Hypothesis

6,154 Matching Annotations

Oct 2025
www.biorxiv.org www.biorxiv.org

New submission 09/01/2024, 09:28:25

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The manuscript by Wagstyl et al. describes an extensive analysis of gene expression in the human cerebral cortex and the association with a large variety of maps capturing many of its microscopic and macroscopic properties. The core methodological contribution is the computation of continuous maps of gene expression for >20k genes, which are being shared with the community. The manuscript is a demonstration of several ways in which these maps can be used to relate gene expression with histological features of the human cortex, cytoarchitecture, folding, function, development and disease risk. The main scientific contribution is to provide data and tools to help substantiate the idea of the genetic regulation of multi-scale aspects of the organisation of the human brain. The manuscript is dense, but clearly written and beautifully illustrated.
  
  Main comments
  
  The starting point for the manuscript is the construction of continuous maps of gene expression for most human genes. These maps are based on the microarray data from 6 left human brain hemispheres made available by the Allen Brain Institute. By technological necessity, the microarray data is very sparse: only 1304 samples to map all the cortex after all subjects were combined (a single individual's hemisphere has ~400 samples). Sampling is also inhomogeneous due to the coronal slicing of the tissue. To obtain continuous maps on a mesh, the authors filled the gaps using nearest-neighbour interpolation followed by strong smoothing. This may have two potentially important consequences that the authors may want to discuss further: (a) the intrinsic geometry of the mesh used for smoothing will introduce structure in the expression map, and (b) strong smoothing will produce substantial, spatially heterogeneous, autocorrelations in the signal, which are known to lead to a significant increase in the false positive rate (FPR) in the spin tests they used.
  
  Many thanks to the reviewer for their considered feedback. We have addressed these primary concerns into point-by-point responses below. The key conclusions from our new analyses are: (i) while the intrinsic geometry of the mesh had not originally been accounted for in sufficient detail, the findings presented in this manuscript paper are not driven by mesh-induced structure, (ii) that the spin test null models used in this manuscript [(including a modified version introduced in response to (i)] are currently the most appropriate way to mitigate against inflated false positive rates when making statistical inferences on smooth, surface-based data.
  
  a. Structured smoothing
  
  A brain surface has intrinsic curvature (Gaussian curvature, which cannot be flattened away without tearing). The size of the neighbourhood around each surface vertex will be determined by this curvature. During surface smoothing, this will make that the weight of each vertex will be also modulated by the local curvature, i.e., by large geometric structures such as poles, fissures and folds. The article by Ciantar et al (2022, https://doi.org/10.1007/s00429-022-02536-4) provides a clear illustration of this effect: even the mapping of a volume of pure noise into a brain mesh will produce a pattern over the surface strikingly similar to that obtained by mapping resting state functional data or functional data related to a motor task.
  
  Comment 1
  
  It may be important to make the readers aware of this possible limitation, which is in large part a consequence of the sparsity of the microarray sampling and the necessity to map that to a mesh. This may confound the assessments of reproducibility (results, p4). Reproducibility was assessed by comparing pairs of subgroups split from the total 6. But if the mesh is introducing structure into the data, and if the same mesh was used for both groups, then what's being reproduced could be a combination of signal from the expression data and signal induced by the mesh structure.
  
  Response 1
  
  The reviewer raises an important question regarding the potential for interpolation and smoothing on a cortical mesh to induce a common/correlated signal due to the intrinsic mesh structure. We have now generated a new null model to test this idea which indicates that intrinsic mesh structure is not inflating reproducibility in interpolated expression maps. This new null model spins the original samples prior to interpolation, smoothing and comparison between triplet splits of the six donors, with independent spins shared across the triplet. For computational tractability we took one pair of triplets and regenerated the dataset for each triplet using 10 independent spins. We used these to estimate gene-gene null reproducibility for 90 independent pairwise combinations of these 10 spins. Across these 90 permutations, the average median gene-gene correlation was R=0.03, whereas in the unspun triplet comparisons this was R=0.36. These results indicate that the primary source of the gene-level triplet reproducibility is the underlying shared gene expression pattern rather than interpolation-induced structure.
  
  In Methods 2a: "An additional null dataset was generated to test whether intrinsic geometry of the cortical mesh and its impact on interpolation for benchmarking analyses of DEMs and gradients (Fig S1d, Fig S2d, Fig S3c). In these analyses, the original samples were rotated on the spherical surface prior to subsequent interpolation, smoothing and gradient calculation. Due to computational constraints the full dataset was recreated only for 10 independent spins. These are referred to as the “spun+interpolated null”.
  
  Author response image 1.
  
  Figure S1d, Gene predictability was higher across all triplet-triplet pairs than when compared to spun+interpolated null.
  
  Comment 2
  
  It's also possible that mesh-induced structure is responsible in part for the "signal boost" observed when comparing raw expression data and interpolated data (fig S1a). How do you explain the signal boost of the smooth data compared with the raw data otherwise?
  
  Response 2
  
  We thank the reviewer for highlighting this issue of mesh-induced structure. We first sought to quantify the impact of mesh-induced structure through the new null model, in which the data are spun prior to interpolation. New figure S1d, S2d and S3c all show that the main findings are not driven by interpolation over a common mesh structure, but rather originate in the underlying expression data.
  
  Specifically, for the original Figure S1a, the reviewer highlights a limitation that we compared intersubject predictability of raw-sample to raw-sample and interpolated-to-interpolated. In this original formulation improved prediction scores for interpolated-to-interpolated (the “signal boost”) could be driven by mesh-induced structure being applied to both the input and predicted maps. We have updated this so that we are now comparing raw-to-raw and interpolated-to-raw, i.e. whether interpolated values are better estimations of the measured expression values. The new Fig S1a&b (see below) shows a signal boost in gene-level and vertex level prediction scores (delta R = +0.05) and we attribute this to the minimisation of location and measurement noise in the raw data, improving the intersubject predictability of expression levels.
  
  In Methods 2b: "To assess the effect of data interpolation in DEM generation we compared gene-level and vertex-level reproducibility of DEMs against a “ground truth” estimate of these reproducibility metrics based on uninterpolated expression data. To achieve a strict comparison of gene expression values between different individuals at identical spatial locations we focused these analyses on the subset of AHBA samples where a sample from one subject was within 3 mm geodesic distance of another. This resulted in 1097 instances (spatial locations) with measures of raw gene expression of one donor, and predicted values from the second donor’s un-interpolated AHBA expression data and interpolated DEM. We computed gene-level and vertex-level reproducibility of expression using the paired donor data at each of these sample points for both DEM and uninterpolated AHBA expression values. By comparing DEM reproducibility estimates with those for uninterpolated AHBA expression data, we were able to quantify the combined effect of interpolation and smoothing steps in DEM generation. We used gene-level reproducibility values from DEMs and uninterpolated AHBA expression data to compute a gene-level difference in reproducibility, and we then visualized the distribution of these difference values across genes (Fig S1a). We used gene-rank correlation to compare vertex-level reproducibility values between DEMs and uninterpolated AHBA expression data (Fig S1b)."
  
  Author response image 2.
  
  Figure S1. Reproducibility of Dense Expression Maps (DEMs) interpolated from spatially sparse postmortem measures of cortical gene expression. a, Signal boost in the interpolated DEM dataset vs. spatially sparse expression data. Restricting to samples taken from approximately the same cortical location in pairs of individuals (within 3mm geodesic distance), there was an overall improvement in intersubject spatial predictability in the interpolated maps. Furthermore, genes with lower predictability in the interpolated maps were less predictable in the raw dataset, suggesting these regions exhibit higher underlying biological variability rather than methodologically introduced bias. b, Similarly at the paired sample locations, gene-rank predictability was generally improved in DEMs vs. sparse expression data (median change in R from sparse samples to interpolated for each pair of subjects, +0.5).
  
  How do you explain that despite the difference in absolute value the combined expression maps of genes with and without cortical expression look similar? (fig S1e: in both cases there's high values in the dorsal part of the central sulcus, in the occipital pole, in the temporal pole, and low values in the precuneus and close to the angular gyrus). Could this also reflect mesh-smoothing-induced structure?
  
  Response 3
  
  As with comment 1, this is an interesting perspective that we had not fully considered. We would first like to clarify that non-cortical expression is defined from the independent datasets including the “cortex” tissue class of the human protein atlas and genes identified as markers for cortical layers or cortical cells in previous studies. This is still likely an underestimate of true cortically expressed genes as some of these “non-cortical genes” had high intersubject reproducibility scores. Nevertheless we think it appropriate to use a measure of brain expression independent of anything included in other analyses for this paper. These considerations are part of the reason we provide all gene maps with accompanying uncertainty scores for user discretion rather than simply filtering them out.
  
  In terms of the spatially consistent pattern of the gene ranks of Fig S1f, this consistent spatial pattern mirrors Transcriptomic Distinctiveness (r=0.52 for non-cortical genes, r=0.75 for cortical genes), so we think that as the differences in expression signatures become more extreme, the relative ranks of genes in that region are more reproducible/easier to predict.
  
  To assess whether mesh-smoothing-induced structure is playing a role, we carried out an additional the new null model introduced in response to comment 1, and asked if the per-vertex gene rank reproducibility of independently spun subgroup triplets showed a similar structure to that in our original analyses. Across the 90 permutations, the median correlation between vertex reproducibility and TD was R=0.10. We also recalculated the TD maps for the 10 spun datasets and the mean correlation with the original TD did not significantly differ from zero (mean R = 0.01, p=0.2, nspins =10). These results indicate that folding morphology is not the major driver of local or large scale patterning in the dataset. We have included this as a new Figure S3c.
  
  We have updated the text as follows:
  
  In Methods 3a: "Third, to assess whether the covariance in spatial patterning across genes could be a result of mesh-associated structure introduced through interpolation and smoothing, TD maps were recomputed for the spun+interpolated null datasets and compared to the original TD map (Fig S3c)."
  
  In Results: "The TD map observed from the full DEMs library was highly stable between all disjoint triplets of donors (Methods, Fig S3a, median cross-vertex correlation in TD scores between triplets r=0.77) and across library subsets at all deciles of DEM reproducibility (Methods, Fig S3b, cross-vertex correlation in TD scores r>0.8 for the 3rd-10th deciles), but was not recapitulated in spun null datasets (Fig S3c)."
  
  Author response image 3.
  
  Figure S3c, Correlations between TD and TD maps regenerated on datasets spun using two independent nulls, one where the rotation is applied prior to interpolation and smoothing (spun+interpolated) and one where it is applied to the already-created DEMs. In each null, the same rotation matrix is applied to all genes.
  
  Comment 4
  
  Could you provide more information about the way in which the nearest-neighbours were identified (results p4). Were they nearest in Euclidean space? Geodesic? If geodesic, geodesic over the native brain surface? over the spherically deformed brain? (Methods cite Moresi & Mather's Stripy toolbox, which seems to be meant to be used on spheres). If the distance was geodesic over the sphere, could the distortions introduced by mapping (due to brain anatomy) influence the geometry of the expression maps?
  
  Response 4
  
  We have clarified in the Methods that the mapping is to nearest neighbors on the spherically-inflated surface.
  
  The new null model we have introduced in response to comments 1 & 3 preserves any mesh-induced structure alongside any smoothing-induced spatial autocorrelations, and the additional analyses above indicate that main results are not induced by systematic mesh-related interpolation signal. In response to an additional suggestion from the reviewer (Comment 13), we also assessed whether local distortions due to the mesh could be creating apparent border effects in the data, for instance at the V1-V2 boundary. At the V1-V2 border, which coincides anatomically with the calcarine sulcus, we computed the 10 genes with the highest expression gradient along this boundary in the actual dataset and the spun-interpolated null. The median test expression gradients along this border was higher than in any of the spun datasets, indicating that these boundary effects are not explained by the interpolation and cortical geometry effects on the data (new Fig S2d). The text has been updated as follows:
  
  In Methods 1: "For cortical vertices with no directly sampled expression, expression values were interpolated from their nearest sampled neighbor vertex on the spherical surface (Moresi and Mather, 2019) (Fig 1b)."
  
  In Methods 2: "We used the spun+interpolated null to test whether high gene gradients could be driven by non-uniform interpolation across cortical folds. We quantified the average gradient for all genes along the V1-V2 border in the atlas, as well as for 10 iterations of the atlas where the samples were spun prior to interpolation. We computed the median gradient magnitude for the 20 top-ranked genes for each (Fig S2d)."
  
  Author response image 4.
  
  Figure S2d Mean of gradient magnitudes for 20 genes with largest gradients along V1-V2 border, compared to values along the same boundary on the spun+interpolated null atlas. Gradients were higher in the actual dataset than in all spun version indicating this high gradient feature is not primarily due to the effects of calcarine sulcus morphology on interpolation
  
  Comment 5
  
  Could you provide more information about the smoothing algorithm? Volumetric, geodesic over the native mesh, geodesic over the sphere, averaging of values in neighbouring vertices, cotangent-weighted laplacian smoothing, something else?
  
  Response 5
  
  We are using surface-based geodesic over the white surface smoothing described in Glasser et al., 2013 and used in the HCP workbench toolbox (https://www.humanconnectome.org/software/connectome-workbench). We have updated the methods to clarify this.
  
  In Methods 1: "Surface expression maps were smoothed using the Connectome Workbench toolbox (Glasser et al. 2013) with a 20mm full-width at half maximum Gaussian kernel , selected to be consistent with this sampling density (Fig 1c)."
  
  Comment 6
  
  Could you provide more information about the method used for computing the gradient of the expression maps (p6)? The gradient and the laplacian operator are related (the laplacian is the divergence of the gradient), which could also be responsible in part for the relationships observed between expression transitions and brain geometry.
  
  Response 6
  
  We are using Connectome Workbench’s metric gradient command for this Glasser et al., 2013 and used in the HCP workbench pipeline. The source code for gradient calculation can be found here: https://github.com/Washington-University/workbench/blob/131e84f7b885d82af76e be21adf2fa97795e2484/src/Algorithms/AlgorithmMetricGradient.cxx
  
  In Methods 2: >For each of the resulting 20,781 gene-level expression maps, the orientation and magnitude of gene expression change at each vertex (i.e. the gradient) was calculated for folded, inflated, spherical and flattened mesh representations of the cortical sheet using Connectome Workbench’s metric gradient command (Glasser et al. 2013).
  
  b. Potentially inflated FPR for spin tests on autocorrelated data."
  
  Spin tests are extensively used in this work and it would be useful to make the readers aware of their limitations, which may confound some of the results presented. Spin tests aim at establishing if two brain maps are similar by comparing a measure of their similarity over a spherical deformation of the brains against a distribution of similarities obtained by randomly spinning one of the spheres. It is not clear which specific variety of spin test was used, but the original spin test has well known limitations, such as the violation of the assumption of spatial stationarity of the covariance structure (not all positions of the spinning sphere are equivalent, some are contracted, some are expanded), or the treatment of the medial wall (a big hole with no data is introduced when hemispheres are isolated).
  
  Another important limitation results from the comparison of maps showing autocorrelation. This problem has been extensively described by Markello & Misic (2021). The strong smoothing used to make a continuous map out of just ~1300 samples introduces large, geometry dependent autocorrelations. Indeed, the expression maps presented in the manuscript look similar to those with the highest degree of autocorrelation studied by Markello & Misic (alpha=3). In this case, naive permutations should lead to a false positive rate ~46% when comparing pairs of random maps, and even most sophisticated methods have FPR>10%.
  
  Comment 7 There's currently several researchers working on testing spatial similarity, and the readers would benefit from being made aware of the problem of the spin test and potential solutions. There's also packages providing alternative implementations of spin tests, such as BrainSMASH and BrainSpace, which could be mentioned.
  
  Response 7
  
  We thank the reviewer for raising the issue of null models. First, with reference to the false positive rate of 46% when maps exhibit spatial autocorrelation, we absolutely agree that this is an issue that must be accounted for and we address this using the spin test. We acknowledge there has been other work on nulls such as BrainSMASH and BrainSpace. Nevertheless in the Markello and Misic paper to which the reviewer refers, the BrainSmash null models perform worse with smoother maps (with false positive rates approaching 30% in panel e below), whereas the spin test maintains false positives rates below 10%.
  
  Author response image 5.
  
  We have added a brief description of the challenge and our use of the spin test.
  
  In Methods 2a: "Cortical maps exhibit spatial autocorrelation that can inflate the False Positive Rate, for which a number of methods have been proposed(Alexander-Bloch et al. 2018; Burt et al. 2020; Vos de Wael et al. 2020). At higher degrees of spatial smoothness, this high False Positive Rate is most effectively mitigated using the spin test(Alexander-Bloch et al. 2018; Markello and Misic 2021; Vos de Wael et al. 2020). In the following analyses when generating a test statistic comparing two spatial maps, to generate a null distribution, we computed 1000 independent spins of the cortical surface using https://netneurotools.readthedocs.io, and applied it to the first map whilst keeping the second map unchanged. The test statistic was then recomputed 1000 times to generate a null distribution for values one might observe by chance if the maps shared no common organizational features. This is referred to throughout as the “spin test” and the derived p-values as pspin."
  
  Comment 8
  
  Could it be possible to measure the degree of spatial autocorrelation?
  
  Response 8
  
  We agree this could be a useful metric to generate for spatial cortical maps. However, there are multiple potential metrics to choose from and each of the DEMs would have their own value. To address this properly would require the creation of a set of validated tools and it is not clear how we could summarize this variety of potential metrics for 20k genes. Moreover, as discussed above the spin method is an adequate null across a range of spatial autocorrelation degrees, thus while we agree that in general estimation of spatial smoothness could be a useful imaging metric to report, we consider that it is beyond the scope of the current manuscript.
  
  Comment 9
  
  Could you clarify which version of the spin test was used? Does the implementation come from a package or was it coded from scratch?
  
  Response 9
  
  As Markello & Misic note, at the vertex level, the various implementations of the spin test become roughly equivalent to the ‘original’ Alexander-Bloch et al., implementation. We used took the code for the ‘original’ version implemented in python here: https://netneurotools.readthedocs.io/en/latest/_modules/netneurotools/stats.html# gen_spinsamples.
  
  This has been updated in the methods (see Response 7).
  
  Comment 10
  
  Cortex and non-cortex vertex-level gene rank predictability maps (fig S1e) are strikingly similar. Would the spin test come up statistically significant? What would be the meaning of that, if the cortical map of genes not expressed in the cortex appeared to be statistically significantly similar to that of genes expressed in the cortex?
  
  Response 10
  
  Please see response to comment 3, which also addresses this observation.
  
  Reviewer #2 (Public Review):
  
  The authors convert the AHBA dataset into a dense cortical map and conduct an impressively large number of analyses demonstrating the value of having such data.
  
  I only have comments on the methodology.
  
  Comment 1
  
  First, the authors create dense maps by simply using nearest neighbour interpolation followed by smoothing. Since one of the main points of the paper is the use of a dense map, I find it quite light in assessing the validity of this dense map. The reproducibility values they calculate by taking subsets of subjects are hugely under-powered, given that there are only 6 brains, and they don't inform on local, vertex-wise uncertainties). I wonder if the authors would consider using Gaussian process interpolation. It is really tailored to this kind of problem and can give local estimates of uncertainty in the interpolated values. For hyperparameter tuning, they could use leave-one-brain-out for that.
  
  I know it is a lot to ask to change the base method, as that means re-doing all the analyses. But I think it would strengthen the paper if the authors put as much effort in the dense mapping as they did in their downstream analyses of the data.
  
  Response 1
  
  We thank the reviewer for the suggestion to explore Gaussian process interpolation. We have implemented this for our dataset and attempted to compare this with our original method with the 3 following tests: i) intertriplet reproducibility of individual gene maps, ii) microscale validations: area markers, iii) macroscale validations: bio patterns.
  
  Overall, compared to our original nearest-neighbor interpolation method, GP regression (i) did not substantially improve gene-level reproducibility of expression maps (median correlation increase of R=0.07 which was greater for genes without documented protein expression in cortex): ii) substantially worsened performance in predicting areal marker genes and iii) showed similar but slightly worse performance at predicting macroscale patterns from Figure 1.
  
  Given the significantly poorer performance on one of our key tests (ii) we have opted not to replace our original database, but we do now include code for the alternative GP regression methodology in the github repository so others can reproduce/further develop these methods.
  
  Author response image 6.
  
  ii) Genes ranked by mean expression gradient from current DEMs (left) and Gaussian process-derived interpolation maps (right). Established Human and macaque markers are consistently higher-ranked in DEM maps. iii) Figure 1 Interpolated vs GP regression
  
  Author response table 1.
  
  Comment 2
  
  It is nice that the authors share some code and a notebook, but I think it is rather light. It would be good if the code was better documented, and if the user could have access to the non-smoothed data, in case they was to produce their own dense maps. I was only wondering why the authors didn't share the code that reproduces the many analyses/results in the paper.
  
  Response 2
  
  We thank the reviewer for this suggestion. In response we have updated the shared github repository (https://github.com/kwagstyl/magicc). This now includes code and notebooks to reproduce the main analyses and figures.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor comments
  
  Comment 11
  
  p4 mentions Fig S1h, but the supp figures only goes from S1a to S1g
  
  Response 11
  
  We thank the reviewer for capturing this error. It was in fact referring to what is now Fig S1h and has been updated.
  
  Comment 12
  
  It would be important that the authors share all the code used to produce the results in the paper in addition to the maps. The core methodological contribution of the work is a series of continuous maps of gene expression, which could become an important tool for annotation in neuroimaging research. Many arbitrary (reasonable) decisions were made, it would be important to enable users to evaluate their influence on the results.
  
  Response 12
  
  We thank both reviewers for this suggestion. We have updated the github to be able to reproduce the dense maps and key figures with our methods.
  
  Comment 13
  
  p5: Could the sharp border reflect the effect of the geometry of the calcarine sulcus on map smoothing? More generally, could there be an effect of folds on TD?
  
  Response 13
  
  Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These new null models - where original source data were spun prior to interpolation suggest that neither the sharp V1/2 border or the TD map are effects of mesh geometry. Specifically: (i) , the magnitudes of gradients along the V1/2 boundary from null models were notably smaller than those in our original analyses (see new figure S2d), and (ii) TD maps computed from the new null models showed no correlation with TD maps from ur original analyses (new Figure S3c, mean R = 0.01, p=0.2, nspins =10).
  
  Comment 14
  
  p5: Similar for the matching with the areas in Glasser's parcellation: the definition of these areas involves alignment through folds (based on freesurfer 'sulc' map, see Glasser et al 2016). If folds influence the geometry of TDs, could that influence the match?
  
  Response 14
  
  We note that Fig S3c provided evidence that folding was not the primary driver of the TD patterning. However, it is true that Glasser et al. use both neuroanatomy (folding, thickness and myelin) and fMRI-derived maps to delineate their cortical areas. As such Figure 2 f & g aren’t fully independent assessments. Nevertheless the reason that these features are used is that many of the sulci in question have been shown to reliably delineate cytoarchitectonic boundaries (Fischl et al., 2008).
  
  In Results: "A similar alignment was seen when comparing gradients of transcriptional change with the spatial orientation of putative cortical areas defined by multimodal functional and structural in vivo neuroimaging(Glasser et al., 2016) (expression change running perpendicular to area long-axis, pspin<0.01, Fig 2g, Methods)."
  
  Comment 15
  
  p6: TD peaks are said to overlap with functionally-specialised regions. A comment on why audition is not there, nor language, but ba 9-46d is? Would that suggest a lesser genetic regulation of those functions?
  
  Response 15
  
  The reviewer raises a valid point and this was a result that we were also surprised by. The finding that the auditory cortex is not as microstructurally distinctive as, say V1, is consistent with other studies applying dimensionality-reduction techniques to multimodal microstructural receptor data (e.g. Zilles et al., 2017, Goulas et al., 2020). These studies found that the auditory microstructure is not as extreme as either visual and somatomotor areas. From a methodological view point, the primary auditory cortex is significantly smaller than both visual and somatomotor areas, and therefore is captured by fewer independent samples, which could reduce the detail in which its structure is being mapped in our dataset.
  
  For the frontal areas, we would note that i) the frontal peak is the smallest of all peaks found and was more strongly characterised by low z-score genes than high z-score. ii) the anatomical areas in the frontal cortex are much more highly variable with respect to folding morphology (e.g. Rajkowska 1995). The anatomical label of ba9-46d (and indeed all other labels) were automatically generated as localisers rather than strict area labels. We have clarified this in the text as follows:
  
  In Methods 3a: "Automated labels to localize TD peaks were generated based on their intersection with a reference multimodal neuroimaging parcellation of the human cortex(Glasser et al., 2016). Each TD was given the label of the multimodal parcel that showed greatest overlap (Fig 2b)."
  
  Comment 16.
  
  p7: The proposition that "there is a tendency for cortical sulci to run perpendicular to the direction of fastest transcriptional change", could also be "there is a tendency for the direction of fastest transcriptional change to run perpendicular to cortical sulci"? More pragmatically, this result from the geometry of transcriptional maps being influenced by sulcal geometry in their construction.
  
  Response 16
  
  Please see our response to Reviewer 1, Comment 1 above, where we introduce the new null models now analyzed to test for effects of mesh geometry on our findings. These models indicate that the topography of interpolated gene expression maps do not reflect influences of sulcal geometry on their construction.
  
  Comment 17
  
  p7: TD transitions are indicated to precede folding. This is based on a consideration of folding development based on the article by Chi et al 1977, which is quite an old reference. In that paper, the authors estimated the tempo of human folding development based on the inspection of photographs, which may not be sufficient for detecting the first changes in curvature leading to folds. The work of the Developing Human Connectome consortium may provide a more recent indication for timing. In their data, by PCW 21 there's already central sulcus, pre-central, post-central, intra-parietal, superior temporal, superior frontal which can be detected by computing the mean curvature of the pial surface (I can only provide a tweet for reference: https://twitter.com/R3RT0/status/1617119196617261056). Even by PCW 9-13 the callosal sulcus, sylvian fissure, parieto-occipital fissure, olfactory sulcus, cingulate sulcus and calcarine fissure have been reported to be present (Kostovic & Vasung 2009).
  
  Response 17
  
  Our field lacks the data necessary to provide a comprehensive empirical test for the temporal ordering of regional transcriptional profiles and emergence of folding. Our results show that transcriptional identities of V1 and TGd are - at least - present at the very earliest stages of sulcation in these regions. In response to the reviewers comment we have updated with a similar fetal mapping project which similarly shows evidence of the folds between weeks 17-21 and made the language around directionality more cautious.
  
  In Results: "The observed distribution of these angles across vertices was significantly skewed relative to a null based on random alignment between angles (pspin<0.01, Fig 2f, Methods) - indicating that there is indeed a tendency for cortical sulci and the direction of fastest transcriptional change to run perpendicular to each other (pspin<0.01, Fig 2f).
  
  As a preliminary probe for causality, we examined the developmental ordering of regional folding and regional transcriptional identity. Mapping the expression of high-ranking TD genes in fetal cortical laser dissection microarray data(Miller et al., 2014) from 21 PCW (Post Conception Weeks) (Methods) showed that the localized transcriptional identity of V1 and TGd regions in adulthood is apparent during the fetal periods when folding topology begins to emerge (Chi et al. 1977; Xu et al. 2022) (Fig " S2d).
  
  In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship."
  
  Comment 18
  
  p7: In my supplemental figures (obtained from biorxiv, because I didn't find them among the files submitted to eLife) there's no S2j (only S2a-S2i).
  
  Response 18
  
  We apologize, this figure refers to S3k (formerly S3j), rather than S2j. We have updated the main text.
  
  Comment 19 p7: It is not clear from the methods (section 3b) how the adult and fetal brains were compared. Maybe using MSM (Robinson et al 2014)?
  
  Response 19
  
  We have now clarified this in Methods text as reproduced below.
  
  In Methods 3b: "We averaged scaled regional gene expression values between donors per gene, and filtered for genes in the fetal LDM dataset that were also represented in the adult DEM dataset - yielding a single final 20,476*235 gene-by-sample matrix of expression values for the human cortex at 21 PCW. Each TD peak region was then paired with the closest matching cortical label within the fetal regions. This matrix was then used to test if each TD expression signature discovered in the adult DEM dataset (Fig 2, Table 3) was already present in similar cortical regions at 21 PCW."
  
  Comment 20
  
  p7: WGCNA is used prominently, could you provide a brief introduction to its objectives? The gene coexpression networks are produced after adjusting the weight of the network edges to follow a scale-free topology, which is meant to reflect the nature of protein-protein interactions. Soft thresholding increases contrast, but doesn't this decrease a potential role of infinitesimal regulatory signals?
  
  Response 20
  
  We agree with the reviewer that the introduction to WGCNA needed additional details and have amended the Results (see below). One limitation of WGCNA-derived associations is that it will downweigh the role of smaller relationships including potentially important regulatory signals. WGCNA methods have been titrated to capture strong relationships. This is an inherent limitation of all co-expression driven methods which lead to an incomplete characterisation of the molecular biology. Nevertheless we feel these stronger relationships are still worth capturing and interrogating. We have updated the text to introduce WGCNA and acknowledge this potential weakness in the approach.
  
  In Results: "Briefly, WGCNA constructs a constructs a connectivity matrix by quantifying pairwise co-expression between genes, raising the correlations to a power (here 6) to emphasize strong correlations while penalizing weaker ones, and creating a Topological Overlap Matrix (TOM) to capture both pairwise similarities expression and connectivity. Modules of highly interconnected genes are identified through hierarchical clustering. The resultant WGCNA modules enable topographic and genetic integration because they each exist as both (i) a single expression map (eigenmap) for spatial comparison with neuroimaging data (Fig 3a,b, Methods) and, (ii) a unique gene set for enrichment analysis against marker genes systematically capturing multiple scales of cortical organization, namely: cortical layers, cell types, cell compartments, protein-protein interactions (PPI) and GO terms (Methods, Table S2 and S4)."
  
  Comment 21
  
  WGCNA modules look even more smooth than the gene expression maps. Are these maps comparable to low frequency eigenvectors? Autocorrelation in that case should be very strong?
  
  Response 21
  
  These modules are smooth as they are indeed eigenvectors which likely smooth out some of the more detailed but less common features seen in individual gene maps. These do exhibit high degrees of autocorrelation, nevertheless we are applying the spin test which is currently the appropriate null model for spatially autocorrelated cortical maps (Response 7).
  
  Comment 22
  
  If the WGCNA modules provide an orthogonal basis for surface data, is it completely unexpected that some of them will correlate with low-frequency patterns? What would happen if random low frequency patterns were generated? Would they also show correlations with some of the 16 WGCNA modules?
  
  Response 22
  
  We agree with the reviewer that if we used a generative model like BrainSMASH, we would likely see similar low frequency patterns. However, the inserted figure in Response 7 from Makello & Misic provide evidence that is not as conservative a null as the spin test when data exhibit high spatial autocorrelation. The spatial enrichment tests carried out on the WGCNA modules are all carried out using the spin test.
  
  Comment 23
  
  In part (a) I commented on the possibility that brain anatomy may introduce artifactual structure into the data that's being mapped. But what if the relationship between brain geometry and brain organisation were deeper than just the introduction of artefacts? The work of Lefebre et al (2014, https://doi.org/10.1109/ICPR.2014.107; 2018, https://doi.org/10.3389/fnins.2018.00354) shows that clustering based on the 3 lowest frequency eigenvectors of the Laplacian of a brain hemisphere mesh produce an almost perfect parcellation into lobes, with remarkable coincidences between parcel boundaries and primary folds and fissures. The work of Pang et al (https://doi.org/10.1101/2022.10.04.510897) suggests that the geometry of the brain plays a critical role in constraining its dynamics: they analyse >10k task-evoked brain maps and show that the eigenvectors of the brain laplacian parsimoniously explain the activity patterns. Could brain anatomy have a downward effect on brain organisation?
  
  Response 23
  
  The reviewer raises a fascinating extension of our work identifying spatial modes of gene expression. We agree that these are low frequency in nature, but would first like to note that the newly introduced null model indicates that the overlaps with salient neuroanatomical features are inherent in the expression data and not purely driven by anatomy in a methodological sense.
  
  Nevertheless we absolutely agree there is likely to be a complex multidirectional interplay between genetic expression patterns through development, developing morphology and the “final” adult topography of expression, neuroanatomical and functional patterns.
  
  We think that the current manuscript currently contains a lot of in depth analyses of these expression data, but agree that a more extensive modeling analysis of how expression might pattern or explain functional activation would be a fascinating follow on, especially in light of these studies from Pang and Lefebre. Nevertheless we think that this must be left for a future modeling paper integrating these modes of microscale, macroscale and functional anatomy.
  
  In Discussion: "Indeed, future work might find direct links between these module eigenvectors and similar low-frequency eigenvectors of cortical geometry have been used as basis functions to segment the cortex (Lefèvre et al. 2018) and explain complex functional activation patterns(Pang et al. 2023)."
  
  Comment 24
  
  On p11: ASD related to rare, deleterious mutations of strong effect is often associated with intellectual disability (where the social interaction component of ASD is more challenging to assess). Was there some indication of a relationship with that type of cognitive phenotype?
  
  Response 24
  
  Across the two ABIDE cohorts, the total number of those with ASD and IQ <70, which is the clinical threshold for intellectual disability was n=10, which unfortunately did not allow us to conduct a meaningful test of whether ID impacts the relationship between imaging changes in ASD and the expression maps of genes implicated in ASD by rare variants.
  
  Comment 25
  
  Could you clarify if the 6 donors were aligned using the folding-based method in freesurfer?
  
  Response 25
  
  The 6 donors were aligned using MSMsulc (Robinson et al., 2014), which is a folding based method from the HCP group. This is now clarified in the methods.
  
  In Methods 1: "Cortical surfaces were reconstructed for each AHBA donor MRI using FreeSurfer(Fischl, 2012), and coregistered between donors using surface matching of individuals’ folding morphology (MSMSulc) (Robinson et al., 2018)."
  
  Comment 26
  
  The authors make available a rich resource and a series of tools to facilitate their use. They have paid attention to encode their data in standard formats, and their code was made in Python using freely accessible packages instead of proprietary alternatives such as matlab. All this should greatly facilitate the adoption of the approach. I think it would be important to state more explicitly the conceptual assumptions that the methodology brings. In the same way that a GWAS approach relies on a Mendelian idea that individual alleles encode for phenotypes, what is the idea about the organisation of the brain implied by the orthogonal gene expression modules? Is it that phenotypes - micro and macro - are encoded by linear combinations of a reduced number of gene expression patterns? What would be the role of the environment? The role of non-genic regulatory regions? Some modalities of functional organisation do not seem to be encoded by the expression of any module. Is it just for lack of data or should this be seen as the sign for a different organisational principle? Likewise, what about the aspects of disorders that are not captured by expression modules? Would that hint, for example, to stronger environmental effects? What about linear combinations of modules? Nonlinear? Overall, the authors adopt implicitly, en passant, a gene-centric conceptual standpoint, which would benefit from being more clearly identified and articulated. There are citations to Rakic's protomap idea (I would also cite the original 1988 paper, and O'Leary's 1989 "protocortex" paper stressing the role of plasticity), which proposes that a basic version of brain cytoarchitecture is genetically determined and transposed from the proliferative ventricular zone regions to the cortical plate through radial migration. In p13 the authors indicate that their results support Rakic's protomap. Additionally, in p7 the authors suggest that their results support a causal arrow going from gene expression to sulcal anatomy. The reviews by O'leary et al (2007), Ronan & Fletcher (2014, already cited), Llinares-Benadero & Borrell (2019) could be considered, which also advocate for a similar perspective. For nuances on the idea that molecular signals provide positional information for brain development, the article by Sharpe (2019, DOI: 10.1242/dev.185967) is interesting. For nuances on the gene-centric approach of the paper the articles by Rockmann (2012, DOI: 10.1111/j.1558-5646.2011.01486.x) but also from the ENCODE consortium showing the importance of non-genic regions of the genome ("Perspectives on ENCODE" 2020 DOI: 10.1038/s41586-021-04213-8) could be considered. I wouldn't ask to cite ideas from the extended evolutionary synthesis about different inheritance systems (as reviewed by Jablonka & Lamb, DOI: 10.1017/9781108685412) or the idea of inherency (Newman 2017, DOI: 10.1007/978-3-319-33038-9_78-1), but the authors may find them interesting. Same goes for our own work on mechanical morphogenesis which expands on the idea of a downward causality (Heuer and Toro 2019, DOI: 10.1016/j.plrev.2019.01.012)
  
  Response 26
  
  We thank the reviewer for recommending these papers, which we enjoyed reading and have deepened our thinking on the topic. In addition to toning down some of the language with respect to causality that our data cannot directly address, we have included additional discussion and references as follows:
  
  In Discussion: "By establishing that some of these cortical zones are evident at the time of cortical folding, we lend support to a “protomap”(Rakic 1988; O'Leary 1989; O'Leary et al. 2007; Rakic et al. 2009) like model where the placement of some cortical folds is set-up by rapid tangential changes in cyto-laminar composition of the developing cortex(Ronan et al., 2014; Toro and Burnod, 2005; Van Essen, 2020). The DEMs are derived from fully folded adult donors, and therefore some of the measured genetic-folding alignment might also be induced by mechanical distortion of the tissue during folding(Llinares-Benadero and Borrell 2019; Heuer and Toro 2019). However, no data currently exist to conclusively assess the directionality of this gene-folding relationship.
  
  Overall, the manuscript is very interesting and a great contribution. The amount of work involved is impressive, and the presentation of the results very clear. My comments indicate some aspects that could be made more clear, for example, providing additional methodological information in the supplemental material. Also, making aware the readers and future users of MAGICC of the methodological and conceptual challenges that remain to be addressed in the future for this field of research.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Comment 1
  
  The supplementary figures seem to be missing from the eLife submission (although I was able to find them on europepmc)
  
  Response 1
  
  We apologize that these were not included in the documents sent to reviewers. The up-to-date supplementary figures are included in this resubmission and again on biorxiv.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.06.13.495984v3
www.biorxiv.org www.biorxiv.org

Age-Related Decline in Blood-Brain Barrier Function is More Pronounced in Males than Females in Parietal and Temporal Regions

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This work revealed an important finding that the blood-brain barrier (BBB) functionality changes with age and is more pronounced in males. The authors applied a non-invasive, contrast-agent-free approach of MRI called diffusion-prepared arterial spin labeling (DP-pCASL) to a large cohort of healthy human volunteers. DP-pCASL works by tracking the movement of magnetically labeled water (spins) in blood as it perfuses brain tissue. It probes the molecular diffusion of water, which is sensitive to microstructural barriers, and characterizes the signal coming from fast-moving spins as blood and slow-moving spins as tissue, using different diffusion gradients (b-values). This differentiation is then used to assess the water exchange rates (kw) across the BBB, which acts as a marker for BBB functionality. The main finding of the authors is that kw decreases with age, and in some brain regions, kw decreases faster in males. The neuroprotective role of the female sex hormone, estrogen, on BBB function is discussed as one of the explanations for this finding, supported by literature. The study also shows that BBB function remains stable until the early 60s and remarkably decreases thereafter.
  
  Strengths:
  
  The two main strengths of the study are the MRI method used and the amount of data. The authors employed a contrast-agent-free MRI method called ASL, which offers the opportunity to repeat such experiments multiple times without any health risk - a significant advantage of ASL. Since ASL is an emerging field that requires further exploration and testing, a study evaluating blood-brain barrier functionality is of great importance. The authors utilized a large dataset of healthy humans, where volunteer data from various studies were combined to create a substantial pool. This strategy is effective for statistically evaluating differences in age and gender.
  
  Weaknesses:
  
  R1.0: Gender-related differences are only present in some brain regions, not in the whole brain or gray matter - which is usually the assumption unless stated otherwise. From the title, this was not clear. Including simulations could increase readers' understanding related to model fitting and the interdependence of parameters, if present. The discussion follows a clear line of argument supported by literature; however, focusing solely on AQP4 channels and missing a critical consideration of other known/proven changes in transport mechanisms through the BBB and their effects substantially weakens the discussion.
  
  Thanks for your insightful feedback and suggestions. We have made the following changes to the manuscript:
  
  (1) The title has been modified to highlight the sex differences in specific brain regions: “Age-Related Decline in Blood-Brain Barrier Function is More Pronounced in Males than Females in Parietal and Temporal Regions.”
  
  (2) To study the potential impact of prolonged ATT seen in males on estimated kw, we simulated kw distribution for females by adjusting ATT by +60 ms to match males' ATT. This led to marginally higher kw values (Supplemental Figure S2), suggesting that the kw difference between males and females is not a direct result of prolonged ATT. Additionally, we have added a section titled “Data and Code Availability Statements” in the revised manuscript to indicate that we are willing to share the reconstruction toolbox with interested groups. The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF, and ATT maps, which can run on Windows or Mac computers.
  
  (3) We agree with the reviewer that BBB water exchange can be facilitated by other transport mechanisms, as we mentioned in the introduction: “Water exchange across the BBB occurs at a relatively high level and is mediated by passive diffusion, active co-transport through the endothelial membrane, and facilitated diffusion through the dedicated water channel, aquaporin-4 (AQP4), at the end-feet of astrocytes.” We emphasized our findings related to AQP4 based on the technical properties of DP-pCASL, which is more sensitive to the exchange occurring across astrocyte end-feet. We also acknowledge that different techniques can be helpful to study other components of BBB water exchange, and we have added the following discussion to the updated manuscript: “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method. These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging. In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states. Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements.”
  
  Reviewer #1 (Recommendations For The Authors):
  
  R1.1 The manuscript is well-organized and presents arguments in a logical order. The visual representation of results in the form of figures is sufficient (see style suggestions below).
  
  Thanks for your suggestions on improving the figures, we have updated figures for better visualization (Please see our response to R1.5, R1.6, R1.7 and R1.8).
  
  R1.2 It would be beneficial if the model/toolbox could be made publicly available so that fellow researchers from the community could apply and test it in their research.
  
  We have added a section “Data and code availability statements” in the revised manuscript to indicate we’re willing to share the toolbox to the interested groups (L529 in the annotated manuscript). The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF and ATT maps, which can run on windows or MAC computers. Indeed, we have been sharing our reconstruction toolbox with over 50 collaboration sites. The following screenshots are examples of three steps performed by the toolbox (shared by one collaborator):
  
  Author response image 1.
  
  Step 1: Loading raw data and calculate T1 map
  
  Author response image 2.
  
  Step 2: Motion correction and skull stripping
  
  Author response image 3.
  
  Step 3: kw, CBF and ATT quantification (nii files will be saved)
  
  R1.3 Line 46 states that the technique is novel, but it has been introduced and used before (Shao, et al. MRM 2019). It sure is innovative but the term novel is too strong and may confuse the readers that it is something new introduced in this manuscript.
  
  Thanks for the suggestion, we agree the term ‘novel’ may cause confusion about the technique, we have removed it in the revised manuscript (L48, L50).
  
  R1.4 Line 395, kw was generated using PLD = 1.8s with b = 0, 50 s/mm2. Is only one-time point enough for estimating kw? To me, it is not clear how robust is the kw estimation with only one PLD.
  
  According to the single-pass approximation (SPA) model (1), kw can be accurately estimated when the PLD is longer than the ATT. We recruited cognitively normal participants in this study and found the longest ATT to be 1526.7±117.4 and 1468.1±166.9 ms in aged (62-92 years) males and females, respectively. A PLD of 1.8 s was chosen to balance the SNR of the data and the accuracy of the model fitting, which should be sufficient for this study. However, for future studies involving diseased populations with prolonged ATT, a longer PLD should be used, or a multi-PLD protocol could be helpful to improve the robustness of quantification accuracy.
  
  We have added a limitation statement in the revised manuscript (L407): "A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2)."
  
  R1.5 Suggestion: Figure 3A, colormap for kw appears suboptimal. Regional differences are hard to see.
  
  Thanks for the suggestion, we have updated the range of color scale (from [0, 200], to [70, 160]) to highlight the regional differences in the updated Figure 3:
  
  We prefer to use the same blue colormap that we and our collaborators have been using this for publications to maintain consistence. We also acknowledged the limitation of the spatial resolution of kw maps in the updated manuscript (L412): “To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2)”
  
  R1.6 Suggestion: use same/similar colormaps for the same parameters (kw, ATT, CBF) to help the reader follow across Figures 3, 4, and 5.
  
  Thanks for your suggestion, we agree that using the same color would be easier for readers to follow the context. However, figures 4 and 5 were created to show the age and sex dependent changes, so that we used warm and cold colors to indicate effects of decrease and increase, respectively. We clarified the choice of colormap in the figure captions (L260, L284): “The effects of decrease or increase were represented by warm colors (yellow to red) and cold (gray to blue) colors, respectively.”
  
  R1.7 Suggestion: please be consistent with the ordering of parameters in Figures 3, 4, and 5.
  
  Thanks for the suggestion, we have updated Figure 3 to consistently show kw, CBF and ATT results in order from left to right:
  
  R1.8 Suggestion: use the same scaling (e.g.[|1.9|, |11 |] for Fig. 4, [|1.9|, |4|] for Figure 5) to enhance comparability across parameters in the subfigures.
  
  Thanks for the suggestion, we agree that the same scaling would enhance the comparability across parameters. We have updated the color scales for Figure 5 using maximal |T| = 4:
  
  However, range of maximal |T| was relatively large for Figure 4 (i.e. 5 for kw, 11 for CBF and 7 for ATT), and using the same color scale might oversaturate the regional responses or diminish the visibility of regional differences. Therefore, we prefer to keep the original color scale for Figure 4.
  
  R1.9 In Figure 5, the interaction of age with sex in kw parameter seems to be more on one side of the brain. What could be the reasons for possible lateralization?
  
  We agree with the reviewer that the age and sex interaction effects emphasized on one side is an interesting finding. While we do not have a clear explanation now, we suspect it may relate to aging-related asymmetrical vascular burdens. Giannakopoulos et al. reported that vascular scores, indicating higher vascular burden, were significantly higher in the left hemisphere across all Clinical Dementia Rating scores. Moreover, the predominance of Alzheimer’s disease and vascular pathology in the right hemisphere correlated with significantly higher Clinical Dementia Rating scores (3). We added the following to the updated manuscript to discuss this potential mechanism (L370): “… We also observed an asymmetric effect on left and right brain hemispheres, which might be associated with asymmetrically developed vascular burdens in aging (3)."
  
  R1.10 A comparison between the present study and DCE MRI as well as other ASL methods evaluating BBB function with age is missing. ASL techniques probing transverse relaxation and DCE MRI have reported increased kw with age in humans as well as in animal models. What could be the reasons?
  
  We agree with the reviewer that BBB water exchange measured by other methods should be sufficiently discussed, especially regarding their age-related changes. We added the following discussion in the updated manuscript (L415): “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13).”
  
  R1.11 Line 163/164, a rapid decrease of CBF in males in the region of the hippocampus is reported. It would be beneficial to discuss this in discussion further (has this been reported before, possible reasons, etc).
  
  Thanks for the suggestion, we agree that the accelerated CBF decline in males in the hippocampus is an important finding, we have added discussion in the revised manuscript (L300): "Furthermore, we found a more pronounced age-related decline in CBF in the hippocampus of males compared to females (Fig. 2, Supplemental Table S2). To the best of our knowledge, no study has previously reported this accelerated hippocampal CBF decline in males. This finding may be linked to the accelerated hippocampal volume loss in males, as reported in a study analyzing 19,793 generally healthy UK Biobank participants (14). Lower hippocampal perfusion has been associated with poor memory performance (15, 16), suggesting that males might be more vulnerable to potential cognitive decline (17).
  
  R1.12 Lines 198-202 describe a simulation done to test the dependence of kw on ATT. This is important and could be explained more in detail. Adding simulation results (numeric or figure) to supplementary materials would increase reproducibility and understanding for others.
  
  We apologize for not referencing to the simulation results in the main text. We simulated kw distribution for females by adjusting ATT by +60 ms to matching males’ ATT, leading to a marginally higher kw values. And these results were shown in the Supplemental Figure S2 C (yellow):
  
  We have now referenced the simulation results in the updated manuscript (L206).
  
  R1.13 No limitations of the presented work are mentioned. A critical perspective would increase the scientific impact on future research decisions and implementation of this method by others.
  
  Thanks for the suggestion, we agree the limitations need to be acknowledged. We have added a limitation paragraph in the revised manuscript (L406): "Limitations of the study and future directions: There are a few limitations of this study. A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2). To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2). Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological stages (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13). Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies. Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to the unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This study used a novel diffusion-weighted pseudo-continuous arterial spin labelling (pCASL) technique to simultaneously explore age- and sex-related differences in brain tissue perfusion (i.e., cerebral blood flow (CBF) & arterial transit time (ATT) - a measure of CBF delivery to brain tissue) and blood-brain barrier (BBB) function, measured as the water exchange (kw) across the BBB. While age- and sex-related effects on CBF are well known, this study provides new insights to support the growing evidence of these important factors in cerebrovascular health, particularly in BBB function. Across the brain, the decline in CBF and BBB function (kw) and elevation in ATT were reported in older adults, after the age of 60, and more so in males compared to females. This was also evident in key cognitive regions including the insular, prefrontal, and medial temporal regions, stressing the consideration of age and sex in these brain physiological assessments.
  
  Strengths:
  
  Simultaneous assessment of CBF with BBB along with transit time and at the voxel-level helped elucidate the brain's vulnerability to age and sex-effects. It is apparent that the investigators carefully designed this study to assess regional associations of age and sex with attention to exploring potential non-linear effects.
  
  Weaknesses:
  
  R2.0 It appears that no brain region showed concurrent CBF and BBB dysfunction (kw), based on the results reported in the main manuscript and supplemental information. Was an association analysis between CBF and kw performed? There is a potential effect of the level of formal education on CBF (PMID: 12633147; 15534055), which could have been considered and accounted for as well, especially for a cohort with stated diversity (age, race, sex).
  
  Thank you for your positive feedback and comments on the potential associations between BBB kw and other physiological parameters (e.g., CBF) and socioeconomic factors (e.g., education). We have made the following changes to the updated manuscript:
  
  (1) We conducted additional linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized in Supplemental Table S6. We found that BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be influenced by different aspects of neurovascular function represented by CBF and ATT at different stages of aging.
  
  (2) One limitation of this study is the lack of information on participants’ geographical, cultural, physical characteristics, and socioeconomic factors. While we included race as a covariate to account for potential variations observed in previous research, race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes. We have acknowledged this limitation by adding the following discussion in the updated manuscript: “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research. However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health. For example, education has been shown to be highly relevant to regional CBF changes in AD. Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  General comments:
  
  I commend the authors on a very well-written and laid-out study. General remarks have been provided in the short assessment and public review sections.
  
  We would like to thank the reviewer for the insightful suggestions and overall positive feedback. We have substantial revised and improved our manuscript, and point-to-point responses can be found in the following sections and in the annotated manuscript.
  
  Specific comments:
  
  Results:
  
  R2.1 Line 127: "since race may influence the changes in perfusion and kw with aging, it was included as a covariate". It is not clear how race - a simplistic term for ethnicity or to be more specific ancestry has been shown to influence changes in perfusion? Is it known for a fact that for example, older Black people have lower/higher CBF or kw compared to Asians or Asians to Caucasian Americans? Can this be extrapolated to Japanese Brazilians having different patterns of regional CBF to Caucasian or Black Brazilians or similar patterns of CBF to Japanese people in Japan since they share similar race? Do Dutch people in the Netherlands share CBF characteristics to their descendants in the US or in South Africa? Would the geographical, cultural, and other physical characteristics of one's ethnicity or lineage impact CBF? Race is often used as a poor substitute for the complex interactions of physical, socioeconomic, and geopolitical factors that produce disparities that may have measurable biological effects including CBF. But it is not clear why being one race vs the other will impact CBF, without carefully parcelling out the many factors beyond biology, if any. Is any of the participants in the study mixed race? How about recently settled individuals who may identify for example as Black but have spent all their life up to adult years outside of the US and marked here in the study as simply African American? Not that I am saying this is the case. However this simplification may require more careful analysis.
  
  In our study, no participant indicated to be mixed-race, and unfortunately we do not have additional information about their specific ancestry or information about their geographical, cultural, and other physical characteristics. We acknowledge that race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes, including perfusion and BBB function. The use of race as a covariate in our study is intended to account for potential variations observed in previous research, rather than to imply a direct causal relationship.
  
  Research has shown differences in blood flow among racial groups (18, 19). However, these differences are not solely attributable to race, and they are also shaped by environmental exposures, lifestyle factors, healthcare access, and other social determinants of health (20). We have added the following discussion in the updated manuscript (L436): “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”
  
  R2.2 Figure 3: Could the standard deviation of the reported values be also stated so the variance can be appreciated?
  
  Thanks for the suggestion, we have added the standard deviation of the kw, CBF and ATT values on the updated Figure 3:
  
  R2.3 Discussions: Line 280: .."observed distinct trajectory of kw changes with aging as compared with CBF and ATT. I presume this as compared to the earlier statements (line 268) of pervasive increase in ATT and decrease in CBF across the brain. Were there any brain regions that showed increased ATT, decreased CBF and kw as a function of age or even sex?? Was there any association between CBF and kw in any brain regions, across the participants after controlling for sex differences? If there is a suspicion of early BBB dysfunction (line 286) preceding cognitive decline that has been also suspected with CBF, is this concomitant with CBF in most people? This could maybe make CBF an easier and more straightforward biomarker since its effects mirror that of BBB? I suspect it generally does not, even in healthy aging. It would have been great to shed more light on this with your results and in your discussion.
  
  Thank you for your comments. By 'distinct trajectory of kw changes with aging,' we refer to the ‘turning point’ in age at which kw starts declining. BBB kw remained relatively stable and began to decline in the early 60s, while CBF consistently decreased and ATT consistently increased with age, although the rates of change differed at 22 years and 36 years, respectively. Using linear regressions for voxel analysis, Figure 4 shows that age-dependent decreases in CBF and increases in ATT were observed in most of the brain. However, significant age-related decreases in kw were more localized to specific brain regions and were mostly accompanied by simultaneous decreases in CBF and increases in ATT. We highlighted this finding in the updated manuscript (L250): “In the brain regions showing significant age-related kw decreases (Fig. 4A), these decreases are mostly accompanied by CBF decreases (Fig. 4B) and ATT increases (Fig. 4C).”
  
  Thank you for your suggestion regarding the relationship between kw and CBF. We further conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized Supplemental Table S6.
  
  This new supplemental tables shows many interesting results. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years.
  
  We have added the following discussion to the updated manuscript (L307): 'We observed a distinct trajectory of kw changes with aging compared to CBF and ATT. To study the potential regional associations between kw and CBF and ATT, we conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining), respectively. The results are shown in Supplemental Table S6. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, PHG, and MTL in participants aged 8-61 years (when kw was relatively consistent across ages), but no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional brain regions, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be affected by different aspects of neurovascular function represented by CBF and ATT at different stages of aging."
  
  Other notes:
  
  R2.4 While reading the results section, two things that jump out at me when I saw the sex differences: 1) hematocrit and 2) menopausal status. I saw in the discussion that these were touched on. I may have missed this in the methods, was hematocrit collected and included in the parameters estimates?? Was the menopausal status including ERT (estrogen replacement therapies) recorded and factored in? If not these could be included as limitations that may confound the results, especially when the age groups were split to include a group comprising or potentially both pre-and post-menopausal females (36-61).
  
  We do not have the information about hematocrit nor menopausal status and they were not included in data analysis. We agree this is a limitation of the current study and we discussed in the updated manuscript (L442): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”
  
  R2.5 The general vascular health of the cohort is not well described especially if some of the participants were from sickle cell study. While they are cognitively normal and free from major medical illnesses, or neurological disorders, did the sample also include individuals with considerable vascular risk factors and metabolic syndrome (known to affect CBF), especially in the older cohort??
  
  We agree with the reviewer that vascular health can significantly impact perfusion and BBB function. Since the data presented in this study were collected from multiple cohorts, vascular risk factors were not available in all cohorts and thus were not included as covariates in the data analysis. To account for potential vascular variations across participants, we included CBF and ATT as covariates in our analysis on age related BBB kw changes. We have added discussion in the updated manuscript (L442, same as our response to the previous comment): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”.
  
  References:
  
  (1) K. S. St Lawrence, D. Owen, D. J. Wang, A two-stage approach for measuring vascular water exchange and arterial transit time by diffusion-weighted perfusion MRI. Magn Reson Med 67, 1275-1284 (2012).
  
  (2) X. Shao, C. Zhao, Q. Shou, K. S. St Lawrence, D. J. Wang, Quantification of blood–brain barrier water exchange and permeability with multidelay diffusion‐weighted pseudo‐continuous arterial spin labeling. Magnetic Resonance in Medicine (2023).
  
  (3) P. Giannakopoulos, E. Kövari, F. R. Herrmann, P. R. Hof, C. Bouras, Interhemispheric distribution of Alzheimer disease and vascular pathology in brain aging. Stroke (2009).
  
  (4) A. Mahroo, S. Konstandin, M. Günther, Blood–Brain Barrier Permeability to Water Measured Using Multiple Echo Time Arterial Spin Labeling MRI in the Aging Human Brain. Journal of Magnetic Resonance Imaging 59, 1269-1282 (2024).
  
  (5) Y. Ohene et al., Increased blood–brain barrier permeability to water in the aging brain detected using noninvasive multi‐TE ASL MRI. Magnetic resonance in medicine 85, 326-333 (2021).
  
  (6) B. R. Dickie, H. Boutin, G. J. Parker, L. M. Parkes, Alzheimer's disease pathology is associated with earlier alterations to blood–brain barrier water permeability compared with healthy ageing in TgF344‐AD rats. NMR in Biomedicine 34, e4510 (2021).
  
  (7) Y. Ying et al., Heterogeneous blood‐brain barrier dysfunction in cerebral small vessel diseases. Alzheimer's & Dementia (2024).
  
  (8) V. Zachariou et al., Regional differences in the link between water exchange rate across the blood–brain barrier and cognitive performance in normal aging. GeroScience, 1-18 (2023).
  
  (9) Y. Zhang et al., Increased cerebral vascularization and decreased water exchange across the blood-brain barrier in aquaporin-4 knockout mice. PLoS One 14, e0218415 (2019).
  
  (10) Y. Ohene et al., Non-invasive MRI of brain clearance pathways using multiple echo time arterial spin labelling: an aquaporin-4 study. NeuroImage 188, 515-523 (2019).
  
  (11) Y. V. Tiwari, J. Lu, Q. Shen, B. Cerqueira, T. Q. Duong, Magnetic resonance imaging of blood–brain barrier permeability in ischemic stroke using diffusion-weighted arterial spin labeling in rats. Journal of Cerebral Blood Flow & Metabolism 37, 2706-2715 (2017).
  
  (12) Z. Wei et al., Non-contrast assessment of blood-brain barrier permeability to water in mice: an arterial spin labeling study at cerebral veins. NeuroImage, 119870 (2023).
  
  (13) Y. Jia et al., Transmembrane water-efflux rate measured by magnetic resonance imaging as a biomarker of the expression of aquaporin-4 in gliomas. Nature Biomedical Engineering 7, 236-252 (2023).
  
  (14) L. Nobis et al., Hippocampal volume across age: Nomograms derived from over 19,700 people in UK Biobank. NeuroImage: Clinical 23, 101904 (2019).
  
  (15) S. Rane et al., Inverse correspondence between hippocampal perfusion and verbal memory performance in older adults. Hippocampus 23, 213-220 (2013).
  
  (16) S. Heo et al., Resting hippocampal blood flow, spatial memory and aging. Brain research 1315, 119-127 (2010).
  
  (17) O. Gannon, L. Robison, A. Custozzo, K. Zuloaga, Sex differences in risk factors for vascular contributions to cognitive impairment & dementia. Neurochemistry international 127, 38-55 (2019).
  
  (18) A. E. Leeuwis et al., Cerebral blood flow and cognitive functioning in a community-based, multi-ethnic cohort: the SABRE study. Frontiers in aging neuroscience 10, 279 (2018).
  
  (19) L. R. Clark et al., Association of cardiovascular and Alzheimer’s disease risk factors with intracranial arterial blood flow in Whites and African Americans. Journal of Alzheimer's Disease 72, 919-929 (2019).
  
  (20) D. R. Williams, S. A. Mohammed, Discrimination and racial disparities in health: evidence and needed research. Journal of behavioral medicine 32, 20-47 (2009).
  
  (21) N. Scarmeas et al., Association of life activities with cerebral blood flow in Alzheimer disease: implications for the cognitive reserve hypothesis. Archives of neurology 60, 359-365 (2003).
  
  (22) N.-T. Chiu, B.-F. Lee, S. Hsiao, M.-C. Pai, Educational level influences regional cerebral blood flow in patients with Alzheimer’s disease. Journal of Nuclear Medicine 45, 1860-1863 (2004).
  
  (23) R. C. Gur et al., Gender differences in age effect on brain atrophy measured by magnetic resonance imaging. Proceedings of the National Academy of Sciences 88, 2845-2849 (1991).
  
  (24) M. J. Cipolla, J. A. Godfrey, M. J. Wiegman, The effect of ovariectomy and estrogen on penetrating brain arterioles and blood-brain barrier permeability. Microcirculation 16, 685-693 (2009).
  
  (25) A. C. Wilson et al., Reproductive hormones regulate the selective permeability of the blood-brain barrier. Biochim Biophys Acta 1782, 401-407 (2008).
  
  (26) M. S. Stringer et al., Tracer kinetic assessment of blood–brain barrier leakage and blood volume in cerebral small vessel disease: Associations with disease burden and vascular risk factors. NeuroImage: Clinical 32, 102883 (2021).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.12.575463v2
www.biorxiv.org www.biorxiv.org

New submission 11/02/2024, 12:12:42

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1
  
  Strengths:
  
  This study uses a carefully constructed experiment design and decision-making task that allows separation of multiple electroencephalographic (EEG) signals thought to track different stages of decision-making. For example, the steady-state visual evoked potential measures can be cleanly dissociated from more anterior beta-band activity over the motor cortex. They also allow evaluation of how cued expectancy effects may unfold over a number of testing sessions. This is important because the most consistent evidence of expectation-related modulations of electrophysiological measures (using EEG, local field potentials, or single neuron firing rates) is from studies of nonhuman primates that involved many days of cue-stimulus contingency learning, and there is a lack of similar work using several testing sessions in humans. Although there were several experimental conditions included in the study, careful trial-balancing was conducted to minimise biases due to incidental differences in the number of trials included for analyses across each condition. Performance for each individual was also carefully calibrated to maximise the possibility of identifying subtle changes in task performance by expectation and avoid floor or ceiling effects.
  
  We would like to thank Reviewer 1 for these very positive comments.
  
  Weaknesses:
  
  Although the experiment and analysis methods are cohesive and well-designed, there are some shortcomings that limit the inferences that can be drawn from the presented findings.
  
  Comment #1
  
  The first relates to the measures of SSVEPs and their relevance for decision-making in the task. In order to eliminate the influence of sporadic pulses of contrast changes that occurred during stimulus presentation, a time window of 680-975 ms post-stimulus onset was used to measure the SSVEPs. The mean response times for the valid and neutral cues were around 850-900 ms for correct responses, and within the same time window for errors in the invalid cue condition. In addition, a large portion of response times in perceptual decision-making tasks are substantially faster than the mean due to right-skewed response time distributions that are typically observed. As it has also been estimated to require 70-100 ms to execute a motor action (e.g., a keypress response) following the commitment to a decision. This raises some concerns about the proportion of trials in which the contrast-dependent visual responses (indexed by the SSVEPs) indexed visual input that was actually used to make the decision in a given trial. Additional analyses of SSVEPs that take the trial-varying pulses into account could be run to determine whether expectations influenced visual responses earlier in the trial.
  
  The reviewer raises a very valid point and, indeed, it is an issue that we grappled with in our analyses. Actually, in this study, the RT distributions were not right-skewed, but appear to be relatively normal (RT distributions shown below). This is something that we have previously observed when using tasks that involve an initial zero-evidence lead in at the start of each trial which means that participants cannot start accumulating at stimulus onset and must rely on their knowledge of the lead-in duration to determine when the physical evidence has become available (e.g. Kelly et al 2021, Nat Hum Beh). We agree that it is important to establish whether the reported SSVEP modulations occur before or after choice commitment. In our original submission we had sought to address this question through our analysis of the response-locked ‘difference SSVEP’. Figure 4D clearly indicates that the cue modulations are evident before as well as after response.
  
  However, we have decided to include an additional Bayesian analysis of the response-locked signal to offer more evidence that the cue effect is not a post-response phenomenon.
  
  Manuscript Changes
  
  To quantify the evidence that the cue effect was not driven by changes in the signal after the response, we ran Bayesian one-way ANOVAs on the SSVEP comparing the difference across cue conditions before and after the response. If the cue effect only emerged after the response, we would expect the difference between invalid and neutral or invalid and valid cues to increase in the post-response window. There was no compelling evidence of an increase in the effect when comparing invalid to neutral (BF10 = 1.58) or valid cues (BF10 = 0.32).
  
  Comment #2
  
  Presenting response time quantile plots may also help to determine the proportions of motor responses (used to report a decision) that occurred during or after the SSVEP measurement window.
  
  We agree that it may be helpful for the reader to be able to determine the proportion of responses occurring at different phases of the trial, so we have included the requested response time quantile plot (shown below) as a supplementary figure.
  
  Author response image 1.
  
  Reaction time quantiles across cue conditions. The plot illustrates the proportion of trials where responses occurred at different stages of the trial. The SSVEP analysis window is highlighted in purple.
  
  Comment #3
  
  In addition, an argument is made for changes in the evidence accumulation rate (called the drift rate) by stimulus expectancy, corresponding to the observed changes in SSVEP measures and differences in the sensory encoding of the stimulus. This inference is limited by the fact that evidence accumulation models (such as the Diffusion Decision Model) were not used to test for drift rate changes as could be determined from the behavioural data (by modelling response time distributions). There appear to be ample numbers of trials per participant to test for drift rate changes in addition to the starting point bias captured in earlier models. Due to the very high number of trials, models could potentially be evaluated for each single participant. This would provide more direct evidence for drift rate changes than the findings based on the SSVEPs, particularly due to the issues with the measurement window relating to the response times as mentioned above.
  
  The focus of the present study was on testing for sensory-level modulations by predictive cues, rather than testing any particular models. Given that the SSVEP bears all the characteristics of a sensory evidence encoding signal, we believe it is reasonable to point out that its modulation by the cues would very likely translate to a drift rate effect. But we do agree with the reviewer that any connection between our results and previously reported drift rate effects can only be confirmed with modelling and we have tried to make this clear in the revised text. We plan to comprehensively model the data from this study in a future project. While we do indeed have the benefit of plenty of trials, the modelling process will not be straightforward as it will require taking account of the pulse effects which could have potentially complicated, non-linear effects. In the meantime, we have made changes to the text to qualify the suggestion and stress that modelling would be necessary to determine if our hypothesis about a drift rate effect is correct.
  
  Manuscript Changes
  
  (Discussion): [...] We suggest that participants may have been able to stabilise their performance across task exposure, despite reductions in the available sensory evidence, by incorporating the small sensory modulation we detected in the SSVEP. This would suggest that the decision process may not operate precisely as the models used in theoretical work describe. Instead, our study tentatively supports a small number of modelling investigations that have challenged the solitary role of starting point bias, implicating a drift bias (i.e. a modulation of the evidence before or upon entry to the decision variable) as an additional source of prior probability effects in perceptual decisions (Dunovan et al., 2014; Hanks et al., 2011; Kelly et al., 2021; van Ravenzwaaij et al., 2012 Wyart et al., 2012) and indicates that these drift biases could, at least partly, originate at the sensory level. However, this link could only be firmly established with modelling in a future study.
  
  Recommendations For The Authors:
  
  Comment #4
  
  The text for the axis labels and legends in the figures is quite small relative to the sizes of the accompanying plots. I would recommend to substantially increase the sizes of the text to aid readability.
  
  Thank you for this suggestion. We have increased the size of the axis labels and made the text in the figure legends just 1pt smaller than the text in the main body of the manuscript.
  
  Comment #5
  
  It is unclear if the scalp maps for Figure 5 (showing the mu/beta distributions) are on the same scale or different scales. I assume they are on different scales (adjusted to the minimum/maximum within each colour map range), as a lack of consistent signals (in the neutral condition) would be expected to lead to a patchy pattern on the scalp as displayed in that figure (due to the colour range shrinking to the degree of noise across electrodes). I would recommend to include some sort of colour scale to show that, for example, in the neutral condition there are no large-amplitude mu/ beta fluctuations distributed somewhat randomly across the scalp.
  
  Thank you to the reviewer for pointing this out. They were correct, the original topographies were plotted according to their own scale. The topographies in Figure 5 have now been updated to put them on a common scale and we have included a colour bar (as shown below). The caption for Figure 5 has also been updated to confirm that the topos are on a common scale.
  
  Author response image 2.
  
  Manuscript Changes
  
  (Figure 5 Caption): [...] The topography of MB activity in the window - 200:0 ms before evidence onset is plotted on a common scale for neutral and cued conditions separately.
  
  Comment #6
  
  In Figure 2, the legend is split across the two panels, despite the valid/invalid/neutral legend also applying to the first panel. This gives an initial impression that the legend is incomplete for the first panel, which may confuse readers. I would suggest putting all of the legend entries in the first panel, so that all of this information is available to readers at once.
  
  We are grateful to the reviewer for spotting this. Figure 2 has been updated so that the full legend is presented in the first panel, as shown below.
  
  Author response image 3.
  
  Comment #7
  
  Although linear mixed-effects models (using Gaussian families) for response times are standard in the literature, they incorrectly specify the distributions of response times to be Gaussian instead of substantially right-skewed. Generalised linear mixed-effects models using gamma families and identity functions have been shown to more accurately model distributions of response times (see Lo and Andrews, 2015. Frontiers in Psychology). The authors may consider using these models in line with good practice, although it might not make a substantial difference relating to the patterns of response time differences.
  
  We appreciate this thoughtful comment from Reviewer 1. Although RT distributions are often right skewed, we have previously observed that RT distributions can be closer to normal when the trial incorporates a lead-in phase with no evidence (e.g. Kelly et al 2021, Nat Hum Beh). Indeed, the distributions we observed in this study were markedly Gaussian (as shown in the plot below). Given the shape of these distributions and the reviewer’s suggestion that adopting alternative models may not lead to substantial differences to our results, we have decided to leave the mixed effects models as they are in the manuscript, but we will take note of this advice in future work.
  
  Author response image 4.
  
  Reviewer #2
  
  Strengths:
  
  The work is executed expertly and focuses cleverly on two features of the EEG signals that can be closely connected to specific loci of the perceptual decision-making process - the SSVEP which connects closely to sensory (visual) encoding, and Mu-Beta lateralisation which connects closely to movement preparation. This is a very appropriate design choice given the authors' research question.
  
  Another advantage of the design is the use of an unusually long training regime (i.e., for humans) - which makes it possible to probe the emergence of different expectation biases in the brain over different timecourses, and in a way that may be more comparable to work with nonhuman animals (who are routinely trained for much longer than humans).
  
  We are very grateful for these positive comments from Reviewer 2.
  
  Weaknesses:
  
  In my view, the principal shortcoming of this study is that the experimental task confounds expectations about stimulus identity with expectations about to-be-performed responses. That is, cues in the task don't just tell participants what they will (probably) see, but what they (probably) should do.
  
  In many respects, this feature of the paradigm might seem inevitable, as if specific stimuli are not connected to specific responses, it is not possible to observe motor preparation of this kind (e.g., de Lange, Rahnev, Donner & Lau, 2013 - JoN).
  
  However, the theoretical models that the authors focus on (e.g., drift-diffusion models) are models of decision (i.e., commitment to a proposition about the world) as much as they are models of choice (i.e., commitment to action). Expectation researchers interested in these models are often interested in asking whether predictions influence perceptual processing, perceptual decision, and/ or response selection stages (e.g., Feuerriegel, Blom & Hoogendorn, 2021 - Cortex), and other researchers have shown that parameters like drift bias and start point bias can be shifted in paradigms where observers cannot possibly prepare a response (e.g., Thomas, Yon, de Lange & Press, 2020 - Psych Sci).
  
  The present paradigm used by Walsh et al makes it possible to disentangle sensory processing from later decisional processes, but it blurs together the processes of deciding about the stimulus and choosing/initiating the response. This ultimately limits the insights we can draw from this study - as it remains unclear whether rapid changes in motor preparation we see reflect rapid acquisition of new decision criterion or simple cue-action learning. I think this would be important for comprehensively testing the models the authors target - and a good avenue for future work.
  
  Thank you to Reviewer 2 for these observations. We adopted this paradigm because it is typical of the perceptual decision making literature and our central focus in this study was to test for a sensory-level modulation as a source of a decision bias. We are pleased that the Reviewer agrees that the paradigm successfully disentangles sensory encoding from later decisional processes since this was our priority. However, we agree with Reviewer 2 that because the response mapping was known to the participants, the cues predicted both the outcome of the perceptual decision (“Is this a left- or right-tilted grating?”) and the motor response that the participant should anticipate making (“It’s probably going to be a left click on this trial”). They are correct that this makes it difficult to know whether the changes in motor preparation elicited by the predictive cues reflect action-specific preparation or a more general shift in the boundaries associated with the alternate perceptual interpretations. We fully agree that it remains an interesting and important question and in our future work we hope to conduct investigations that better dissect the distinct components of the decision process during prior-informed decisions. In the interim, we have made some changes to the manuscript to reflect the Reviewer’s concerns and better address this limitation of the study design (these are detailed in the response to the comment below).
  
  Recommendations For The Authors:
  
  Comment #8
  
  As in my public review, my main recommendation to the authors is to think a bit more in the presentation of the Introduction and Discussion about the difference between 'perceiving', 'deciding', and 'responding'.
  
  The paper is presently framed in terms of the debates around whether expectations bias decision or bias perception - and these debates are in turn mapped onto different aspects of the driftdiffusion model. Biases in sensory gain, for instance, are connected to biases in the drift rate parameter, while decisional shifts are connected to parameters like start points.
  
  In line with this kind of typology, the authors map their particular EEG signals (SSVEP and MB lateralisation) onto perception and decision. I see the logic, but I think the reality of these models is more nuanced.
  
  In particular, strictly speaking, the process of evidence accumulation to bound is the formation of a 'decision' (i.e., a commitment to having seen a particular stimulus). Indeed, the dynamics of this process have been beautifully described by other authors on this paper in the past. Since observers in this task simultaneously form decisions and prepare actions (because stimuli and responses are confounded) it is unclear whether changes in motor preparation are reflecting changes in what perceivers 'decide' (i.e., changes in what crosses the decision threshold) or what they 'do' (i.e., changes in the motor response threshold). This is particularly important for the debate around whether expectations change 'perception' or 'decision' because - in some accounts - is the accumulation of evidence to the bound that is hypothesised to cause the perceptual experience observers actually have (Pereira, Perrin & Faivre, 2022 - TiCS). The relevant 'bound' here though is not the bound to push the button, but the bound for the brain to decide what one is actually 'seeing'.
  
  I completely understand the logic behind the authors' choices, but I would have liked more discussion of this issue. In particular, it seems strange to me to talk about the confounding of stimuli and responses as a particular 'strength' of this design in the manuscript - when really it is a 'necessary evil' for getting the motor preparation components to work. Here is one example from the Introduction:
  
  "While some have reported expectation effects in humans using EEG/MEG, these studies either measured sensory signals whose relevance to the decision process is uncertain (e.g. Blom et al., 2020; Solomon et al., 2021; Tang et al., 2018) and/or used cues that were implicit or predicted a forthcoming stimulus but not the correct choice alternative (e.g. Aitken et al., 2020; Feuerriegel et al., 2021b; Kok et al., 2017). To assess whether prior probabilities modulate sensory-level signals directly related to participants' perceptual decisions, we implemented a contrast discrimination task in which the cues explicitly predicted the correct choice and where sensory signals that selectively trace the evidence feeding the decision process could be measured during the process of deliberation."
  
  I would contend that this design allows you to pinpoint signals related to participant's 'choices' or 'actions' but not necessarily their 'decisions' in the sense outlined above.
  
  As I say though, I don't think this is fatal and I think the paper is extremely interesting in any case. But I think it would be strengthened if some of these nuances were discussed a bit more explicitly, as a 'perceptual decision' is more than pushing a button. Indeed, the authors might want to consider discussing work that shows the neural overlap between deciding and acting breaks down when Ps cannot anticipate which actions to use to report their choices ahead of time (Filimon, Philiastides, Nelson, Kloosterman & Heekeren, 2013 - JoN) and/or work which has combined expectations with drift diffusion modelling to show how expectations change drift bias (Yon, Zainzinger, de Lange, Eimer & Press, 2020 - JEP:General) and/or start bias (Thomas, Yon, de Lange & Press, 2020 - Psych Sci) even when Ps cannot prepare a motor response ahead of time.
  
  While our focus was on testing for sensory-level modulations, we think the question of whether the motor-level effects we observed are attributable to the task design or represents a more general perceptual bound adjustment is an important question for future research. In our previous work, we have examined this distinction between abstract, movement-independent evidence accumulation (indexed by the centro-parietal positivity, CPP) and response preparation in detail. The CPP has been shown to trace evidence accumulation irrespective of whether the sensory alternatives are associated with a specific response or not (Twomey et al 2016, J Neurosci). When speed pressure is manipulated in tasks with fixed stimulus-response mappings we have found that the CPP undergoes systematic adjustments in its pre-response amplitude that closely accord with the starting-level modulations observed in mu/beta, suggesting that motor-level adjustments do still translate to differences at the perceptual level under these task conditions (e.g. Kelly et al 2021, Nat Hum Beh; Steinemann et al., 2018, Nat Comms). We have also observed that the CPP and mu-beta exhibit corresponding adjustments in response to predictive cues (Kelly et al., 2021) that are consistent with both a starting-point shift and drift rate bias. However, the Kelly et al. study did not include a signature of sensory encoding and therefore could not test for sensory-level modulations.
  
  We have added some remarks to the discussion to acknowledge this issue with the interpretation of the preparatory shifts in mu-beta activity we observed when the predictive cues were presented, and we have included references to the papers that the reviewer helpfully provided. We have also offered some additional consideration of the features of the task design that may have influenced the SSVEP results.
  
  Manuscript Changes
  
  An implication of using cues that predict not just the upcoming stimulus, but the most likely response, is that it becomes difficult to determine if preparatory shifts in mu-beta (MB) activity that we observed reflect adjustments directly influencing the perceptual interpretation of the stimulus or simply preparation of the more probable action. When perceptual decisions are explicitly tied to particular modes of response, the decision state can be read from activity in motor regions associated with the preparation of that kind of action (e.g. de Lafuente et al., 2015; Ding & Gold, 2012; Shadlen & Newsome, 2001; Romo et al., 2004), but these modules appear to be part of a constellation of decision-related areas that are flexibly recruited based on the response modality (e.g. Filimon et al., 2013). When the response mapping is withheld or no response is required, MB no longer traces decision formation (Twomey et al., 2015), but an abstract decision process is still readily detectable (e.g. O’Connell et al., 2012), and modelling work suggests that drift biases and starting point biases (Thomas et al., 2020; Yon et al., 2021) continue to influence prior-informed decision making. While the design of the present study does not allow us to offer further insight about whether the MB effects we observed were inherited from strategic adjustments at this abstract level of the decision process, we hope to conduct investigations in the future that better dissect the distinct components of prior-informed decisions to address this question.
  
  Several other issues remain unaddressed by the present study. One, is that it is not clear to what extent the sensory effects may be influenced by features of the task design (e.g. speeded responses under a strict deadline) and if these sensory effects would generalise to many kinds of perceptual decision-making tasks or whether they are particular to contrast discrimination.
  
  Comment #9
  
  On a smaller, unrelated point - I thought the discussion in the Discussion section about expectation suppression was interesting, but I did not think it was completely logically sound. The authors suggest that they may see relative suppression (rather than enhancement) of their marginal SSVEP under a 'sharpening' account because these accounts suggest that there is a relative suppression of off-channel sensory units, and there are more off-channel sensory units than onchannel sensory units (i.e., there are usually more possibilities we don't expect than possibilities that we do, and suppressing the things we don't expect should therefore yield overall suppression).
  
  However, this strikes me as a non-sequitur given that the marginal SSVEP only reflects featurespecific visual activity (i.e., activity tuned to one of the two grating stimuli used). The idea that there are more off-channel than on-channel units makes sense for explaining why we would see overall signal drops on expected trials e.g., in an entire visual ROI in an fMRI experiment. But surely this explanation cannot hold in this case, as there is presumably an equal number of units tuned to each particular grating?
  
  My sense is that this possibility should probably be removed from the manuscript - and I suspect it is more likely that the absence of a difference in marginal SSVEP for Valid vs Neutral trials has more to do with the fact that participants appear to be especially attentive on Neutral trials (and so any relative enhancement of feature-specific activity for expected events is hard to detect against a baseline of generally high-precision sensory evidence on these highly attentive, neutral trials).
  
  We thank the reviewer for flagging that we did not clearly articulate our thoughts in this section of the manuscript. Our primary purpose in mentioning this sharpening account was simply to point out that, where at first blush our results seem to conflict with expectation suppression effects in the fMRI literature, the sharpening account provides an explanation that can reconcile them. In the case of BOLD data, the sharpening account proposes that on-channel sensory units are boosted and off-channel units are suppressed and, due to the latter being more prevalent, this leads to an overall suppression of the global signal. In the case of the SSVEP, the signal isolates just the onunits and so the sharpening account would predict that when there is a valid cue, the SSVEP signal associated with the high-contrast, expected stimulus should be boosted and the SSVEP signal associated with the low-contrast, unexpected stimulus should be weakened; this would result in a larger difference between these signals and therefore, a larger ‘marginal SSVEP’. Conversely, when there is an invalid cue, the SSVEP signal associated with the, now unexpected, high-contrast stimulus should be relatively weakened and the SSVEP signal associated with the expected, but low-contrast stimulus should be relatively boosted; this would result in a smaller difference between these signals and therefore, a lower amplitude marginal SSVEP. We do not think that this account needs to make reference to any channels beyond those feature-specific channels driving the two SSVEP signals. Again our central point is simply that the sharpening account offers a means of reconciling our SSVEP findings with expectation suppression effects previously reported in the fMRI literature.
  
  We suspect that this was not adequately explained in the discussion. We have adjusted the way this section is phrased to make it clear that we are not invoking off-channel activity to explain the SSVEP effect we observed and we thank the Reviewer for pointing out that this was unclear in the original text.
  
  Manuscript Changes
  
  An alternative account for expectation suppression effects, which is consistent with our SSVEP results, is that they arise, not from a suppression of expected activity, but from a ‘sharpening’ effect whereby the response of neurons that are tuned to the expected feature are enhanced while the responses of neurons tuned to unexpected features are suppressed (de Lange et al., 2018). On this account, the expectation suppression commonly reported in fMRI studies arises because voxels contain intermingled populations with diverse stimulus preferences and the populations tuned to the unexpected features outnumber those tuned to the expected feature. In contrast to these fMRI data, the SSVEP represents the activity of sensory units driven at the same frequency as the stimulus, and thus better isolates the feature-specific populations encoding the task-relevant sensory evidence. Therefore, according to the sharpening account, an invalid cue would have enhanced the SSVEP signal associated with the low contrast grating and weakened the SSVEP signal associated with the high contrast grating. As this would result in a smaller difference between these signals, and therefore, a lower amplitude marginal SSVEP compared to the neutral cue condition, this could explain the effect we observed.
  
  Reviewer #3
  
  Observers make judgements about expected stimuli faster and more accurately. How expectations facilitate such perceptual decisions remains an ongoing area of investigation, however, as expectations may exert their effects in multiple ways. Expectations may directly influence the encoding of sensory signals. Alternatively (or additionally), expectations may influence later stages of decision-making, such as motor preparation, when they bear on the appropriate behavioral response.
  
  In the present study, Walsh and colleagues directly measured the effect of expectations on sensory and motor signals by making clever use of the encephalogram (EEG) recorded from human observers performing a contrast discrimination task. On each trial, a predictive cue indicated which of two superimposed stimuli would likely be higher contrast and, therefore, whether a left or right button press was likely to yield a correct response. Deft design choices allowed the authors to extract both contrast-dependent sensory signals and motor preparation signals from the EEG. The authors provide compelling evidence that, when predictive cues provide information about both a forthcoming stimulus and the appropriate behavioral response, expectation effects are immediately manifest in motor preparation signals and only emerge in sensory signals after extensive training.
  
  Future work should attempt to reconcile these results with related investigations in the field. As the authors note, several groups have reported expectation-induced modulation of sensory signals (using both fMRI and EEG/MEG) on shorter timescales (e.g. just one or two sessions of a few hundred trials, versus the intensive multi-session study reported here). One interesting possibility is that perceptual expectations are not automatic but demand the deployment of feature-based attention, while motor preparation is comparatively less effortful and so dominates when both sources of information are available, as in the present study. This hypothesis is consistent with the authors' thoughtful analysis showing decreased neural signatures of attention over posterior electrodes following predictive cues. Therefore, observing the timescale of sensory effects using the same design and methods (facilitating direct comparison with the present work), but altering task demands slightly such that cues are no longer predictive of the appropriate behavioral response, could be illuminating.
  
  We would like to thank Reviewer 3 for their positive comments and thoughtful suggestions for future work.
  
  Recommendations For The Authors:
  
  Comment #10
  
  In the methods, the term 'session' is used early on but only fleshed out at the end of the 'Procedure' subsection and never entirely explained (e.g., did sessions take place over multiple days?). A brief sentence laying this out early on, perhaps in 'Participants' after the (impressive) trial counts are reported, might be helpful.
  
  Thank you to Reviewer 3 for pointing out that this was not clear in the original draft. We have amended the text in the Methods section to better explain the relationship between sessions, days, and trial bins.
  
  Manuscript Changes
  
  (Methods - Participants): [...] All procedures were approved by the Trinity College Dublin School of Psychology Ethics Committee and were in accordance with the Declaration of Helsinki. Participants completed between 4 and 6 testing sessions, each on a different day. While the sample size was small, on average, participants completed 5750 (SD = 1066) trials each.
  
  (Methods - Data Analysis): [...] As there were two lengths of testing session and participants completed different numbers of sessions, we analysed the effect of task exposure by pooling trials within-subjects and dividing them into five ‘trial bins’. The first bin represents the participants’ earliest exposure to the task and the final bin represents trials at the end of their participation, when they had had substantial task exposure. All trials with valid responses and reaction times greater than 100 ms were included in the analyses of behavioural data and the SSVEP.
  
  Comment #11
  
  On a related note: participants completed a variable number of trials/sessions. To facilitate comparison across subjects, training effects are reported by dividing each subject's data into 5 exposure bins. This is entirely reasonable but does leave the reader wondering about whether you found any effects of rest or sleep between sessions.
  
  We agree with the reviewer that this is an interesting question that absolutely merits further investigation. As different participants completed different numbers of sessions, different session lengths, and had variable gaps between their sessions, we do not think a per-session analysis would be informative. We think it may be better addressed in a future study, perhaps one with a larger sample where we could collect data specifically about sleep and more systematically control the intervals between testing sessions.
  
  Comment #12
  
  Fig 2B: the 'correct' and 'neutral' labels in the legend are switched
  
  Thank you to the reviewer for spotting that error, the labels in Figure 2 have been corrected.
  
  Comment #13
  
  Fig 4B: it's a bit difficult to distinguish which lines are 'thick' and 'thin'
  
  We have updated Figure 4.B to increase the difference in line thickness between the thick and thin lines (as shown below).
  
  Author response image 5.
  
  Comment #14
  
  Fig 4C: missing (I believe?) the vertical lines indicating median reaction time
  
  We have updated Figure 4.C to include the median reaction times.
  
  Author response image 6.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.15.549123v3
www.biorxiv.org www.biorxiv.org

A Statistical Framework for Analysis of Trial-Level Temporal Dynamics in Fiber Photometry Experiments

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.
  
  Thank you for reviewing our manuscript and giving us the opportunity to respond and improve our paper. In our revision, we have strived to address the points raised in the comments, and implement suggested changes where feasible. We have also improved our package and created an analysis guide (available on our Github - https://github.com/gloewing/fastFMM and https://github.com/gloewing/photometry_fGLMM), showing users how to apply our methods and interpret their results. Below, we provide a detailed point-by-point response to the reviewers.
  
  Reviewer #1:
  
  Summary:
  
  Fiber photometry has become a very popular tool in recording neuronal activity in freely behaving animals. Despite the number of papers published with the method, as the authors rightly note, there are currently no standardized ways to analyze the data produced. Moreover, most of the data analyses confine to simple measurements of averaged activity and by doing so, erase valuable information encoded in the data. The authors offer an approach based on functional linear mixed modeling, where beyond changes in overall activity various functions of the data can also be analyzed. More in-depth analysis, more variables taken into account, and better statistical power all lead to higher quality science.
  
  Strengths:
  
  The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.
  
  Thank you for your favorable and detailed description of our work!
  
  Weaknesses:
  
  However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.
  
  Thank you for these important suggestions. We agree that many data pre-processing steps will influence the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we would argue that the sensitivity of analysis results to pre-processing choices should motivate the development of statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. For example, even without many standard pre-processing steps, FLMM provides smooth estimation results across trial timepoints (i.e., the “functional domain”), has the ability to adjust for betweentrial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. We appreciate the reviewer’s suggestion to emphasize and further elaborate on our method from this perspective. We have now included the following in the Discussion section:
  
  “FLMM can help model signal components unrelated to the scientific question of interest, and provides a systematic framework to quantify the additional uncertainty from those modeling choices. For example, analysts sometimes normalize data with trial-specific baselines because longitudinal experiments can induce correlation patterns across trials that standard techniques (e.g., repeated measures ANOVA) may not adequately account for. Even without many standard data pre-processing steps, FLMM provides smooth estimation results across trial time-points (the “functional domain”), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference approach that quantifies the resulting uncertainty. For instance, session-to-session variability in signal magnitudes or dynamics (e.g., a decreasing baseline within-session from bleaching or satiation) could be accounted for, at least in part, through the inclusion of trial-level fixed or random effects. Similarly, signal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects. Inclusion of these effects would then influence the width of the confidence intervals. By expressing one’s “beliefs” in an FLMM model specification, one can compare models (e.g., with AIC). Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences.”
  
  Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution.
  
  By modeling trial signals as “functions”, the method accounts for and exploits correlation across trial timepoints and, as such, any pre-smoothing of the signals should not negatively affect the validity of the 95% CI coverage. It will, however, change inferential results and the interpretation of the data, but this is not unique to FLMM, or many other statistical procedures.
  
  The same question applies if the z-score is calculated based on various responses or even baselines. How reliable the method is if the data are non-stationery and the baselines undergo major changes between separate trials?
  
  Adjustment for trial-to-trial variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of trial-level random effects. This heterogeneity would then influence the width of the confidence intervals, directly conveying the effect of the variability on the conclusions being drawn from the data. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences. Indeed, non-stationarity (e.g., a decreasing baseline within-session) due to, for example, measurement artifacts (e.g., bleaching) or behavioral causes (e.g., satiation, learning) should, if possible, be accounted for in the model. As mentioned above, one can often achieve the same goals that motivate pre-processing steps by instead applying specific FLMM models (e.g., that include trial-specific intercepts to reflect changes in baseline) to the unprocessed data. One can then compare model criteria in an objective fashion (e.g., with AIC) and quantify the uncertainty associated with those modeling choices. Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.
  
  Finally, what is the rationale for not using non-linear analysis methods? Following the paper’s logic, non-linear analysis can capture more information that is diluted by linear methods.
  
  This is a good question that we imagine many readers will be curious about as well. We have added in notes to the Discussion and Methods Section 4.3 to address this (copied below). We thank the reviewer for raising this point, as your feedback also motivated us to discuss this point in Part 5 of our Analysis Guide.
  
  Methods
  
  “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”
  
  Discussion
  
  “In this paper, we specified FLMM models with linear covariate–signal relationships at a fixed trial time-point across trials/sessions, to compare the FLMM analogue of the analyses conducted in (Jeong et al., 2022). However, our package allows modeling of covariate–signal relationships with non-linear functions of covariates, using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models, especially since FLMM is designed for statistical inference.”
  
  Reviewer #2:
  
  Summary:
  
  This work describes a statistical framework that combines functional linear mixed modeling with joint 95% confidence intervals, which improves statistical power and provides less conservative statistical inferences than in previous studies. As recently reviewed by Simpson et al. (2023), linear regression analysis has been used extensively to analyze time series signals from a wide range of neuroscience recording techniques, with recent studies applying them to photometry data. The novelty of this study lies in 1) the introduction of joint 95% confidence intervals for statistical testing of functional mixed models with nested random-effects, and 2) providing an open-source R package implementing this framework. This study also highlights how summary statistics as opposed to trial-by-trial analysis can obscure or even change the direction of statistical results by reanalyzing two other studies.
  
  Strengths:
  
  The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.
  
  The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.
  
  We appreciate the in-depth description of our work and, in particular, the R package. This is an area where we put a lot of effort, since our group is very concerned with the practical experience of users.
  
  Weaknesses:
  
  Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial. As described by the authors, fitting pointwise linear mixed models and performing t-test and BenjaminiHochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.
  
  Thank you for making this important point. We agree that this offers an opportunity to showcase the advantages of FLMM over non-functional data analysis methods, such as the approach applied in Lee et al. (2019). As mentioned in the text, fitting entirely separate models at each trial timepoint (without smoothing regression coefficient point and variance estimates across timepoints), and applying multiple comparisons corrections as a function of the number of time points has substantial conceptual drawbacks. To see why, consider that applying this strategy with two different sub-sampling rates requires adjustment for different numbers of comparisons, and could thus lead to very different proportions of timepoints achieving statistical significance. In light of your comments, we decided that it would be useful to provide a demonstration of this. To that effect, we have added Appendix Section 2 comparing FLMM with the method in Lee et al. (2019) on a real dataset, and show that FLMM yields far less conservative and more stable inference across different sub-sampling rates. We conducted this comparison on the delay-length experiment (shown in Figure 6) data, sub-sampled at evenly spaced intervals at a range of sampling rates. We fit either a collection of separate linear mixed models (LMM) followed by a Benjamini–Hochberg (BH) correction, or FLMM with statistical significance determined with both Pointwise and Joint 95% CIs. As shown in Appendix Tables 1-2, the proportion of timepoints at which effects are statistically significant with FLMM Joint CIs is fairly stable across sampling rates. In contrast, the percentage is highly inconsistent with the BH approach and is often highly conservative. This illustrates a core advantage of functional data analysis methods: borrowing strength across trial timepoints (i.e., the functional domain), can improve estimation efficiency and lower sensitivity to how the data is sub-sampled. A multiple comparisons correction may, however, yield stable results if one first smooths both regression coefficient point and variance estimates. Because this includes smoothing the coefficient point and variance estimates, this approach would essentially constitute a functional mixed model estimation strategy that uses multiple comparisons correction instead of a joint CI. We have now added in a description of this experiment in Section 2.4 (copied below).
  
  “We further analyze this dataset in Appendix Section 2, to compare FLMM with the approach applied in Lee et al. (2019) of fitting pointwise LMMs (without any smoothing) and applying a Benjamini–Hochberg (BH) correction. Our hypothesis was that the Lee et al. (2019) approach would yield substantially different analysis results, depending on the sampling rate of the signal data (since the number of tests being corrected for is determined by the sampling rate). The proportion of timepoints at which effects are deemed statistically significant by FLMM joint 95% CIs is fairly stable across sampling rates. In contrast, that proportion is both inconsistent and often low (i.e., highly conservative) across sampling rates with the Lee et al. (2019) approach. These results illustrate the advantages of modeling a trial signal as a function, and conducting estimation and inference in a manner that uses information across the entire trial.”
  
  In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.
  
  Thank you for bringing this up, as we endeavored to create code that is able to scale to complex models and large datasets. We agree that highlighting this capability in the paper will strengthen the work. We now state in the Discussion section that “[T]he package is fast and maintains a low memory footprint even for complex models (see Section 4.6 for an example) and relatively large datasets.” Methods Section 4.6 now includes the following:
  
  Our fastFMM package scales to the dataset sizes and model specifications common in photometry. The majority of the analyses presented in the Results Section (Section 2) included fairly simple functional fixed and random effect model specifications because we were implementing the FLMM versions of the summary measure analyses presented in Jeong et al. (2022). However, we fit the following FLMM to demonstrate the scalability of our method with more complex model specifications:
  
  We use the same notation as the Reward Number model in Section 4.5.2, with the additional variable TL_i,j,l_ denoting the Total Licks on trial j of session l for animal i. In a dataset with over 3,200 total trials (pooled across animals), this model took ∼1.2 min to fit on a MacBook Pro with an Apple M1 Max chip with 64GB of RAM. Model fitting had a low memory footprint. This can be fit with the code:
  
  model_fit = fui(photometry ~ session + trial + iri + lick_time + licks + (session + trial + iri + lick_time + licks | id), parallel = TRUE, data = photometry_data)
  
  This provides a simple illustration of the scalability of our method. The code (including timing) for this demonstration is now included on our Github repository.
  
  Reviewer #3:
  
  Summary:
  
  Loewinger et al., extend a previously described framework (Cui et al., 2021) to provide new methods for statistical analysis of fiber photometry data. The methodology combines functional regression with linear mixed models, allowing inference on complex study designs that are common in photometry studies. To demonstrate its utility, they reanalyze datasets from two recent fiber photometry studies into mesolimbic dopamine. Then, through simulation, they demonstrate the superiority of their approach compared to other common methods.
  
  Strengths:
  
  The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.
  
  Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.
  
  Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.
  
  We would like to thank the reviewer for the deep reading and understanding of our paper and method, and the thoughtful feedback provided. We agree with this summary, and will respond in detail to all the concerns raised.
  
  Weaknesses:
  
  While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.
  
  Thank you for this point. While we went to great effort to explain things clearly, our efforts to be concise likely resulted in some lack of clarity. To address this, we have created a series of analysis guides for a more general neuroscience audience, reflecting our experience working with researchers at the NIH and the broader community. These guides walk users through the code, its deployment in typical scenarios, and the interpretation of results.
  
  While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:
  
  In section 2.3, the authors use FLMM to identify an instance of Simpson’s Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors’ metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors’ approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects.
  
  Our goal was to demonstrate that FLMM provides insight into why the opposing within- and between-session effects occur: the between-session and within-session changes appear to occur at different trial timepoints. Thus, while the AUC metrics applied in Jeong et al. (2022) are enough to show the presence of Simpson’s paradox, it is difficult to hypothesize why the opposing within-/between-session effects occur. An AUC analysis cannot determine at what trial timepoints (relative to licking) those opposing trends occur.
  
  The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.
  
  While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point re: potential reward predictability that we had not considered. They have convinced us that acknowledging this alternative perspective will strengthen the paper, and we have added it into the Discussion. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals may sense the reward delivery. After discussing extensively with the authors of Jeong et al. (2022), it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that may have served as a cue. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this potential learned predictability could, at least partially, account for the increase in signal magnitude across sessions. As this paper is focused on analysis methods, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting this explanation in detail, for consideration in future experiments. We have substantially edited this discussion and, as per the reviewer’s suggestion, have qualified our interpretations to reflect the uncertainty in explaining the observed trends.
  
  If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane. Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.
  
  Thank you for this point. We agree with you that, given the scope of the paper, we should avoid any extensive comparison between the models. To address your comment, we have now removed portions of the Discussion that compared RPE and ANCCR. Overall, we agree with the reviewer, and think that future experiments will be needed for conclusively testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our description of several conversations with the Jeong et al., 2022 authors could have gone deeper, we hope the reviewer can appreciate that inclusion of these conversations was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting our discussion. We do commend the authors of Jeong et al., 2022 for their willingness to discuss all these details. They could easily have avoided acknowledging any potential incompleteness of their theory by claiming that our results do not invalidate their predictions for a random reward, because the reward could potentially have been predicted (due to an inadvertent CS+ generated from the solenoid pressure). Instead, they emphasized that they thought their experiment did test a random reward, to the extent they could determine, and that our results suggest components of their theory that should be updated. We think that engagement with re-analyses of one’s data, even when findings are at odds with an initial theoretical framing, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.
  
  Finally, we would like to reiterate that this conversation is happening at least in part because of our method: by analyzing the signal at every trial timepoint, it provides a formal way to test for the presence of a neural signal indicative of reward delivery perception. Ultimately, this was what we set out to do: help researchers ask questions of their data that may have been harder to ask before. We believe that having a demonstration that we can indeed do this for a “live” scientific issue is the most appropriate way of demonstrating the usefulness of the method.
  
  Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (∆F/F) with smoothing and baseline correction and this does not seem to have been considered in the argument. Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.
  
  We appreciate the nuance of this point, and we have made considerable efforts in the Results and Discussion sections to caution that alternative hypotheses (e.g., photobleaching) cannot be definitively ruled out. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high ∆F/F magnitudes in both time-windows. We do wish to point out that the Jeong et al. (2022) authors were also concerned about photobleaching as a possible explanation. At their request, we analyzed data from additional experiments, collected from the same animals. In most cases, we did not observe signal patterns that seemed to indicate photobleaching. Given the additional scrutiny, we do not think that photobleaching is more likely to invalidate results in this particular set of experiments than it would be in any other photometry experiment. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included primarily as a way of acknowledging that it is possible that non-linearities in photobleaching could occur. Regardless, your point is well taken and we have qualified our description of these analyses to express that photobleaching cannot be ruled out.
  
  Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors’ description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.
  
  Thank you for pointing this out! We removed the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.
  
  The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.
  
  Our point was initially included to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of re-analyzing shared datasets is acknowledging both areas where new analyses support the original results, as well as those where they conflict with them. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we have made those changes. We have qualified the conclusions of our analysis to emphasize they are a demonstration of how FLMM can be used to answer a certain style of question with hypothesis testing (how signal dynamics change across sessions), as opposed to providing evidence for/against the backpropagation hypothesis.
  
  A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.
  
  Thank you for this suggestion. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we made changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. Given the length of the manuscript as it stands, we could only include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify including analyses from a third dataset, only to have to relegate them to an appendix. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with many groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method, and compares the results with those yielded by standard analysis of AUCs, is already published (Beas et al., 2024). Finally, in our analysis guide we describe additional analyses, not included in the manuscript, that replicate positive results. Hence there are numerous demonstrations of FLMM’s performance in less controversial settings. We take your point that our description of the data supporting one theory or the other should be qualified, and we have corrected that. Specifically for your suggestion of Amo et al. 2022, we have not had the opportunity to personally reanalyze their data, but we are already in contact with other groups who have conducted preliminary analyses of their data with FLMM. We are delighted to see this, in light of your comments and our decision to restrict the scope of our paper. We will help them and other groups working on this question to the extent we can.
  
  Recommendations for the Authors:
  
  Reviewer #2:
  
  First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.
  
  Thank you for the positive feedback!
  
  I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.
  
  Thank you for this suggestion. As we described above in response to Reviewer #2’s Public Reviews, we have added in a demonstration of the scalability of the method. Since our initial manuscript submission, we have further increased the package’s speed (e.g., through further parallelization). We are releasing the updated version of our package on CRAN.
  
  From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.
  
  This is a great point. Our updated manuscript Discussion includes the following:
  
  “The FLMM framework may also be applicable to techniques like electrophysiology and calcium imaging. For example, our package can fit functional generalized LMMs with a count distribution (e.g., Poisson). Additionally, our method can be extended to model time-varying covariates. This would enable one to estimate how the level of association between signals, simultaneously recorded from different brain regions, fluctuates across trial time-points. This would also enable modeling of trials that differ in length due to, for example, variable behavioral response times (e.g., latency-topress).”
  
  Reviewer #3:
  
  The authors should define ’function’ in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7.
  
  We include a description of the alternate tests in Appendix Section 5.2. We have updated the Methods Section (Section 4) to introduce the reader to how ‘functions’ are conceptualized and modeled in the functional data analysis literature. Specifically, we added the following text:
  
  “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”
  
  Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).
  
  We appreciate your pointing this out, as the distinction is nuanced. Our manuscript includes a description of how joint CIs enable one to interpret effects as statistically significant for time-intervals as opposed to individual timepoints. Unlike joint CIs, assessing significance with pointwise CIs suffers from multiple-comparisons problems. As a result of your suggestion, we have included a short discussion of this to our analysis guide (Part 1), entitled “Pointwise or Joint 95% Confidence Intervals.” The Methods section of our manuscript also includes the following:
  
  “The construction of joint CIs in the context of functional data analysis is an important research question; see Cui et al. (2021) and references therein. Each point at which the pointwise 95% CI does not contain 0 indicates that the coefficient is statistically significantly different from 0 at that point. Compared with pointwise CIs, joint CIs takes into account the autocorrelation of signal values across trial time-points (the functional domain). Therefore, instead of interpreting results at a specific timepoint, joint CIs enable joint interpretations at multiple locations along the functional domain. This aligns with interpreting covariate effects on the photometry signals across time-intervals (e.g., a cue period) as opposed to at a single trial time-point. Previous methodological work has provided functional mixed model implementations for either joint 95% CIs for simple random-effects models (Cui et al., 2021), or pointwise 95% CIs for nested models (Scheipl et al., 2016), but to our knowledge, do not provide explicit formulas or software for computing joint 95% CIs in the presence of general random-effects specifications.”
  
  The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).
  
  This is a fantastic point and we have added the following into the Discussion:
  
  “...[S]ignal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects.”
  
  In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.
  
  Good point. We have made this change.
  
  Minor corrections:
  
  Panels are mislabeled in Figure 5.
  
  Thank you. We have corrected this.
  
  The Crowder (2009) reference is incorrect, being a review of the book with the book presumably being the correct citation.
  
  Good catch, thank you! Corrected.
  
  In Section 5 (first appendix), the authors could include the alternate spelling ’fibre photometry’ to capture any citations that use British English spelling.
  
  This is a great suggestion, but we did not have time to recreate these figures before re-submission.
  
  Section 7.4 is almost all quotation, though unevenly using the block quotation formatting. It is unclear why such a large quotation is included.
  
  Thank you for pointing this out. We have removed this Appendix section (formerly Section 7.4) as the relevant text was already included in the Methods section.
  
  References
  
  Sofia Beas, Isbah Khan, Claire Gao, Gabriel Loewinger, Emma Macdonald, Alison Bashford, Shakira Rodriguez-Gonzalez, Francisco Pereira, and Mario A Penzo. Dissociable encoding of motivated behavior by parallel thalamo-striatal projections. Current Biology, 34(7):1549–1560, 2024.
  
  Erjia Cui, Andrew Leroux, Ekaterina Smirnova, and Ciprian Crainiceanu. Fast univariate inference for longitudinal functional models. Journal of Computational and Graphical Statistics, 31:1–27, 07 2021. doi: 10.1080/10618600.2021.1950006.
  
  Huijeong Jeong, Annie Taylor, Joseph R Floeder, Martin Lohmann, Stefan Mihalas, Brenda Wu, Mingkang Zhou, Dennis A Burke, and Vijay Mohan K Namboodiri. Mesolimbic dopamine release conveys causal associations. Science, 378(6626):eabq6740, 2022. doi: 10.1126/science.abq6740. URL https://www. science.org/doi/abs/10.1126/science.abq6740.
  
  Rachel S Lee, Marcelo G Mattar, Nathan F Parker, Ilana B Witten, and Nathaniel D Daw. Reward prediction error does not explain movement selectivity in dms-projecting dopamine neurons. eLife, 8:e42992, apr 2019. ISSN 2050-084X. doi: 10.7554/eLife.42992. URL https://doi.org/10.7554/eLife.42992.
  
  Fabian Scheipl, Jan Gertheiss, and Sonja Greven. Generalized functional additive mixed models. Electronic Journal of Statistics, 10(1):1455 – 1492, 2016. doi: 10.1214/16-EJS1145. URL https://doi.org/10.1214/16-EJS1145.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.06.565896v3
www.biorxiv.org www.biorxiv.org

New submission 19/02/2024, 08:57:13

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This important study elucidates the molecular divergence of caspase 3 and 7 in the vertebrate lineage. Convincing biochemical and mutational data provide evidence that in humans, caspase 7 has lost the ability to cleave gasdermin E due to changes in a key residue, S234. However, the physiological relevance of the findings is incomplete and requires further experimental work.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary
  
  In this study, Xu et al. provide insights into the substrate divergence of CASP3 and CASP7 for GSDME cleavage and activation during vertebrate evolution vertebrates. Using biochemical assays, domain swapping, site-directed mutagenesis, and bioinformatics tools, the authors demonstrate that the human GSDME C-terminal region and the S234 residue of human CASP7 are the key determinants that impede the cleavage of human GSDME by human CASP7.
  
  Strengths
  
  The authors made an important contribution to the field by demonstrating how human CASP7 has functionally diverged to lose the ability to cleave GSDME and showing that reverse-mutations in CASP7 can restore GSDME cleavage. The use of multiple methods to support their conclusions strengthens the authors' findings. The unbiased mutagenesis screen performed to identify S234 in huCASP7 as the determinant of its GSDME cleavability is also a strength.
  
  Weaknesses
  
  While the authors utilized an in-depth experimental setup to understand the CASP7-mediated GSDME cleavage across evolution, the physiological relevance of their findings are not assessed in detail. Additional methodology information should also be provided.
  
  Specific recommendations for the authors
  
  (1) The authors should expand their evaluation of the physiological relevance by assessing GSDME cleavage by the human CASP7 S234N mutant in response to triggers such as etoposide or VSV, which are known to induce CASP3 to cleave GSDME (PMID: 28045099). The authors could also test whether the human CASP7 S234N mutation affects substrate preference beyond human GSDME by testing cleavage of mouse GSDME and other CASP3 and CASP7 substrates in this mutant.
  
  (1) The physiological relevance was discussed in the revised manuscript (lines 328-340). Our study revealed the molecular mechanism underlying the divergence of CASP3- and CASP7-mediated GSDME activation in vertebrate. One of the physiological consequences is that in humans, CASP7 no longer directly participates in GSDME-mediated cell death, which enables CASP7 to be engaged in other cellular processes. Another physiological consequence is that GSDME activation is limited to CASP3 cleavage, thus restricting GSDME activity to situations more specific, such as that inducing CASP3 activation. The divergence and specialization of the physiological functions of different CASPs are consistent with and possibly conducive to the development of refined regulations of the sophisticated human GSDM pathways, which are executed by multiple GSDM members (A , B, C, D, and E), rather than by GSDME solely in teleost, such as Takifugu. More physiological consequences of CASP3/7 divergence in GSDME activation need to be explored in future studies.
  
  With respect to the reviewer’s suggestion of assessing GSDME cleavage by the human CASP7 S234N mutant in response to triggers such as etoposide or VSV: (i) CASP7 S234N is a creation of our study, not a natural human product, hence its response to CASP7 triggers cannot happen under normal physiological conditions except in the case of application, such as medical application, which is not the aim of our study. (ii) CASP3/7 activators (such as raptinal) induced robust activation of the endogenous CASP3 (Heimer et al., Cell Death Dis. 2019;10:556) and CASP7 (Author response image 1, below) in human cells. Since CASP3 is the natural activator of GSDME, the presence of the triggers inevitably activates GSDME via CASP3. Hence, under this condition, it will be difficult to examine the effect of CASP7 S234N.
  
  Author response image 1.
  
  HsCASP7 activation by raptinal. HEK293T cells were transfected with the empty vector (-), or the vector expressing HsCASP7 or HsCASP7-S234N for 24 h. The cells were then treated with or without (control) 5 μM raptinal for 4 h. The cells were lysed, and the lysates were blotted with anti-CASP7 antibody.
  
  (2) As suggested by the reviewer, the cleavage of other CASP7 substrates, i.e., poly (ADP-ribose) polymerase 1 (PARP1) and gelsolin, by HsCASP7 and S234N mutant was determined. The results showed that HsCASP7 and HsCASP7-S234N exhibited similar cleavage capacities. Figure 5-figure supplement 1 and lines 212-214.
  
  (2) It would also be interesting to examine the GSDME structure in different species to gain insight into the nature of mouse GSDME, which cannot be cleaved by either mouse or human CASP7.
  
  Because the three-dimensional structure of GSDME is not solved, we are unable to explore the structural mechanism underlying the GSDME cleavage by caspase. Since our results showed that the C-terminal domain was essential for caspase-mediated cleavage of GSDME, it is likely that the C-terminal domain of mouse GSDME may possess some specific features that render it to resist mouse and human CASP7.
  
  (3) The evolutionary analysis does not explain why mammalian CASP7 evolved independently to acquire an amino acid change (N234 to S234) in the substrate-binding motif. Since it is difficult to experimentally identify why a functional divergence occurs, it would be beneficial for the authors to speculate on how CASP7 may have acquired functional divergence in mammals; potentially this occurred because of functional redundancies in cell death pathways, for example.
  
  According to the reviewer’s suggestion, a speculation was added. Lines 328-340.
  
  (4) For the recombinant proteins produced for these analyses, it would be helpful to know whether size-exclusion chromatography was used to purify these proteins and whether these purified proteins are soluble. Additionally, the SDS-PAGE in Figure S1B and C show multiple bands for recombinant mutants of TrCASP7 and HsCASP7. Performing protein ID to confirm that the detected bands belong to the respective proteins would be beneficial.
  
  The recombinant proteins in this study are soluble and purified by Ni-NTA affinity chromatography. Size-exclusion chromatography was not used in protein purification.
  
  For the SDS-PAGE in Figure 4-figure supplement 1B and C (Figure S1B and C in the previous submission), the multiple bands are most likely due to the activation cleavage of the TrCASP7 and HsCASP7 variants, which can result in multiple bands, including p10 and p20. According to the reviewer’s suggestion, the cleaved p10 was verified by immunoblotting. Figure 4-figure supplement 1B and C.
  
  (5) For Figures 3C and 4A, it would be helpful to mention what parameters or PDB files were used to attribute these secondary structural features to the proteins. In particular, in Figure 3C, residues 261-266 are displayed as a β-strand; however, the well-known α-model represents this region as a loop. Providing the parameters used for these callouts could explain this difference.
  
  For Figure 3C, in the revised manuscript, we used the structure of mouse GSDMA3 (PDB: 5b5r) for the structural analysis of HsGSDME. As indicated by the reviewer, the region of 261-266 is a loop. The description was revised in lines 172 and 174, Figure 3C and Figure 3C legend.
  
  For Figure 4A, the alignment of CASP7 was constructed by using Esprit (https://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) with human CASP7 (PDB:1k86) as the template. The description was revised in the Figure legend.
  
  (6) Were divergent sequences selected for the sequence alignment analyses (particularly in Figure 6A)? The selection of sequences can directly influence the outcome of the amino acid residues in each position, and using diverse sequences can reduce the impact of the number of sequences on the LOGO in each phylogenetic group.
  
  In Figure 6A, the sequences were selected without bias. For Mammalia, 45 CASP3 and 43 CASP7 were selected; for Aves, 41 CASP3 and 52 CASP7 were selected; for Reptilia, 31CASP3 and 39 CASP7 were selected; for Amphibia, 11 CASP3 and 12 CASP7 were selected; for Osteichthyes, 40 CASP3 and 43 CASP7 were selected. The sequence information was shown in Table 1 and Table 2.
  
  (7) For clarity, it would help if the authors provided additional rationale for the selection of residues for mutagenesis, such as selecting Q276, D278, and H283 as exosite residues, when the CASP7 PDB structures (4jr2, 3ibf, and 1k86) suggest that these residues are enriched with loop elements rather than the β sheets expected to facilitate substrate recognition in exosites for caspases (PMID: 32109412). It is possible that the inability to form β-sheets around these positions might indicate the absence of an exosite in CASP7, which further supports the functional effect of the exosite mutations performed.
  
  According to the suggestion, the rationale for the selection of residues for mutagenesis was added (lines 216-222). Unlike the exosite in HsCASP1/4, which is located in a β sheet, the Q276, D278, and H283 of HsCASP7 are located in a loop region (Figure 5-figure supplement 2), which may explain the mutation results and the absence of an exosite in HsCASP7 as suggested by the reviewer.
  
  Reviewer #2 (Public Review):
  
  The authors wanted to address the differential processing of GSDME by caspase 3 and 7, finding that while in humans GSDME is only processed by CASP3, Takifugu GSDME, and other mammalian can be processed by CASP3 and 7. This is due to a change in a residue in the human CAPS7 active site that abrogates GSDME cleavage. This phenomenon is present in humans and other primates, but not in other mammals such as cats or rodents. This study sheds light on the evolutionary changes inside CASP7, using sequences from different species. Although the study is somehow interesting and elegantly provides strong evidence of this observation, it lacks the physiological relevance of this finding, i.e. on human side, mouse side, and fish what are the consequences of CASP3/7 vs CASP3 cleavage of GSDME.
  
  Our study revealed the molecular mechanism underlying the divergence of CASP3- and CASP7-mediated GSDME activation in vertebrate. One of the physiological consequences is that in humans, CASP7 no longer directly participates in GSDME-mediated cell death, which enables CASP7 to be engaged in other cellular processes. Another physiological consequence is that GSDME activation is limited to CASP3 cleavage, thus restricting GSDME activity to situations more specific, such as that inducing CASP3 activation. The divergence and specialization of the physiological functions of different CASPs are consistent with and possibly conducive to the development of refined regulations of the sophisticated human GSDM pathways, which are executed by multiple GSDM members (A , B, C, D, and E), rather than by GSDME solely in teleost, such as Takifugu. More physiological consequences of CASP3/7 divergence in GSDME activation need to be explored in future studies. Lines 328-340.
  
  Fish also present a duplication of GSDME gene and Takifugu present GSDMEa and GSDMEb. It is not clear in the whole study if when referring to TrGSDME is the a or b. This should be stated in the text and discussed in the differential function of both GSDME in fish physiology (i.e. PMIDs: 34252476, 32111733 or 36685536).
  
  The TrGSDME used in this study belongs to the GSDMEa lineage of teleost GSDME. The relevant information was added. Figure 1-figure supplement 1 and lines 119, 271, 274-276, 287 and 288.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) For the chimeric and truncated constructs, such as HsNT-TrCT, TrNT-HsCT, Hsp20-Trp10, Trp20-Hsp10, etc., the authors should provide a table denoting which amino acids were taken from each protein to create the fusion or truncation.
  
  According to the reviewer’s suggestion, the information of the truncate/chimeric proteins was provided in Table 4.
  
  (2) Both reviewers agree that functional physiological experiments are needed to increase the significance of the work. Specifically, the physiological relevance of these findings can be assessed by using western blotting to monitor GSDME cleavage by the human CASP7 S234N mutant compared with wild type CASP7 in response to triggers such as etoposide or VSV, which are known to induce CASP3 to cleave GSDME (PMID: 28045099).
  
  Additionally, the authors can assess cell death in HEK293 cells, HEK293 cells transfected with TrGSDME, HEK293 cells expressing TrCASP3/7 plus TrGSDME, and TrCASP3/7 plus the D255R/D258A mutant. These cells can be stimulated, and pyroptosis can be assessed by using ELISA to measure the release of the cytoplasmic enzyme LDH as well as IL-1β and IL-18, and the percentage of cell death (PI+ positive cells) may also be assessed.
  
  (1) With respect to the physiological relevance, please see the above reply to Reviewer 1’s comment of “Specific recommendations for the authors, 1”.
  
  (2) As shown in our results (Fig. 2), co-expression of TrCASP3/7 and TrGSDME in HEK293T cells induced robust cell death without the need of any stimulation, as evidenced by LDH release and TrGSDME cleavage. In the revised manuscript, similar experiments were performed as suggested, and cell death was assessed by Sytox Green staining (Figure 2-figure supplement 3A and B) and immunoblot to detect the cleavage of both wild type and mutant TrGSDME (Figure 2-figure supplement 3C). The results confirmed the results of Figure 2.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Abstract:
  
  Although the authors try to summarize the principal results of this study, please rewrite the abstract section to make it easier to follow and to empathise the implications of their results.
  
  We have modified the Abstract as suggested by the reviewer.
  
  Introduction:
  
  The authors do not mention anything about the implication of the inflammasome activation to get pyroptosis by GSDM cleave by inflammatory caspases. Please consider including this in the introduction section as they do in the discussion section.
  
  The introduction was modified according to the reviewer’s suggestion. Lines 58-61.
  
  From the results section the authors name the human GSDM as HsGSDM and the human CASP as HsCASP, maybe the author could use the same nomenclature in the introduction section. The same for the fish GSDM (Tr) and CASP.
  
  According to the reviewer’s suggestion, the same nomenclature was used in the introduction.
  
  Line 39. Remove the word necrotic.
  
  “necrotic” was removed .
  
  Line 42. Change channels by pores. In the manuscript, change channels by pores overall.
  
  “channels” was replaced by “pores”.
  
  Line 42: Include that: by these pores can be released the proinflammatory cytokines and if these pores are not solved then pyroptosis occurs. Please rephrase this statement.
  
  According to the reviewer's suggestion, the sentence was rephrased. Lines 46-48.
  
  Line 45. GSDMF is not an approved gene name, its official nomenclature is PJVK (Uniprot Q0ZLH3). Please use PJVK instead GSDMF.
  
  GSDMF was changed to PJVK.
  
  Line 103: Can the authors explain better the molecular determinant?
  
  The sentence was revised, line 109.
  
  Results:
  
  Line 110: Reference for this statement. The reference for this statement was added in line 116.
  
  Figure 1A, B: Concentration or units used of HsCASP?
  
  The unit (1 U) of HsCASPs was added to the figure legend (line 661).
  
  Line 113: Add Hs or Tr after CASP would be helpful to follow the story.
  
  “CASP” was changed to “HsCASP”.
  
  Fig 1D: Why the authors do not use the DMPD tetrapeptide (HsGSDME CASP3 cut site) in this assay? Comparing with the data obtained in Fig 3B the TrCASP3 activity is going to be very closer to that obtained for VEID o VDQQD in the CASP3 panel.
  
  The purpose of Figure 1D was to determine the cleavage preference of TrCASPs. For this purpose, a series of commercially available CASP substrates were used, including DEVD, which is commonly used as a testing substrate for CASP3. Figure 3B was to compare the cleavage of HsCASP3/7 and TrCASP3/7 specifically against the motifs from TrGSDME (DAVD) and HsGSDME (DMPD).
  
  Figure 1D and Figure 3B are different experiments and were performed under different conditions. In Figure 1D, CASP3 was incubated with the commercial substrates at 37 ℃ for 2 h, while in Figure 3B, CASP3/7 were incubated with non-commercial DAVD (motif from TrGSDME) and DMPD (motif from HsGSDME) at 37 ℃ for 30 min. More experimental details were added to Materials and Methods, lines 443 and 447.
  
  Fig 1H: What is the concentration used of the inhibitors?
  
  The concentration (20 μM) was added to the figure legend (line 669).
  
  Does the Hs CASP3/7 fail to cleave the TrGSDME mutants (D255R and D258A)? the authors do not show this result so they cannot assume that HsCASP3/7 cleave that sequence (although this is to be expected).
  
  The result of HsCASP3/7 cleavage of the TrGSDME mutants was added as Figure 1-figure supplement 2 and described in Results, line 133.
  
  Line 132-133: Can the author specify where is placed the mCherry tag? In the N terminal or C terminal portion of the different engineered proteins?
  
  The mCherry tag is attached to the C-terminus. Figure 2 legend (line 676).
  
  Fig 2A: Although is quite clear, a column histogram showing the quantification is going to be helpful.
  
  The expression of TrGSDME-FL, -NT and -CT was determined by Western blot, and the result was added as Figure 2-figure supplement 1.
  
  Fig 2A, B, C: After how many hours of expression are the pictures taken? Can the authors show a Western blot showing that the expression of the different constructions is similar?
  
  The time was added to Figure 2 legend and Materials and Methods (line 466). The expression of TrGSDME-FL, -NT and -CT was determined by Western blot, and the result was added as Figure 2-figure supplement 1.
  
  Fig 2C: Another helpful assay can be to measure the YO-PRO or another small dye internalization, to complete the LDH data.
  
  According the reviewer’s suggestion, in addition to LDH release, Sytox Green was also used to detect cell death. The result was added as Figure 2-figure supplement 2 and described in Results, line 146.
  
  Fig 2C: In the figure y axe change LHD by LDH.
  
  The word was corrected.
  
  Fig 2D: Change HKE293T by HEK293T in the caption.
  
  The word was corrected.
  
  Fig 2G: Please add the concentration used with the two plasmids co-transfection. A Western blot showing CASP3/7 expression vs TrGSDME is missing. Is that assay after 24h? please specify better the methodology.
  
  The concentration of plasmid used in co-transfection and the time post transfection were added to the Materials and Methods (lines 422 and 424). In addition, the expression of CASP3/7 was added to Figure 2I.
  
  Fig 2 J, K: Change HKE293T by HEK293T in the figure caption. The concentration of the caspase inhibitors is missing. Depending on the concentration used, these inhibitors used could provoke toxicity on the cells by themselves.
  
  The word was corrected in the figure caption. The inhibitor concentration (10 μM) was added to the figure legend (line 690).
  
  Line 151: TrCASP3/7 instead of CASP3/7
  
  CASP3/7 was changed to TrCASP3/7.
  
  Fig 3A, 3B: Please add the units used of the HsCASP
  
  The unit was added to the figure legends (lines 697).
  
  Fig 3A: Can the authors add the SDS-PAGE to see the Nt terminal portion as has been done in Fig 1A? Maybe in a supplementary figure.
  
  The SDS-PAGE was added as Figure 3-figure supplement 1.
  
  Fig 3B: If the authors could add some data about the caspase activity using any other CASP such as CASP2, CASP1 to compare the activity data with CASP3 and CASP7 would be helpful.
  
  The proteolytic activity of TrCASP1 was provided as Figure 3-figure supplement 2.
  
  Fig 3C: To state this (Line 160), the authors should use another prediction software to reach a consensus with the sequences of the first analysis. In fact, what happens when GSDME is modelled 3-dimensionally by comparing it to crystalized structures such as mouse GSDMA? If the authors add an arrow indicating where the Nt terminal portion ends and where Ct portion begins would make the figure clearer.
  
  According to the suggestions of both reviewers, in the revised manuscript, we used mouse GSDMA3 (PDB: 5b5r) for the structural analysis of HsGSDME, which showed that the 261-266 region of HsGSDME was a loop. As a result, Figure 3C was revised. Relevant change in Results: lines 172 and 174.
  
  As suggested by the reviewer, we modelled the three-dimensional structure of HsGSDME by using SWISS-MODEL with mouse GSDMA3 as the template (Author response image 2, below).
  
  Author response image 2.
  
  The three-dimensional structure model of HsGSDME. (A) The structure of HsGSDME was modeled by using mouse GSDMA3 (MmGSDMA3) as the template. The N-terminal domain (1-246 aa) and the C-terminal domain (279-468 aa) of HsGSDME are shown in red and blue, respectively. (B) The superposed structure of HsGSDME (cyan) and MmGSDMA3 (purple).
  
  Fig 3F: if this is an immunoblotting why NT can be seen? In other Western blots only the CT is detected, why? The use of the TrGSDME mouse polyclonal needs more details (is a purify Ab, was produced for this study, what are the dilution used...)
  
  Since the anti-TrGSDME antibody was generated using the full-length TrGSDME, it reacted with both the N-terminal and the C-terminal fragments of TrGSDME in Figure 3F. In Figure 3G, the GSDME chimera contained only TrGSDME-CT, so only the CT fragment was detected by anti-TrGSDME antibody. More information on antibody preparation and immunoblot was added to “Materials and Methods” (lines 390 and 391).
  
  Fig 4B: Can the authors show in which amino acid the p20 finish for each CASP? (Similarly, as they have done in panel 3E)
  
  Fig 4B was revised as suggested.
  
  Fig 5F: With 4 units of WT CASP7 the authors show a HsGSDME Ct in the same proportion than when the S234N mutant is used (at lower concentrations). How do the authors explain this?
  
  The result showed that the cleavage by 4U of HsCASP7 was comparable to the cleavage by 0.25U of HsCASP7-S234N, indicating that S234 mutation increased the cleavage ability of HsCASP7 by 16 folds.
  
  Line 203: Can the authors show an alignment between this region of casp1/4 and 7? Maybe in supplementary figures.
  
  As reported by Wang et. al (PMID: 32109412), the βIII/βIII’ sheet of CASP1/4 forms the exosite critical for GSDMD recognition. The structural comparison among HsCASP1/4/7 and the sequence alignment of HsCASP1/4 βIII/βIII’ region with its corresponding region in HsCASP7 were added as Figure 5-figure supplement 2.
  
  Line 205: A mutation including S234N with the exosite mutations (S234+Q276W+D278E+H283S) is required to support this statement.
  
  The sentence of “suggesting that, unlike human GSDMD, HsGSDME cleavage by CASPs probably did not involve exosite interaction” was deleted in the revised manuscript.
  
  Fig 5I, 5J: which is the amount of HsGSDME and TrGSDME? I would place these figures in supplementary material.
  
  The protein expression of TrGSDME/HsGSDME was shown in the figure. Fig 5I and 5J were moved to Figure 5-figure supplement 3.
  
  Line 218: I would specify that this importance is in HUMAN CASP7 to cleavage Human GSDME.
  
  “CASP7” and “GSDME” were changed to “HsCASP7” and “HsGSDME”, respectively.
  
  Fig 6C: 4 units is the amount of S234N mutant needed to see an optimal HsGSDME cleavage in Fig 5F.
  
  In Figure 6C, the cleavage efficacy of HsCASP3-N208S was apparently decreased compared to that of HsCASP3, and 4U of HsCASP3-N208S was roughly equivalent to 1U of HsCASP3 in cleavage efficacy. In Figure 5F, cleavage by 4U of HsCASP7 was comparable to the cleavage by 0.25U of HsCASP7-S234N. Together, these results confirmed the critical role of S234/N208 in HsCASP3/7 cleavage of HsGSDM.
  
  Fig 6I: Could be the fact that the mouse GSDME has a longer Ct than human GSDME affect the interaction with CASP7? Less accessible to the cut site? Needs a positive control of mouse GSDME with mouse Caspase 3.
  
  Although mouse GSDME (MmGSDME) (512 aa) is larger than HsGSDME (496 aa), the length of the C-terminal domain of MmGSDME (186 aa) is comparable to that of HsGSDME (190 aa).
  
  Author response image 3.
  
  Conserved domain analysis of mouse (upper) and human (lower) GSDME.
  
  As suggested by the reviewer, the cleavage of MmGSDME by mouse caspase-3 (MmCASP3) was added as Figure 6-figure supplement 2 and described in Results, lines 258.
  
  Material and Methods:
  
  -Overall, concentrations or amounts used in this study regarding the active enzyme or plasmids used are missing and need to be added.
  
  The missing concentrations of the enzymes and plasmids were added in Material and Methods (lines 421, 453, 457, and 470) or figure legends (Figure 1 and 3).
  
  -It would be helpful if the authors label in the immunoblotting panels what is the GSDME that they are using. (Hs GSDME FL...).
  
  As suggested, the labels were added to Figures 1A ,1B, and 3.
  
  -Add the units of enzyme used.
  
  The units of enzyme were added to figure legends (Figure 1A, 3A, 3D, and 3F) or Material and Methods (lines 453 and 457).
  
  The GSDME sequence obtained for Takifugu after amplification of the RNA extracted should be shown and specified (GSDMEa or GSDMEb). From which tissue was the RNA extracted?
  
  The details were added to Materials and Methods (lines 398 and 402).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.28.546966v2
www.biorxiv.org www.biorxiv.org

Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction and Cognitive Decision-Making

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1）IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.
  
  Strengths:
  
  The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.
  
  We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.
  
  We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.
  
  Weaknesses:
  
  (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.
  
  We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:
  
  Introduction: "However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored." (P.5, Line. 16-17)
  
  Discussion: "However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. " (P.21, Line. 7-9)
  
  (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.
  
  We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."
  
  To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.
  
  We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.
  
  (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.
  
  We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is common in many electrophysiological studies, particularly those conducted in behaving primates, where directly manipulating specific neural circuits to establish causality presents significant challenges, especially in comparison to research in mice.
  
  This complexity is further compounded when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could have a widespread impact on auditory responses in downstream pathways, potentially influencing sensory prediction and decision-making processes.
  
  Despite this limitation, our study provides novel evidence suggesting that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research aimed at exploring the underlying mechanisms and broader functional implications of these signals.
  
  To address the reviewer's concerns, we have made the following adjustments to the manuscript:
  
  (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we have referred to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships.
  
  “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)
  
  We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.
  
  (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.
  
  We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.
  
  To address this concern, we have made the following revisions to the manuscript:
  
  (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.
  
  “The accumulation of the climbing effect alongside repetitive sound presentations suggests a potential linkage to reward prediction or sensory prediction, reflecting an increased probability of receiving a reward and the strengthening of sound prediction as the sound sequence progresses.” (P.10, Line. 17-20)
  
  “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)
  
  (2) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.
  
  “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)
  
  “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)
  
  We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.
  
  (5) The logic between different sections of Results is not clear.
  
  We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.
  
  To address this concern, we have made the following revisions:
  
  (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.
  
  “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)
  
  “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)
  
  “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.
  
  “These results demonstrate that reward anticipation does not drive the climbing effect, thereby reinforcing the idea that sensory prediction is the primary factor influencing the accumulation of the climbing effect in the IC.” (P.12, Line. 4-7)
  
  “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)
  
  (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.
  
  “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)
  
  “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)
  
  “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.
  
  (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.
  
  We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.
  
  To address these concerns, we have made the following revisions:
  
  (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.
  
  “We demonstrated that the climbing effect is dynamically modulated (Figure 2D-G), and this modulation is driven primarily by sensory prediction rather than reward anticipation, as controlling for reward effects showed minimal impact on the response profile (Figure 3D, E). This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 1-5)
  
  (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.
  
  “Metzger and colleagues reported a gradual increase in neural activity—termed late-trial ramping—in the IC during an auditory saccade task. Similar to our results, they observed no climbing effect in the absence of a behavioral task. Both studies support the idea that the climbing effect depends on both behavioral engagement and reward. While both pieces of research emphasize the IC's complex role in integrating auditory processing with cognitive functions related to reward and behavior, our findings provide further insight by distinguishing between the effects of sensory prediction and reward anticipation on IC neuronal activity.” (P.16, Line. 16-24)
  
  We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.
  
  Strengths:
  
  Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.
  
  We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.
  
  We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.
  
  We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.
  
  We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.
  
  Weaknesses:
  
  These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.
  
  We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We fully acknowledge the importance of distinguishing between correlation and causality. As outlined in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the inherent challenges in establishing direct causal links, particularly in electrophysiological studies involving behaving primates, and given the lower-level role of the IC in the auditory pathway.
  
  We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)
  
  Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?
  
  We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:
  
  (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:
  
  The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective and recognize its relevance. However, we believe that rhythm is unlikely to be a significant contributor to the 'climbing effect' for two key reasons:
  
  a) The 'climbing effect' begins as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), before any rhythm or pattern could be fully established, since rhythm generally requires at least three repetitions to form.
  
  b) In our reward experiment (Figs. 4-5), the sounds were also presented at regular ISIs, which could have facilitated rhythmic learning, yet the observed climbing effect was comparatively small in those conditions.
  
  Unfortunately, we did not explore variable ISIs in this current study, so we cannot directly address this concern with the available data.
  
  (2) Reward Expectation and Climbing Effect:
  
  The reviewer raises a valid concern regarding whether the 'climbing effect' might be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise.
  
  In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100% reward delivery), we did not observe a significant climbing effect in the auditory response. Additionally, the presence of reward prediction error (Fig. 4D) further supports the idea that while the monkeys may indeed form reward expectations, these expectations do not directly drive the climbing effect in the IC.
  
  To make this distinction clearer, we have added sentences in the revised manuscript explicitly discussing the relationship between reward expectation and the climbing effect.
  
  “Within the oddball paradigm, both sensory and reward predictions intensify alongside the recurrence of standard sounds, suggesting that the strength of these predictions could significantly influence neuronal responses. Our experimentation with rewards has effectively dismissed the role of reward prediction (Figures 3 and 4), highlighting the potential significance of sensory prediction in molding the climbing effect.” (P.17, Line. 14-19)
  
  We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and effectively address the reviewer's concerns. We sincerely thank the reviewer for these valuable suggestions, which have allowed us to improve the clarity and depth of our manuscript.
  
  "Reward effect" on IC neurons' responses were shown in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.
  
  We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.
  
  To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.
  
  “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.
  
  Strengths:
  
  This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.
  
  We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.
  
  Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.
  
  We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.
  
  We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.
  
  Weaknesses:
  
  Major Comments:
  
  (1) Structural Clarity and Logic Flow:
  
  The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."
  
  We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.
  
  To address the reviewer's concerns, we have made the following revisions:
  
  (1) Reorganization of Figures and Results:
  
  We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.
  
  We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.
  
  “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)
  
  “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)
  
  (2) Revised Title:
  
  In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.
  
  (3) Improved Logic Flow:
  
  We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.
  
  “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)
  
  “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line 22; P.11, Line. 1-2)
  
  “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.
  
  (2) Clarification of Data Analysis:
  
  Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.
  
  We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:
  
  (1) Detailed Explanation of Experimental Design:
  
  We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.
  
  (2) Cohesive Presentation of Data Analysis:
  
  Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.
  
  We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.
  
  (3) Enhanced Statistical Analysis Details:
  
  We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. This relevant information is highlighted in the Results section or figure legends to facilitate understanding.
  
  We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.
  
  (3) Reward Prediction Analysis:
  
  The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.
  
  We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.
  
  In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:
  
  (1) Inclusion of Population-Level Data for IC Neurons:
  
  In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.6), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.
  
  “This phenomenon was further supported by examining the responses in the duration deviation detection task. Since most IC neurons exhibit motor responses during key presses (Supplementary Figure 6), which can complicate distinguishing between reward-related activity and motor responses, we specifically selected two neurons without motor responses during key presses (Figure 5).” (P.13, Line. 10-15)
  
  (2) Addition of Data on Key Press Errors and No-Response Trials:
  
  In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure below). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.
  
  When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons.
  
  Author response image 1.
  
  (A) PSTH of the neuron from Figure 5A during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 5A during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 5B.
  
  We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  One of the major issues of this work is that its writing fails to convey the focus and significance of the work. Sentences are too long and multiple pieces of information are often integrated in one sentence, causing great confusion.
  
  We appreciate the reviewer's feedback regarding the clarity and structure of the manuscript. We agree that scientific writing should be clear and concise to effectively communicate the significance of the work. In response to this comment, we have undertaken the following revisions to improve the readability and focus of the manuscript:
  
  (1) Simplified Sentence Structure:<br /> We have revisited the manuscript and revised sentences that were overly complex or contained multiple pieces of information. Long sentences have been broken into shorter, more digestible statements to improve clarity and readability. Each sentence now conveys a single, focused idea.
  
  (2) Improved Flow and Focus:<br /> We have restructured certain paragraphs to ensure that the narrative flows logically and highlights the key findings. This restructuring includes placing the most significant results in prominent positions within paragraphs and ensuring that each section begins with a clear statement of purpose.
  
  “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)
  
  “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)
  
  “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)
  
  “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)
  
  (3) Refined Significance of the Work:<br /> In response to the reviewer's concern that the manuscript fails to clearly convey the significance of the work, we have revised the Introduction and Discussion sections to better emphasize the focus and impact of our findings. We now explicitly highlight the novel contributions of this research to the understanding of the multifaceted role of the IC in sensory prediction, decision-making, and reward processing.
  
  “In this research, we embarked on a deviation detection task centered around sound duration with trained monkeys, performing extracellular recordings in the IC. Our observations unveiled a 'climbing effect'—a progressive increase in firing rate after sound onset, not attributable to reward but seemingly linked to sensory experience such as sensory prediction. Moreover, we identified signals of reward prediction error and decision-making. These findings propose that the IC's role in auditory processing extends into the realm of complex perceptual and cognitive tasks, challenging previous assumptions about its functionality.” (P.6, Line. 1-8)
  
  “Overall, our results strongly suggest that the inferior colliculus is actively engaged in sensory experience, reward prediction and decision making, shedding light on its intricate functions in these processes.” (P.16, Line. 10-12)
  
  We believe these revisions address the reviewer's concern and will make the manuscript more accessible to readers. Thank you for the valuable suggestion, which has led to a more precise and effective presentation of our work.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) In oddball paradigm, inter-stimuli-interval of 0.6 seconds was used. Vary the inter-stimulus-interval should prove whether this effect is rhyme learning. It is better to choose random inter-stimuli-interval and inter-trial-interval for each experiment across whole experiment in case monkeys try to remember the rhythm.
  
  The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds may lead to a rhythmic auditory response, allowing monkeys to anticipate sounds. This is a valuable suggestion, and we appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in driving the 'climbing effect.' The 'climbing effect' starts as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), which is before any rhythm or pattern could be fully established. Typically, rhythm learning requires at least three repetitions to form a predictable sequence.
  
  Unfortunately, we did not vary the inter-stimuli-interval in the current study, so we cannot directly test this hypothesis with the current dataset. However, we agree with the reviewer that using random ISIs would be an effective way to rule out any potential contribution of rhythm learning to the climbing effect directly.
  
  (2) Regarding "reward effect" on IC neurons' responses, we should rule out the possibility of simple auditory response to the switching of electromagnetic valve.
  
  We appreciate the reviewer’s concern about the potential confounding factor of the electromagnetic valve's click sound during water reward delivery, which could be interpreted as an auditory response rather than a true reward-related response. Anticipating this issue, we took measures to eliminate this possibility by placing the electromagnetic valve outside the soundproof room where neuronal recordings were conducted. This setup ensured that any potential auditory noise from the valve was minimized and unlikely to influence the IC neuronal activity.
  
  To address this concern more explicitly, we have added a description in the Methods section detailing this setup. This revision clarifies the steps we took to rule out this potential confound, strengthening the validity of our claim that the observed IC activity is genuinely related to reward processing and not a simple auditory response to the valve's operation.
  
  We thank the reviewer for bringing attention to this critical aspect of our experimental design, and we hope this clarification enhances the interpretation of our findings.
  
  “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)
  
  (3) Since monkeys are smart, simple Go/NoGo design is not a good strategy. The task with more buttons to press, such as 2-AFC or 4-AFC task, may prevent artificial effect of unwanted behaviors and offer us more reliable and useful data.
  
  We appreciate the reviewer’s suggestion to implement a more complex behavioral task, such as a 2-Alternative Forced Choice (2-AFC) or 4-AFC design, to reduce the possibility of unwanted behaviors and to gather more reliable data. We agree that such paradigms could offer additional insights and help control the monkeys’ decision-making processes by reducing potential confounding factors related to the simplicity of Go/NoGo responses.
  
  In our current study, we chose the Go/NoGo task because it aligns with our primary experimental goal: investigating the relationship between IC activity and sensory prediction, decision-making, and reward processing in a simplified manner. This task allowed us to focus on reward prediction and sensory responses without introducing additional complexity that could increase the cognitive load on the monkeys and affect their performance. It is worth noting that training monkeys to perform auditory tasks is generally more challenging compared to visual tasks, though they are indeed capable of complex learning.
  
  Moreover, this novelty detection task was initially designed as an oddball paradigm to explore predictive coding along the auditory pathway. Our lab has concentrated on this topic for several years, with the majority of current research focusing on non-behavioral subjects such as rodents. Implementing a more advanced paradigm like 2-AFC would have increased training time and required a different approach than our core objective.
  
  That said, we agree that future studies would benefit from using more sophisticated tasks, such as 2-AFC or 4-AFC paradigms, as they could offer a more refined understanding of decision-making processes while enhancing the quality of data by minimizing unwanted behaviors. We believe that incorporating more advanced behavioral paradigms in future work will further enhance the rigor and reliability of our findings.
  
  (4) Line 52, "challenges...", sounds a little bit too much. The authors tried to sell the ideal that IC is more than simple sensory relay point. I agree with that and I know the experiments on monkeys are not easy to gain too much comprehensive data. But to support authors' further bold opinions, more analysis is need to be done.
  
  We appreciate the reviewer’s feedback on the tone of the statement in Line 52, where we describe the findings as “challenging” conventional views of the IC as a simple sensory relay point. We agree that while our data provides intriguing insights into the multifunctionality of the IC, especially in sensory prediction, decision-making, and reward processing.
  
  To address this, we have toned down the language in the revised manuscript to better reflect the current state of our findings. Rather than presenting the results as a direct challenge to existing knowledge, we now describe them as contributing to a growing body of evidence that suggests the IC plays a more integrative role in auditory processing and cognitive functions.
  
  “This research highlights a more complex role for the IC than traditionally understood, showcasing its integral role in cognitive and sensory processing and emphasizing its importance in integrated brain functions.” (Abstract, P.3, Line.12-15)
  
  “This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 3-5)
  
  (5) Line 143, "peak response", it is better not to refer this transient response as "peak response". How about "transient response" or "transient peak response"?
  
  Thank you for your suggestion regarding the terminology used in Line 143. We agree with the reviewer that referring to this as simply a "peak response" could be misleading. To improve clarity and precision, we have revised the term to "transient peak response" as recommended.
  
  We believe this adjustment better captures the nature of the neuronal activity observed and avoids confusion. The manuscript has been updated accordingly, and we appreciate the reviewer’s valuable input.
  
  (6) Is it possible to manipulate IC area and check the affection in behavior task?
  
  We appreciate the reviewer’s suggestion to manipulate the IC area and observe its effect on behavior during the task. Indeed, this would provide valuable causal evidence regarding the role of the IC in sensory prediction, decision-making, and reward processing, which would complement the correlational findings we have presented.
  
  However, in this particular study, we focused on electrophysiological recordings to observe naturally occurring neuronal activity in behaving monkeys. While it is certainly feasible to manipulate IC activity, such as through pharmacological inactivation, optogenetics, or electrical stimulation, these techniques pose technical challenges in primates. Moreover, manipulating the IC, given its role as a lower-level relay station in the auditory pathway, could potentially disrupt auditory processing more broadly, complicating the interpretation of behavioral outcomes.
  
  That said, we agree that introducing such manipulations in future studies would significantly enhance our understanding of the causal role of the IC in cognitive and sensory functions. We have now emphasized this as a key future research direction in the revised manuscript’s discussion section. Thank you for this insightful suggestion.
  
  “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)
  
  Reviewer #3 (Recommendations for the authors):
  
  Minor Comments:
  
  (1) Figure Labeling:
  
  The figures require more precise labeling, particularly concerning the analysis time windows, to facilitate reader understanding of the results.
  
  We thank the reviewer for highlighting the importance of precise figure labeling, particularly regarding the analysis time windows. We understand that clear labeling is critical for conveying our findings effectively.
  
  In response to your suggestion, we have revised the figures to include more precise and detailed labels, especially for the analysis time windows. These changes will help guide readers through the experimental design and clarify the interpretation of the results. We hope these improvements enhance the overall clarity and accessibility of the figures.
  
  (2) Discrepancies in Figures and Text:
  
  There are discrepancies in the manuscript that could confuse readers. For example, on line 154, what was referred to as Supplementary Figure 1 seemed to actually be Supplementary Figure 2. Similar issues were noted on lines 480 and 606.
  
  We appreciate the reviewer bringing this issue to our attention. We apologize for the discrepancies between the figures referenced in the text and their actual labels in the manuscript, as this could indeed confuse readers.
  
  We have carefully reviewed the entire manuscript and corrected all discrepancies between the figures and their corresponding references in the text, including the issues noted on lines 154, 480, and 606. We have ensured that the figure and supplementary figure references are now consistent and accurate throughout the manuscript.
  
  (3) Inconsistent Formatting in Figure legends:
  
  Ensuring a more professional and uniform presentation throughout the manuscript would be appreciated. There was inconsistent use of uppercase and lowercase letters in legends.
  
  We appreciate the reviewer’s attention to detail regarding the formatting of figure legends. Ensuring a professional and consistent presentation is crucial for enhancing the readability and overall quality of the manuscript.
  
  We have carefully reviewed all figure legends and made the necessary corrections to ensure consistent use of uppercase and lowercase letters, as well as uniform formatting throughout the manuscript. This includes ensuring that all abbreviations and terminology are used consistently across the text and legends.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.16.603747v2
www.biorxiv.org www.biorxiv.org

Structural mechanisms of PIP2 activation and SEA0400 inhibition in human cardiac sodium-calcium exchanger NCX1

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Public Review):
  
  (1) This study uses structural and functional approaches to investigate the regulation of the Na/Ca exchanger NCX1 by an activator, PIP2, and an inhibitor, SEA0400. State-of-the-art methods are employed, and the data are of high quality and presented very clearly. The manuscript combines two rather different studies (one on PIP2; and one on SEA0400) neither of which is explored in the depth one might have hoped to form robust conclusions and significantly extend knowledge in the field.
  
  We combined the study of PIP2 and SEA0400 in this manuscript because both ligands inhibit or activate NCX1 by affecting the Na<sup>+</sup>-dependent inactivation of the exchanger - SEA0400 promotes inactivation by stabilizing the cytosolic inactivation assembly whereas PIP2 mitigates inactivation by destabilizing the assembly. The current study aims to provide structural insights into these ligand binding. We didn’t perform extensive electrophysiological analysis as the functional effects of both ligands have been extensively characterized over the last thirty years.
  
  (2) The novel aspect of this work is the study of PIP2. Unfortunately, technical limitations precluded structural data on binding of the native PIP2, so an unnatural short-chained analog, diC8 PIP2, was used instead. This raises the question of whether these two molecules, which have similar but very distinctly different profiles of activation, actually share the same binding pocket and mode of action. In an effort to address this, the authors mutate key residues predicted to be important in forming the binding site for the phosphorylated head group of PIP2. However, none of these mutations prevent PIP2 activation. The only ones that have a significant effect also influence the Na-dependent inactivation process independently of PIP2, thus casting doubt on their role in PIP2 binding, and thus identification of the PIP2 binding site. A more extensive mutagenic study, based on the diC8 PIP2 binding site, would have given more depth to this work and might have been more revealing mechanistically.
  
  The reviewer raises the important question of whether the short-chain PIP2 diC8 and long-chain native PIP2 share the same binding site. We have performed a pilot experiment to address this question. The data indicate that PIP2 diC8 competes with native brain PIP2 for its binding site (Author response image 1). We believe that the mild effects of diC8 on the biophysical properties of NCX1 are due to its decreased affinity as compared to the long-chain PIP2. We have included this competition assay in the revised manuscript.
  
  The acyl-chain length-dependent PIP2 activation is consistent with some previous studies. Before PIP2 was demonstrated to regulate NCX1, some earlier studies showed that negatively charged long-chain lipids such as phosphatidylserine (PS) or phosphatidic acid (PA) could have the same potentiation effects on NCX1 as PIP2 (PMID: 1474504; PMID: 3276350). A later study showed that long-chain acyl-CoAs could also have the same potentiation effects on NCX1 as PIP2 (PMID: 16977318). All these studies demonstrated that activation of NCX by the anionic lipids depends on their chain length with the short chain being ineffective or less effective. These findings have two implications. First, it is the negative surface charge rather than the specific IP3 head group of the lipid that is important for stimulating NCX1 activity. This would imply non-specific electrostatic interactions between the negatively charged lipids and those positively charged residues at the binding site. Second, a longer acyl chain is required for the high-affinity binding of PIP2 or negatively charged lipids. As further discussed in the revised manuscript (Discussion section), we suspect the tail of the long acyl chain from the native anionic lipids can enter the same binding pocket for SEA0400 thereby rendering higher affinity lipid binding than shorter chain lipids.
  
  As the interactions between PIP2 and NCX1 are both electrostatic involving multiple charged residues as well as hydrophobic involving the long lipid acyl chain, single amino acid substitutions likely only decrease the affinity of PIP2 rather than completely disrupt its binding. Our data demonstrated that mutants R220A, K225A, and R220A/K225A do show a significantly decreased potentiation effect of PIP2 (Figure 3 in the manuscript). We also conducted an experiment with a mutant exchanger in which all four amino were mutated. This K164A/R167A/R220A/K225A mutant is insensitive to PIP2 and shows no Na<sup>+</sup>-dependent inactivation (Figure 3A). The unresponsiveness to PIP2 and lack of Na<sup>+</sup>-dependent inactivation in this mutant is consistent with previous studies demonstrating that PIP2 activates NCX by tuning the amount of Na<sup>+</sup>-dependent inactivation and any mutation that decreases NCX sensitivity to PIP2 will affect the extent of Na<sup>+</sup>-dependent inactivation (PMID: 10751315). Such studies show that the two processes cannot be dissected from each other, making more extensive mutagenesis investigation unlikely to provide new mechanistic insights. A brief discussion related to this quadruple mutant has been added in the revised manuscript.
  
  Author response image 1.
  
  Giant patch recording of the human WT exchanger. Currents were first activated by intracellular application of 10 µM brain PIP2. Afterwards, a solution containing 100 mM Na<sup>+</sup> and 12 µM Ca<sup>2+</sup> was perfused for about 5 min (washout). The PIP2 effects was not reversible during this time. The same patch was then perfused internally with the same solution in presence of 10 µM di-C8. Application of the shorted-chained di-C8, partially decreased the current suggesting that that PIP2 and diC8 compete for the binding site.
  
  (3) The SEA0400 aspect of the work does not integrate particularly well with the rest of the manuscript. This study confirms the previously reported structure and binding site for SEA0400 but provides no further information. While interesting speculation is presented regarding the connection between SEA0400 inhibition and Na-dependent inactivation, further experiments to test this idea are not included here.
  
  Our SEA0400-bound NCX structure was determined and deposited in 2023, along with our previous study on the apo NCX published in 2023 (PMID: 37794011). We decided to combine the SEA0400-bound structure with the later study of PIP2 binding because both represent ligand modulation of NCX by affecting the Na<sup>+</sup>-dependent inactivation of the exchanger. The SEA0400 inhibition of NCX1 has been extensively investigated previously, which demonstrated a strong connection between SEA0400 and the Na<sup>+</sup>-dependent inactivation. As discussed in the manuscript, SEA0400 is ineffective in an exchanger lacking Na<sup>+</sup>-dependent inactivation. Conversely, enhancing the extent of Na<sup>+</sup>-dependent inactivation increases the affinity for SEA0400. Our structural analysis provides explanations for these pharmacological features of SEA0400 inhibition.
  
  Reviewer #2 (Public review):
  
  (1) The study by Xue et al. reports the structural basis for the regulation of the human cardiac sodium-calcium exchanger, NCX1, by the endogenous activator PIP2 and the small molecule inhibitor SEA400. This well-written study contextualizes the new data within the existing literature on NCX1 and the broader NCX family. This work builds upon the authors' previous study (Xue et al., 2023), which presented the cryo-EM structures of human cardiac NCX1 in both inactivated and activated states. The 2023 study highlighted key structural differences between the active and inactive states and proposed a mechanism where the activity of NCX1 is regulated by the interactions between the ion-transporting transmembrane domain and the cytosolic regulatory domain. Specifically, in the inward-facing state and at low cytosolic calcium levels, the transmembrane (TM) and cytosolic domains form a stable interaction that results in the inactivation of the exchanger. In contrast, calcium binding to the cytosolic domain at high cytosolic calcium levels disrupts the interaction with the TM domain, leading to active ion exchange.
  
  In the current study, the authors present two mechanisms explaining how both PIP2 stimulates NCX1 activity by destabilizing the protein's inactive state (i.e., by disrupting the interaction between the TM domain and the cytosolic domain) and how SEA400 stabilizes this interaction, thereby acting as a specific inhibitor of the system.
  
  The first part of the results section addresses the effect of PIP2 and PIP2 diC8 on NCX1 activity. This is pertinent as the authors use the diC8 version of this lipid (which has a shorter acyl chain) in their subsequent cryo-EM structure due to the instability of native PIP2. I am not an electrophysiology expert; however, my main comment would be to ask whether there is sufficient data here to characterise fully the differences between PIP2 and PIP2 diC8 on NCX1 function. It appears from the text that this study is the first to report these differences, so perhaps this data needs to be more robust. The spread of the data points in Figure 1B is possibly a little unconvincing given that only six measurements were taken. Why is there one outlier in Figure 1A? Were these results taken using the same batch of oocytes? Are these technical or biological replicates? Is the convention to use statistical significance for these types of experiments?
  
  Oocytes were isolated from at least 3 different frogs and each data point shown in Fig. 1 A or 1B of the manuscript represents a recording obtained from a single oocyte. For clarity, we have added this information to the Methods section. We understand that 6 observations (Fig. 1B) are a small sample size but electrophysiological recordings of NCX currents are extremely challenging and technically difficult due to the low transport activity of the exchanger. Because of these circumstances, this type of study relies on a small sample of observations. Nevertheless, our data clearly show that native PIP2 and the short-chain PIP2 diC8 can activate NCX activity although with different affinity. The spread of the steady state current data points is due to the variability in the extent of Na<sup>+</sup>-dependent inactivation within each patch, likely due to slightly different levels of endogenous PIP2 or other regulatory mechanisms that control this allosteric process. As PIP2 acts on the Na<sup>+</sup>-dependent inactivation this will lead to varying levels of potentiation. Because of that, we did occasionally observe some outliers in our recordings. Rather than cherry-picking in data analysis, we presented all the data points from patches with measurable NCX1 currents. Despite this variability, a T-test indicates that the effects of PIP2 are more pronounced on the steady-state current than peak current. The differences between native PIP2 and PIP2 diC8 on NCX1 function are consistent with previous investigations showing that both PIP2 and anionic lipids enhance NCX current by antagonizing the Na<sup>+</sup>-dependent inactivation and long-chain lipids are more effective in potentiating NCX1 activity (PMID: 1474504; PMID: 3276350; PMID: 16977318). A discussion related to the chain length-dependent lipid activation of NCX1 is added in the Discussion of the revised manuscript.
  
  (2) I am also somewhat skeptical about the modelling of the PIP2 diC8 molecule. The authors state, "The density of the IP3 head group from the bound PIP2 diC8 is well-defined in the EM map. The acyl chains, however, are flexible and could not be resolved in the structure (Fig. S2)."
  
  However, the density appears rather ambiguous to me, and the ligand does not fit well within the density. Specifically, there is a large extension in the volume near the phosphate at the 5' position, with no corresponding volume near the 4' phosphate. Additionally, there is no bifurcation of the volume near the lipid tails. I attempted to model cholesterol hemisuccinate (PDB: Y01) into this density, and it fits reasonably well - at least as well as PIP2 diC8. I am also concerned that if this site is specific for PIP2, then why are there no specific interactions with the lipid phosphates? How can the authors explain the difference between PIP2 and PIP2 diC8 if the acyl chains don't make any direct interactions with the TM domain? In short, the structures do not explain the functional differences presented in Figure 1.
  
  The side chain densities for Arg167 and Arg220 are also quite weak. While there is some density for the side chain of Lys164, it is also very weak. I would expect that if this site were truly specific for PIP2, it should exhibit greater structural rigidity - otherwise, how is this specific?
  
  Given this observation, have the authors considered using other PIP2 variants to determine if the specificity lies with PI4,5P<sub>2</sub> as opposed to PI3,5P<sub>2</sub> or PI3,4P<sub>2</sub>? A lack of specificity may explain the observed poor density.
  
  The map we provided to the editor in the initial submission is the overall map for PIP2-bound NCX1. Due to the relative flexibility between the cytosolic CBD and TM regions, we also performed local refinement on each region in data processing to improve the map quality as illustrated in Fig. S2. The local-refined map focused on the TM domain provides a much better density for PIP2 diC8 and its surrounding residues than the overall map. The map quality allowed us to unambiguously identify the lipid as PIP2 with the IP3 head group having phosphate groups at the 4,5 positions. Furthermore, no lipid density is observed at the equivalent location in the local-refined map from the apo NCX1 TM region as shown in Fig. S3 in the revision. In the revised manuscript, the density for the bound PIP2 is shown in Fig. 2A. Those local-refined maps for PIP2-bound NCX1 were also deposited as additional maps along with the overall map in the Electron Microscopy Data Bank under accession numbers EMD-60921. The local-refined maps for the apo-NCX1 were deposited in the Electron Microscopy Data Bank under accession numbers EMD-40457 in our previous study (https://www.ebi.ac.uk/emdb/EMD-40457?tab=interpretation).
  
  As discussed in our response to reviewer #1, the acyl-chain length-dependent PIP2 activation is consistent with some previous studies. Before PIP2 was identified as a physiological regulator of NCX1, some earlier studies showed that negatively charged long-chain lipids such as phosphatidylserine (PS) or phosphatidic acid (PA) could have the same potentiation effects on NCX as PIP2 (PMID: 1474504; PMID: 3276350). A later study also showed that acyl-CoA could also have the same potentiation effects on NCX as PIP2 (PMID: 16977318). All these studies demonstrated that activation of NCX1 by the anionic lipids depends on their chain length with the short chain being ineffective. These findings have two implications. First, it is the negative surface charge rather than the specific IP3 head group of the lipid that is important for stimulating NCX activity. This would imply non-specific electrostatic interactions between the negatively charged lipids and those positively charged residues at the binding site. Second, a longer acyl chain is required for the high-affinity binding of PIP2 or negatively charged lipids. As further discussed in the revised manuscript (Discussion section), we suspect the tail of the long acyl chain can enter the same binding pocket for SEA0400 thereby rendering higher affinity lipid binding than shorter chain lipids. In light of the equivalent potentiating effect of various anionic lipids on NCX1, PI(4,5)P2 activation of NCX1 is likely non-specific and PI(3,5)P2 or PI(3,4)P2 may also activate the exchanger. However, as a key player in membrane signaling, PI(4,5)P2 has been demonstrated to be a physiological regulator of NCX1 in many studies.
  
  (3) I also noticed many lipid-like densities in the maps for this complex. Is it possible that the authors overlooked something? For instance, there is a cholesterol-like density near Val51, as well as something intriguing near Trp763, where I could model PIP2 diC8 (though this leads to a clash with Trp763). I wonder if the authors are working with mixed populations in their dataset. The accompanying description of the structural changes is well-written (assuming it is accurate).
  
  Densities from endogenous lipids and cholesterols are commonly observed in membrane protein structures. Other than the bound PIP2, those lipid and cholesterol densities are present in both the apo and PIP2-bound structures, including the density around Trp763 and Val53. Whether those bound lipids/cholesterols play any functional roles or just stabilize the protein is beyond the scope of this study. We have added a supporting figure (Fig. S3) showing a side-by-side comparison of the density at the PIP2 binding site between the PIP2-bound and apo structures.
  
  I would recommend that the authors update the figures associated with this section, as they are currently somewhat difficult to interpret without prior knowledge of NCX architecture. My suggestions include:
  
  - Including the density for the PIP2 diC8 in Figure 2A.
  
  As suggested, we have included the density of PIP2 diC8 in Figure 2A.
  
  - Adding membrane boundaries (cytosolic vs. extracellular) in Figure 2B.
  
  - Labeling the cytosolic domains in Figure 2B.
  
  - Adding hydrogen bond distances in Figure 2A.
  
  We have added and labeled the boundaries for the TM and cytosolic domains in Figure 2B as suggested. Although we can identify those positively charged residues in the vicinity of the PIP2 head group and observe local structural changes, the poorly defined side-chain densities of these residues won’t allow us to properly determine the hydrogen bond distances.
  
  - Detailing the domain movements in Figure 2B (what is the significance of the grey vs. blue structures?).
  
  There is a rigid-body downward swing movement at CBDs between the apo (grey) and PIP2-bound (cyan) structures. The movement at the TM region is subtle. We have added the description in the legend for Figure 2B and also marked the movement at the tip of CBD1 in the figure.
  
  The section on the mechanism of SEA400-induced inactivation is strong. The maps are of better quality than those for the PIP2 diC8 complex, and the ligand fits well. However, I noticed a density peak below F02 on SEA400 that lies within the hydrogen bonding distance of Asp825. Is this a water molecule? If so, is this significant?
  
  The structure of SEA0400-bound NCX1 was determined at a higher resolution likely because the drug stabilize the exchanger in the inactivated state. The mentioned density could be an ordered water molecule. We don’t know if it is functionally significant.
  
  Furthermore, there are many unmodeled regions that are likely cholesterol hemisuccinate or detergent molecules, which may warrant further investigation.
  
  We constantly observed partial densities from bound lipids, cholesterols, or detergents in our structures. Most of them are difficult to be unambiguously identified and modeled. Whether they play any functional roles is beyond the scope of this study.
  
  The authors introduce SEA400 as a selective inhibitor of NCX1; however, there is little to no comparison between the binding sites of the different NCX proteins. This section could be expanded. Perhaps Fig. 4C could include sequence conservation data.
  
  SEA0400 is more specific for NCX1 than NCX2 and NCX3 as demonstrated in an early study (PMID: 14660663). The lack of structure information for NCX2 or NCX3 makes it difficult to make a direct comparison to reveal the structural basis of SEA0400 specificity.
  
  Additionally, is the fenestration in the membrane physiological, or is it merely a hole forced open by the binding of SEA400? I was unclear as to whether the authors were suggesting a physiological role for this feature, similar to those observed in sodium channels.
  
  The fenestration likely serves as the portal for SEA0400 binding as discussed in the manuscript. As further discussed in the revised manuscript, we suspect this fenestration also allows the tail of a long-chain lipid to enter the same binding pocket for SEA0400 and results in higher affinity binding of a long-chain lipid than a short-chain lipid.
  
  Reviewer #3 (Public review):
  
  NCXs are key Ca<sup>2+</sup> transporters located on the plasma membrane, essential for maintaining cellular Ca<sup>2+</sup> homeostasis and signaling. The activities of NCX are tightly regulated in response to cellular conditions, ensuring precise control of intracellular Ca<sup>2+</sup> levels, with profound physiological implications. Building upon their recent breakthrough in determining the structure of human NCX1, the authors obtained cryo-EM structures of NCX1 in complex with its modulators, including the cellular activator PIP2 and the small molecule inhibitor SEA0400. Structural analyses revealed mechanistically informative conformational changes induced by PIP2 and elucidated the molecular basis of inhibition by SEA0400. These findings underscore the critical role of the interface between the transmembrane and cytosolic domains in NCX regulation and small molecule modulation. Overall, the results provide key insights into NCX regulation, with important implications for cellular Ca<sup>2+</sup> homeostasis.
  
  We appreciate this reviewer’s positive comments.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript would be strengthened enormously by a much deeper focus on the novel and very interesting PIP2 work, as noted above, and perhaps the removal of the SEA0400 data.
  
  If that is beyond the scope of the authors' options, then a more robust discussion of limitations of the current work, perhaps speculation regarding other future experiments, a clearer presentation of how these data on SEA0400 are different from/extend from the previously published work, and a better effort to link the two disparate aspects of the work into a more cohesive manuscript should be attempted.
  
  As discussed in our response to this reviewer’s public review, we combined the study of PIP2 and SEA0400 in this manuscript because both ligands activate or inhibit NCX1 by affecting the Na<sup>+</sup>-dependent inactivation of the exchanger. The functional effects of both ligands on NCX1 have been extensively characterized over the last thirty years. Thus the current study is focused on providing structural explanations for some unique pharmacological features of these ligands. In the revised manuscript, we have added an extra paragraph of discussion that provides a plausible explanation for chain length-dependent PIP2 activation.
  
  Reviewer #3 (Recommendations for the authors):
  
  A few comments to consider:
  
  (1) The short-chain PIP2 appears to have lower potency, but the mechanism remains unclear. Based on structural analyses, are there potential binding sites for the acyl chains of PIP2 that could contribute to this difference?
  
  As discussed in our response to other reviewers, long-chain anionic lipids can have the same potentiation effect on NCX1 activity as PIP2, but the short-chain ones are ineffective just like short-chain PIP2 diC8. We suspect the tail of a long acyl chain from the native PIP2 can enter the same binding pocket for SEA0400 thereby rendering higher affinity binding for a long-chain lipid than a short-chain lipid. A discussion related to this point has been added to the revised manuscript.
  
  (2) It is unclear why mutating residues that interact with the IP3 head group retain PIP2 activation. Would it be possible to assess PIP2 and C8 PIP2 binding to these NCX1 variants? Identifying a mutant that abolishes C8 PIP2 binding would be valuable in interpreting those results.
  
  As the interactions between PIP2 and NCX1 are both electrostatic involving multiple charged residues and hydrophobic involving the long lipid acyl chain, single amino acid substitutions likely only decrease the affinity of PIP2 rather than completely disrupt its binding. Individual mutants R220A and K225A show a 5-fold decrease in their response to PIP2 application indicating that their replacement alters the affinity of NCX for PIP2. We have added a new experiment showing that an exchanger with all four residues mutated is insensitive to PIP2 in the revision.
  
  (3) What are the functional effects of mutating Y226 and R247, residues that seem to play an important role in PIP2-mediated activation?
  
  In a previous study, mutation at Y226 (Y226T), which is found within the XIP region of NCX, has been shown to have enhanced Na<sup>+</sup>-dependent inactivation (PMID: 9041455). To our knowledge, the R247 mutation has not been investigated. Also positioned in the XIP region, we suspect its mutation could directly affect Na<sup>+</sup>-dependent inactivation. This would make it difficult to determine if the function effect of the mutation is caused by changing the stability of the XIP region or by changing the binding of PIP2.
  
  (4) Is there any overlap between the PIP2 and SEA0400 binding regions? Both appear to involve TM4, TM5, and TMD-beta hub interfaces. It might be interesting to discuss any shared mechanisms and why this region might serve as a hotspot for modulation.
  
  As mentioned in our previous response, we suspect the tail of a long acyl chain from the native PIP2 can enter the same binding pocket for SEA0400 thereby rendering higher affinity binding for a long-chain lipid than a short-chain lipid. A more detailed discussion related to this point has been included in the revision.
  
  (5) It would be helpful to show the density at the PIP2-binding site in the apo and PIP2-bound structures side by side
  
  This figure has been added in the revision as Fig. S3.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.05.627058v2
www.biorxiv.org www.biorxiv.org

Causal Role of the Frontal Eye Field in Attention-induced Ocular Dominance Plasticity

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This important study combines psychophysics, fMRI, and TMS to reveal a causal role of FEF in generating an attention-induced ocular dominance shift, with potential relevance for clinical applications. The evidence supporting the claims of the authors is solid, but the theoretical and mechanistic interpretation of results and experimental approaches need to be strengthened. The work will be of broad interest to perceptual and cognitive neuroscience.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Based on a "dichoptic-background-movie" paradigm that modulates ocular dominance, the present study combines fMRI and TMS to examine the role of the frontoparietal attentional network in ocular dominance shifts. The authors claimed a causal role of FEF in generating the attention-induced ocular dominance shift.
  
  Strengths:
  
  A combination of fMRI, TMS, and "dichoptic-background-movie" paradigm techniques is used to reveal the causal role of the frontoparietal attentional network in ocular dominance shifts. The conclusions of this paper are mostly well supported by data.
  
  Weaknesses:
  
  (1) The relationship between eye dominance, eye-based attention shift, and cortical functions remains unclear and merits further delineation. The rationale of the experimental design related to the hemispheric asymmetry in the FEF and other regions should be clarified.
  
  Thanks for the reviewer’s comments! We have further clarified the relationship between eye dominance shift, eye-based attention, and cortical functions in the Introduction and Discussion. In the Introduction, we introduce the modulating effects of eye-based attention on eye dominance. On one hand, eye-based attention can enhance eye dominance of the attended eye in real time (see page 3 first paragraph or below):
  
  ”For instance, presenting top-down attentional cues to one eye can intensify the competition strength of input signals in the attended eye during binocular rivalry (Choe & Kim, 2022; Zhang et al., 2012) and shift the eye balance towards the attended eye (Wong et al., 2021).”
  
  On the other hand, prolonged eye-based attention can induce a shift of eye dominance to the unattended eye (see page 3 second paragraph or below):
  
  “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).”
  
  Moreover, we discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or below, which also respond to this reviewer’s comment of Weakness #2):
  
  “Then how does FEF regulate the attention-induced ocular dominance shift? Our previous work has found that the aftereffect (for simplicity, hereafter we use aftereffect to denote the attention-induced ocular dominance shift) can be produced only when the adapting stimuli involve adequate interocular competition, and is measurable only when the testing stimuli are not binocularly fused (Song et al., 2023). Given the indispensability of interocular competition, we explained those findings in the framework of the ocular-opponency-neuron model of binocular rivalry (Said & Heeger, 2013). The model suggests that there are some opponency neurons which receive excitatory inputs from monocular neurons for one eye and inhibitory inputs from monocular neurons for the other eye (e.g. AE-UAE opponency neurons receive excitatory inputs from the attended eye (AE) and inhibitory inputs from the unattended eye (UAE)). Then a difference signal is computed so that the opponency neurons fire if the excitatory inputs surpass the inhibitory inputs. Upon activation, the opponency neurons will in turn suppress the monocular neurons which send inhibitory signals to them.
  
  Based on this model, we proposed an ocular-opponency-neuron adaptation account to explain the aftereffect, and pointed out that the attentional system likely modulated the AE-UAE ocular opponency neurons (Song et al., 2023). So why would FEF modulate the AE-UAE opponency neurons? The reason may be two fold. Firstly, understanding the logic during the dichoptic-backward-movie viewing may require filtering out the distracting information (from the unattended eye) and sustaining attention (to the attended eye), which is exactly the role of FEF (Esterman et al., 2015; Lega et al., 2019).
  
  Secondly, due to the special characteristics of binocular vision system, filtering the distracting input from the unattended eye may have to rely on the interocular suppression mechanism. According to the ocular-opponency-neuron model, this is achieved by the firing of the AE-UAE opponency neurons that send inhibitory signals to the UAE monocular neurons.
  
  As mentioned previously, the firing of the AE-UAE opponency neurons requires stronger activity for the AE monocular neurons than for the UAE monocular neurons. This is confirmed by the results shown in Figure 8 of Song et al. (2023) that monocular response for the attended eye during the entire adaptation phase was slightly stronger than that for the unattended eye. Accordingly, during adaptation the AE-UAE opponency neurons were able to activate for a longer period thus adapted to a larger extent than the UAE-AE opponency neurons. This would cause the monocular neurons for the unattended eye to receive less inhibition from the AE-UAE opponency neurons in the post-test as compared with the pre-test, leading to a shift of ocular dominance towards the unattended eye. In this vein, the magnitude of this aftereffect should be proportional to the extent of adaptation of the AE-UAE relative to UAE-AE opponency neurons. Attentional enhancement on the AE-UAE opponency neurons is believed to strengthen this aftereffect, as it has been found that attention can enhance adaptation (Dong et al., 2016; Rezec et al., 2004). Inhibition of FEF likely led such attentional modulation to be much less effective. Consequently, the AE-UAE opponency neurons might not have the chance to adapt to a sufficiently larger extent than the UAE-AE opponency neurons, leading to a statistically non-detectable aftereffect in Experiment 2. Therefore, the results of Experiments 2-4 in the present study suggest that within the context of the ocular-opponency-neuron adaptation account, FEF might be the core area to fulfill the attentional modulations on the AE-UAE opponency neurons.”
  
  We used the experimental design with hemispheric asymmetry in the FEF and other regions for two reasons. First, many studies have shown that the dorsal attentional network has a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010). This was also indicated by the results of Experiment 1 (Figure 3). Second, we found that a recent research applying TMS to FEF and IPS stimulated only the right hemisphere (Gallotto et al., 2022). Therefore, we selected the right FEF and right IPS as the target regions for cTBS. In the Methods section of Experiment 2, we have elucidated the reasons for the selection of cTBS target regions (see page 35, first paragraph or below):
  
  “Given that the dorsal attentional network primarily consists of the FEF and the IPS (Corbetta & Shulman, 2002; Mayrhofer et al., 2019), with a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010), we selected the right FEF and right IPS from the four clusters identified in Experiment 1 as the target regions for cTBS (Gallotto et al., 2022).”
  
  (2) Theoretically, how the eye-related functions in this area could be achieved, and how it interacts with the ocular representation in V1 warrant further clarification.
  
  Thanks for the reviewer’s comment! In the revised manuscript, we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or the quoted paragraphs under this reviewer’s first Public comment).
  
  Reviewer #2 (Public Review):
  
  Summary
  
  Song et al investigate the role of the frontal eye field (FEF) and the intraparietal sulcus (IPS) in mediating the shift in ocular dominance (OD) observed after a period of dichoptic stimulation during which attention is selectively directed to one eye. This manipulation has been previously found to transiently shift OD in favor of the unattended eye, similar to the effect of short-term monocular deprivation. To this aim, the authors combine psychophysics, fMRI, and transcranial magnetic stimulation (TMS). In the first experiment, the authors determine the regions of interest (ROIs) based on the responses recorded by fMRI during either dichoptic or binocular stimulation, showing selective recruitment of the right FEF and IPS during the dichoptic condition, in line with the involvement of eye-based attention. In a second experiment, the authors investigate the causal role of these two ROIs in mediating the OD shift observed after a period of dichoptic stimulation by selectively inhibiting with TMS (using continuous theta burst stimulation, cTBS), before the adaptation period (50 min exposure to dichoptic stimulation). They show that, when cTBS is delivered on the FEF, but not the IPS or the vertex, the shift in OD induced by dichoptic stimulation is reduced, indicating a causal involvement of the FEF in mediating this form of short-term plasticity. A third control experiment rules out the possibility that TMS interferes with the OD task (binocular rivalry), rather than with the plasticity mechanisms. From this evidence, the authors conclude that the FEF is one of the areas mediating the OD shift induced by eye-selective attention.
  
  Strengths
  
  (1) The experimental paradigm is sound and the authors have thoroughly investigated the neural correlates of an interesting form of short-term visual plasticity combining different techniques in an intelligent way.
  
  (2) The results are solid and the appropriate controls have been performed to exclude potential confounds.
  
  (3) The results are very interesting, providing new evidence both about the neural correlates of eye-based attention and the involvement of extra-striate areas in mediating short-term OD plasticity in humans, with potential relevance for clinical applications (especially in the field of amblyopia).
  
  Weaknesses
  
  (1) Ethics: more details about the ethics need to be included in the manuscript. It is only mentioned for experiment 1 that participants "provided informed consent in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences". (Which version of the Declaration of Helsinki? The latest version requires the pre-registration of the study. The code of the approved protocol together with the code and date of the approval should be provided.) There is no mention of informed consent procedures or ethics approval for the TMS experiments. This is a huge concern, especially for brain stimulation experiments!
  
  Response: Thanks for the reviewer’s comment! In the revised manuscript, we have provided the code of the approved protocol and date of the approval (see page 25 second paragraph or below):
  
  “This study was approved (H21058, 11/01/2021) by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.”
  
  Indeed, ethics approval and informed consent were obtained for each experiment. To avoid duplication in the text, we only presented the ethics instructions in the Methods section of Experiment 1. We have now clarified in that section that all the experiments in this study were approved by the IRB in our Institute.
  
  (2) Statistics: the methods section should include a sub-section describing in detail all the statistical analyses performed for the study. Moreover, in the results section, statistical details should be added to support the fMRI results. In the current version of the manuscript, the claims are not supported by statistical evidence.
  
  Response: Thanks for the reviewer’s suggestion! In the Methods section of revised manuscript, we have added a section to describe the detailed statistical analyses for each experiment (see page 37 last paragraph for Experiment 2 and page 38 last paragraph for Experiment 3 or below):
  
  “Statistical analyses were performed using MATLAB. A 3 (stimulation site: Vertex, FEF, IPS) × 2 (test phase: pre-test and post-test) repeated measures ANOVA was used to investigate the effect of cTBS delivery on ocular dominance shift. Moreover, for the blob detection test, the target detection rate of each experimental condition was calculated by dividing the summed number of detected blob targets by the total number of blob targets. Then, a 2 (eye: attended eye, unattended eye) × 3 (stimulation site: Vertex, FEF, IPS) repeated measures ANOVA on the detection performance was performed. Post-hoc tests were conducted using paired t-tests (2-tailed significance level at α = 0.05), and the resulting p-values were corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini & Hochberg, 1995).”
  
  “In addition to the data analysis in Experiment 2, we complemented the standard inferential approach with the Bayes factor (van den Bergh et al., 2023; van Doorn et al., 2021; Wagenmakers et al., 2018), which allows quantifying the relative evidence that the data provide for the alternative (H1) or null hypothesis (H0). We conducted the Bayesian repeated measures ANOVA using JASP with default priors and computed inclusion Bayes factors (BFincl) which suggest the evidence for the inclusion of a particular effect calculated across matched models. A BF greater than 1 provides support for the alternative hypothesis. Specifically, a BF between 1 and 3 indicates weak evidence, a BF between 3 and 10 indicates moderate evidence, and a BF greater than 10 indicates strong evidence (van Doorn et al., 2021). In contrast, a BF below 1 provides evidence in favor of the null hypothesis.”
  
  Furthermore, in the Results section of revised manuscript, we have added the statistical details to support the fMRI results (see page 9 last paragraph or below):
  
  “To seek these brain regions, we used the AFNI program “3dttest++” to access the difference of ‘dichoptic-binocular’ contrast between the experimental and control runs. The AFNI program “ClustSim” was then applied for multiple comparison correction, yielding a minimum significant cluster size of 21 voxels (voxel wise p = .001; cluster threshold α = 0.05). We found 4 clusters showing stronger responses to the dichoptic movies than to the binocular movies especially in the experimental runs.”
  
  (3) Interpretation of the results: the TMS results are very interesting and convincing regarding the involvement of the FEF in the build-up of the OD shift induced by dichoptic stimulation, however, I am not sure that the authors can claim that this effect is related to eye-based attention, as cTBS has no effect on the blob detection task during dichoptic stimulation. If the FEF were causally involved in eye-based attention, one would expect a change in performance in this task during dichoptic stimulation, perhaps a similar performance for the unattended and attended eye. The authors speculate that the sound could have an additional role in driving eye-based attention, which might explain the lack of effect for the blob discrimination task, however, this hypothesis has not been tested.
  
  Response: Thanks for the reviewer’s comment! Following this reviewer’s insightful suggestion, we have conducted a new experiment to examine the effect of sound on blob detection task (see Experiment 4 in the revised manuscript). The procedure was similar to that of Experiment 2 except that the sound was no longer presented during the dichoptic-backward-movie adaptation. The results showed that the interocular difference of blob detection rate after sound elimination remained unaffected by the cTBS, which disagreed with our explanation in the previous version of manuscript. Based on the new data, we now question the validity to use the blob detection rate to precisely quantify eye-based attention, and have tried to explain why the blob detection results do not contradict with our account for the function role of FEF in modulating the aftereffect in the Discussion of the revised manuscript (see page 23 second paragraph to page 24 first paragraph or below):
  
  “An unresolved issue is why inhibiting the cortical function of FEF did not impair the performance of blob detection task. One potential explanation is that the synchronized audio in Experiment 2 might help increase the length of time that the regular movie dominated awareness. However, the results of Experiment 4 did not support this explanation, in which the performance of blob detection survived from the inhibition of FEF even when silent movies were presented. Although this issue remains to be explored in future work, it does not contradict with our notion of FEF modulating AE-UAE opponency neurons. It should be noted that our notion merely states that FEF is the core area for attentional modulations on activities of AE-UAE opponency neurons. No other role of FEF during the adaptation is assumed here (e.g. boosting monocular responses or increasing conscious level of stimuli in the attended eye). In contrast, according to the most original definition, the blob detection performance serves as an estimation of visibility (or consciousness level) of the stimuli input from each eye, despite the initial goal of adopting this task is to precisely quantify eye-based attention (which might be impractical). Thus, according to our notion, inhibition of FEF does not necessarily lead to deteriorate performance of blob detection. Furthermore, our findings consistently indicated that the visibility of stimuli in the attended eye was markedly superior to that of stimuli in the unattended eye, yet the discrepancy in the SSVEP monocular responses between the two eyes was minimal though it had reached statistical significance (Song et al., 2023). Therefore, blob detection performance in our work may only faithfully reflect the conscious level in each monocular pathway, but it is probably not an appropriate index tightly associated with the attentional modulations on monocular responses in early visual areas. Indeed, previous work has argued that attention but not awareness modulates neural activities in V1 during interocular competition (Watanabe et al., 2011), but see (Yuval-Greenberg & Heeger, 2013). We have noticed and discussed the counterintuitive results of blob detection performance in our previous work (Song et al., 2023). Here, with the new counterintuitive finding that inhibition of FEF did not impair the performance of blob detection, we suspect that blob detection performance in the “dichoptic-backward-movie” adaptation paradigm may not be an ideal index that can be used to accurately quantify eye-based attention.
  
  (4) Writing: in general, the manuscript is well written, but clarity should be improved in certain sections.
  
  (a) fMRI results: the first sentence is difficult to understand at first read, but it is crucial to understand the results, please reformulate and clarify.
  
  Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have reformulated this sentence (see page 9 last paragraph or below):
  
  “It was only in the dichoptic condition of experimental runs that participants had to selectively pay more attention to one eye (i.e., eye-based attention). Therefore, we speculate that if certain brain regions exhibit greater activities in the dichoptic condition as compared to the binocular condition in the experimental runs but not in the control runs, the activation of these brain regions could be attributable to eye-based attention.”
  
  (b) Experiment 3: the rationale for experiment one should be straightforward, without a long premise explaining why it would not be necessary.
  
  Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have streamlined the lengthy premise explaining to make the rationale of Experiment 3 more straightforward (see page 15 last two paragraphs or below):
  
  “The results of Experiment 2 support the notion that eye-based attention was the cause for attention-induced ocular dominance plasticity. However, an alternative account is that the significant two-way interaction between test phase and stimulation site did not stem from any persistent malfunction of FEF in modulating ocular dominance, but rather it was due to some abnormality of binocular rivalry measures in the post-test that occurred after stimulation at the FEF only (and not at the other two brain sites). For instance, stimulation at the FEF might simply reduce the ODI measured in the binocular rivalry post-test.
  
  Therefore, we conducted Experiment 3 to examine how suppression of the three target sites would impact binocular rivalry performance, in case that any unknown confounding factors, which were unrelated to adaptation but related to binocular rivalry measures, contributed to the results.”
  
  (c) Discussion: the language is a bit familiar here and there, a more straightforward style should be preferred (one example: p.19 second paragraph).
  
  Response: Thanks for the reviewer’s suggestion! We have carefully revised the language in the discussion. The discussion following the example paragraph has been largely rewritten.
  
  (5) Minor: the authors might consider using the term "participant" or "observer" instead of "subject" when referring to the volunteers who participated in the study.
  
  Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have replaced the term “subject” with “participant”.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This study studied the neural mechanisms underlying the shift of ocular dominance induced by "dichoptic-backward-movie" adaptation. The study is self-consistent.
  
  Strengths:
  
  The experimental design is solid and progressive (relationship among three studies), and all of the raised research questions were well answered.
  
  The logic behind the neural mechanisms is solid.
  
  The findings regarding the cTMS (especially the position/site can be useful for future medical implications).
  
  Weaknesses:
  
  Why does the "dichoptic-backward-movie" adaptation matter? This part is severely missing. This kind of adaptation is neither intuitive like the classical (Gbison) visual adaptation, nor practical as adaptation as a research paradigm as well as the fundamental neural mechanism. If this part is not clearly stated and discussed, this study is just self-consistent in terms of its own research question. There are tons of "cool" phenomena in which the neural mechanisms are apparent as "FEF controls vision-attention" but never tested using TMS & fMRI, but we all know that this kind of research is just of incremental implications.
  
  Response: Thanks for the reviewer’s comment! We designed the "dichoptic-backward-movie" adaptation to study the perceptual consequence and mechanisms of sustained attention to a monocular pathway. Since the overall visual input to both eyes during adaptation were identical, any effect (i.e. the change of ocular dominance in our study) after adaptation can be easily ascribed to unbalanced eye-based attention between the two eyes rather than unbalanced input energy across the eyes. In typical short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is undoubtedly distributed to the non-deprived eye. The fact that in a short-term monocular deprivation paradigm the deprived eye is also the unattended eye prevents researchers from ascertaining whether unbalanced eye-based attentional allocation contributes to the shift of ocular dominance just like unbalanced visual input across the two eyes. That is why the “dichoptic-backward-movie” adaptation was adopted in the present study. This new paradigm balances the input energy across the eyes but leaves attention unbalanced across the eyes. In the revised manuscript, we have added the description of the “dichoptic-backward-movie” adaptation (see page 3 last paragraph and page 4 first paragraph or below). Hope this complementary information improves the clarity.
  
  “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).” In short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is biased towards the non-deprived eye. However, it is difficult to tease apart the potential contribution of unbalanced eye-based attention from the consequence of the unbalanced input energy, as the deprived eye is also the unattended eye. Therefore, the advantage of the “dichoptic-backward-movie” adaptation paradigm is to balance the input energy across the eyes but leave attention unbalanced across the eyes.
  
  Our previous work (Song et al., 2023) has shown that eye-based attention plays a role in the formation of ocular dominance shift following adaptation to dichoptic backward movie. However, because the “dichoptic-backward-movie” adaptation paradigm is new, to our knowledge, no literature has ever discovered the brain areas that are responsible for eye-based attention. Our fMRI experiment for the first time resolves this issue, which, we believe, is one of the novelties of the present study. Attention is a pretty general definition of our ability to select limited information for preferential or privileged processing, yet it includes numerous aspects (e.g. spatial attention for spatial locations, feature-based attention for visual features, object-based attention for objects, social attention for social cues, and eye-based attention for monocular pathways etc). Are we 100% sure that the same brain network always underlies every aspect of attention including eye-based attention? No test, no answer. Maybe the answer is Yes, but we are not aware of any evidence for that from literature. It is not unlikely that attention is like an elephant while researchers are like blind people touching the elephant from different angles. Even if all previous researchers have touched the side of the elephant and state that an elephant is no different from a wall, as long as one researcher grabs the elephant’s tail, the “wall” knowledge will be falsified. From this perspective of the essence of science (falsifiable), we have the confidence to say that our fMRI experiment on eye-based attention is novel, because to our knowledge our experiment is the first one to explore the issue. On the basis of the fMRI experiment (otherwise we would have no idea on which precise brain site to apply the cTBS), we could successfully complete the subsequent TMS experiments.
  
  Of course, if the reviewer can kindly point out any previous neuroimaging work we missed that has already disclosed the neural mechanisms underlying human’s eye-based attention, we would truly appreciate the reviewer very much. But even so, we would like to emphasize that the purpose of the current study was actually not to use TMS & fMRI to confirm that “FEF controls visual attention”. As we mentioned in the Abstract and expanded the introduction in the last two paragraphs of Introduction, the goal of the TMS experiments is to examine the causal role of eye-based attention in producing the aftereffect of “dichoptic-backward-movie” adaptation. This research question is also new, thus we do not think the TMS experiments are incremental, either. Our findings provided direct causal evidence for the effect of FEF on modulating ocular dominance through eye-based attention. Please see the last two sentences in the first paragraph on page 20 in the revised manuscript or below,
  
  “Interestingly, in our Experiment 2 this aftereffect was significantly attenuated after we temporarily inhibited the cortical function of FEF via cTBS. This finding indicates the crucial role of FEF in the formation of attention-induced ocular dominance shift.”
  
  as well as the last sentence of the Abstract,
  
  “…and in this network, FEF plays a crucial causal role in generating the attention-induced ocular dominance shift.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The hemispheric asymmetry in the eye-based attention-related cortex should be further examined and discussed. For example, IPS in both hemispheres was identified in the fMRI experiment. It is not clear why only the right IPS was stimulated in the TMS experiment.
  
  Response: Thanks for the comment. We have elucidated the reasons for the experimental design with hemispheric asymmetry in FEF and IPS. Please see our response to the Weakness #1 raised by Reviewer #1 in the Public Review section.
  
  (2) It is known that the frontoparietal cortex plays a role in the contralateral shift of attentional allocation. Meanwhile, the latest stage of ocular-specific representation is V1. The authors should discuss how the eye-related function can be achieved in FEF.
  
  Response: Thanks for the comment. we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph in the revised manuscript, and our response to the Weakness #2 raised by Reviewer #1 in the Public Review section).
  
  (3) To further validate the role of FEF in eye-related attention shifts, the authors may consider using the traditional monocular deprivation paradigm with fMRI and TMS. It would be valuable to compare the neural mechanisms related to the classical monocular deprivation paradigm with the current findings.
  
  Response: Thanks for the reviewer’s suggestion! That is indeed an interesting research topic that we are currently exploring. The current study investigated the attention-induced ocular dominance shift with the “dichoptic-backward-movie-adaptation” paradigm. This paradigm is substantially different from traditional short-term monocular deprivation. In our Neuroscience Bulletin paper (Song et al. 2023), we discuss the reason as follows.
  
  “An alternative account of our results is the homeostatic plasticity mechanism. The function of this mechanism is to stabilize neuronal activity and prevent the neuronal system from becoming hyperactive or hypoactive. For this goal, the mechanism moves the neuronal system back toward its baseline after a perturbation [51, 52]. In our case, the aftereffect can be explained such that the visual system boosts the signals from the unattended eye to maintain the balance of the network’s excitability. However, this account cannot easily explain why the change of neural ocular dominance led by prolonged eye-based attention was observed here using the binocular rivalry testing stimuli, but absent in the previous research using the binocularly fused stimuli [11]. In contrast, a recent SSVEP study also using the binocularly fused stimuli has successfully revealed a shift of neural ocular dominance after two hours of monocular deprivation [31], which is in line with the homeostatic plasticity account. Therefore, the mechanisms underlying the “dichoptic-backward-movie” adaptation and monocular deprivation are probably not fully overlapped with each other; and the binocular rivalry mechanism described in the ocular-opponency-neuron model seems to be more preferable than the homeostatic plasticity mechanism in accounting for the present findings.”
  
  Therefore, before asking whether FEF plays a role in the attention-induced ocular dominance shift in a traditional monocular deprivation paradigm, one should probably first examine whether attention also plays a role in traditional monocular deprivation, and whether the ocular-opponency-neuron adaptation account can also be used to explain the traditional monocular deprivation effect. Our newly accepted paper “Negligible contribution of adaptation of ocular opponency neurons to the effect of short-term monocular deprivation” (https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1282113/full) gives a generally negative answer to the second question. And as to the first question, we have one manuscript under review and another ongoing study. In other words, to get a satisfactory answer to this particular comment of this reviewer, we need to first obtain clear answers to the two above questions. We think this is far beyond the scope of one single manuscript.
  
  (4) The authors only presented regular movies to the dominant eye to maximize the ocular dominance shift. This critical information of design should be clarified, not only in the method section.
  
  Response: Thanks for the reviewer’s suggestion! In the Results section of Experiment 2, we have added a description of this critical information of design (see page 11 last paragraph to page 12 first paragraph or below):
  
  “Then, participants adapted to the “dichoptic-backward-movie” in which regular movie images were presented to the dominant eye to maximize the effect of eye dominance shift (Song et al., 2023). Meanwhile they were asked to detect some infrequent blob targets presented on the movie images in one eye at the same time.”
  
  (5) The frame rate of the movie is 30 fps, which is much lower than a typical 60 fps visual presentation, does this have an effect on the adaptation outcome?
  
  Response: To our best of knowledge, there is no evidence that the frame rate of the movie influences the aftereffect of attention-induced ocular dominance shift. In our previous research, the frame rate of the movie during adaptation was 25 fps, which still produced a stable adaptation aftereffect (Song et al., 2023). And the frame rate of the movie was 30 fps in our monocular deprivation work (Lyu et al., 2020), which showed a similar monocular deprivation effect we previously observed in an altered reality study (Bai et al., 2017). The frame rate of the altered-reality video in Bai et al.’s (2017) work was 60 fps. All these clues suggest that the frame rate does not have an effect on the adaptation outcome.
  
  (6) Figure 5: The ODSE derived from ODI in Experiment 3 should also be illustrated, for a better comparison with results from Experiment 2.
  
  Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have added the results of ODSE in Experiment 3 to Figure 5 (see page 15 or below):
  
  Author response image 1.
  
  Figure 5. The results of (A) the ocular dominance index (ODI), (B) the ocular dominance shift effects (ODSE) in Experiment 2, (C) the ODI and (D) the ODSE in Experiment 3. The bars show the grand average data for each condition. The individual data are plotted with gray lines or dots. The dashed gray line represents the absolute balance point for the two eyes (ODI = 0.5). Error bars indicate standard errors of means. * p < .05; ** p < .01; n.s. p > .05.
  
  (7) Spelling issues: "i.e." → "i.e.,"
  
  Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have changed “i.e.” to “i.e.,”.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Linked to weakness 3: Ideally, a control experiment with cTBS and dichoptic stimulation without sound but with the blob discrimination task should be performed to be able to make important claims about the neural mechanisms involved in eye-based attention.
  
  Response: Thanks for the comment. We have performed a new experiment as the reviewer suggested. Please see our response to the Weakness #3 raised by Reviewer #2 in the Public Review section.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) The neural mechanisms are so apparent. We all know the FEF\IPS\SC matter in vision and attention and gaze. This is not groundbreaking.
  
  Response: As we addressed in our response to Reviewer #3’s public comment, the current study aimed at investigating the causal mechanism for eye-based attentional modulation of ocular dominance plasticity rather than simply the role of FEF\IPS\SC in visual attention. Moreover, eye-based attention is a less investigated aspect of visual attention. The neural mechanism underlying eye-based attention is still largely unknown, and seeking the brain areas for controlling eye-based attention is the necessary preparation work for applying the cTBS. We have responded in detail to Reviewer #3’s public comment why we think both the fMRI and TMS experiments are novel to the field, which we will not reiterate it here to avoid redundancy.
  
  (2) Why does the "dichoptic-backward-movie" adaptation matter? Is playing a backward movie to one eye realistic? Does that follow the efficient coding? Is that a mere consequence of information theory?
  
  Response: Thanks for the comments. We have added the description of the “dichoptic-backward-movie” adaptation paradigm in the revised manuscript (see page 3 last paragraph and page 4 first paragraph or our response to this reviewer’s Public comment).
  
  Is it realistic to play backward movie to one eye? We feel this question is somehow ambiguous to us. If the reviewer means the technical operability for such stimulus presentation, we can assure it since we have used this paradigm in both the current and previously published studies. To be more specific, we made the video stimuli in advance. The left half of the video was the regular movie and the right half was the backward version of the same movie (or vice versa). When viewing such video stimuli through stereoscopes, participants could only see the left half of the video with the left eye and the right half of the video with the right eye. In other words, the regular movie and backward movie were viewed dichoptically. Alternatively, if the reviewer means that such dichoptic presentation rarely happens in real world thus not realistic, we agree with the reviewer on one hand. On the other hand, we have explained on page 3 last paragraph and page 4 first paragraph why it is a particular useful paradigm for the main purpose of the present study. Let us make a similar example. The phenomenon of binocular rivalry rarely happens in everyday life. So people may say binocular rivalry is not realistic. However, our visual system does have the ability to deal with such conflicting visual inputs across the eyes, even binocular rivalry is unrealistic! Sometimes it is fun to investigate those seemingly unrealistic functions of our brains since those may also reveal the mystery of our neural system. As we know, despite binocular rivalry is uncommon in daily life, it is frequently used to investigate awareness. And in our work, we use binocular rivalry to measure perceptual ocular dominance.
  
  Finally, the reviewer queried about if the "dichoptic-backward-movie" adaptation paradigm follow efficient coding and information theory. The information theory and efficient coding assume that messages with low expectedness or of rare occurrence would attract more attention and induce larger neural responses than those with high expectedness. In the "dichoptic-backward-movie" adaptation paradigm, the backward movie should be less expected since the actions of the characters in the backward movie appeared illogical. Thus, according to the information theory and efficient coding, it would be expected that more attention was paid to the backward movie and thus the backward movie might dominate the awareness for a longer period during adaptation (Zhang et al., 2012). However, we instructed participants to follow the regular movie during adaptation. The results of blob detection task also showed a better task performance when the targets appeared in the eye presented with the regular movie, which contradicted with the prediction of the information theory and efficient coding. Thus, it seems not very likely that the "dichoptic-backward-movie" adaptation followed efficient coding and information theory.
  
  References
  
  Bai, J., Dong, X., He, S., & Bao, M. (2017). Monocular deprivation of Fourier phase information boosts the deprived eye’s dominance during interocular competition but not interocular phase combination. Neuroscience, 352, 122-130. https://doi.org/10.1016/j.neuroscience.2017.03.053
  
  Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1), 289-300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
  
  Choe, E., & Kim, M.-S. (2022). Eye-specific attentional bias driven by selection history. Psychonomic Bulletin & Review, 29(6), 2155-2166. https://doi.org/10.3758/s13423-022-02121-0
  
  Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215. https://doi.org/10.1038/nrn755
  
  Dong, X., Gao, Y., Lv, L., & Bao, M. (2016). Habituation of visual adaptation. Sci Rep, 6, 19152. https://doi.org/10.1038/srep19152
  
  Duecker, F., Formisano, E., & Sack, A. T. (2013). Hemispheric differences in the voluntary control of spatial attention: direct evidence for a right-hemispheric dominance within frontal cortex. Journal of Cognitive Neuroscience, 25(8), 1332-1342. https://doi.org/10.1162/jocn_a_00402
  
  Esterman, M., Liu, G., Okabe, H., Reagan, A., Thai, M., & DeGutis, J. (2015). Frontal eye field involvement in sustaining visual attention: evidence from transcranial magnetic stimulation. Neuroimage, 111, 542-548. https://doi.org/10.1016/j.neuroimage.2015.01.044
  
  Gallotto, S., Schuhmann, T., Duecker, F., Middag-van Spanje, M., de Graaf, T. A., & Sack, A. T. (2022). Concurrent frontal and parietal network TMS for modulating attention. iScience, 25(3), 103962. https://doi.org/10.1016/j.isci.2022.103962
  
  Lega, C., Ferrante, O., Marini, F., Santandrea, E., Cattaneo, L., & Chelazzi, L. (2019). Probing the neural mechanisms for distractor filtering and their history-contingent modulation by means of TMS. Journal of Neuroscience, 39(38), 7591-7603. https://doi.org/10.1523/JNEUROSCI.2740-18.2019
  
  Lunghi, C., Burr, D. C., & Morrone, C. (2011). Brief periods of monocular deprivation disrupt ocular balance in human adult visual cortex. Curr Biol, 21(14), R538-539. https://doi.org/10.1016/j.cub.2011.06.004
  
  Lyu, L., He, S., Jiang, Y., Engel, S. A., & Bao, M. (2020). Natural-scene-based Steady-state Visual Evoked Potentials Reveal Effects of Short-term Monocular Deprivation. Neuroscience, 435, 10-21. https://doi.org/10.1016/j.neuroscience.2020.03.039
  
  Mayrhofer, H. C., Duecker, F., van de Ven, V., Jacobs, H. I., & Sack, A. T. (2019). Hemifield-specific correlations between cue-related blood oxygen level dependent activity in bilateral nodes of the dorsal attention network and attentional benefits in a spatial orienting paradigm. Journal of Cognitive Neuroscience, 31(5), 625-638. https://doi.org/10.1162/jocn_a_01338
  
  Rezec, A., Krekelberg, B., & Dobkins, K. R. (2004). Attention enhances adaptability: evidence from motion adaptation experiments. Vision Res, 44(26), 3035-3044. https://doi.org/10.1016/j.visres.2004.07.020
  
  Sack, A. T. (2010). Using non-invasive brain interference as a tool for mimicking spatial neglect in healthy volunteers. Restorative neurology and neuroscience, 28(4), 485-497. https://doi.org/10.3233/RNN-2010-0568
  
  Said, C. P., & Heeger, D. J. (2013). A model of binocular rivalry and cross-orientation suppression. PLoS computational biology, 9(3), e1002991. https://doi.org/10.1371/journal.pcbi.1002991
  
  Song, F., Lyu, L., Zhao, J., & Bao, M. (2023). The role of eye-specific attention in ocular dominance plasticity. Cerebral Cortex, 33(4), 983-996. https://doi.org/10.1093/cercor/bhac116
  
  van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2023). Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP. Advances in Methods and Practices in Psychological Science, 6(2), 25152459231168024. https://doi.org/10.1177/25152459231168024
  
  van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š., Ly, A., Marsman, M., Matzke, D., Gupta, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E. J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5
  
  Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Selker, R., Gronau, Q. F., Dropmann, D., Boutin, B., Meerhoff, F., Knight, P., Raj, A., van Kesteren, E. J., van Doorn, J., Šmíra, M., Epskamp, S., Etz, A., Matzke, D., de Jong, T., van den Bergh, D., Sarafoglou, A., Steingroever, H., Derks, K., Rouder, J. N., & Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1323-7
  
  Watanabe, M., Cheng, K., Murayama, Y., Ueno, K., Asamizuya, T., Tanaka, K., & Logothetis, N. (2011). Attention but not awareness modulates the BOLD signal in the human V1 during binocular suppression. Science, 334(6057), 829-831. https://doi.org/10.1126/science.1203161
  
  Wong, S. P., Baldwin, A. S., Hess, R. F., & Mullen, K. T. (2021). Shifting eye balance using monocularly directed attention in normal vision. J Vis, 21(5), 4. https://doi.org/10.1167/jov.21.5.4
  
  Yuval-Greenberg, S., & Heeger, D. J. (2013). Continuous flash suppression modulates cortical activity in early visual cortex. J Neurosci, 33(23), 9635-9643. https://doi.org/10.1523/jneurosci.4612-12.2013
  
  Zhang, P., Jiang, Y., & He, S. (2012). Voluntary attention modulates processing of eye-specific visual information. Psychol Sci, 23(3), 254-260. https://doi.org/10.1177/0956797611424289
  
  Zhou, J., Reynaud, A., & Hess, R. F. (2014). Real-time modulation of perceptual eye dominance in humans. Proc Biol Sci, 281(1795). https://doi.org/10.1098/rspb.2014.1717
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.10.561439v2
www.biorxiv.org www.biorxiv.org

A cryo-ET study of ciliary rootlet organization

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public reviews):
  
  Summary:
  
  Ciliary rootlet is a structure associated with the ciliary basal body (centriole) with beautiful striation observed by electron microscopy. It has been known for more than a century, but its function and protein arrangement are still unknown. This work reconstructed the near-atomic resolution 3D structure of the rootlet using cryo-electron tomography, discovered a number of interesting filamentous structures inside, and built a molecular model of the rootlet.
  
  Strengths:
  
  The authors exploited the currently possible ability of cryo-ET and used it appropriately to describe the 3D structure of the rootlet. They carefully conducted subtomogram averaging and classification, which enabled an unprecedented detailed view of this structure. The dual use of (nearly) intact rootlets from cilia and extracted (demembraned) rootlets enabled them to describe with confidence how D1/D2/A bands form periodic structures and cross with longitudinal filaments, which are likely coiled-coil.
  
  Weaknesses:
  
  Some more clarifications are needed. This reviewer believes that the authors can address them.
  
  Reviewer #1 (Recommendations for the authors):
  
  Recommendation 1: According to Fig.1B, the rootlet was mechanically pulled out from the visual cell for a long distance by vortexing. Is there no artifact? Can the authors comment on it by referring to old literature, for example, with EM of resin-embedded and sectioned basal bodies?
  
  Response: A previous study (Gilliam et al., 2012) compared cryoET of purified rootlets with resinembedded ultrathin sections of mouse eyecups. They reported no changes in striation repeat or rootlet morphology suggesting there is no artifact of purification. Our rootlet data are consistent with that of Gilliam, suggesting the tomograms we report are representative of rootlets prior to purification.
  
  We have clarified this in the text: pg 2: “As previously described (Gilliam et al., 2012), rootlet striation-repeat and morphology appear unaltered by the purification method. Moreover, …”
  
  Recommendation 2: Fig.1F: It is not clear how to distinguish striation-membrane joints indicated by grey and white arrows. It seems relatively straight striation is indicated by a white arrow, while in the case of the bulky feature it is shown by a grey arrow (and the bulk is colored in blue). But there is no clear border between these features. How were they distinguished? Are they based on classification?
  
  Response: The membrane-associated densities (colored in blue) were assigned according to the TomoSeg neural network. It was trained on a small set of globular densities closely associated with a membrane. This training set included examples both close to and far away from the rootlet. We trained a separate network on recognizing rootlet striations. Both networks competed on assigning pixels in the tomogram as either striations or membrane-associated proteins. The different membrane connections were therefore defined by the probability within the TomoSeg network rather than classification.
  
  We clarified this in the main text: pg 3: “All the striations partially or fully spanned the width of the rootlet and extended beyond the outermost longitudinal filaments. These rootlet-protruding striation-densities frequently contacted the membrane (Fig 1E). Close examination suggested some make a direct contact, whereas others contact a subset of globular membrane-associated densities that are a striking feature of the tomograms. These densities are ~7 nm in diameter and cover almost every membrane surface. Where two membranes come into proximity, the intervening space is filled with two layers of these membrane-associated proteins, one layer associated with each membrane (Fig 1C, S1A, blue arrowheads). We trained a TomoSeg neural network to assign these densities and let this network compete with one that assigned striations. This resulted in a final segmentation with membrane-associated densities indicated in blue and striations in yellow (Fig 1E, F and S1D–F).”
  
  We also clarified this in the methods:
  
  pg 12/13: “The tomograms were then preprocessed in EMAN2.2 for training of the TomoSeg CNN (Chen et al., 2017). Here, the features (filaments, D-bands, A-bands, gold fiducials, actin, membranes, membrane-associated densities and ice contaminations) were individually trained. Segmented maps were allowed to compete for the assignment of pixels in the tomograms, cleaned up in Amira (Thermo Fisher Scientific), and converted to object files. The object files and corresponding tomograms were displayed in ChimeraX (Pettersen et al., 2021). Assignment of direct and indirect striation-membrane connections was done manually by assessing whether TomoSeg-segmented striations and membranes were connected directly or via membrane-associated densities. The automated segmentation of amorphous striations picked up mostly dense amorphous features. The fainter densities that we observed to laterally connect the amorphous features were manually drawn by dotted lines.”
  
  Recommendation 3: p.3 "All the striations partially or fully spanned the width of the rootlet before protruding from its surface." This reviewer would read the last part of this sentence as "before protruding from the surface of the rootlet membrane toward inside". Is this correct?
  
  Response: This was not what we had intended to imply.
  
  We have changed this sentence in the text to avoid confusion: pg 3: “All the striations partially or fully spanned the width of the rootlet and extended beyond the outermost longitudinal filaments. These rootlet-protruding striation-densities frequently contacted the membrane (Fig 1E).”
  
  Recommendation 4: Same for p.4 "The protrusions from the rootlets were flexible". This means the protrusions from the membrane if this reviewer understands correctly.
  
  We also clarified this sentence in the text: pg 4: “The proteinaceous protrusions that extended from the rootlets were flexible and did not induce a regular spacing in the membrane-associated proteins they contacted (Fig 1F, S1D–F).”
  
  Recommendation 5: p.4 "Due to the thickness of the sample and the presence of membranes": How thick is the typical sample?
  
  Response: We typically collected data on samples thicker than 300nm. We initially tried making thinner samples, for better contrast, but observed this led to sample disruption. We changed “sample” to “ice” to clarify that we refer to the prepared sample and not the biological object.
  
  Changes in text:
  
  pg 4: “Due to the ice-thickness and the presence of membranes, the tomograms had limited contrast.”
  
  Recommendation 6: p.4 "We were also able to see these bands with cryo-ET." It would be nice if the comparison between tomograms of the native and purified rootlets was done. This reviewer could not get where the D1/D2/A bands are in Fig.1E.
  
  Response: Due to the noise in the native tomograms it is difficult to see the regular striation pattern in Fig 1E. However, we see it better when we project the native rootlet onto a single image. We added the projection image, the corresponding fourier transform, and repeat measurements to the supplement (Fig S1B, C). We updated all figure references in the text.
  
  We updated the text accordingly:
  
  pg 4: “We were also able to see these bands with cryo-ET. The striations in the purified rootlets appeared more ordered and clearer than in the cellular tomograms due to the improved contrast. In the cellular rootlets, we identified the bands in a tomogram projection (Fig S1B), with an average distance of 79.52 ± 0.26 nm between each repeat (Fig S1C). The repeat distance for the purified rootlets is 80.1 ± 0.03 nm based on a sine fit to A and D-bands of 10 fourier-filtered tomogram projections (Fig 2D, Fig S2E–I).”
  
  We updated the figure legend of Fig S1:
  
  pg 18: “(B) Projection image of a 53 nm thick slice through the tomogram and the corresponding Fast Fourier Transform (FFT). Measured frequencies are indicated with red lines. (C) Quantification of the distance measured between pairs of discrete striations. (D–F) …”
  
  Recommendation 7: Fig.2E-I: Could the authors explain how these bands were tracked? It is very difficult for this reviewer to trace, for example, the A-band in Fig.2g.
  
  Response: We trained the neural network of TomoSeg to pick up discrete and amorphous striations. The Tomoseg segmentation of the amorphous striations often only picked up dense features marked in green. However, we could see densities by eye in the tomograms that connect these dense features.
  
  These connecting densities were manually drawn with a dotted line.
  
  We clarified this in the methods:
  
  pg 13: “The automated segmentation of amorphous striations picked up mostly dense amorphous features. The fainter densities that we observed to laterally connect the amorphous features were manually drawn by dotted lines.”
  
  We also changed the figure legend of Fig2:
  
  pg 5: “(F,G,I) fainter features not picked up by the automated segmentation were drawn with dotted lines.”
  
  Recommendation 8: Fig.2: The caption of Fig.2I is missing.
  
  We have edited the legend of Fig 2 to include this caption: pg 5: “(I) Segmentation that shows amorphous features occur as two bands and connect to the rootlet surface densities.”
  
  Recommendation 9: p.6 "Additionally, the surface densities show evidence of connecting to the A-bands (Fig 2I and S3I)." Does the author mean Fig.2J and S3I?
  
  Response: This is most clearly visible in figure 2I and S3I (S3J after revisions), but it is also visible in 2J.
  
  We therefore edited this figure reference:
  
  pg 6: (Fig 2I, J and S3J)
  
  Recommendation 10: p.8 "The metazoan rootlet is a cilium-associated fiber that is characterized by regular cross-striations." In this reviewer's memory, Tetrahymena also has a rootlet. Are they different in structure?
  
  Response: Tetrahymena and other protists have striated rootlets (known as kinetodesmal fibres or System-I fibres), that are classified as being different from mammalian rootlets (Andersen et al., 1991). Tetrahymena rootlets have a 32 nm repeat (Munn, 1970), which is less than half of the 80 nm repeat observed for mammalian rootlets. While the protein composition of Tetrahymena rootlets is unknown, a 250 kDa protein was proposed to be their main component (Williams et al., 1979). Tetrahymena rootlet proteins were proposed to span a minimum of 4-5 striation repeats, based on early thin-sectioning EM (Munn, 1970), while we show that rootletin predictions span at most ~3.3 repeats in mammalian rootlets. Since the early proposal of Tetrahymena rootlet protein organisation, more components have been identified: DisAp (Galati et al., 2014) with a predicted length of ~37 nm (0.15 nm/residue), and proteins of 170 kDa that cross react with the Naegleria Gruberi major rootlet component (Dingle & Larson, 1981). Thus, the available data suggest that Tetrahymena rootlets are different in structure from mammalian ones.
  
  Reviewer #2 (Public reviews):
  
  Summary:
  
  This work performs structural analysis on isolated or purified rootlets.
  
  Strengths:
  
  To date, most studies of this cellular assembly have been from fluorescence microscopy, conventional TEM methods, or through biochemical analysis of constituents. It is clearly a challenging target for structural analysis due to its complexity and heterogeneity. The authors combine observations from cryo-electron tomograms, automated segmentations, subtomogram averaging, and previous data from the literature to present an overall model of how the rootlet is organised.
  
  Their model will serve as a jumping-off point for future studies, and as such it is something of considerable value and interest.
  
  Weaknesses:
  
  It is speculative but is presented as such, and is well-reasoned, plausible, and thorough.
  
  Reviewer #2 (Recommendations for the authors):
  
  Recommendation 1: My suggestions to improve the manuscript lie in some of the technical details:
  
  The subtomogram averaging methods are overly brief - I am not convinced that someone could replicate the process from the text in the methods (and results sections).
  
  We have now extended our description of the subtomogram averaging methods:
  
  pg 13: “For particle picking, the tomograms were deconvolved using the TOM package (Tegunov & Cramer, 2019). Dynamo was used for particle extraction using the Dynamo surface model (Castaño-Díez et al., 2012, 2017): Each D2 band was traced in multiple slices per rootlet to define dynamo surfaces. Surface triangulation was set to result in extraction coordinates approximately 4 times the number of expected filaments. The coordinates were extracted as a Dynamo table that was subsequently converted to the motl-format using subTOM scripts, available at https://github.com/DustinMorado/subTOM/ (Leneva et al., 2021). Particles were extracted from tomograms reconstructed using novaCTF (Turoňová et al., 2017).
  
  An initial reference was obtained by in-plane randomizing and averaging all particles prior to alignments. Initial alignments were performed to centre filaments, by using a 10 nm wide cylindrical mask, limited to 4 nm shifts in X and Y with respect to the reference orientation, A spherical mask with large diameter was used for alignments the D-bands, these alignments were restricted to the reference Z direction. Cluster- and careful per-tomogram cross-correlation cleaning were applied to remove particle duplicates, particles with no filaments, and particles with disordered D-bands. This resulted in a cleaned particle dataset.
  
  Prior to classification in subTOM, alignments with limited X/Y/Z shifts and increasingly finer in-plane rotations were performed. 20 eigenvolumes were generated by K-means classification over 20 eigenvectors. The eigenvolumes and particles clustered per eigenvector were assessed to identify which vectors described the missing wedge or structural features (Leneva et al., 2021). The structural eigenvectors were used to cluster particles into the final class averages that described particle heterogeneity.
  
  For the final subtomogram class-average that contained the twist, the cleaned particle dataset motl was converted to a STAR file compatible with RELION 4.0 alpha (Zivanov et al., 2022). Gold beads were removed from the preprocessed tomogram frames by converting the aligned tomogram gold coordinates initially obtained by Etomo bead-finder during preprocessing steps (Kremer et al., 1996). Particles were then extracted in RELION 4.0 alpha. The initial reference was an inplane randomized average of the cleaned particle dataset. Instead of refinement, which resulted in anisotropic structures due to a lack of features for the alignment, we used simultaneous alignment and classification. We restricted the alignments to full inplane rotations with respect to the reference Z-axis.”
  
  Recommendation 2: I find it difficult to assess the quality of the final subtomogram averages as presented in the manuscript. One potential worry is the fact that the authors state that nothing is visible outside the mask, which can be a sign of overfitting (though, as the authors state, can just be a sign of heterogeneity). I would suggest that the authors include FSC curves, as well as 2D slices through the unmasked subtomogram averages - it is easier to judge the impact of the mask when viewing it this way and not at the isosurface.
  
  Response: We understand the reviewer’s concern for overfitting and masking. To clarify our approach, the class averages we show in Fig3G and FigS5C are the result of simultaneous classification with alignment and not a gold-standard refined average. The classification does not produce an FSC since it does not work with half sets. We initially tried a refinement approach, but the filaments did not have enough features to align and resulted in anisotropic structures. The FSC of such a refinement is shown below. However, because of the anisotropy, we did not include these structures or FSCs in the manuscript and we make no claims about the resolution.
  
  Author response image 1.
  
  Instead, we presented the data from simultaneous classification with alignment which revealed the twist in the filament. Like the reviewer, we were initially concerned that the filament twist could be an artefact of the narrow masks and reference we used. However, we only used rotationally symmetric references and masks that do not contain any features. We therefore, realized this asymmetric twistfeature could not have arisen from imposed alignment regiments, reference biases or overfitting.
  
  To make our approach clearer, we have updated the main text:
  
  pg 8: “To ensure unbiased alignment of any coiled-coil features we generated a smooth reference by randomizing the inplane rotational orientation of the particles (Fig S5B). Initial refinement of the data resulted in an anisotropic structure since the filaments did not have enough features to align to. Therefore, we performed classification with alignment in RELION 4.0 alpha (Zivanov et al., 2022), and used a narrow 3.3 nm-wide mask with a smooth edge up to 7.7 nm (Fig S5B). This was the narrowest mask that still resulted in an isotropic structure and revealed features that were absent in the smooth reference. The resulting class averages contained a twist along the filament length in classes 2, 3 and 4 but most prominently in class 5 (Fig S5C). Class 5 contained a filament of 2 nm thick by 5 nm wide with a groove along its length (Fig 3G).”
  
  We also clarified this in the methods:
  
  pg 13: “The initial reference was an inplane randomized average of the cleaned particle dataset. Instead of refinement, which resulted in anisotropic structures due to a lack of features for the alignment, we used simultaneous alignment and classification. We restricted the alignments to full inplane rotations with respect to the reference Z-axis.”
  
  Recommendation 3: The authors should include the version of Alphafold that they used to perform the structural predictions. Predictions, especially for multimers, have improved in the newest version, and it could be expected that further improvements will occur in the future. Including the version used here will act as a timestamp.
  
  We have now updated the methods to include the version:
  
  pg 14: “Alpha fold predictions of 300 AA long dimer fragments with 50 AA overlap were generated using colabfold 4 that uses a modified version of alphaFold2. To run the large number of sequences we used a customized script called alphascreen (version 1.15) available at https://github.com/samichaaban/alphascreen.”
  
  Recommendation 4: Figure 2G is not so clear in depicting two offset D bands. The authors could include a more zoomed-out image to make it clearer.
  
  Response: We have now included a more zoomed out image in the supplement (Fig S3A).
  
  We updated the figure legend of Fig 2G and Fig S3A: pg 5: “(G) Example where D1 aligns with D2 of a neighboring sub-fiber. Larger view in Fig S3A.”
  
  pg 20: “(A) Tomogram slice and segmentation where D1 aligns with D2 of a neighboring sub-fiber. The dotted square marks the location of Fig 2G. (B)”
  
  Recommendation 5: Did the authors attempt to predict the structure of rootletin oligomers? i.e. folding four rootletin fragments at once instead of two? This could be interesting.
  
  Response: We attempted to predict interactions between all combinations of rootletin fragments. We did this for two fragment (e.g. CC1+CC1 or CC1+CC2) and four fragment (e.g. CC1+CC1+CC1+CC1 or CC1+CC1+CC2+CC2) combinations.
  
  Homodimer combinations (e.g. CC1+CC1) were predicted with most confidence. We did not identify any higher oligomerization. AlphaFold did not identify interactions that were previously proposed in the literature–for example between two CC3 dimers (Ko et al., 2020) or weak interactions between CC2 and CC3 (Yang et al., 2002). These interactions were either not properly predicted or may require additional proteins other than the ones we tested (CCDC102B, CEP68, beta-catenin, ARL2, centlein).
  
  We have updated our methods to include our AlphaFold attempts:
  
  Pg 14: “This setup was used to predict interactions for dimeric and oligomeric combinations of rootletin fragments (e.g. CC2+CC2, CC3+CC4, CC1+CC1+CC1+CC1, CC3+CC3+CC4+CC4 etc). Homodimeric and oligomeric combinations were tested with other proteins identified as putative rootletin-binding: CCDC102B, CEP68, beta-catenin, ARL2, centlein. In our hands, only homodimeric rootletin fragment combinations resulted in confident predictions.”
  
  Reviewer #3 (Public reviews):
  
  Summary:
  
  The study offers a compelling molecular model for the organization of rootlets, a critical organelle that links cilia to the basal body. Striations have been observed in rootlets, but their assembly, composition, and function remain unknown. While previous research has explored rootlet structure and organization, this study delivers an unprecedented level of resolution, valuable to the centrosome and cilia field. The authors isolated rootlets from mice's eyes. They apply EM to partially purified rootlets (first negative stain, then cryoET). From these micrographs, they observed striations along the membranes along the rootlet but no regular spacing was observed.
  
  The thickness of the sample and membranes prevented good contrast in the tomograms. Thus they further purified the rootlets using detergent, which allowed them to obtain cryoET micrographs of the rootlets with greater details. The tomograms were segmented and further processed to improve the features of the rootlet structures. From their analysis, they described 3 regular cross-striations and amorphous densities, which are connected perpendicularly to filaments along the length of the rootlets. They propose that various proteins provide the striations and rootletin (mouse homolog of human cnap1) forms parallel coiled coils that run along the rootlet. Overall their data provide a detailed model for the molecular organization of the rootlet.
  
  The major strength is that this high-quality study uses state-of-the-art cryo-electron tomography, subtomogram averaging, and image analysis to provide a model of the molecular organization of rootlets. The micrographs are exceptional, with excellent contrast and details, which also implies the sample preparation was well optimized to provide excellent samples for cryo-ET. The manuscript is also clear and accessible.
  
  To further validate their model, it would have been useful to identify some components in the EM maps through complementary approaches (mass spectrometry, mutants disrupting certain features, CLEM). Some potential candidates are mentioned in the discussion.
  
  This research marks a significant step forward in our understanding of rootlets' molecular organization.
  
  Response: We agree with the reviewer that it would be ideal to identify rootlet components in the EM densities using complementary approaches. Prior to submitting the manuscript, we attempted several approaches, the details of which are described below:
  
  We performed mass spectrometry on our purified rootlets. This identified the rootlet components rootletin and CCDC102B and various axonemal components, due to the association between the rootlet and axoneme. However, due to the limitations in quantifying components using mass spectrometry, we were unable to confidently identify novel rootlet constituents present in quantities comparable to rootletin.
  
  We further attempted cross-linking mass spectrometry on the rootlets to gain deeper insights to the interactions between rootletin molecules. Unfortunately, this effort resulted in a completely insoluble sample despite extended digestion times, leading to issues with mass spectrometry column clogging and rendering our results inconclusive.
  
  We attempted to express rootlet components recombinantly and were able to purify fibres, but they did not contain the characteristic repeat pattern seen in native rootlets. We also considered purifying native rootlets from cultured cells, but we were unable to obtain sufficient sample for cryoET imaging.
  
  We therefore regret that other approaches to validate our model are outside the scope of this current work.
  
  Reviewer #3 (Recommendations for the authors):
  
  Recommendation 1: There are some problems with spaces in references in the methods.
  
  Response: We have thoroughly checked the methods and manuscript for double spaces and corrected this.
  
  Recommendation 2: Figure 1A, the figure would benefit from more labelling, to show the reader the basal body and nucleus.
  
  Response: We have now added the labels "basal bodies" and "Nucleus" to the cartoon in Fig 1A.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.03.556114v2
www.biorxiv.org www.biorxiv.org

New submission 13/10/2023, 09:31:58

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study has uncovered some important initial findings about how certain extracellular vehicles (EVs) from the mother might impact the energy usage of an embryo. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The study's title might be a bit too assertive as the evidence linking maternal mtDNA transmission to changes in embryo energy use is still correlative.
  
  We would like to express our sincere gratitude to the editors and reviewers for their invaluable comments on this work. Their feedback has been instrumental in enhancing the quality of our manuscript; we have incorporated their suggestions to the best of our abilities.
  
  Reviewer #1 (Public Review):
  
  Q1. Bolumar et al. isolated and characterized EV subpopulations, apoptotic bodies (AB), Microvesicles (MV), and Exosomes (EXO), from endometrial fluid through the female menstrual cycle. By performing DNA sequencing, they found the MVs contain more specific DNA sequences than other EVs, and specifically, more mtDNA were encapsulated in MVs. They also found a reduction of mtDNA content in the human endometrium at the receptive and post-receptive period that is associated with an increase in mitophagy activity in the cells, and a higher mtDNA content in the secreted MVs was found at the same time. Last, they demonstrated that the endometrial Ishikawa cell-derived EVs could be taken by the mouse embryos and resulted in altered embryo metabolism.
  
  This is a very interesting study and is the first one demonstrating the direct transmission of maternal mtDNA to embryos through EVs.
  
  A1. Thank you for your kind comments.
  
  Reviewer #2 (Public Review):
  
  Q2. In Bolumar, Moncayo-Arlandi et al. the authors explore whether endometrium-derived extracellular vesicles contribute mtDNA to embryos and therefore influence embryo metabolism and respiration. The manuscript combines techniques for isolating different populations of extracellular vesicles, DNA sequencing, embryo culture, and respiration assays performed on human endometrial samples and mouse embryos.
  
  Vesicle isolation is technically difficult and therefore collection from human samples is commendable. Also, the influence of maternally derived mtDNA on the bioenergetics of embryos is unknown and therefore novel. However, several experiments presented in the manuscript fail to reach statistical significance, likely due to the small sample sizes. Additionally, the experiments do not demonstrate a direct effect of mtDNA transfer on embryo bioenergetics. This has the unfortunate consequence of making several of the authors' conclusions speculative.
  
  In my opinion the manuscript supports the following of the authors' claims:
  
  1) Different amounts of mtDNA are shed in human endometrial extracellular vesicles during different phases of the menstrual cycle
  
  2) Endometrial microvesicles are more enriched for mitochondrial DNA sequences compared to other types of microvesicles present in the human samples
  
  3) Fluorescently labelled DNA from extracellular vesicles derived from an endometrial adenocarcinoma cell line can be incorporated into hatched mouse embryos.
  
  4) Culture of mouse embryos with endometrial extracellular vesicles can influence embryo respiration and the effect is greater when cultured with isolated exosomes compared to other isolated microvesicles
  
  A2. Thank you for your detailed feedback. We have made every effort to enhance the manuscript in this revised version, ensuring that our conclusions are grounded in solid evidence and that they avoid any speculation.
  
  My main concerns with the manuscript:
  
  Q3. The authors demonstrate that microvesicles contain the most mtDNA, however, they also demonstrate that only isolated exosomes influence embryo respiration. These are two separate populations of extracellular vesicles.
  
  A3. This manuscript focuses on the DNA content secreted by the endometrium and captured by the embryo. We identified both mitochondrial DNA and genomic DNA. We have found that mitochondrial DNA is predominantly secreted and encapsulated within microvesicles, while all three types of vesicles encapsulate genomic DNA. Specifically, based on the results we presented in Response A8 to the reviewers and included in the latest version of the manuscript, we observed that exosomes contain the highest amount of genomic DNA. Furthermore, exosomes have the greatest impact on embryo bioenergetics, suggesting that this DNA content may primarily exert this effect. We have thoroughly revised the manuscript, focusing our message on DNA content.
  
  Q4. mtDNA is not specifically identified as being taken up by embryos only DNA.
  
  A4. We agree with the reviewer; as we mention in answer A9, EdU does not specifically label mitochondrial DNA. To solve this issue, we incubated a synthetic molecule of labeled mtDNA with embryos and analyzed mtDNA incorporation using confocal microscopy. We co-cultured hatched mouse embryos (3.5 days) with an ATP8 sequence conjugated with Biotin overnight at 37ºC and 5% CO2. We then permeabilized embryos, incubated them with Streptavidine-Cy3 for 45 min, and visualized the results using an SP8 confocal microscope (Leica). We observed mtDNA internalization by cells of the hatched embryos; please see new supplementary Figure 7 and lines 234-237 on page 9 and lines 583-592 M&M on page 21.
  
  Q5. The authors do not rule out that other components packaged in extracellular vesicles could be the factors influencing embryo metabolism.
  
  A5. The vesicular subtypes contain molecules beyond DNA, such as microRNAs, proteins, or lipids. Our laboratory has studied the transmission of vesicles and their relationship with their contents (particularly microRNAs) and their connection to maternal-fetal communication. In this study, we focused on genomic/mitochondrial DNA. We cannot exclude the possibility that other molecules may influence metabolism; this statement is already noted in the discussion section on lines 328-331 on page 12.
  
  Q6. Taken together, these concerns seem to contradict the implication of the title of the manuscript – the authors do not demonstrate that inheritance of maternal mtDNA has a direct causative effect on embryo metabolism.
  
  A6. We have modified the title to better align with the manuscript’s results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”
  
  Reviewer #1 (Recommendations for The Authors):
  
  Q7. Would it be possible to validate the mtDNA content and mitophagy activity in different periods using the Ishikawa cells?
  
  A7. Unfortunately, this validation cannot be achieved with in vitro cultures of cell lines, especially with a cell line such as the endometrial adenocarcinoma-derived Ishikawa cell line. While mimicking the menstrual cycle (as observed in Figure 3 of the manuscript) is entirely artificial, we believe that the statistically significant results obtained in human samples faithfully represent the biological processes involved. Using a cell line, in our opinion, would not provide us with novel information.
  
  Q8. Characterization of the EVs subpopulations from Ishikawa cells and direct evidence to show the EdU labeled DNA is contained in the EVs are necessary.
  
  A8. To address this concern, we designed a novel experiment. We cultured Ishikawa cells in the presence of Edu, isolated the three types of vesicles, and evaluated labeled DNA content by flow cytometry (as illustrated in Supplementary Figure 5). All three types of vesicles exhibited positive EdU-DNA labeling; notably, the exosomal fraction demonstrated substantially higher DNA content than the other vesicle populations. Please see new supplementary Figure 5 and lines 217-218 on page 9, and lines 576-582 of the M&M on pages 20-21.
  
  Q9. Would EdU incorporate into the genomic DNA or mitochondrial DNA?
  
  A9. EdU (5-ethynyl-2′-deoxyuridine) is a nucleoside analog of thymidine and becomes incorporated into DNA during active DNA synthesis. EdU labels all newly synthesized DNA, both genomic and mitochondrial; however, we cannot differentiate between them with this technique.
  
  Q10. It is difficult to assess whether the EV-derived DNA was taken by the TE or ICM without immunostaining of cell lineage markers in mouse embryos.
  
  A10. We did not aim to label the inner cell mass, as the vesicles primarily enter through trophectodermal cells. The images presented in Figure 4 and Supplementary Figure 5 depict trophectoderm cells.
  
  Q11. It is also valuable to perform co-staining of Mitotracker to show the co-localization of EdU labelled DNA and the mitochondrial.
  
  A11. Per the reviewer's suggestion, we conducted an experiment as described in the following text. We isolated MVs from the culture media of EdU-treated Ishikawa cells and co-incubated them with embryos overnight. The resulting images (See Author response image 1) show an embryo subjected to staining with EdU-tagged DNA labeled with Alexa Fluor 488 (green), Mitotracker Deep Red (red), and nuclei (blue). Detailed views of the embryo are presented in panels A and B. Notably, we observed co-localization of mitochondria and EdU-tagged DNA, as indicated by the white arrows. Despite this intriguing finding, we chose not to include these results in the initial version of the manuscript; however, if the editor deems it appropriate, we would be delighted to incorporate them into the final version. The experimental procedure for co-localization of EdU DNA-tagged with mitochondria involved the following steps: Mitotracker Deep Red FM (Thermo Fisher Scientific, M22426) was added to the embryo media at a final concentration of 200 nM, and the embryos were subsequently incubated for 45-60 minutes prior to fixation.
  
  Author response image 1.
  
  Co-localization of mitochondria and EdU-tagged DNA in mouse embryos. Representative micrograph of an embryo co-incubated with MVs isolated from the culture media of Ishikawa cells treated with EdU. EdU-tagged DNA was labeled with Alexa Fluro 488 (green). Mitotracker Deep Red (mitochondria; red) and nuclei (blue). A and B) magnified images of the embryo show detailed co-localization of mitochondria and EdU-tagged DNA (white arrows). Negative control) Embryos incubated with MVs isolated from control Ishikawa cells (without EdU incubation) and stained with the click-it reaction cocktail. A and B showed magnified images of the embryo. Notice the absence of EdU-Alexa Fluro 488 signals (green).
  
  Reviewer #2 (Recommendations for The Authors):
  
  Q12. It would be helpful if the authors could provide citations and rationale for why they chose specific molecular markers to validate the different population of extracellular vesicles.
  
  A12. Different extracellular populations are defined by molecular marker signatures that reflect their origin. VDAC1 forms ionic channels in the mitochondrial membrane, has a role in triggering apoptosis, and has been described as characteristic of ABs.[1]
  
  The ER protein Calreticulin has also been used as an AB marker [2]; however, other studies have noted the presence of Calreticulin in MVs. [1] This apparent non-specificity may derive from apoptotic processes, during which the ER membrane fragments and forms vesicles smaller than ABs, which would contain Calreticulin and sediment at higher centrifugal forces.[3,4] In fact, proteomic studies have linked the presence of Calreticulin with vesicular fractions of a size range relevant for MVs [5] and ABs [6].
  
  ARF6, a GTP-binding protein implicated in cargo sorting and promoting MV formation, has been proposed as an MV marker. [7,8]
  
  Classic markers of EXOs include molecules involved in biogenesis, such as tetraspanins (CD63, CD9, CD81), Alix, TSG101, and flotillin-1.[9,10] Nonetheless, studies have recently reported the widespread nature of such markers among various EV populations, although with different relative abundances (such as is the case for CD9, CD63, HSC70, and flotillin-1[11]). Notably, certain molecular markers (such as TSG101[1,11]) have been ratified as specific to EXOs.
  
  References
  
  D. K. Jeppesen, M. L. Hvam, B. Primdahl-Bengtson, A. T. Boysen, B. Whitehead, L. Dyrskjøt, T. F. Orntoft, K. A. Howard, M. S. Ostenfeld, J. Extracell. Vesicle. 2014, 3, 25011, doi: 10.3402/jev.v3.25011.
  
  J. van Deun, P. Mestdagh, R. Sormunen, V. Cocquyt, K. Vermaelen, J. Vandesompele, M. Bracke, O. De Wever, A. Hendrix, J. Extracell. Vesicles. 2014, 3:24858, doi: 10.3402/jev.v3.24858.
  
  L. Abas, C. Luschnig, Anal. Biochem. 2010, 401, 217-227, doi: 10.1016/j.ab.2010.02.030.
  
  C. Lavoie, J. Lanoix, F. W. Kan, J. Paiement, J. Cell Sci. 1996, 109(6), 1415-1425.
  
  M. Tong, T. Kleffmann, S. Pradhan, C. L. Johansson, J. DeSousa, P. R. Stone, J. L. James, Q. Chen, L. W. Chamley, Hum. Reprod. 2016, 31(4), 687-699, doi: 10.1093/humrep/dew004.
  
  P. Pantham, C. A. Viall, Q. Chen, T. Kleffmann, C. G. Print, L. W. Chamley, Placenta. 2015, 36, 1463e1473, doi: 10.1016/j.placenta.2015.10.006.
  
  V. Muralidharan-Chari, J. Clancy, C. Plou, M. Romao, P. Chavrier, G. Raposo, C. D'Souza-Schorey, Curr. Biol. 2009, 19, 1875-1885.
  
  C. Tricarico, J. Clancy, C. D'Souza-Schorey, Small GTPases. 2016, 0(0), 1-13.
  
  M. Colombo, G. Raposo, C. Théry, Annu. Rev. Cell. Dev. Biol. 2014, 30, 255-289, doi: 10.1146/annurev-cellbio-101512-122326.
  
  S. Mathivanan, H. Ji, R. J. Simpson, J. Proteomics. 2010, 73(10), 1907-1920.
  
  J. Kowal, G. Arras, M. Colombo, M. Jouve, J. P. Morath, B. Primdal-Bengtson, F. Dingli, D. Loew, M. Tkach, C. Théry, Proc. Natl. Acad. Sci. U. S. A. 2016, 113(8), E968-77.
  
  Q13. The PCA analysis in supplementary figure 4 A&B needs more explanation for why they think separation of the two conditions based on principal component 1 is sufficient. The small number of replicates makes me concerned because principal component 2 does not show similarity of replicates for the DNase treated samples. Also, 4C has no description in the figure legend.
  
  A13. The PCA results show a clear separation between the two conditions; we believe this separation is primarily driven by the differences observed in principal component 1 (PC1). We would like to address the concerns raised by the reviewer with the following points:
  
  Interpretation of PCs: In PCA, the principal components represent orthogonal axes capturing the highest variance in the data. PC1 accounts for 56% and 57% of the variance in the two conditions, respectively. The significant variance explained by PC1 suggests that it effectively captures the major sources of variation between the samples.
  
  Sample Replicates and Variability: The concern regarding the small number of replicates is acknowledged, and we understand its impact on the analysis. Despite the limited number of replicates, the consistent pattern of separation in PC1 between the two conditions provides confidence in the observed separation. We also agree that PC2 does not show an apparent similarity among the DNase-treated samples; however, this does not diminish the significance of PC1, which robustly separates the two conditions.
  
  We include the Figure legend for 4C: “C) Principal component analysis shows EV sample grouping due to specificity in coding-gene sequences.
  
  Q14. I am confused by the phrasing in the last two sentences of the top paragraph on page 7. Why would apoptotic bodies all have similar content if they encapsulate a greater amount of material making their contents less specific? Please clarify.
  
  A14. This sentence intended to convey the fact that apoptotic bodies (ABs) are formed from apoptotic cells, they are larger in size, and their content is more non-specific - this non-specific nature arises as they do not encapsulate molecules specifically, unlike the other two types of vesicles. For more detailed information on ABs in human reproduction, we published an extensive review in 2018 (see below).
  
  Simon C, Greening DW, Bolumar D, Balaguer N, Salamonsen LA, Vilella F. Extracellular Vesicles in Human Reproduction in Health and Disease. Endocr. Rev. 2018 Jun 1;39(3):292-332. doi: 10.1210/er.2017-00229. PMID: 29390102.
  
  Q15. The first and last sentences of the last paragraph of page 8 seem to contradict each other. Please clarify.
  
  A15. We observe an enrichment in the amount of mitochondrial DNA in samples during the receptive and post-receptive phases. While the data may not show statistical significance, we observed a trend towards greater enrichment in receptivity compared to pre-receptivity. The lack of significant differences could be attributed to inherent variability among patients. We have also altered the text on page 8 to avoid confusion.
  
  Q16. Quantification of the rates of DNA incorporation into embryos would strengthen Figure 4 and Supplementary Figure 5.
  
  A16. We acknowledge the reviewer's feedback, and in response, we conducted an assay to quantify the total DNA incorporated into the embryos. We isolated EVs from the control Ishikawa cell culture media and EdU-treated Ishikawa cell culture media to achieve this. Subsequently, we co-incubated both types of EVs with ten embryos overnight in G2 plus media at 37ºC and 5% CO2.
  
  After co-incubation, we collected embryos and the culture media containing co-incubated EVs. We then isolated total DNA using the QIAamp® DNA Mini kit (Qiagen; 51304). To label the EdU-DNA particles, we performed a click-it reaction using the Click-iT™ EdU Alexa Fluor™ 488 flow cytometry assay Kit (Thermo Fisher Scientific, ref: C10420) per the manufacturer's instructions. Subsequently, we cleaned and purified DNA using AMPure beads XP (Beckman Coulter, A63882) and eluted DNA in 150 L of 0.1 M Tris-EDTA. Finally, we measured the fluorescence of each sample using a Victor3 plate reader (PerkinElmer). To ensure accuracy, we subtracted the background signal from non-labeled DNA-derived EVs and embryos incubated without EVs for each sample. Despite conducting the experiment twice, we encountered challenges in obtaining clear results, possibly due to the limitation of the technique's resolution.
  
  Q17. If mtDNA is most enriched in MVs but only embryos cultured with Exos demonstrated differences in respiration the authors need to comment on this discrepancy.
  
  A17. We ask the reviewer to refer to Answer A3; we have thoroughly revised the manuscript, focusing our message on DNA content.
  
  Q18. The authors should change the definitive language in the title of the manuscript because all evidence presented is correlative.
  
  A18.We have modified the title to better align with the manuscript's results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”
  
  Q19. I realize this is beyond what the authors intend for the scope of this paper, however, on page 6 the authors describe membranous structures within the ABs but say they couldn't study their presence with organelle-specific markers. Why? Presence of organelles in these vesicles is very interesting!
  
  A19. As the reviewer rightly points out, we did not study ABs in this manuscript. Analysis of the electron microscopy images suggests the presence of fragments of organelles, most likely originating from apoptotic processes; however, we did not use any specific markers to confirm our assertion. We have modified the text to avoid any confusion. Please see Page 6, Lines 120-121, for further details.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.21.537765v3
www.biorxiv.org www.biorxiv.org

Life-cycle-related gene expression patterns in the brown algae

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors have examined gene expression between life cycle stages in a range of brown macroalgae to examine whether there are conserved aspects of biological features.
  
  Strengths:
  
  The manuscript incorporates large gene expression datasets from 10 different species and therefore enables a comprehensive assessment of the degree of conservation of different aspects of gene expression and underlying biology.
  
  The findings represent an important step forward in our understanding of the core aspects of cell biology that differ between life cycle phases and provide a substantial resource for further detailed studies in this area. Convincing evidence is provided for the conservation of lifecycle-specific gene expression between species, particularly in core housekeeping gene modules.
  
  Weaknesses:
  
  I found a few weaknesses in the methodology and experimental design. I think the manuscript could have been clearer when linking the findings to the biology of the brown algae.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript by Ratchinski et al presents a comprehensive analysis of developmental and life history gene expression patterns in brown algal species. The manuscript shows that the degree of generation bias or generation-specific gene expression correlates with the degree of dimorphism. It also reports conservation of life cycle features within generations and marked changes in gene expression patterns in Ectocarpus in the transition between gamete and early sporophyte. The manuscript also reports considerable conservation of gene expression modules between two representative species, particularly in genes associated with conserved functional characteristics.
  
  Strengths:
  
  The manuscript represents a considerable "tour de force" dataset and analytical effort. While the data presented is largely descriptive, it is likely to provide a very useful resource for studies of brown algal development and for comparative studies with other developmental and life cycle systems.
  
  Weaknesses:
  
  Notwithstanding the well-known issues associated with inferring function from transcriptomics-only studies, no major weaknesses were identified by this reviewer.
  
  Reviewing Editor Comments:
  
  The overall assessment of the reviewers does not contain major aspects of concern. We nevertheless recommend that the authors carefully consider the constructive comments, as this will further improve their manuscript.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Line 32: The abstract states 'considerable conservation of co-expressed gene modules', but the degree of conservation between Ectocarpus and D. dichotoma appeared limited to specific subsets of genes with highly conserved housekeeping functions, e.g., translation. I think the wording of the abstract should be rephrased to better reflect this.
  
  We agree that genes with housekeeping functions figure strongly in the gene modules that showed strong conservation between Ectocarpus species 7 and D. dichotoma (and we actually highlight this point in the manuscript) but we do not believe that this invalidates the conservation. In the analysis shown in Figure 6A, for example, high scores were obtained for both connectivity and density for about a third of the gene modules and these modules cover broad range of cellular functions. This is a significant result given the large phylogenetic distance and we feel that "considerable conservation" is appropriate as a description of the level of correlation.
  
  (2) Introduction - The Introduction needs a better explanation of the biology of the life cycle phases. Some of this information is present in the 1st paragraph of Materials and Methods, although it would be preferable to include this information within the main text, ideally within the Introduction before the Results are described. For example, when are flagella present? The presence of flagella could be indicated in Figure 3. The ecology of the life cycle is also not described. Are life cycles present in the same ecological niche? Do they co-exist or occupy distinct environments? It would be useful to understand how the observed genotypes could relate to this wider aspect of the brown algal biology.
  
  We have added a sentence to explain that zoids (gametes and spores) are the only flagellated stages of the life cycle (line 678). In addition, in the legend for Figure 3, we have indicated which of the life cycle stages analysed in panel 3A consisted entirely or partially of flagellated cells. We have also added information about phenology to the Introduction.
  
  (3) Line 127. 'The proportion of generation specific genes was positively correlated with the level of dimorphism'. The level of dimorphism between species was not clear to me. This needs to be clearly displayed in Figure 1B.
  
  We had attempted to illustrate the level of dimorphism, using the size of each generation as a measurable proxy, in Figure S1 but we agree that the information was not very clearly presented. To improve clarity, we now provide independent size scales for each generation of the life cycle in this figure and state in the legend that "Size bars indicate the approximate sizes of each generation of each life cycle, providing an indication of the degree of dimorphism between the two generations.". In the text, Figure S1 is cited earlier in the paragraph but we now repeat the citation of the figure at the end of the sentence "The proportion of generation-specific genes (...) was positively correlated with the level of dimorphism" so that the reader can specifically consult the supplementary figure for this phenotypic parameter.
  
  (4) Line 267. Are there known differences in cell wall composition between life cycle phases or within each generation as individual life cycle phases mature (e.g., differences between unicellular and multicellular stages)?
  
  Detailed comparative analyses of cell wall composition at different stages of the life cycle have not been carried out for brown algae. However, Congo red stains Ectocarpus gametophytes but not sporophytes (Coelho et al., 2011), indicating a difference in cell wall composition between the two generations. Zoids (spores and gametes) do not have a cell wall and calcofluor white staining of meio-spores has indicated that a cell wall only starts to be deposited 24-48 hours post-release (Arun et al., 2013).
  
  (5) Line 388. The authors should comment on the accuracy of OrthoFinder for different gene types across this degree of divergence (250 MYA). The best conservation was found in genes with housekeeping characteristics (line 401). It may be that these gene modules show the highest degree of conservation in expression patterns, but I also wonder whether they pattern may also emerge because finding true orthologues is easier for highly conserved gene families.
  
  We do not believe that this is the case because, as mentioned above, the "housekeeping" modules cover quite a broad range of cellular functions. Note also that the modules were given functional labels based on their being clearly enriched in genes corresponding to a particular class of function but not all the genes in a module have a predicted function that corresponds to the functional classification.
  
  However, we have carried out an analysis to look for evidence of the bias proposed by the reviewer. For this, we used BLASTp identity scores as an approximate proxy for pairwise identity between Ectocarpus species 7 and D. dichotoma one-to-one orthologues in each module and plotted the mean identity score for each module against the Fischer test p-value of the contingency table in Figure 6C (Author response image 1).
  
  Author response image 1.
  
  Plot of estimations of the mean percent shared identity between the orthologues within each module (based on mean BLASTp identity scores) against log10(pvalue) values obtained with the Fisher's exact test applied in Figure 6C to determine whether pairs of modules shared a greater number of one-to-one orthologues than expected from a random distribution. Error bars indicate the standard deviation.
  
  This analysis did not detect any correlation between the degree of sequence conservation of orthologues in a module and the degree of conservation of the module between Ectocarpus species 7 and D. dichotoma.
  
  Minor comments
  
  (1) Line 650 loose should be lose.
  
  The error has been corrected.
  
  (2) Line 695 filtered through a 1 μm filter to remove multicellular gametophyte fractions. Is this correct? It seems too small to allow gametes to pass through.
  
  Yes, the text is correct, a 1 μm filter was used. The gametes do pass through this filter, presumably because they do not have a rigid cell wall, allowing them to squeeze through the filter when a light pressure is applied.
  
  (3) Line 709 - DDT should be DTT
  
  The error has been corrected.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) It is not clear why the chosen species for analysis do not include fucoid algae, which display a high degree of dimorphism between generations and which are relatively well studied with respect to gene expression patterns during early development. Indeed, it was recently shown that gene expression patterns in developing embryos of Fucus spp. obey the "hourglass" pattern whereby gene expression shows a minima of transcription age index (i.e., higher expression of evolutionarily older genes) associated with differentiation at the phylotypic stage. I am somewhat surprised that the manuscript does not consider this feature in the analysis or discussion.
  
  Brown algae of the order Fucales have diploid life cycles and therefore do not alternate between a sporophyte and gametophyte generation. It is for this reason that we thought that it was more interesting to compare Ectocarpus species 7 with D. dichotoma, which has a haploid-diploid life cycle.
  
  (2) In Discussion, the comparison of maternal to zygote transition in animals and land plants, which show a high degree of dimorphism, with Ectocarpus would be strengthened by data/discussion from other brown algae that show a high degree of dimorphism.
  
  Animals have diploid life cycles and dimorphism in that lineage generally refers to sexual rather than generational dimorphism. Land plants do have highly dimorphic haploiddiploid life cycles but it is unclear how this characteristic relates to events that occur during the maternal to zygote transition. In Ectocarpus, the transition from gamete to the first stages of sporophyte development involved more marked changes in gene expression than we observed when comparing the mature sporophyte and gametophyte generations (Figure 3C). At present, there is no evidence that events during these two transitions are correlated. The relationship between changes in gene expression during very early sporophyte development and during alternation of life cycle generations could be investigated further using a highly dimorphic kelp model system such as Saccharina latissima but we are not aware of any studies that have specifically addressed this point.
  
  (3) Since marked changes were observed during the transition from gamete to early sporophyte in Ectocarpus, it would be interesting to know how gene expression patterns change during the transition from gamete to partheno-sporophyte. Would the same patterns of downregulation and upregulation be expected?
  
  The sporophyte individuals derived from gamete parthenogenesis (parthenosporophytes) are indistinguishable morphologically and functionally from diploid sporophytes derived from gamete fusions (see line 76). They also express generation marker genes in a comparable manner (Peters et al., 2008). Based on these observations, we have treated partheno-sporophytes and diploid sporophytes as equivalent in our experiments. For clarity, we have now distinguished partheno-sporophyte from diploid sporophyte samples in Table S1.
  
  (4) The authors show a correlation between the degree of dimorphism and generation-biased or generation-specific expression. How was the degree of dimorphism quantified?
  
  The degree of dimorphism is illustrated in Figure S1 using the relative size of the two generations as a proxy. Size estimations are approximate because the size of an individual of a particular species is quite variable but the ten species nonetheless represent a very clear gradient of dimorphism due to the extreme differences in size between generations of species at each end of the scale, with the sporophyte generation being several orders of magnitude larger than the gametophyte generation or visa versa.
  
  References
  
  Arun A, Peters NT, Scornet D, Peters AF, Cock JM, Coelho SM. 2013. Non-cell autonomous regulation of life cycle transitions in the model brown alga Ectocarpus. New Phytol 197:503– 510. doi:10.1111/nph.12007
  
  Coelho SM, Godfroy O, Arun A, Le Corguillé G, Peters AF, Cock JM. 2011. OUROBOROS is a master regulator of the gametophyte to sporophyte life cycle transition in the brown alga Ectocarpus. Proc Natl Acad Sci USA 108:11518–11523. doi:10.1073/pnas.1102274108
  
  Peters AF, Scornet D, Ratin M, Charrier B, Monnier A, Merrien Y, Corre E, Coelho SM, Cock JM. 2008. Life-cycle-generation-specific developmental processes are modified in the immediate upright mutant of the brown alga Ectocarpus siliculosus. Development 135:1503–1512.doi:10.1242/dev.016303
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.25.649966v3
www.biorxiv.org www.biorxiv.org

Deletion of the moeA gene in Flavobacterium IR1 drives structural color shift from green to blue and alters polysaccharide metabolism

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Structural colors (SC) are based on nanostructures reflecting and scattering light and producing optical wave interference. All kinds of living organisms exhibit SC. However, understanding the molecular mechanisms and genes involved may be complicated due to the complexity of these organisms. Hence, bacteria that exhibit SC in colonies, such as Flavobacterium IR1, can be good models.
  
  Based on previous genomic mining and co-occurrence with SC in flavobacterial strains, this article focuses on the role of a specific gene, moeA, in SC of Flavobacterium IR1 strain colonies on an agar plate. moeA is involved in the synthesis of the molybdenum cofactor, which is necessary for the activity of key metabolic enzymes in diverse pathways.
  
  The authors clearly showed that the absence of moeA shifts SC properties in a way that depends on the nutritional conditions. They further bring evidence that this effect was related to several properties of the colony, all impacted by the moeA mutant: cell-cell organization, cell motility and colony spreading, and metabolism of complex carbohydrates. Hence, by linking SC to a single gene in appearance, this work points to cellular organization (as a result of cell-cell arrangement and motility) and metabolism of polysaccharides as key factors for SC in a gliding bacterium. This may prove useful for designing molecular strategies to control SC in bacterial-based biomaterials.
  
  Strengths:
  
  The topic is very interesting from a fundamental viewpoint and has great potential in the field of biomaterials.
  
  Thank you for this.
  
  The article is easy to read. It builds on previous studies with already established tools to characterize SC at the level of the flavobacterial colony. Experiments are well described and well executed. In addition, the SIBR-Cas method for chromosome engineering in Flavobacteria is the most recent and is a leap forward for future studies in this model, even beyond SC.
  
  We appreciate these comments.
  
  Weaknesses:
  
  The paper appears a bit too descriptive and could be better organized. Some of the results, in particular the proteomic comparison, are not well exploited (not explored experimentally). In my opinion, the problem originates from the difficulty in explaining the link between the absence of moeA and the alterations observed at the level of colony spreading and polysaccharide utilization, and the variation in proteomic content.
  
  We have looked at the organisation of the manuscript carefully in this revision, as suggested. In terms of the proteomics, there are a large number of proteins affected by the moeA deletion and not all could be followed up. We chose spreading, structural colour formation and starch degradation to follow up phenotypically, as the most likely to be relevant. For example, (L615-617) we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the moeA KO as a possible explanation for the reduced colony spreading of this mutant. Changes in polysaccharide (starch) utilization were seen on solid medium, as well as in the proteomic profile where we observed the upregulation of carbohydrate metabolism proteins linked to PUL (polysaccharide utilisation locus) operons (Terrapon et al., 2015), such as PAM95095-90 (Figure 8), and other carbohydrate metabolism-related proteins, including a pectate lyase (Table S7) which is involved in starch degradation (Aspeborg et al., 2012). And as noted in L555-566 and Figure 9, alterations in starch metabolism were investigated experimentally.
  
  First, the effect of moeA deletion on molybdenum cofactor synthesis should be addressed.
  
  MoeA is the last enzyme in the MoCo synthesis pathway, thus if only MoeA is absent the cell would accumulate MPT-AMP (molybdopterin-adenosine monophosphatase) (Iobbi-Nivol & Leimkühler, 2013), and the expressed molybdoenzymes would not be functional. In L582-585, we commented how the lack of molybdenum cofactor may affect the synthesis of molybdoenzymes. However, if you meant to analyse the presence of the small molecules, i.e. the cofactors involved in these pathways, that was an assay we were not able to perform. However, in L585-587, we addressed how the deletion of moeA affected the proteins encoded by the rest of genes in the operon which is relevant to the question.
  
  Second, as I was reading the entire manuscript, I kept asking myself if moeA (and by extension molybdenum cofactor) was really involved in SC or it was an indirect effect. For example, what if the absence of moeA alters the cell envelope because the synthesis of its building blocks is perturbed, then subsequently perturbates all related processes, including gliding motility and protein secretion? It would help to know if the effects on colony spreading and polysaccharide metabolism can be uncoupled. I don't think the authors discussed that clearly.
  
  The message of the paper is that the moeA gene, as predicted from a previous genomics analysis, is important in SC. This is based on the representation of the moeA gene in genomes of bacteria that display SC. This analysis does not predict the mechanism. When knocked out, a significant change in structural colour occurred, supporting this hypothesis. Whether this effect is direct or indirect is difficult to assess, as this referee rightly suggests. In order to follow up this central result, we performed proteomics (both intra- and extracellular). As we observed, the deletion of a single gene generated many changes in the proteomic profile, thus in the biological processes. Based on the known functions of molybdenum cofactor, we could only hypothesize that pterin metabolism is important for SC, not exactly how.
  
  We have discussed the links between gliding/spreading and polysaccharide metabolism more clearly, with reference to the literature, as quite a bit is known here including possible links to SC.
  
  “Polysaccharide metabolism in IR1 has been linked to changes in colony color and motility through the study of fucoidan metabolism (van de Kerkhof et al., 2022). Polysaccharide degradation and gliding motility are coupled to the same mechanism: the phylum-specific type IX secretion system, used for the secretion of enzymes and proteins involved in both functions (McKee et al., 2021).” [L622-626]
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors constructed an in-frame deletion of moeA gene, which is involved in molybdopterin cofactor (MoCo) biosynthesis, and investigated its role in structural colors in Flavobacterium IR1. The deletion of moeA shifted colony color from green to blue, reduced colony spreading, and increased starch degradation, which was attributed to the upregulation of various proteins in polysaccharide utilization loci. This study lays the ground for developing new colorants by modifying genes involved in structural colors.
  
  Major strengths and weaknesses:
  
  The authors conducted well-designed experiments with appropriate controls and the results in the paper are presented in a logical manner, which supports their conclusions.
  
  We appreciate these comments.
  
  Using statistical tests to compare the differences between the wild type and moeA mutant, and adding a significance bar in Figure 4B, would strengthen their claims on differences in cell motility regarding differences in cell motility.
  
  Thank you. Figure 4B contains the significance bars that represent the standard deviation of the mean value of the three replicates, but we have modified it to make them more clear.
  
  Additionally, in the result section (Figure 6), the authors suggest that the shift in blue color is "caused by cells which are still highly ordered but narrower", which to my knowledge is not backed up by any experimental evidence.
  
  Thanks. We mentioned that the mutant cells are narrower than the wild type based on the estimated periodicity resulting from the goniometry analysis (L427-430). We will now say “likely to be narrower based on the estimated periodicity from the optical analysis” rather than just “narrower”.
  
  “This optical analysis aligns with visual observations, confirming the blue shift in ΔmoeA, and suggests that this change in SC is caused by cells which are likely to be narrower based on the estimated periodicity from the optical analysis.” [L409-411]
  
  Overall, this is a well-written paper in which the authors effectively address their research questions through proper experimentation. This work will help us understand the genetic basis of structural colors in Flavobacterium and open new avenues to study the roles of additional genes and proteins in structural colors.
  
  Much appreciated.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  As you will see, the reviewers were rather positive about the paper but suggested a number of points to improve it, including a discussion of the direct role of moeA as well as specific editorial comments.
  
  Reviewer #1 (Recommendations for the authors):
  
  More specific comments to the authors:
  
  (1( Line 300, Paragraph on bioinformatic analysis of molybdopterin operon : As written, it is not clear whether this operon is crucial for pterin cofactor synthesis or only some genes are involved. And what is the contribution of moeA?
  
  Based on the bioinformatic analysis done in Zomer et al., 2024, we know the score of which genes of the molybdopterin cofactor synthesis operon may be more relevant to the display of SC, in addition to moeA. We chose moeA to KO as it had the highest score, being careful to delete the coding sequence and not any upstream promoter. The other genes in the predicted operon are moaE, moaC2, and moaA. Then in the proteomic analysis (L435-442), we analysed how the encoded proteins from this operon were upregulated (MoaA, MoaC2, and MobA), indicating also the unaltered proteins (MoeZ and MoaE) and the undetected proteins (MoaD and SumT). Nevertheless, the operon is crucial for pterin cofactor synthesis because it contains all the genes involved in the pathway, and moeA encoded the enzyme for the last reaction of the pathway, being the the molecule produced in the mutated pathway the adenylated molybdopterin (MPT-AMP) instead of molybdenum cofactor (MoCo).
  
  (2) Paragraph line 342 on moeA mutant phenotyping :
  
  Is the reduction in colony spreading caused by a defect in single-cell gliding motility or is the cause more complex? This can be quantified.
  
  We believe the cause is more complex. As mentioned above, for example, in (L615-617) we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the moeA KO as a possible explanation for the reduced colony spreading of this mutant. This cannot be explained simply by spreading, but must (from the optical analysis) indicate changes in cell organisation/dimensions.
  
  (3) During the description of the moeA mutant phenotype (associated with Figures 2 and 4) and throughout the article, the optical properties are « functions » of colony spreading and moeA-dependent metabolism. However it is not quite clear if these two effects are independent or if one may be a consequence of the other.
  
  As noted above, colony spreading alone does not explain the blue-shift in SC observed. Given the function of MoeA (molybdate insertion into MPT-AMP [adenylated molybdopterin], MoMPT [molybdenum-molybdopterin] formation) for the synthesis of MoCo (molybdenum cofactor), the primary effect seems to be on metabolism but as we are dealing with an influential enzymatic cofactor a number of secondary effects are likely, and indeed the proteomics supports this. It is likely that the effect on spreading is secondary as seen with the downregulation of GldL (see above), but we cannot be sure.
  
  (4) Paragraph starting line 381 and Figure 5 on gliding motility:
  
  Gliding motility has to be tested at the level of single cells, allowing a more thorough characterization of the spreading defects. In addition, since gliding is entangled with Type IX-dependent secretion in Flavobacteria, the authors should test if Type IXdependent was perturbed in the absence of moeA.
  
  Based on the intracellular and extracellular proteomic analyses, the regulated T9SS proteins in the absence of moeA are the downregulation of GldL and SprT, and the upregulation of PorU. It shows the log2 FC (moeA/WT) of each these extracellular proteins:
  
  Author response table 1.
  
  <-1: downregulated in moeA KO, -1<X<1: no significant regulation, >1: upregulated in moeA KO, -: not detected
  
  (5) L401: In my opinion, the section "Quantification of the optical responses of IR1 WT and ΔmoeA colonies" should be moved up, before the characterization of motility.
  
  We have done this, as suggested. The section was moved from L401-423 to L388-411.
  
  (6) L475: Proteome comparison: « Of the total known proteins in IR1, 27.5% (1,504 proteins) extracellular proteins were identified » Are some of these proteins also found in the cell fraction? Wouldn't it be more accurate to write that « 1504 proteins were found in the extracellular fraction"?
  
  We have done this, as suggested.
  
  “Of the total known proteins in IR1, 27.5% (1,504 proteins) proteins were detected in the extracellular fraction, 60.4% (909) were statistically significant (p<0.01), with 20.5% (186) considered downregulated, and 20% (182) upregulated in ΔmoeA (Figure 7B).” [L484-486]
  
  How can the authors exclude contamination of the extracellular fraction? This could easily explain the number of proteins lacking secretion signals: "29.6% (55) were likely secreted through a non-classical way, lacking typical secretion sequence motifs in their N-terminus."
  
  Based on the results from SecretomeP and SignalP, we excluded contamination, reducing the significant downregulated proteins from 186 (L476) to 69 (L486), and the upregulated ones from 182 (L477) to 111 (L500).
  
  (7) L490: if the protein misannotated flagellin is highly downregulated, why not push the analysis a bit further and ask what true function may be perturbed? In addition, it should not be classified as a motility protein in Table S6 and considered as a motility protein in the article.
  
  We reconsidered the information given by this and decided to remove it because after checking the homology of the polypeptide by Blast searching, we feel it is probably due to a missannotation.
  
  As is, the whole proteomic section is not that useful. Too many functions are evoked and the reader is not directed toward any particular conclusion. The most convincing hits from the proteomic analysis should be confirmed using another method. Transcriptional regulation could be easily probed by RT-qPCR. Or, since genetics is possible, proteins could be tagged and levels compared by western blot maybe? Do knock-out of the encoding genes generate any phenotype on SC? This would bring weight to the proteomic analysis.
  
  We have revised the proteomics section and removed functions that are not directly relevant to our conclusion.
  
  We feel the most important observation suggested by proteomics was the possible link between moeA and starch metabolism, because the metabolism of complex polysaccharides is important in the Flavobacteriia and known to be linked to SC (van de Kerkhof et al., 2022). It was not possible to follow up every pathway suggested by the proteomics, but the study is appropriately performed with the correct statistics.
  
  (8) Figure 9 : Does the absence of moeA affect the spreading of ASWS? Were colony sizes similar during the starch degradation assay? How can the authors rule out the idea that starch degradation is impacted by the difference in spreading rather than an independent function of moeA in starch metabolism? Slower spreading could lead to the accumulation of amylases, hence stronger activity. Why does starch degradation only accumulate at the center of the colony in the WT case?
  
  The colonies of the WT and moeA had similar size during the starch degradation assay (2 days). However, after day 3, only WT colonies kept expanding on diameter.
  
  Starch degradation is logically in the centre of the colony as it is where the greatest concentration of cells exists, secreting degradative enzymes, for the longest time. Presumably starch degradation at the colony edge is not yet seen as the action of extracellular enzymes is low and has not had time to degrade the starch to the point that there is no iodine staining.
  
  “In contrast to other media where ΔmoeA colony expansion was less than WT, the ΔmoeA showed similar colony spreading and stronger starch degradation, supporting a role of moeA in complex polysaccharides metabolism.” [L562-565]
  
  (9) Finally, I am not quite sure what the authors mean by « a role of moeA in complex polysaccharides metabolism ». Are they referring to enzymes secreted in the medium to degrade starch? or to the incorporation and use of starch degradation products?
  
  We meant that the deletion of moeA showed an increase of extracellular starch degradation as seen in the iodine assay (Figure 9), as well as the upregulation of three different PUL operons (Figure 8).
  
  Reviewer #2 (Recommendations for the authors):
  
  The paper in general is well written with proper experimentation. However, here are a few recommendations for improving the writing and presentation, including minor corrections to the text and figures.
  
  Thank you.
  
  (1) It would be helpful for the readers if you could expand on "some metabolic pathways" in line 71. Please provide examples of metabolic pathways that are linked to SC.
  
  We have done this.
  
  “A recent bioinformatic study has shown the possible link of some metabolic pathways, such as carbohydrate, pterin, and acetolactate metabolism, to bacterial SC (Zomer et al., 2024).”[L70-72]
  
  (2) "Line 79 : a bioinformatics analysis", please mention what kind of bioinformatics analysis was done and by whom to provide clarity for the readers: Either mention bio info analysis or give more details on what kind of bio info analysis and study done by whom"
  
  We have clarified this, as suggested.
  
  “A large-scale, genomic-based analysis of 117 bacteria strains (87 with SC and 30 without) identified genes potentially involved in SC by comparing gene presence/absence, providing a SC-score (Zomer et al., 2024). By this method, pterin pathway genes were strongly predicted to be involved in SC.” [L80-83]
  
  (3) Please correct "Bacteria strains used in this study" to "bacterial" strains in Line 122.
  
  We have done so.
  
  (4) Please indicate in "Lines 394-396" that there were no vortex patterns observed in the moeA mutant.
  
  We have done so.
  
  “In contrast, ΔmoeA exhibited limited motility, with a more tightly packed cell organization and a fine, slow-moving layer at the edge (Figure 6, blue arrows), and did not show a ‘vortex’ pattern. This suggests that moeA deletion significantly impairs cell motility and colony expansion.” [428-L431]
  
  (5) In Figure 4 it looks like with a different carbon source (ASWB with agar and Fucoidan (ASWBF)) the moeA mutant and wild type exchanges its phenotype compared to ASWBKC. Could you explain why this happens in the discussion by highlighting the differences between fucose and Kappa-Carrageenan or confirm if there are any differences in the carbohydrate utilization between the wild type and moeA mutant using biolog assays?
  
  We have explained the differences. Biolog would not be appropriate as we are looking for metabolic processes of bacteria on surfaces (agar) and this is not necessarily appropriate to biolog, which we understand uses liquid cultivation in microplates.
  
  “On different polysaccharide media, the ΔmoeA strain showed varied SC and colony expansion patterns: green/blue SC and low colony expansion on agar, intense blue SC and low colony expansion on kappa-carrageenan, dull green SC and low colony expansion on fucoidan, and blue/green SC with higher colony expansion on starch. Interestingly, the color phenotype of the WT and ΔmoeA exchanged their phenotype on kappa-carrageenan (a simple linear sulfated polysaccharide of D-galactopyranose) and fucoidan (a complex sulfated polysaccharide of fucose and other sugars as galactose, xylose, arabinose and rhamnose), showing the importance of the polysaccharide metabolism in SC. While reduced motility has been associated with dull or absent SC, and reduced polysaccharide metabolism (Kientz et al., 2012a; Johansen et al., 2018), ΔmoeA showed reduced motility, but an intense blue SC, and high polysaccharide metabolism. Based on these results, we established a link among polysaccharide metabolism, MoCo biosynthesis, and SC, showing that intense SC is not strictly dependent on motility.” [L636-648]
  
  (6) In the discussion "Line 632" it is unclear what loss is being limited, and it would help strengthen your discussion if you could add references for lines: 633-636. There are a lot of hypotheses in lines 637-642, it would help the readers if you could clearly mention that these are hypotheses and will need experimental evidence or provide appropriate evidence to support these claims.
  
  We have done this.
  
  “Ecologically, we hypothesize that dense, highly structured bacterial colonies, such as necessary for the SC phenotype, can enhance the uptake of metabolic degradation products from complex polysaccharides. These large macromolecules are often partially hydrolyzed extracellularly because they are too large to pass through bacterial cell membranes. For example, marine Vibrionaceae strains that produce lower levels of extracellular alginate lyases tend to aggregate more strongly, potentially facilitating localized degradation and uptake of polysaccharides (D’Souza et al., 2023). Additionally, certain marine bacteria employ a "selfish" mechanism to internalize large polysaccharide fragments into their periplasmic space, minimizing loss to the environment and enhancing substrate utilization (Reintjes et al., 2017). Bacteria secrete enzymes into the surrounding environment to break these polysaccharides down into more easily absorbable monosaccharides or oligosaccharides. This mechanism suggests that the colony structure could create a physical barrier that keeps these products concentrated and near the cells, allowing the colony to efficiently access and utilize these products, preventing the leakage into the surrounding environment. While SC may also yield other ecological benefits associated with growth in biofilms, the highly structured colonies that characterize SC may be more resistant against invasion by competitor species scavenging for degradation products, than an unstructured biofilm. This model is consistent with the observation that SC is associated with polysaccharide metabolism genes, and with the recent observation that SC is mainly localized on surface and interface environments such as airwater interfaces, tidal flats, and marine particles (Zomer et al., 2024).” [L650-670]
  
  (7) It would help the readers if you could expand on how polysaccharide metabolism is linked to motility in Line 610.
  
  As indicated previously, this is known and we will clarify.
  
  “Polysaccharide metabolism in IR1 has been linked to changes in colony color and motility through the study of fucoidan metabolism (van de Kerkhof et al., 2022).” [L622-623]
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.01.13.632688v3
www.biorxiv.org www.biorxiv.org

Progesterone induces meiosis through two obligate co-receptors with PLA2 activity

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment:
  
  “…However, the findings are reliant on high concentrations of inhibitor drugs, and mechanistic details about the molecular interaction and respective functions of ABHD2 and mPRb are incomplete.”
  
  As discussed below in the response to Reviewers the drug concentrations used span the full dose response of the active range of each drug. In cases where the drug concentrations required to block oocyte maturation where significantly higher than those reported in the literature, we considered those drugs ineffective. In terms of the molecular details of the mechanistic interaction between mPRb and ABHD2, we now provide additional data confirming their molecular interaction to produce PLA2 activity where each protein alone is insufficient. Although these new studies provide more mechanistic insights, there remains details of the ABHD2-mPR interactions that would need to be addressed in future studies which are beyond the scope of the current already extensive study.
  
  Public Reviews:
  
  Reviewer 1
  
  (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.
  
  The co-IP experiments presented in Figure 5E argue that the two receptors are constitutively associated at rest before exposure to P4; but at low levels since addition of P4 increases the association between mPRβ and ABHD2 by ~2 folds. Importantly, we know from previous work (Nader et al., 2020) and from imaging experiments in this study that mPR recycles in immature oocytes between the PM and the endosomal compartment. It is not clear at this point within which subcellular compartment the basal association of mPR and ABHD2 occurs. We have tried to elucidate this point but have not been able to generate a functional tagged ABHD2. We generated GFP-tagged ABHD2 at both the N- and C-terminus but these constructs where not functional in terms of their ability to rescue ABHD2 knockdown. This prevented us from testing the association dynamics between ABHD2 and mPR.
  
  Regarding whether ABHD2 in the oocyte directly binds P4 or not, we had in the initial submission no data directly supporting this rather we based the cartoon in Fig. 6J on the findings from Miller et al. (Science 2016) who showed that ABHD2 in sperm binds biotinylated P4. With the use of a new expression system to produce ABHD2 in vitro (please see below) we were able to try the experiment suggested by the Reviewer. In vitro expressed ABHD2 was incubated with biotinylated P4, and binding tested on a streptavidin column. Under these conditions we could not detect any specific binding of P4 to ABHD2, however, these experiments remain somewhat preliminary and would require validation using additional approaches to conclusively test whether Xenopus ABHD2 binds P4 or not. The discrepancy with the Miller et al. findings could be species specific as they tested mammalian ABHD2.
  
  (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2B and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a.
  
  We believe the Reviewer meant to indicate discrepancies between Fig. 2E (not 2B) and Supp. Fig. 2C. Indeed, the Reviewer is correct, and this is because Fig. 2E shows pooled normalized data on a per PG species and frog, whereas Supp. Fig. 2E shows and example of absolute raw levels from a single frog to illustrate the relative basal abundance of the different PG species. We had failed to clarify this in the Supp. Fig. 2E figure legend, which we have now added in the revised manuscript. So, the discrepancies are due to variation between different donor animals which is highlighted in Supp. Fig. 2A. Furthermore, to minimize confusion, in the revised manuscript we revised Supp. Fig. 2C to show only PG levels at rest, to illustrate basal levels of the different PG species relative to each other, which is the goal of this supplemental figure.
  
  (3) Although they propose PGs, LPA, and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.
  
  We agree conceptually with the Reviewer that identifying the details of the signaling of the different GPCRs involved in oocyte maturation would be interesting. However, our lipidomic data argue that the activation of a PLA2 early in the maturation process in response to P4 leads to the production of multiple lipid messengers that would activate GPCRs and branch out the signaling pathway to activate various pathways required for the proper and timely progression of oocyte maturation. Preparing the egg for fertilization is complex; so, it is not surprising that a variety of pathways are activated simultaneously to properly initiate both cytoplasmic and nuclear maturation to transition the egg from its meiotic arrest state to be ready to support the rapid growth during early embryogenesis. We focus on the S1P signaling pathway specifically because, as pointed out by the Reviewer, we could not detect an increase in S1P even though our metabolomic data collectively argued for an increase. Our results on the S1P pathway -as well as a plethora of other studies historically in the literature that we allude to in the manuscript- argue that these different GPCRs support and regulate oocyte maturation, but they are not essential for the early maturation signaling pathway. For example, for S1P, as shown in Figure 4, the delay/inhibition of oocyte maturation due to S1PR3 knockdown can be reversed at high levels of P4, which presumably leads to higher levels of other lipid mediators that would bypass the need for signaling through S1PR3. This is reminiscent of the kinase cascade driving oocyte maturation where there is significant redundancy and feedback regulation. Therefore, analyzing each receptor subtype that may regulate the different PG species, LPA, and S1P would be a tedious and time-consuming undertaking that goes beyond the scope of the current manuscript. More importantly based on the above arguments, we suggest that findings from such an analysis, similar to the conclusions from the S1PR3 studies (Fig. 4), would show a modulatory role on oocyte maturation rather than a core requirement for the maturation process as observed with mPR and ABHD2. Thus they would provide relatively little insights into the core signaling pathway driving P4-mediated oocyte maturation.
  
  Reviewer 2:
  
  (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype since they both should sequester the AS oligo? Maybe I'm missing something here.
  
  For the ABHD2 rescue experiment, the ABHD2 constructs (S or L) were expressed 48 hrs before the antisense was injected. The experiment was conducted in this way to avoid the potential confounding issue of both constructs sequestering the antisense. The assumption is that the injected RNA after protein expression would be degraded thus allowing the injected antisense to target endogenous ABHD2. The idea is to confirm that ABHD2.S expression alone is sufficient to rescue the antisense knockdown as confirmed experimentally.
  
  However, to further confirm the rescue, we performed the experiment in a different chronological order, where we started with injecting the antisense to knock down endogenous ABHD2 and this was followed 24 hrs later by expressing wild type ABHD2.S. As shown in Author response image 1 this also rescues the knockdown.
  
  Author response image 1.
  
  ABHD2 knockdown and rescue. Oocytes were injected with control antisense (Ctrl AS) or specific ABHD2 antisense (AS) oligonucleotides and incubated at 18 oC for 24 hours. Oocytes were then injected with mRNA to overexpress ABHD.S for 48 hours and then treated with P4 overnight. The histogram shows % GVBD in naïve, oocytes injected with control or ABHD2 antisense with or without mRNA to overexpress ABHD2.S.
  
  In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.
  
  The dose response of ABHD2 protein expression in correlation with rescue of the ABHD2 knockdown is shown indirectly in Figure 1I and 1J. In experiments ABHD2 knockdown was rescued using either the WT protein or two mutants (H120A and N125A). All three constructs rescued ABHD2 KD with equal efficiency (Fig. 1I), eventhough their expression levels varied (Fig. 1J). The WT protein was expressed at significantly higher levels than both mutants, and N125A was expressed at higher levels than H120A (Fig. 1J), note the similar tubulin loading control. Crude estimation of the WBs argues for the WT protein expression being ~3x that of H120A and ~2x that of N125A, yet all three have similar rescue of the ABHD2 knockdown (Fig. 1I). This argues that low levels of ABHD2 expression is sufficient to rescue the knockdown, consistent with the catalytic enzymatic nature of the ABHD2 PLA2 activity.
  
  Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.
  
  The n reflects the number of independent female frogs. We have added this information to the figure legends. For each donor frog at each time point 10-30 oocytes were used.
  
  (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.
  
  [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. The 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more informative bar graphs or scatter plots.]
  
  We have revised the lipidomics data as requested by the Reviewer. The Reviewer asked that we show the data as a time course with each individual frog as in Fig. 2E. This turns out to be confusing and not a good way to present the data (please see Author response image 2).
  
  Author response image 2.
  
  Metabolite levels from 5 replicates of 10 oocytes each at each time point were measured and averaged per frog and per time point. Fold change was measured as the ratio at the 5- and 30-min time points relative to untreated oocytes (T0). FCs that are not statistically significant are shown as faded. Oocytes with mPR knockdown (KD) are boxed in green and ABHD2-KD in purple.
  
  We therefore revised the metabolomics data as follow to improve clarity. The changes in the glycerophospholipids and sphingolipids determined on the Metabolon CLP platform (specific for lipids) are now shown as single metabolites clustered at the levels of species and pathways and arranged for the 5- and 30-min time points sequentially on the same heatmap as requested (Fig. 2B). This allows for a quick visual overview of the data that clearly shows the decrease in the lipid species following P4 treatment in the control oocytes and not in the mPR-KD or ABHD2-KD cells (Fig. 2B). The individual species are listed in Supplemental Tables 1 and 2. We also revised the Supplemental Tables to include the values for the non-significant changes, which were omitted from the previous submission.
  
  We revised the metabolomics data from the HD4 platform in a similar fashion but because the lipid data were complimentary and less extensive than those from the CLP platform, we moved that heatmap to Supplemental Fig. 2B.
  
  For the single oocyte metabolomics, we now show the data as the correlation between FC and p value, which clearly shows the upregulated (including LPA) and downregulated metabolites at T30 relative to T0 (Fig. 2C). The raw data is now shown in a new Supplemental Table 7.
  
  (3) The reticulocyte lysate co-expression data are quite important and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here - I'm not a protein expression expert - but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?
  
  We thank the Reviewer for this insightful comment. We agree that this is a critical result that would benefit from cross validation, especially given the low level of PLA2 activity detected in the reticulocyte lysate expression system. We have therefore expanded these studies using another in vitro expression system with microsomal membranes based on tobacco extracts (ALiCE®Cell-Free Protein Synthesis System, Sigma Aldrich) to enhance production and stability of the expressed receptors as suggested by the Reviewer. We further prepared virus-like particles (VLPs) from cells expressing each receptor individually or both receptors together. We however could not detect any PLA2 activity from the VLPs. We thus focused on the coupled in vitro transcription/translation tobacco extracts that allow the expression of difficult-to-produce membrane proteins in microsomes. This kit targets membrane protein directly to microsomes using a microsome targeting melittin signal peptide. This system took significant time and effort to troubleshoot and adapt to mPR and ABHD2 expression. We were however ultimately able to produce significantly higher amounts of both ABHD2 and mPRb, which were readily detected by WBs (Supplemental Fig. 4I). In contrast, we could not reliably detect mPR or ABHD2 using WBs from reticulocyte lysates given the limited amounts produced.
  
  Similarly to our previous findings with proteins produced in reticulocytes, expression of ABHD2 or mPRβ alone was not associated with an increase in PLA2 activity over a two-hour incubation period (Fig. 5C). It is worth noting here that the tobacco lysates had high endogenous PLA2 activity. However, co-expression of both mPRb and ABHD2 produced robust PLA2 activity that was significantly higher than that detected in reticulocyte lysate system (Fig. 5C). Surprisingly, however this PLA2 activity was P4 independent as it was observed when both receptors are co-expressed in the absence of P4.
  
  These results validate our earlier conclusion that PLA2 activity requires both mPR and ABHD2, so their interaction in needed for enzymatic activity. It is interesting however that in the tobacco expression system this mPR-ABHD2 PLA2 activity becomes for the most part P4 independent. As the tobacco expression system forces both ABHD2 and mPR into microsomes using a signal sequence, the two receptors are enriched in the same vesicular compartment. As they can interact independently of P4 as shown in the co-IP experiments in immature oocytes (Fig. 5D), their forced co-expression in the same microsomal compartment could lead to their association and thus PLA2 activity. This is an attractive possibility that fits the current data, but would need independent validation.
  
  Reviewer 3:
  
  There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.
  
  For the inhibitors used we performed a full dose response to define the active concentrations. So, inhibitors were not used at one high dose. We then compared the EC50 for each active inhibitor to the reported EC50 in the literature (Table 1). The inhibitors were deemed effective only if they inhibited oocyte maturation within the range reported in the literature. This despite the fact that frog oocytes are notorious in requiring higher concentrations of drug given their high lipophilic yolk content, which acts as a sponge for drugs. So our criteria for an effective inhibitor are rather stringent.
  
  Based on these criteria, only 3 inhibitors were ‘effective’ in inhibiting oocyte maturation: Ibuprofen, ACA and MP-A08 with relative IC50s to those reported in the literature of 0.7, 1.1, and 1.6 respectively. Ibuprofen targets Cox enzymes, which produce prostaglandins. We independently confirmed an increase in PGs in response to P4 in oocytes thus validating the drug inhibitory effect. ACA blocks PLA2 and inhibits maturation, a role supported by the metabolomics analyses that shows decrease in the PE/PE/LPE/LPC species; and by the ABHD2-mPR PLA2 activity following in vitro expression. Finally, MP-A08 blocks sphingosine kinase activity, which role is supported by the metabolomics showing a decrease in sphingosine levels in response to P4; and our functional studies validating a role for the S1P receptor 3 in oocyte maturation.
  
  As pointed out by the Reviewer, other inhibitors did block maturation at very high concentration, but we do not consider these as effective and have not implicated the blocked enzymes in the early steps of oocyte maturation. To clarify this point, we edited the summary panel (now Fig. 2D) to simplify it and highlight the inhibitors with an effect in the reported range in red and those that don’t inhibit based on the above criteria in grey. Those with intermediate effects are shown in pink. We hope these edits clarify the inhibitors studies.
  
  Recommendations For the Authors
  
  Reviewer 2:
  
  (1) Introduction, para 1. Please change "mPRs mediated" to "mPR-mediated".
  
  Done
  
  (2) Introduction, para 2. Please change "cyclin b" to "cyclin B".
  
  Done
  
  (3) Introduction, para 2. Please change "that serves" to "which serves".
  
  Done
  
  (4) Introduction, para 4. I know that the authors have published evidence that "a global decrease in cAMP levels is not detectable" (2016), but old work from Maller and Krebs (JBC 1979) did see an early, transient decrease after P4 treatment, and subsequent work from Maller said that there was both a decrease in adenylyl cyclase activity and an increase in cAMP activity. Perhaps it would be better to say something like "early work showed a transitory drop in cAMP activity within 1 min of P4 treatment (Maller), although later studies failed to detect this drop and showed that P4-dependent maturation proceeds even when cAMP is high (25)".
  
  We agree and thank the Reviewer for this recommendation. The text was revised accordingly.
  
  (5) Results, para 1. Based on the results in Fig 1B, one should probably not assert that ABHD2 is expressed "at levels similar to those of mPRβ in the oocyte"-with different mRNAs and different PCR primers, it's hard to say whether they are similar or not. The RNAseq data from Xenbase in Supp Fig 1 supports the idea that the ABHD2 and mPRβ mRNAs are expressed at similar levels at the message level, although of course mRNA levels and protein levels do not correlate well when different gene products are compared (Wuhr's 2014 Curr Biol paper reported correlation coefficients of about 0.3).
  
  We agree and have changed the text as follow to specifically point out to RNA: “we confirmed that ABHD2 RNA is expressed in the oocyte at levels similar to those of mPRβ RNA (Fig. 1B).”
  
  (6) Results, para 2. It would be worth pointing out that since an 18 h incubation with microinjected antisense oligos was sufficient to substantially knock down both the ABHD2 mRNAs (Fig 1C) and the ectopically-expressed proteins (Fig 1D), the mRNA and protein half-lives must be fairly short, on the order of a few hours or less.
  
  Done
  
  (7) Figure 1. Please make the western blots (especially Fig 1D) and their labeling larger. These are key results and as it stands the labeling is virtually unreadable on printed copies of the figures. I'm not sure about eLife's policy, but many journals want the text in figures to be no smaller than 5-7 points at 100% size.
  
  Likewise for many of the western blots in subsequent figures.
  
  As requested by the Reviewer we have increased the font and size of all Western blots in the Figures.
  
  (8) Figure 1E, G. I am not sure one should compare the effectiveness of the ABHD2 rescue (Fig 1E) and the mPRβ rescue (Fig 1G). Even if these were oocytes from the same frog, we do not know how the levels of the overexpressed ABHD2 and mPRβ proteins compare. E.g. maybe ABHD2 was highly overexpressed and mPRβ was overexpressed by a tiny amount.
  
  Although this is a possibility, the expression levels of the proteins here is not of much concern because we previously showed that mPRβ expression effectively rescues mPRβ antisense knockdown which inhibits maturation (please see (Nader et al., 2020)). This argues that at the levels of mRNA injected mPR is functional to support maturation, yet it does not rescue ABHD2 knockdown to the same levels (Fig. 1G). With that it is fair to argue that mPRβ is not as effective at rescuing ABHD2 KD maturation.
  
  (9) Inhibitor studies: There are two likely problems in comparing the observed potencies with legacy data - in vitro vs in vivo data and frog vs. mammalian data. Please make it clear what is being compared to what when you are comparing legacy data.
  
  The legacy data are from the literature based on the early studies that defined the IC50 for inhibition primarily using in vivo models (cell line mostly) but not oocytes. Typically, frog oocytes require significantly higher concentrations of inhibitors to mediate their effect because of the high lipophilic yolk content which acts as a sponge for some drugs. So, the fact that the drugs that are effective in inhibiting oocyte maturation (ACA, MP-A08, and Ibuprofen) work in a similar or lower concentration range to the published IC<sub50</sub> gives us confidence as to the specificity of their effect. We have revised Table 1 to include the reference for each IC<sub50</sub> value from the literature to allow the reader to judge the exact model and context used.
  
  (10) Isn't it surprising that Gas seems to promote maturation, given the Maller data (and data from others) that cAMP and PKA oppose maturation (see also the authors' own Fig 1A) and the authors' previous data sees no positive effect (minor point 7 above)?
  
  We show that a specific Gas inhibitor NF-449 inhibits maturation (although at relatively high concentrations), which is consistent with a positive role for Gas in oocyte maturation. We argue based on the lipidomics data and the inhibitors data that GPCRs play a modulatory role and not a central early signaling role in terms of releasing oocyte meiotic arrest. They are likely to have effects on the full maturation of the egg in preparation for embryonic development. The actions of the multiple lipid messengers generated downstream of mPRβ activation are likely to act through GPCRs and could signal through Gas or other Ga or even through Gβγ. Minor point 7 refers to the size of Western blots.
  
  (11) Page 9, bottom: "...one would predict activation of sphingosine kinases...." Couldn't it just be the activity of some constitutively active sphingosine kinase? Maybe replace "activation" with "activity".
  
  A constitutively sphingosine kinase activity would not make sense as it needs to be activated by P4.
  
  (12) Sometimes the authors refer to concentrations in molar units plus a power of 10 (e.g. 10-5 M) and sometime in µM or nM, sometimes even within the same paragraph. This makes it unnecessarily difficult to compare. Please keep consistent.
  
  We replaced all the concentrations through the text to M with scientific notation for consistency as requested by the Reviewer.
  
  (13) Fig 3I: "Sphingosine kinase" is misspelled.
  
  This has been corrected. We thank the Reviewer for catching it.
  
  (14) Legend to Fig. 5: Please change "after P4 treatment in reticulocytes" to "after P4 treatment in reticulocyte lysates".
  
  Done
  
  (15) Fig 6J. Doesn't the MAPK cascade inhibit MYT1? I.e. shouldn't the arrow be -| rather than ->?
  
  Yes the Reviewer is correct. This has been changed. We thank the Reviewer for noticing this error.
  
  (16) Materials and Methods, second paragraph. Please change "inhibitor's studies" to "inhibitor studies".
  
  Corrected thanks.
  
  (17) Table 1: Please be consistent in how you write Cox-2.
  
  Done.
  
  Reviewer #3:
  
  The findings are of potential broad interest, but I have some concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. Importantly, several claims regarding lipid metabolism signaling in the context of oocyte maturation are made without critical validation that the intended target is inactivated with reasonable selectivity across the proteome. Several of the inhibitors used for pharmacology and metabolomics are known covalent inhibitors (JZL184 and MJN110) that can readily bind additional lipases depending on the treatment time and concentration.
  
  I did not find any data using the reported ABHD2 inhibitor (compound 183; PMID: 31525885). Is there a reason not to include this compound to complement the knockdown studies? I believe this is an important control given that not all lipid effects were reversed with ABHD2 knockdown. The proper target engagement and selectivity studies should be performed with this ABHD2 inhibitor.
  
  We obtained aliquots the reported ABHD2 inhibitor compound 183 from Dr. Van Der Stelt and tested its effect on oocyte maturation at 10<sup>-4</sup>M using both low (10<sup>-7</sup>M) or high (10<sup>-5</sup>M) P4 concentration. Compound 183 partially inhibited P4-mediated oocyte maturation. The new data was added to the manuscript as Supplemental Figure 3D.
  
  Additional comments:
  
  (1) Pristimerin was tested at low P4 concentration for effects on oocyte maturation. Authors should also test JZL184 and MJN110 under this experimental paradigm.
  
  We have tested the effect of high concentration (2.10-<sup>-5</sup>M) of JZL184 or MJN110 on oocyte maturation at low P4 concentration (Author response image 3). MJN 110 did not have a prominent effect on oocyte maturation at low P4, whereas JZL184 inhibited maturation by 50%. However, this inhibition of maturation required concentrations of JZL 184 that are 10 times higher than those reported in rat and human cells (Cui et al., 2016; Smith et al., 2015), arguing against an important role for a monoacylglycerol enzymatic activity in inducing oocyte maturation.
  
  Author response image 3.
  
  The effect of MJN110 and JZL184 compounds on oocyte maturation at low P4 concentration. Oocytes were pre-treated for 2 hours with the vehicle or with the highest concentration of 2.10-<sup>-5</sup> M for both JZL184 or MJN110, followed by overnight treatment with P4 at 10-<sup>7</sup>M. Oocyte maturation was measured as % GVBD normalized to control oocytes (treated with vehicle) (mean + SEM; n = 2 independent female frogs for each compound).
  
  2) Figure 4A showed different ct values of ODC between Oocytes and spleen, please explain them in the text. There is not any description regarding spleen information in Figure 4A, please make it clear in the text.
  
  We thank the Reviewer for this recommendation. The text was revised accordingly.
  
  (3) For Figures 3A, E, and I, there are different concentration settings for comparing the activity, is it possible to get the curves based on the same set of concentrations? The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect. Please set more concentration points to improve the figures. And for the error bar, there are different display formats like Figure 4c and 4d, etc. Please uniform the format for all the figures. Additionally, for the ctrl. or veh., please add an error bar for all figures.
  
  Some of the drugs tested were toxic to oocytes at high concentrations so the dose response was adjusted accordingly. The graphs were plotted to encompass the entire tested dose response. We could have plotted the data on the same x-axis range but that would make the figures uneven and awkward.
  
  We are not clear what the Reviewer means by “The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect.”
  
  The error bars for all dose responses are consistent throughout all the Figures. They are different from those on bar graphs to improve clarity. If the Reviewer wishes to have the error bars on the bar graphs and dose response the same, we are happy to do so.
  
  For the inhibitor studies the data were normalized on a per frog basis to control for variability in the maturation rate in response to P4, which varies from frog to frog. It is thus not possible to add error bars for the controls.
  
  (4) Please check the sentence "However, the concentration of HA130...... higher that......'; Change "IC50" to "IC50" in the text and tables. Table 1 lists IC50 values in the literature, but the references are not cited. Please include the references properly. For the IC50 value obtained in the research, please include the standard deviation in the table. For reference parts, Ref 1, 27, 32, 46, doublecheck the title format.
  
  We edited the sentence as follows to be more clear: “However, this inhibition of maturation required high concentrations of HA130 -at least 3 orders of magnitude higher that the reported HA130 IC<sub>50</sub>-…”
  
  We changed IC50 to subscript in Table 1.
  
  We added the relevant references in Table 1 to provide context for the cited IC50 values for the different inhibitors used.
  
  We added SEM to the IC<sub>50</sub> for inhibition of oocyte maturation values in Table 1.
  
  We checked the titles on the mentioned references and cannot identify any problems.
  
  References
  
  Cui, Y., Prokin, I., Xu, H., Delord, B., Genet, S., Venance, L., and Berry, H. (2016). Endocannabinoid dynamics gate spike-timing dependent depression and potentiation. eLife 5, e13185.
  
  Nader, N., Dib, M., Hodeify, R., Courjaret, R., Elmi, A., Hammad, A.S., Dey, R., Huang, X.Y., and Machaca, K. (2020). Membrane progesterone receptor induces meiosis in Xenopus oocytes through endocytosis into signaling endosomes and interaction with APPL1 and Akt2. PLoS Biol 18, e3000901.
  
  Smith, M., Wilson, R., O'Brien, S., Tufarelli, C., Anderson, S.I., and O'Sullivan, S.E. (2015). The Effects of the Endocannabinoids Anandamide and 2-Arachidonoylglycerol on Human Osteoblast Proliferation and Differentiation. PloS one 10, e0136546.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.09.556646v3
www.biorxiv.org www.biorxiv.org

Peptidoglycan-tethered and free forms of the Braun lipoprotein are in dynamic equilibrium in Escherichia coli

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Weaknesses:
  
  - Only one mutant (YafK) is used to make the conclusion.
  
  The aim of the study is to determine the effect of the hydrolysis of the PG→Lpp bond on the dynamics of the tethering of Lpp to PG. Since YafK is the only enzyme catalyzing this reaction, it is appropriate to compare the wild-type strain to an isogenic yafK deletion mutant. Nonetheless, we carefully consider this comment and will investigate the dynamics of the tethering of Lpp to PG in mutants deficient in the production of the L,D-transpeptidases responsible for tethering Lpp to PG.
  
  Additional kinetic analyses were performed on strains relying on a single L,D-transpeptidase for LPP tethering to PG. Escherichia coli produces three L,D-transpeptidases catalyzing the tethering of LPP to PG (Ybis, YcfS, and ErfK). The corresponding genes were deleted from the chromosome of strain BW25113, thus generating strain BW25113Δ3. Plasmids encoding each one of these three enzymes were independently introduced in BW25113Δ3. Qualitatively, LC-MS analyses revealed similar kinetics for the four Tri-KR isotopologues purified from wild-type strain BW25113 and from the three BW25113Δ3 derivatives producing a single plasmidencoded L,D-transpeptidase (Ybis, YcfS, or ErfK) under the control of a rhamnose inducible promoter (Prha) of plasmid pHV30 (Voedts et al. EMBO J. 2021 40:e108126, doi: 10.15252/embj.2021108126) (see panel A in figure 1 below). Briefly, and as indicated in the first version of the main text, the old→new Tri→KR isotopologue was first synthesized. The new→new isotopologue was not detected 5 min after the medium switch. These results indicate that the newly-synthesized PG disaccharidepeptide subunits and Lpp are independently incorporated into the expanding PG polymer. The proportion of the new→old isotopologue exceeded that of the old→new isotopologue at around 40 min (for the strain producing ErfK) or 20 min (for the strains producing Ybis or YcfS). This is the hallmark of the activity of the YafK hydrolase that liberates existing (old) Lpp that can be tethered to newly synthesized disaccharide-peptide subunit thereby generating the new→old isotopologue. In absence of the YafK hydrolase, the relative proportion of the new→old isotopologue is lower since this isotopologue can only result from the tethering of the preexisting free forms of Lpp to newly synthesized disaccharide-peptide units. The contribution of YafK to variations in the relative abundance of the four isotopologues was also investigated by combining the relative abundance of isotopologues containing either old versus new KR (panel B) or old versus new PG stem peptide (panel C) moieties. As discussed in the first version of the manuscript for strains BW25113 and BW25113ΔyafK, this analysis revealed that the existing (old) disaccharide-tripeptide moieties in the Tri→RK isotopologues disappears more rapidly than the existing (old) KR moieties due to the hydrolysis of the old→old Tri-KR isotopologue by YafK. These results indicate that the mode of tethering of Lpp to PG and the dynamic equilibrium between the PG-tethered and free forms of Lpp are similar for the Ybis, YcfS, and ErfK L,D-transpeptidases. Quantitatively, we also noticed that the overall decrease in the relative abundance of all Tri→KR isotopologues containing existing (old) moieties was slower for the strains producing only ErfK, Ybis, or YcfS than for the wild type and ΔyafK strains. This could be accounted for by an increase in the generation time of the former group of three strains. This is a limitation of our study because it precludes the comparison of the evolution of a particular isotopologue in several strains, as performed in Fig. 3 for strains BW25113 and BW25113ΔyafK. For this reason, we prefer to present these data in the rebuttal rather than in the manuscript. Indeed, presentation of the data in the main text would require introducing a new mode of presentation of the data (variations in the relative abundance of all four isotopologues in the same strain; see figure below) in addition to variations of the relative abundance of any one of the four isotopologues between strains (Fig. 3). Introduction of this additional mode of presentation of the data would complicate the initial manuscript in an unnecessary manner because the data obtained with mutants producing a single L,D-transpeptidase (ErfK, YbiS, or YcfS) confirmed the data obtained with the wild-type strains producing the three L,D-transpeptidases.
  
  Author response image 1.
  
  MS-based kinetic analysis of Lpp tethering to PG.
  
  -Time points to analyse Tri-KR isotopologues in Wt (0,10,20,40,60 min) and yafK mutant (0,15, 25, 40, 60 min) are not the same.
  
  The purpose of the experiments is to compare the kinetics of formation and hydrolysis of the PG→Lpp bond in the WT versus ΔyafK strains. Comparison of the kinetics is therefore possible even though the kinetics are not based on the exact same time points. Nonetheless, we will reproduce the kinetics experiment (see also answers to Reviewer 2) and use the same time points in these additional experiments.
  
  We have performed additional analyses to provide kinetic data for at least three biological repeats and for the same periods of incubation after the medium switch (0, 10, 20, 40, and 60 min). The full set of data, including means and standard deviations, appear in the additional Table S1. We have also updated Fig. 3 with the means calculated with these additional values. The conclusions of the first version of the manuscript are fully supported by the additional data requested by the reviewer. We have also revised Fig. 4 based on the full set of data appearing in Table S2.
  
  Reviewer #2 (Public Review):
  
  Weaknesses:
  
  - However, the authors make a few other conclusions from their data which are harder to understand the logic of, or to feel confident in based on the existing data. They claim that their 5-time point kinetic data indicates that new lpp is not substantially added to lipidII before it is added to the peptidoglycan, and that instead lpp is attached primarily to old peptidoglycan. I believe that this conclusion comes from the comparison of Fig.s 3A and 3C, where it appears that new lpp is added to old peptidoglycan a few minutes before new lpp is added to new peptidoglycan. However, the very small difference in the timing of this result, the minimal number of time points and the complete lack of any presentation of calculated error in any of the data make this conclusion very tenuous. In addition, the authors conclude that lpp is not significantly attached to septal peptidoglycan. The logic behind this conclusion appears to be based on the same data, but the authors do not provide a quantitative model to support this idea.
  
  The reviewer is correct in stating that we claim that Lpp is not substantially added to lipid II before incorporation of the disaccharide-pentapeptide subunit into the expanding PG network. This conclusion is based on the paucity of PG-Lpp covalent adducts containing light PG and Lpp moieties at the earliest time points. To substantiate more thoroughly this finding, we will reproduce the kinetic experiments with more early time points. The paucity of the new→new PG-Lpp isotopologues also implies that Lpp might not be extensively tethered to septal peptidoglycan since the latter is assembled from newly synthesized PG (see our previous publication Atze et al. 2021 and references therein). Quantitatively, septal synthesis roughly accounts for one third of the total PG synthesis. It is therefore expected that tethering of Lpp to septal PG would represent one third of the total number of newly synthesized Lpp molecules tethered to PG. We therefore proposed that the paucity of new→new PG- Lpp isotopologues at early time points of the kinetics implies that Lpp is preferentially tethered to the side wall. This is only one of several conclusions that we reach in the present study and we were very careful in the wording of our results.
  
  We would first like to stress that our claim that Lpp is primarily attached to old peptidoglycan rather than to lipid II is indeed supported by the results presented in the first version of the manuscript. In fact, the opposite mechanism, i.e. Lpp linking to Lipid II, as established for the linking of proteins to PG by sortases in Gram-positive bacteria, would result in the exclusive tethering of newly synthesized Lpp to newly synthesized PG stems (Fig. 3). This is clearly not the case since the new→new isotopologues are present in small amounts 10 min after the medium switch and are not detectable at 5 min (data appearing in Table S1 and new mass spectra added to Supplementary file 1). Instead, our data indicate that newly synthesized Lpp is tethered to existing PG. Thus, the relevant comparison is not the absolute value of the delay in the appearance of isotopologues in Figs 3A and 3C, as suggested by the reviewer. Rather, the relevant comparison should take into consideration these two following modes of Lpp tethering to PG: (i) tethering Lpp to Lipid II versus (ii) tethering of Lpp to existing PG independently from insertion of new subunits into the expanding PG. The former mode implies the exclusive formation of new→new isotopologues, which were not detected at early time points. The latter mode implies the prevalent formation of old→new isotopologues that were indeed preponderant at early time-points. Thus, our analysis clearly eliminates the first mode of Lpp tethering to PG (tethering of Lpp to Lipid II) and validates the second one (tethering of Lpp to existing PG). As stated in our answers to reviewer 1, we have generated additional repeats and the full set of data, including means and SD values, appears in the additional Supplementary Tables S1 and S2.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  -All major reactions catalysed by L,D-transpeptidases must be studied using the labeling-mass spec technique and compared with YafK to strengthen the conclusions.
  
  As described above (Figure 1), we explored the dynamics of Lpp tethering in mutants producing a single L,D-transpeptidase.
  
  -Experiments on the effect of YafK on the bacterial envelope and production of vesicles should be concluded to support the claims.
  
  We have analyzed the extent of outer membrane vesicle (OMV) formation both in the wild type strain and in each one of the mutant strains characterized in this study by using a procedure described in detail in one of our previous publications (Hugonneau-Beaufet et al. Microbiol Spectr. 2023 11:e0521722, doi: 10.1128/spectrum.05217-22). Figure 2 below shows that loss of Lpp or of its tethering to PG, following deletion of genes encoding L,D-transpeptidases ErfK, YbiS, and YcfS, results in the formation of OMVs as revealed by the presence of the maltose-binding protein (MBP, 42 kDa) in the corresponding spare culture medium (as detected by immunoblotting). The RNA polymerase subunit RpoA (36 kDa), used as a control, was not detected in these spare culture media, indicating that loss of either Lpp alone or of ErfK, YbiS, and YcfS together was not associated with bacterial lysis. This analysis also showed that production of ErfK, YbiS, or YcfS alone was sufficient to prevent formation of OMVs. Finally, deletion of YafK, as expected, did not lead to OMV formation. These confirmatory results are out of the scope of the manuscript that focuses on the dynamics of Lpp tethering to PG rather than on the role of that tethering in the envelope stability.
  
  Author response image 2.
  
  Figure 2. Immuno-detection of OMV formation.
  
  Reviewer #2 (Recommendations For The Authors):
  
  - Why so much background about previous results in the abstract? Previous results don't seem required for understanding the description of new results here. Maybe put a sentence about importance at the end, instead.
  
  The background information is important for two reasons. First, because it is important to stress that the method used to determine the structure and dynamics of the isotopologues is novel and has been validated in various ways, including the modeling of isotopic clusters, in a previous study (https://doi.org/10.7554/eLife.72863). Since the current study is an extension of this previous report it is relevant to introduce the type of information that can be obtained by this approach. Second, because it is also important to stress that kinetic analyses have been previously reported for the incorporation of disaccharide-peptide units into the expanding peptidoglycan (https://doi.org/10.7554/eLife.72863). In the current study, we focused on the mode of Lpp-to-PG tethering in the context of PG expansion that thus had to be introduced.
  
  - Abstract: tethering of lpp to septal pg is limited by what? Limited to what? Wording not clear.
  
  The unclear sentence has been rephrased. Revised version “Newly synthesized septum PG appears to contain small amounts of tethered Lpp.”
  
  - The figure legend for fig 1b - I only see one red double arrow?
  
  Black double arrows indicate the position of glycosidic bonds cleaved by the muramidases. Their size was increased so that they appear more distinctly in the image.
  
  - Fig 3 and Fig 4- these should be shown with error.
  
  The full set of data with means and standard deviations appear in Supplementary Tables S1 and S2.
  
  - This new-> old, old-> new annotation is confusing. Is the PG fragment or the lpp old or new? Are you distinguishing between which part is old and new by the ordering? Or, could either the PG fragment or the lpp be old to be annotated as old-> new? I think you are trying to explain it in the figure 3CD legend, but it could be presented more clearly. When you say respectively, do you mean that old->new means old muropeptide, new lpp? And new-> old means new muropeptide and old lpp? Why not just use the same annotation system you use in fig 2? Or, use subscripts to indicate old and new?.
  
  The designation of isotopologues is correct and adequate to designate the products of transpeptidation catalyzed both by PBPs and L,D-transpeptidases. This nomenclature of transpeptidation products has been introduced in the 70s (see Schleifer and Kandler 1972 Bacteriological Reviews 36:407-477). In this bond designation, the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond. For the Tri→KR isotopologues, the peptide stem acts as the acyl donor whereas Lpp acts as the acyl acceptor. There is therefore no ambiguity in the annotation. This also applies to the old→new-type annotation, old (existing) PG stem linked to new (neosynthesized) Lpp. In the figures, we used a color code to identify old (red) and new (purple) in the Tri→KR moieties. Since a color code cannot be used in the main text, we used the old→new-type of annotation. A sentence has been added at the end of the legend to Fig. 1b to introduce this nomenclature “Please note that we used the standard nomenclature for transpeptidation products in which the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond”.
  
  - Pg 5 - first paragraph. I'm struggling with the logic of your conclusion that lpp is not attached to lipid II - it seems that this conclusion is based on the timing of the appearance of the hybrid isotopes. You say you would expect the new-new ones to appear quickly, but how quickly would you expect that, and why? You do see new-new ones appearing fairly quicky, in 20 minutes, so I don't understand the logic of why that timing excludes the lipidII modification model. Please elaborate further.
  
  See answer above to reviewer 2 and analysis of samples collected shortly after the medium switch (Table S1). See also the revised version of Supplementary file 1 that shows mass spectra for peptidoglycan extracted 5 min after the medium switch.
  
  - The conclusion about tethering of lpp to septal PG also appears to be somewhat tenuous, which the authors concede when then use the word "might" in the section of the results. However, the language in the abstract is more definitive. Please tone down the language in the abstract, or provide more evidence to support this conclusion. At the least, you could add a little discussion of the numbers. At a given time in mixed culture, how much PG is being constructed at the septum? How does that percentage line up with the rate of PG label loss vs the rate of lpp label loss?
  
  - Pg 5, bottom paragraph. I don't know what you mean by "there was no loss of old->old in the ∆yafK strains, " when you just a sentence above described the decrease.
  
  The data of the MS analyses are presented as the relative abundance of isotopologues. If the old→old Tri→KR isotopologue present at the medium shift were not hydrolyzed by YafK, its absolute amount would remain constant over time. However, the relative abundance of the old→old isotopologue decreases by 50% in one generation because the total amount of the Tri→KR muropeptide doubles in one generation (as any of the bacterial constituents). In Fig. 3B, we indeed observed that the relative amount of old→old isotopologue is about 50% after one generation in the ΔyafK mutant indicating the persistence of the isotopologue. In contrast, production of YafK in the strain BW25113 results in lower abundance of this isotopologue (in the order of 90%).
  
  To better explicit the concept we expanded the reasoning in the relevant paragraph of the revised version of the manuscript.
  
  - Pg 6 - I don't understand how you are drawing a conclusion about the proteolytic degradation of lpp from these data. Please clarify your reasoning.
  
  In the analysis presented in Fig. 4, we investigated the relative abundance of old and new Lpp based on the relative abundance of old and new KR moieties in all four Tri-KR isotopologues. As stated in the preceding answer, the relative abundance of KR moieties should be 50% after one generation if no degradation of Lpp occurs. This is observed both for BW25113 (Fig. 4A) and for the ΔyafK mutant (Fig. 4B), thus supporting our claim that Lpp is not degraded. In contrast, the relative abundance of the old Tri moiety is lower than 50% for the wild type strain (Fig. 4C) but not for the ΔyafK mutant (Fig. 4D). This reflects the fact that YafK hydrolyzes the PG-Lpp bond and that Lpp released by this reaction can be cross-linked to neo-synthesized PG stems. Please note that, in this reaction, the substrate is a tetrapeptide donor stem (Fig. 1C).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.26.559517v2
www.biorxiv.org www.biorxiv.org

Sperm fertility restoration in mice suffering from oligo-astheno-teratozoospermia by in vivo injection and electroporation of naked mRNA

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The authors assess the effectiveness of electroporating mRNA into male germ cells to rescue the expression of proteins required for spermatogenesis progression in individuals where these proteins are mutated or depleted. To set up the methodology, they first evaluated the expression of reporter proteins in wild-type mice, which showed expression in germ cells for over two weeks. Then, they attempted to recover fertility in a model of late spermatogenesis arrest that produces immotile sperm. By electroporating the mutated protein, the authors recovered the motility of ~5% of the sperm, although the sperm regenerated was not able to produce offspring using IVF.
  
  We actually did not write that “sperm regenerated was not able to produce offspring using IVF” but rather that IVF was not attempted because the number of rescued sperm was too low. To address this important point, the ability of sperm to produce embryos was therefore challenged by two different assisted reproduction technologies, that are IVF and ICSI. To increase the number of motile sperm for IVF experiments, we have injected both testes from one male. We also conducted intracytoplasmic sperm injection (ICSI) experiments, using only rescued sperm, identified as motile sperm with a normal flagellum. The results of these new experiments have demonstrated that the rescued ARMC2 sperm successfully fertilized eggs and produced embryos at the two-cell stage by IVF and blastocysts by ICSI. These outcomes are presented in Figure 12.
  
  This is a comprehensive evaluation of the mRNA methodology with multiple strengths. First, the authors show that naked synthetic RNA, purchased from a commercial source or generated in the laboratory with simple methods, is enough to express exogenous proteins in testicular germ cells. The authors compared RNA to DNA electroporation and found that germ cells are efficiently electroporated with RNA, but not DNA. The differences between these constructs were evaluated using in vivo imaging to track the reporter signal in individual animals through time. To understand how the reporter proteins affect the results of the experiments, the authors used different reporters: two fluorescent (eGFP and mCherry) and one bioluminescent (Luciferase). Although they observed differences among reporters, in every case expression lasted for at least two weeks.
  
  The authors used a relevant system to study the therapeutic potential of RNA electroporation. The ARMC2-deficient animals have impaired sperm motility phenotype that affects only the later stages of spermatogenesis. The authors showed that sperm motility was recovered to ~5%, which is remarkable due to the small fraction of germ cells electroporated with RNA with the current protocol. The 3D reconstruction of an electroporated testis using state-of-the-art methods to show the electroporated regions is compelling.
  
  The main weakness of the manuscript is that although the authors manage to recover motility in a small fraction of the sperm population, it is unclear whether the increased sperm quality is substantial to improve assisted reproduction outcomes. The quality of the sperm was not systematically evaluated in the manuscript, with the endpoints being sperm morphology and sperm mobility.
  
  We would like to thank the reviewers for their comments. As previously stated above, we produced additional rescue experiments and performed CASA, morphology observation, IVF and ICSI with the rescued sperm. The rescued ARMC2 sperm exhibited normal morphology (new figure 11 and Supp Fig 8), motility (figure 11), and fecundity (figure 12). Whereas sperm from untreated KO males were unable to fertilize egg by IVF, the rescued sperm fertilized eggs in vitro at a significant level (mean 62%, n=5), demonstrating that our strategy improves the sperm quality and assisted reproduction outcome (from 0 to 62%).
  
  Some key results, such as the 3D reconstruction of the testis and the recovery of sperm motility, are qualitative given the low replicate numbers or the small magnitude of the effects. The presentation of the sperm motility data could have been clearer as well. For example, on day 21 after Armc2-mRNA electroporation, only one animal out of the three tested showed increased sperm motility. However, it is unclear from Figure 11A what the percentage of sperm motility for this animal is since the graph shows a value of >5% and the reported aggregate motility is 4.5%. It would have been helpful to show all individual data points in Figure 11A.
  
  We provide now in figure 11A, a graph showing the percentage of rescued sperm for all animals. (scatter dot plot). Moreover, we performed additional CASA experiments to analyze in detail sperm motility (Figure 11A2-A3). Individual CASA parameters for motile sperm cells were extracted as requested by reviewer 3 and represented in a new graph (Fig 11 A2).
  
  The expression of the reporter genes is unambiguous; however, better figures could have been presented to show cell type specificity. The DAPI staining is diffused, and it is challenging to understand where the basement membranes of the tubules are. For example, in Figures 7B3 and 7E3, the spermatogonia seems to be in the middle of the seminiferous tubule. The imaging was better for Figure 8. Suboptimal staining appears to lead to mislabeling of some germ cell populations. For example, in Supplementary Figure 4A3, the round spermatid label appears to be labeling spermatocytes. Also, in some instances, the authors seem to be confusing, elongating spermatids with spermatozoa, such as in the case of Supplementary Figures 4D3 and D4.
  
  Thanks for the comments, some spermatogenic cells were indeed mislabeled as you mentioned. We have therefore readjusted the labeling accordingly. We also changed spermatozoa to mature spermatids. The new sentence is now: “At the cellular level, fluorescence was detectable in germ cells (B1-B3) including Spermatogonia (Sg), Spermatocytes (Scytes),round Spermatids (RStids), mature spermatids (m-Sptids) and Sertoli cells (SC)”. Moreover, to indicate the localization of the basal membrane, we have also labelled myoid cells.
  
  The characterization of Armc2 expression could have been improved as well. The authors show a convincing expression of ARMC2 in a few spermatids/sperm using a combination of an anti-ARMC2 antibody and tubules derived from ARMC2 KO animals. At the minimum, one would have liked to see at least one whole tubule of a relevant stage.
  
  Thanks for the remark.
  
  We present now new images showing transversal section of seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text.
  
  Overall, the authors show that electroporating mRNA can improve spermatogenesis as demonstrated by the generation of motile sperm in the ARMC2 KO mouse model.
  
  Thank you
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Here, the authors inject naked mRNAs and plasmids into the rete testes of mice to express exogenous proteins - GFP and later ARMC2. This approach has been taken before, as noted in the Discussion to rescue Dmc1 KO infertility. While the concept is exciting, multiple concerns reduce reviewer enthusiasm.
  
  Strengths:
  
  The approach, while not necessarily novel, is timely and interesting. Weaknesses:
  
  Overall, the writing and text can be improved and standardized - as an example, in some places in vivo is italicized, in others it's not; gene names are italicized in some places, others not; some places have spaces between a number and the units, others not. This lack of attention to detail in the preparation of the manuscript is a significant concern to this reviewer - the presentation of the experimental details does cast some reasonable concern with how the experiments might have been done. While this may be unfair, it is all the reviewers have to judge. Multiple typographical and grammatical errors are present, and vague or misleading statements.
  
  Thanks for the comment, we have revised the whole manuscript to remove all the mistakes. We have also added new experiments/figures to strengthen the message. Finally, we have substantially modified the discussion.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The authors used a novel technique to treat male infertility. In a proof-of-concept study, the authors were able to rescue the phenotype of a knockout mouse model with immotile sperm using this technique. This could also be a promising treatment option for infertile men.
  
  Strengths:
  
  In their proof-of-concept study, the authors were able to show that the novel technique rescues the infertility phenotype in vivo.
  
  Weaknesses:
  
  Some minor weaknesses, especially in the discussion section, could be addressed to further improve the quality of the manuscript.
  
  We have substantially modified the discussion, following the remarks of the reviewers.
  
  It is very convincing that the phenotype of Armc2 KO mice could (at least in part) be rescued by injection of Armc2 RNA. However, a central question remains about which testicular cell types have been targeted by the constructs. From the pictures presented in Figures 7 and 8, this issue is hard to assess. Given the more punctate staining of the DNA construct a targeting of Sertoli cells is more likely, whereas the more broader staining of seminiferous tubules using RNA constructs is talking toward germ cells. Further, the staining for up to 119 days (Figure 5) would point toward an integration of the DNA construct into the genome of early germ cells such as spermatogonia and/or possibly to Sertoli cells.
  
  Thanks for the comment. We would like to recall the peculiar properties of the non-insertional Enhanced Episomes Vector (EEV) plasmid, which is a non-viral episome based on the Epstein-Barr virus (EBV: Epstein-Barr Virus). It allows the persistence of the plasmid for long period of time without integration. Its maintenance within the cell is made possible by its ability to replicate in a synchronous manner with the host genome and to segregate into daughter cells. This is due to the fact that EEV is composed of two distinct elements derived from EBV: an origin of replication (oriP) and an EpsteinBarr Nuclear Antigen 1 (EBNA1) expression cassette (Gil, Gallaher, and Berk, 2010).   The oriP is a locus comprising two EBNA1-binding domains, designated as the Family of Repeats (FR) and Dyad Symmetry (DS). The FR is an array of approximately 20 EBNA1-binding sites (20 repeats of 30 bp) with high affinity, while the DS comprises four lower-affinity sites operating in tandem (Ehrhardt et al., 2008).
  
  The 641-amino-acid EBNA1 protein contains numerous domains. The N-terminal domains are rich in glycines and alanines, which enable interaction with host chromosomes. The C-terminal region is responsible for binding to oriP (Hodin, Najrana, and Yates, 2013). The binding of EBNA1 to the DS element results in the recruitment of the origin of replication. This results in the synchronous initiation of extra-chromosomal EEV replication with host DNA at each S phase of the cell cycle (Düzgüneş, Cheung, and Konopka 2018). Furthermore, EBNA1 binding to the FR domain induces the formation of a bridge between metaphase chromosomes and the vector during mitosis. This binding is responsible for the segregation of the EEV episome in daughter cells (Düzgüneş, Cheung, and Konopka 2018). It is notable that EEV is maintained at a rate of 90-95% per cell division.
  
  Because of the intrinsic properties of EEV described above, the presence of the reporter protein at 119 day after injection was likely due to the maintenance of the plasmid, mostly in Sertoli cells, and not to the DNA integration of the plasmid.
  
  Of note, the specificity of EEV was already indicated in the introduction (lines 124-128 clean copy). Nevertheless, we have added more information about EEV to help the readers.
  
  Given the expression after RNA transfection for up to 21 days (Figure 4) and the detection of motile sperm after 21 days (Figure 11), this would point to either round spermatids or spermatocytes. These aspects need to be discussed more carefully (discussion section: lines 549-574).
  
  We added a sentence to highlight that spermatids are transfected and protein synthetized at this stage and this question is discussed in details (see lines 677-684 clean copy).
  
  It would also be very interesting to know in which testicular cell type Armc2 is endogenously expressed (lines 575-591)
  
  Thanks for the remarks. We present now new images showing the full seminiferous tubules as requested by reviewer 1 (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that Armc2 is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text. (lines 570-579 clean copy).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  The article is well-structured and easy to read. Nonetheless, there are typos and mistakes in some places that are distracting to the reader, such as the capitalization of the word "Oligo-" in the title of the manuscript, the use of the word "Materiel" in the title of the Materials and methods and the presence of space holders "Schorr staining was obtained from Merck (XXX)". Thank you, we corrected the misspelling of "Materials and Methods" and corrected our error: "obtained from Merck (Darmstadt, Germany)". We also carefully corrected the manuscript to remove typos and mistakes.
  
  The discussion is too lengthy, with much repetition regarding the methods used and the results obtained. For example, these are two sentences from the discussion. "The vector was injected via the rete testis into the adult Armc2 KO mice. The testes were then electroporated." I would recommend shortening these passages.
  
  Thanks for your comments, we removed the sentences and we have substantially modified the discussion, following the remarks of the reviewers.
  
  The work is extensive, and many experiments have been done to prove the points made. However, a more in-depth analysis of critical experiments would have benefited the manuscript significantly. A more thorough analysis of sperm mobility and morphology using the CASA system would have been an initial step.
  
  In response to the observations made, additional CASA experiments and sperm motility analysis were conducted, as illustrated in Figure 11 (A2-A3). Individual CASA parameters for motile sperm cells were extracted as suggested and represented in a new graph (Fig 11 A2). We have observed significant differences between WT and rescued sperm. In particular, the VSL and LIN parameters were lower for rescued sperm. Nevertheless, these differences were not sufficient to prevent IVF, maybe because the curvilinear velocity (VCL) was not modified.
  
  In the case of ARMC2 localization, an analysis of the different stages of spermatogenesis to show when ARMC2 starts to be expressed.
  
  Thanks for the remarks. This is an important remark pointed out by all reviewers. As explained above, we have performed more experiments. We present now new images showing transversal section of seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatid layers. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text. (lines 575579 clean copy).
  
  Finally, exploring additional endpoints to understand the quality of the sperm generated, such as the efficiency of ICSI or sperm damage, could have helped understand the degree of the recovery.
  
  This point was underlined in public review. We paste here our answer: “To address this important point, the ability of sperm to produce embryos was therefore challenged by two different assisted reproduction technologies, that are IVF and ICSI. To increase the number of motile sperm for IVF experiments, we have injected both testes from one male. We also conducted intracytoplasmic sperm injection (ICSI) experiments, using only rescued sperm, identified as motile sperm with a normal flagellum. The results of these new experiments have demonstrated that the rescued ARMC2 sperm successfully fertilized eggs and produced embryos at the two-cell stage by IVF and blastocysts by ICSI. These outcomes are presented in Figure 12.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  38,74 intracellular
  
  Thanks, we changed it accordingly: "Intracytoplasmic sperm injection (ICSI) is required to treat such a condition, but it has limited efficacy and has been associated with a small increase in birth defects" and "such as intracytoplasmic sperm injection (ICSI)".
  
  39 "limited efficacy" Versus what? And for what reason? "small increase in birth defects" - compared to what?
  
  We changed to “… but it is associated with a small increase in birth defect with comparison to pregnancies not involving assisted conception.”
  
  40 Just thinking through the logic of the argument thus far - the authors lay out that there are people with OAT (true), ICSI must be used (true), ICSI is bad (not convincing), and therefore a new strategy is needed... so is this an alternative to ICSI? And this is to restore fertility, not "restore spermatogenesis"
  
  - because ICSI doesn't restore spermatogenesis. This logic flow needs to be cleaned up some
  
  Thanks we changed it accordingly: “restore fertility.”
  
  45 "mostly"?
  
  Thank you, we removed the word: “We show that mRNA-coded reporter proteins are detected for up to 3 weeks in germ cells, making the use of mRNA possible to treat infertility.”
  
  65 Reference missing.
  
  We added the following reference Kumar, N. and A. K. Singh (2015). "Trends of male factor infertility, an important cause of infertility: A review of literature." J Hum Reprod Sci 8(4): 191-196.
  
  68 Would argue meiosis is not a reduction of the number of chromosomes - that happens at the ends of meiosis I and II - but the bulk of meiosis is doubling DNA and recombination; would re-word; replace "differentiation" with morphogenesis, which is much more commonly used:
  
  Thank you, we have changed the sentence accordingly: "proliferation (mitosis of spermatogonia), reduction of the number of chromosomes (meiosis of spermatocytes), and morphogenesis of sperm (spermiogenesis)".
  
  70 "almost exclusively" is an odd term, and a bit of an oxymoron - if not exclusively, then where else are they expressed? Can you provide some sense of scale rather than using vague words like "large", "almost", "several", "strongly" and "most...likely" - need some support for these claims by being more specific:
  
  Thanks for the comment, we changed the sentence: "The whole process involves around two thousand genes, 60% of which are expressed exclusively in the testes."
  
  73 "severe infertility" is redundant - if they are infertile, is there really any more or less about it? I think what is meant is patients with immotile sperm can be helped by ICSI - so just be more specific...
  
  We changed the transition : “Among infertility disorders, oligo-astheno-teratozoospermia (OAT) is the most frequent (50 % (Thonneau, Marchand et al. 1991); it is likely to be of genetic origin. Spermatocytograms of OAT patients show a decrease in sperm concentration, multiple morphological defects and defective motility. Because of these combined defects, patients are infertile and can only conceive by IntraCytoplasmic Sperm Injection (ICSI). IntraCytoplasmic Sperm Injection (ICSI) can efficiently overcome the problems faced. However, there are …”
  
  75 "some" is vague - how many concerns, and who has them? Be specific!
  
  Thanks for the comment, we removed the word.
  
  76-7 Again, be specific - "real" has little meaning - what is the increased risk, in % or fold? This is likely a controversial point, so make sure you absolutely support your contention with data .
  
  77 "these"? There was only one concern listed - increased birth defects; and "a number" is vague - what number, 1 or 1,000,000? A few (2-3), dozens, hundreds?
  
  Thanks for the comment, we have reworded the sentence: “Nevertheless, concerns persist regarding the potential risks associated with this technique, including blastogenesis defect, cardiovascular defect, gastrointestinal defect, musculoskeletal defect, orofacial defect, leukemia, central nervous system tumors, and solid tumors. Statistical analyses of birth records have demonstrated an elevated risk of birth defects, with a 30–40% increased likelihood in cases involving ICSI, and a prevalence of birth defects between 1% and 4%.” We have added a list of references to support these claims.
  
  79-81 So, basically transgenesis? Again, vague terms "widely" - I don't think it's all that widely used yet... and references are missing to support the statement that integration of DNA into patient genomes is widely used. Give specific numbers, and provide a reference to support the contention.
  
  Thanks for the comment, we removed the word widely and add references.
  
  81-5 Just finished talking about humans, but now it appears the authors have switched to talking about mice - got to let the readers know that! Unless you're talking about the Chinese group that deleted CCR5 in making transgenic humans?
  
  Your feedback is greatly appreciated. In response to your comments, the sentence in question has been amended to provide a more comprehensive understanding. Indeed, the text refers to experiences carried in mice. The revised wording is as follows: “Given the genetic basis of male infertility, the first strategy, tested in mice, was to overcome spermatogenic failure associated with monogenic diseases by delivery of an intact gene to deficient germ cells (Usmani, Ganguli et al. 2013).
  
  84-5 "efficiently" and "high" - provide context so the reader can understand what is meant - do the authors mean the experiments work efficiently, or that a high percentage of cells are transfected? And give some numbers or range of numbers - you're asking the readers to take your word for things when you choose adjectives - instead, provide values and let the readers decide for themselves.
  
  Thanks for the comment, we have reworded the sentence: Gene therapy is effective in germ cells, as numerous publications have shown that conventional plasmids can be transferred into spermatogonia in several species with success, allowing their transcription in all cells of the germinal lineage (Usmani, Ganguli et al. 2013, Michaelis, Sobczak et al. 2014, Raina, Kumar et al. 2015, Wang, Liu et al. 2022).
  
  93 Reference at the end of the sentence "most countries"
  
  Thanks, we changed the sentence and added the reference: the new sentence is "… to avoid any eugenic deviations, transmissible changes in humans are illegal in 39 countries (Liu 2020)” (Liu, S. (2020). "Legal reflections on the case of genomeedited babies." Glob Health Res Policy 5: 24
  
  93-4 Odd to say "multiple" and then list only one.
  
  Thanks for the comment, we have reworded the sentence: “Furthermore, the genetic modification of germ cell lines poses biological risks, including the induction of cancer, off-target effects, and cell mosaicism. Errors in editing may have adverse effects on future generations. It is exceedingly challenging to anticipate the consequences of genetic mosaicism, for instance, in a single individual. (Sadelain, Papapetrou et al. 2011, Ishii 2017).”
  
  97 Is this really a "small" change? Again, would use adjectives carefully - to this reviewer, this is not a small change, but a significant one! And "should be" is not altogether convincing
  
  Thanks for the comment, we have reworded the sentence: “Thanks to this change, the risk of genomic insertion is avoided, and thus there is no question of heritable alterations.”
  
  What chance is there of retrotransposition? Is there any data in the literature for that, after injecting millions of copies of RNA one or more might be reverse transcribed and inserted into the genome?
  
  This is certainly possible and is the putative origin for multiple intronless spermatid-expressed genes:
  
  The expert poses an interesting question, but one that unfortunately remains unanswered at present. Most papers on mRNA therapy state that there is no risk concerning genomic integration, but no reference is given (for instance see mRNA-based therapeutics: looking beyond COVID-19 vaccines. Lancet. 2024 doi: 10.1016/S0140-6736(23)02444-3). This is an important question, which deserves to be evaluated, but is beyond the scope of this manuscript. Nevertheless is remaining very debating (Igyarto and Qin 2024).
  
  98 Odd to say "should be no risk" and then conclude with "there is no question" - so start the sentence with 'hedging', and then end with certainty - got to pick one or the other.
  
  Thanks for the comment, we have reworded the sentence
  
  99 "Complete" - probably not, would delete:
  
  We removed the word: “The first part of this study presents a characterization of the protein expression patterns obtained following transfection of naked mRNA coding for reporter genes into the testes of mice”
  
  101-2 Reference missing, as are numbers - what % of cases?
  
  Thank you, we changed the sentence and added the reference: “Among infertility disorders, oligoastheno-teratozoospermia (OAT) is the most frequent (50 % (Thonneau, Marchand et al. 1991)” Thonneau, P., S. Marchand, A. Tallec, M. L. Ferial, B. Ducot, J. Lansac, P. Lopes, J. M. Tabaste and A. Spira (1991). "Incidence and main causes of infertility in a resident population (1,850,000) of three French regions (1988-1989)." Hum Reprod 6(6): 811-816.
  
  103 Once again, the reference is missing:
  
  We have added these references: (Colpi, Francavilla et al. 2018) (Cavallini 2006)
  
  104-5 Awkward transition.
  
  Thanks, we changed the transition: “The first part of this study presents a characterization of the protein expression patterns obtained following transfection of naked mRNA coding for reporter genes into the testes of mice. The second part is to apply the protocol to a preclinical mouse model of OAT.”
  
  105 Backslash is odd - never seen it used in that way before
  
  Removed
  
  108 "completely infertile" is redundant;
  
  Thank you, we changed it accordingly: “Patients and mice carrying mutations in the ARMC2 gene present a canonical OAT phenotype and are infertile”.
  
  and is a KO mouse really "preclinical"?
  
  The definition of preclinical research, is research involving the use of animals to ascertain the potential efficacy of a drug, procedure, or treatment. Preclinical studies are conducted prior to any testing in humans. Our KO mouse model has been shown to mimic human infertility. Indeed Armc2-/-mice exhibit a phenotype that is identical to that observed in humans. Our study is in line with this definition. For this reason, we have decided to maintain our current position and to use the term "preclinical" in the article.
  
  110 Delete "sperm".
  
  Thank you, we changed it accordingly: “The preclinical Armc2 deficient (Armc2 KO) mouse model is therefore a valuable model to assess whether in vivo injection of naked mRNA combined with electroporation can restore spermatogenesis”
  
  111 "Easy"? Really?
  
  We changed it accordingly: “We chose this model for several reasons: first, Armc2 KO mice are sterile and all sperm exhibit short, thick or coiled flagella [13].”
  
  112-3 "completely immobile" is redundant - either they are immobile or not.
  
  Thank you, we changed it accordingly: “As a result, 100 % of sperm are immobile, thus it should be easy to determine the efficacy of the technique by measuring sperm motility with a CASA system.”
  
  108-33 Condense this lengthy text into a coherent few sentences to give readers a sense of what you sought to accomplish, broadly how it was done, and what you found. This reads more like a Results section
  
  Thanks for the comment, we shortened the text.
  
  Materials and Methods
  
  The sections appear to have been written by different scientists - the authors should standardize so that similar detail and formatting are used - e.g., in some parts the source is in parentheses with catalog number, in others not, some have city, state, country, others do not... the authors should check eLife mandates for this type of information and provide.
  
  We are grateful for your feedback. We standardized the text, and if we had missed some, as outlined on the E-Life website, we can finish to format the article once it has been accepted for publication in the journal before sending the VOR.
  
  134 Misspelling
  
  We corrected the misspelling
  
  142 Just reference, don't need to spell it out.
  
  Thanks, we changed it accordingly: “and the Armc2 KO mouse strain obtained by CRISPR-Cas9 (Coutton, Martinez et al. 2019). Experiments”
  
  150 What is XXX?
  
  We would like to express our gratitude for bringing this error to our attention. We have duly rectified the issue: “obtained from Merck (Darmstadt, Germany).”
  
  157-60 Are enough details provided for readers to repeat this if necessary? Doesn't seem so to this reviewer; if kits were followed, then can say "using manufacturer's protocol", or refer to another manuscript - but this is too vague.
  
  Thanks, we change it accordingly: After expansion, plasmids were purified with a NucleoBond Xtra Midi kit (740410-50; Macherey-Nagel, Düren, Germany) using manufacturer's protocol.”
  
  165 Again, too few details - how was it purified? What liquid was it in?
  
  Thanks for the comment, the EEV plasmids were purified like all other plasmids. We change the text: “All plasmids,EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid ( given by Dr. Conti MD at UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOM-S017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation”
  
  170 Seems some words are missing - and will everyone know Dr. Conti by last name alone? Would spell out, and the details of the plasmid must either be provided or a reference given; how was amplification done? Purification? What was it resuspended in?
  
  Thank for the remark, the mcherry plasmids were purified like all other plasmids. We change the text: “All plasmids,EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid ( given by Dr. Conti MD, UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOM-S017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation”
  
  175 Again, for this plasmid provide more information - catalog number, reference, etc; how amplified and purified, what resuspension buffer?
  
  Thank you for the remark, as We mentioned, we add this sentence for the preparation: “All plasmids, EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid (given by Dr. Conti MD at UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOMS017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation” and we add these sentence “The EEV-Armc2-GFP plasmid used for in vivo testes microinjection and electroporation was synthesized and customized by Trilink (CUSTOM-S017188-R2-3,San Diego, USA).”
  
  183 What sequence, or isoform was used? Mouse or human?
  
  Thanks, we changed accordingly: “This non-integrative episome contains the mice cDNA sequences of Armc2 (ENSMUST00000095729.11)”
  
  186-7 Provide sequence or catalog number; what was it resolubilized in?
  
  Thanks we changed accordingly “the final plasmid concentration was adjusted to 9 μg μL-1 in water.” We provided the sequence of EEV-Armc2-GFP in supp data 6.
  
  207-219 Much better, this is how the entire section needs to be written!
  
  237-240 Font
  
  Thanks for the comment, we changed it accordingly
  
  246 Cauda, and sperm, not sperm cells
  
  Thanks for the comment, we changed it accordingly
  
  255-6 Which was done first? Would indicate clearly.
  
  Thanks for the comment, we changed the sentence: “Adult mice were euthanized by cervical dislocation and then transcardiac perfused with 1X PBS”
  
  281-2 Provide source for software - company, location, etc:
  
  We changed it accordingly: FIJI software (Opened source software) was used to process and analyze images and Imaris software (Oxford Instruments Tubney Woods, Abingdon, Oxon OX13 5QX, UK) for the 3D reconstructions.
  
  323 um, not uM.
  
  Thanks for the comment, we changed our mistake: “After filtration (100 µm filter)”
  
  Results
  
  369 Weighed.
  
  Thanks for the comment, we changed our mistake: “the testes were measured and weighed”
  
  371 No difference in what, specifically?
  
  Thanks for the comment, we changed the sentence to: “No statistical differences in length and weight were observed between control and treated testes”
  
  375 "was respected"? What does this mean?
  
  Thanks for the comment, we changed the sentence to “The layered structure of germ cells were identical in all conditions”
  
  378 This is highly unlikely to be true, as even epididymal sperm from WT animals are often defective - the authors are saying there were ZERO morphological defects? Or that there was no difference between control and treated? Only showing 2-3 sperm for control vs treatment is not sufficient.
  
  Your observation that the epididymal spermatozoa from wild-type animals exhibited defective morphology is indeed true. The prevalence of these defects varies by strain, with an average incidence of 20% to 40% (Kawai, Hata et al., 2006; Fan, Liu et al., 2015). To provide a more comprehensive representation, we conducted a Harris-Shorr staining procedure and included a histogram of the percentage of normal sperm in each condition (new figure 2F4). Furthermore, Harris-Shorr staining of the epididymal sperm cells revealed that there were no discernible increases in morphological defects when mRNA and EEV were utilized, in comparison with the control. We add the sentence “At last, Harris-Shorr staining of the epididymal sperm cells demonstrated that there were no increases in morphological defects when mRNA and EEV were used in comparison with the control”.
  
  379 "safe" is not the right word - better to say "did not perturb spermatogenesis".
  
  Thanks, we changed it accordingly: “these results suggest that in vivo microinjection and electroporation of EEV or mRNA did not perturb spermatogenesis”
  
  382-3 This sentence needs attention, doesn't make sense as written:
  
  Thanks for the remark, we changed the sentence to: “No testicular lesions were observed on the testes at any post injection time”
  
  389 How long after injection?
  
  Thanks for the comment, we changed the sentence to: “It is worth noting that both vectors induced GFP expression at one day post-injection”
  
  390 Given the duration of mouse spermatogenesis (~35 days), for GFP to persist past that time suggests that it was maintained in SSCs? How can the authors explain how such a strong signal was maintained after such a long period of time? How stable are the episomally-maintained plasmids, are they maintained 100% for months? And if they are inherited by progeny of SSCs, shouldn't they be successively diluted over time? And if they are inherited by daughter cells such that they would still be expressed 49 days after injection, shouldn't all the cells originating from that SSC also be positive, instead of what appear to be small subsets as shown in Fig. 3H2? Overall, this reviewer is struggling to understand how a plasmid would be inherited and passed through spermatogenesis in the manner seen in these results.
  
  Thanks for the comment.
  
  This point was already underlined in public review. We paste here our answer: “The non-insertional Enhanced Episomes Vector (EEV) plasmid is a non-viral episome based on the Epstein-Barr virus (EBV: Epstein-Barr Virus). Its maintenance within the cell is made possible by its ability to replicate in a synchronous manner with the host genome and to segregate into daughter cells. This is due to the fact that EEV is composed of two distinct elements derived from EBV: an origin of replication (oriP) and an Epstein-Barr Nuclear Antigen 1 (EBNA1) expression cassette (Gil, Gallaher, and Berk, 2010).   The oriP is a locus comprising two EBNA1-binding domains, designated as the Family of Repeats (FR) and Dyad Symmetry (DS). The FR is an array of approximately 20 EBNA1-binding sites (20 repeats of 30 bp) with high affinity, while the DS comprises four lower-affinity sites operating in tandem (Ehrhardt et al., 2008).
  
  The 641-amino-acid EBNA1 protein contains numerous domains.The N-terminal domains are rich in glycines and alanines, which enable interaction with host chromosomes. The C-terminal region is responsible for binding to oriP (Hodin, Najrana, and Yates, 2013a). The binding of EBNA1 to the DS element results in the recruitment of the origin of replication. This results in the synchronous initiation of extra-chromosomal EEV replication with host DNA at each S phase of the cell cycle (Düzgüneş, Cheung, and Konopka 2018a). Furthermore, EBNA1 binding to the FR domain induces the formation of a bridge between metaphase chromosomes and the vector during mitosis. This binding is responsible for the segregation of the EEV episome in daughter cells (Düzgüneş, Cheung, and Konopka 2018b). It is notable that EEV is maintained at a rate of 90-95% per cell division.”
  
  Because of the intrinsic properties of EEV described above, the presence of the reporter protein at 119 day after injection was likely due to the maintenance of the plasmid, mostly in Sertoli cells, and not to the DNA integration of the plasmid.
  
  Of note, the specificity of EEV was already indicated in the introduction. Nevertheless, we have added more information about it to help the readers (lines 124-128 clean copy)
  
  398 Which "cell types"?
  
  Your feedback is greatly appreciated, and the sentence in question has been amended to provide a more comprehensive understanding. The revised wording is as follows: These results suggest that GFPmRNA and EEV-GFP targeted different seminiferous cell types, such as Sertoli cells and all germline cells, or that there were differences in terms of transfection efficiency.
  
  409 Why is it important to inject similar copies of EEV and mRNA? Wouldn't the EEV be expected to generate many, many more copies of RNA per molecule than the mRNAs when injected directly??
  
  We removed the word importantly.
  
  415 How is an injected naked mRNA stably maintained for 3 weeks? What is the stability of this mRNA?? Wouldn't its residence in germ cells for 21 days make it more stable than even the most stable endogenous mRNAs? Even mRNAs for housekeeping genes such as actin, which are incredibly stable, have half-lives of 9-10 hours.
  
  We appreciate your inquiry and concur with your assessment that mRNA stability is limited. It is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the expression of the GFP protein induced by the mRNA. To draw the reader's attention to this point, we have added the following sentence to the text “It is important to underline that the signal measured is the fluorescence emitted by the GFP. This signal is dependent of both the half-lives of the plasmid/mRNA and the GFP. Therefore, the kinetic of the signal persistence (which is called here expression) is a combination of the persistence of the vector and the synthetized protein. See lines 469-472 clean copy.
  
  This being said, it is difficult to compare the lifespan of a cellular mRNA with that of a mRNA that has been modified at different levels, including 5’Cap, mRNA body, poly(A)tail modifications, which both increase mRNA stability and translation (see The Pivotal Role of Chemical Modifications in mRNA Therapeutics (2022) https://doi.org/10.3389/fcell.2022.901510). This question is discussed lines 687698 clean copy
  
  467 "safely" should be deleted
  
  Thanks, we removed the word: “To validate and confirm the capacity of naked mRNA to express proteins in the testes after injection and electroporation”
  
  470 Except that apoptotic cells were clearly seen in Figure 2:
  
  We would like to thank the reviewer for their comment. We agree that the staining of the provided sections were of heterogenous quality. To address the remark, we carried out additional HE staining for all conditions, and we now present testis sections correctly stained obtained in the different condition in Fig. 2 and Supp. 7. Our observations revealed that the number of apoptotic cells remained consistent across all conditions.
  
  471 "remanence"?
  
  We appreciate your feedback and have amended the sentence to provide clear meaning. The revised wording is as follows: “The assessment of the temporal persistence of testicular mCherry fluorescent protein expression revealed a robust red fluorescence from day 1 post-injection, which remained detectable for at least 15 days (Fig. Supp. 3 B2, C2, and D2).”
  
  489 IF measures steady-state protein levels, not translation; should say you determined when ARMC2 was detectable.
  
  Thanks for the remark, we changed the sentence to: “ By IF, we determined when ARMC2 protein was detectable during spermatogenesis.”
  
  491 Flagella
  
  Thanks for the comment, we changed our mistake: “in the flagella of the elongated spermatids (Fig 9A)”
  
  Discussion
  
  The Discussion is largely a re-hashing of the Methods and Results, with additional background.
  
  Message stability must be addressed - how is a naked mRNA maintained for 21 days?
  
  As previously stated, it is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the synthetized GFP protein. This point and the stability of protein in the testis is now discussed lines 677-684 (clean copy).
  
  556 How do the authors define "safe"?
  
  Thanks for the comment, we changed the sentence to be clearer: “Our results also showed that the combination of injection and electroporation did not perturb spermatogenesis when electric pulses are carefully controlled”
  
  563 Synthesized
  
  Thanks, we changed it accordingly
  
  602 Again, this was not apparent, as there were more apoptotic cells in Fig. 2 - data must be provided to show "no effect".
  
  As previously stated, we carried out additional HE staining for all conditions, as can be observed in Fig. 2 . Our observations revealed that the number of apoptotic cells remained consistent across all conditions.
  
  629-30 This directly contradicts the authors' contention in the Introduction that ICSI was unsafe - how is this procedure going to be an advancement over ICSI as proposed, if ICSI needs to be used?? Why not just skip all this and do ICSI then?? Perhaps if this technique was used to 'repair' defects in spermatogonia or spermatocytes, then that makes more sense. But if ICSI is required, then this is not an advancement when trying to rescue a sperm morphology/motility defect.
  
  In light of the latest findings (Fig 12), we have revised this part of the discussion and this paragraph no longer exist.
  
  Nevertheless, to address specifically the reviewer’s remark, we would like to underline that ICSI with sperm from fertile donor is always more efficient than ICSI with sperm from patient suffering of OAT condition. Our strategy, by improving sperm quality, will improve the efficiency of ICSI and at the end will increase the live birth rate resulting from the first fresh IVF cycle.
  
  640-2 What is meant by "sperm organelles" And what examples are provided for sperm proteins being required at or after fertilization?
  
  This paragraph was also strongly modified and the notion of protein persistence during spermatogenesis was discussed in the paragraph on fluorescent signal duration. See lines 698-705.
  
  651 "Dong team"??
  
  Thanks for the comment, we added the references.
  
  Figure 2D2 - tubule treated with EEV-GFP appears to have considerably more apoptotic cells - this reviewer counted ~10 vs 0 in control; also, many of the spermatocytes appear abnormal in terms of their chromatin morphology - the authors must address this by staining for markers of apoptosis - not fair to conclude there was no difference when there's a very obvious difference!
  
  We would like to thank the reviewer for their comment. This point was already addressed. As previously stated, we provide now new testis sections for all condition (see Fig. 2). Our observations revealed that the number of apoptotic cells remained consistent across all conditions.
  
  Figure 2D3 staining is quite different than D1-2, likely a technical issue - looks like no hematoxylin was added? Need to re-stain so results can be compared to the other 2 figures
  
  As previously stated, we carried out additional HE staining for all conditions, and new images are provided, with similar staining.
  
  Figure 3 - the fluorescent images lack any context of tubule structure so it is nearly impossible to get a sense of what cells express GFP, or whether they're in the basal vs adluminal compartment - can the authors outline them? Indicate where the BM and lumen are.
  
  We would like to thank the reviewer for their comment. This figure provides actually a global view of the green fluorescent protein (GFP) expression at the surface of the testis. The entire testis was placed under an inverted epifluorescence microscope, and a picture of the GFP signal was recorded. For this reason, it is impossible to delineate the BM and the lumen. It should be noted that the fluorescence likely originates from different seminiferous tubules.
  
  Author response image 1.
  
  So, for Figure 3 if the plasmid is being uptaken by cells and maintained as an episome, is it able to replicate? Likely not.
  
  Yes! it is the intrinsic property of the episome, see the detailed explanation provided above about the EEV plasmid
  
  So, initially, it could be in spermatogonia, spermatocytes, and spermatids. As time progressed those initially positive spermatids and then spermatocytes would be lost - and finally, the only cells that should be positive would be the progeny of spermatogonia that were positive - but, as they proliferate shouldn't the GFP signal decline?
  
  Because EEV is able to replicate in a synchronous manner with the host genome and to segregate into daughter cells at a level of 90% of the mother cell, the expected decline is very slow.
  
  And, since clones of germ cells are connected throughout their development, shouldn't the GFP diffuse through the intercellular bridges so entire clones are positive? Was this observed?
  
  We did not perform IF experiments further than 7 days after injection, a time too short to observe what the reviewer suggested. Moreover, if at 1 day after injection, GFP synthesized from injected EEV was found in both germ cells and Sertoli cells (Fig 7), after one week, the reporter proteins were only observable in Sertoli cells. This result suggests that EEV is maintained only in Sertoli cells, thus preventing the observation of stained clones.
  
  Can these sections be stained for the ICB TEX14 so that clonality can be distinguished? Based on the apparent distance between cells, it appears some are clones, but many are not...
  
  We thank the reviewer for this suggestion but we are not able to perform testis sectioning and costaining experiments because the PFA treatment bleaches the GFP signal. We also tested several GFP antibodies, but all failed.
  
  Nevertheless, we were able to localize and identify transfected cells thank to the whole testis optical clearing, combined with a measure of GFP fluorescence and three-dimensional image reconstructions.
  
  For Figure 4, with the mRNA-GFP, why does the 1-day image (which looks similar to the plasmidtransfected) look so different from days 7-21?
  
  And why do days 7-21 look so different from those days in Fig 3?
  
  Thank you for your feedback. It is an excellent question. Because of the low resolution of the whole testis epifluorescences imaging and light penetration issue, we decided to carry-out whole testis optical clearing and three-dimensional image reconstructions experiments, in order to get insights on the transfection process. At day 1, GFP synthesized from EEV injection was found in spermatogonia, spermatocytes and Sertoli cells (Fig 7). After one week, the reporter protein synthesized from injected EEV was only observable in Sertoli cells.
  
  In contrast, for mRNA, on day 1 and day 7 post-injection, GFP fluorescent signal was associated with both Sertoli cells and germ cells. This explains why patterns between mRNA-GFP and EEV-GFP are similar at day 1 and different at day 7 between both conditions.
  
  Why do the authors think the signal went from so strong at 21 to undetectable at 28? What changed so drastically over those 7 days?
  
  What is the half-life of this mRNA supposed to be? It seems that 21 days is an unreasonably long time, but then to go to zero at 28 seems also odd... Please provide some explanation, and context for whether the residence of an exogenous mRNA for 21 days is expected.
  
  As previously stated, it is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the GFP protein produced by the mRNA. The time of observation of the reporter proteins expressed by the respective mRNA molecules (mCherry, luciferase, or GFP) ranged from 15 to 21 days. Proteins have very different turnover rates, with half-lives ranging from minutes to days. Half-lives depend on proteins but also on tissues. As explained in the discussion, it has been demonstrated that proteins involved in spermatogenesis exhibit a markedly low turnover rate and this explains the duration of the fluorescent signal.
  
  The authors should immunostain testis sections from controls and those with mRNA and plasmid and immunostain with established germ cell protein fate markers to show what specific germ cell types are GFP+
  
  Thank you for your feedback. As previously mentioned, we were unable to perform testis sectioning and co-staining because the PFA treatment bleaches the GFP signal and because we were unable to reveal GFP with an GFP antibody, for unknown reasons.
  
  For the GFP signal to be maintained past 35 days, the plasmid must have integrated into SSCs - and for that to happen, the plasmid would have to cross the blood-testis-barrier... is this expected?
  
  We are grateful for your observation.
  
  First, as explained above, we do not think that the plasmid has been integrated.
  
  Concerning the blood-testing barrier. It bears noting that electroporation is a technique that is widely utilized in biotechnology and medicine for the delivery of drugs and the transfer of genes into living cells (Boussetta, Lebovka et al. 2009). This process entails the application of an electric current, which induces the formation of hydrophilic pores in the lipid bilayer of the plasma membrane (Kanduser, Miklavcic et al. 2009). The pores remain stable throughout the electroporation process and then close again once it is complete. Consequently, as electroporation destabilizes the cell membrane, it can also destabilize the gap junctions responsible of the blood-testis barrier. This was actually confirmed by several studies, which have observed plasmid transfection beyond the blood-testis barrier with injection into rete testis following electroporation (Muramatsu, Shibata et al. 1997, Kubota, Hayashi et al. 2005, Danner, Kirchhoff et al. 2009, Kanduser, Miklavcic et al. 2009, Michaelis, Sobczak et al. 2014).
  
  Figure 9 - authors should show >1 cell - this is insufficient; also, it's stated it's only in the flagella, but it also appears to be in the head as well. And is this just the principal piece?? And are the authors sure those are elongating vs condensing spermatids? Need to show multiple tubules, at different stages, to make these claims
  
  We have partly answered to this question in the public review; We pastehere our answer
  
  “We present now new images showing the full seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text.”
  
  Concerning the localization of the protein in the head, we confirm that the base of the manchette is stained but we have no explanation so far. This point is now indicated in the manuscript.
  
  Figure 10B2 image - a better resolution is necessary
  
  We are grateful for your feedback. We concede that the quality of the image was not optimal. Consequently, We have replaced it with an alternative.
  
  Figure 11 - in control, need to show >1 sperm; and lower-mag images should be provided for all samples to show population-wide effects; showing 1 "normal" sperm per group (white arrows) is insufficient:
  
  We are grateful for your feedback. We conducted further experiments and provide now additional images in Supp. figure 8.
  
  Reviewer #3 (Recommendations For The Authors):
  
  In this study, Vilpreux et al. developed a microinjection/electroporation method in order to transfect RNA into testicular cells. The authors studied several parameters of treated testis and compared the injection of DNA versus RNA. Using the injection of Armc2 RNA into mice with an Armc2 knockout the authors were able to (partly) rescue the fertility phenotype.
  
  Minor points.
  
  Figure 6 + lines 553+554: might it be that the staining pattern primarily on one side of the testis is due to the orientation of the scissor electrode during the electroporation procedure and the migration direction of negatively charged RNA molecules (Figure 6)?
  
  Your input is greatly appreciated. We concur that the observed peripheral expression is due to both the electroporation and injection. Accordingly, we have amended the sentence as follows: "The peripheral expression observed was due to the close vicinity of cells to the electrodes, and to a peripheral dispersal of the injected solution, as shown by the distribution of the fluorescent i-particles NIRFiP-180."
  
  Discussion of the safety aspect (lines 601-608): The authors state several times that there are no visible tissue changes after the electroporation procedure. However, in order to claim that this procedure is "safe", it is necessary to examine the offspring born after microinjection/electroporation.
  
  Your input is greatly appreciated. Consequently, the term "safe" has been replaced with "did not perturb spermatogenesis" in accordance with the provided feedback. Your assertion is correct; an examination of the offspring born would be necessary to ascertain the safety of the procedure. Due to the quantity of motile sperm obtained, it was not possible to produce offspring through natural mating. However, novel Armc2-/--rescued sperm samples have been produced and in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) experiments have been conducted. The results demonstrate that the Armc2-/--rescued sperm can successfully fertilize eggs and produce two-cell embryos by IVF and blastocysts by ICSI. These outcomes are visually represented in Figure 12. The development of embryos up to the blastocyst stage is a step in the right direction.
  
  The discussion section could be shortened. Lines 632-646 are largely a repetition of the introductory section. In addition, the Dong paper (ref. 25) may be interesting; however, this part could also be shortened (lines 647-676). This reviewer would prefer the authors to focus on the technique (different application sites and applied nucleotides) and proof of concept for (partial) phenotype rescue in the knockout mice.
  
  Your contribution is highly valued. In light of your observations and the latest findings, we have substantially revised the discussion accordingly.
  
  Line 63: oocytes rather than eggs.
  
  We are grateful for your input, but we have decided to retain our current position and to use the term "eggs" rather than "oocytes" in our writing because the definition of an oocyte is a female gametocyte or germ cell involved in reproduction. In other words, oocyte corresponds to a germ cell inside the ovary and after ovulation become an egg.
  
  Boussetta, N., N. Lebovka, E. Vorobiev, H. Adenier, C. Bedel-Cloutour and J. L. Lanoiselle (2009). "Electrically assisted extraction of soluble matter from chardonnay grape skins for polyphenol recovery." J Agric Food Chem 57(4): 1491-1497.
  
  Cavallini, G. (2006). "Male idiopathic oligoasthenoteratozoospermia." Asian J Androl 8(2): 143-157.
  
  Colpi, G. M., S. Francavilla, G. Haidl, K. Link, H. M. Behre, D. G. Goulis, C. Krausz and A. Giwercman (2018). "European Academy of Andrology guideline Management of oligo-asthenoteratozoospermia." Andrology 6(4): 513-524.
  
  Coutton, C., G. Martinez, Z. E. Kherraf, A. Amiri-Yekta, M. Boguenet, A. Saut, X. He, F. Zhang, M. Cristou-Kent, J. Escoffier, M. Bidart, V. Satre, B. Conne, S. Fourati Ben Mustapha, L. Halouani, O. Marrakchi, M. Makni, H. Latrous, M. Kharouf, K. Pernet-Gallay, M. Bonhivers, S. Hennebicq, N. Rives, E. Dulioust, A. Toure, H. Gourabi, Y. Cao, R. Zouari, S. H. Hosseini, S. Nef, N. Thierry-Mieg, C. Arnoult and P. F. Ray (2019). "Bi-allelic Mutations in ARMC2 Lead to Severe Astheno-Teratozoospermia Due to Sperm Flagellum Malformations in Humans and Mice." Am J Hum Genet 104(2): 331-340.
  
  Danner, S., C. Kirchhoff and R. Ivell (2009). "Seminiferous tubule transfection in vitro to define postmeiotic gene regulation." Reprod Biol Endocrinol 7: 67.
  
  Gan, H., L. Wen, S. Liao, X. Lin, T. Ma, J. Liu, C. X. Song, M. Wang, C. He, C. Han and F. Tang (2013). "Dynamics of 5-hydroxymethylcytosine during mouse spermatogenesis." Nat Commun 4: 1995. Igyarto, B. Z. and Z. Qin (2024). "The mRNA-LNP vaccines - the good, the bad and the ugly?" Front Immunol 15: 1336906.
  
  Ishii, T. (2017). "Germ line genome editing in clinics: the approaches, objectives and global society." Brief Funct Genomics 16(1): 46-56.
  
  Kanduser, M., D. Miklavcic and M. Pavlin (2009). "Mechanisms involved in gene electrotransfer using high- and low-voltage pulses--an in vitro study." Bioelectrochemistry 74(2): 265-271.
  
  Kubota, H., Y. Hayashi, Y. Kubota, K. Coward and J. Parrington (2005). "Comparison of two methods of in vivo gene transfer by electroporation." Fertil Steril 83 Suppl 1: 1310-1318.
  
  Michaelis, M., A. Sobczak and J. M. Weitzel (2014). "In vivo microinjection and electroporation of mouse testis." J Vis Exp(90).
  
  Muramatsu, T., O. Shibata, S. Ryoki, Y. Ohmori and J. Okumura (1997). "Foreign gene expression in the mouse testis by localized in vivo gene transfer." Biochem Biophys Res Commun 233(1): 45-49.
  
  Raina, A., S. Kumar, R. Shrivastava and A. Mitra (2015). "Testis mediated gene transfer: in vitro transfection in goat testis by electroporation." Gene 554(1): 96-100.
  
  Sadelain, M., E. P. Papapetrou and F. D. Bushman (2011). "Safe harbours for the integration of new DNA in the human genome." Nat Rev Cancer 12(1): 51-58.
  
  Thonneau, P., S. Marchand, A. Tallec, M. L. Ferial, B. Ducot, J. Lansac, P. Lopes, J. M. Tabaste and A. Spira (1991). "Incidence and main causes of infertility in a resident population (1,850,000) of three French regions (1988-1989)." Hum Reprod 6(6): 811-816.
  
  Usmani, A., N. Ganguli, H. Sarkar, S. Dhup, S. R. Batta, M. Vimal, N. Ganguli, S. Basu, P. Nagarajan and S. S. Majumdar (2013). "A non-surgical approach for male germ cell mediated gene transmission through transgenesis." Sci Rep 3: 3430.
  
  Wang, L., C. Liu, H. Wei, Y. Ouyang, M. Dong, R. Zhang, L. Wang, Y. Chen, Y. Ma, M. Guo, Y. Yu, Q. Y. Sun and W. Li (2022). "Testis electroporation coupled with autophagy inhibitor to treat nonobstructive azoospermia." Mol Ther Nucleic Acids 30: 451-464.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.12.571239v3
www.biorxiv.org www.biorxiv.org

Large-scale characterization of drug mechanism of action using proteome-wide thermal shift assays

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This is an interesting and potentially important paper, which however has some deficiencies.
  
  Strengths:
  
  A significant amount of potentially useful data.
  
  Weaknesses:
  
  One issue is a confusion of thermal stability with solubility. While thermal stability of a protein is a thermodynamic parameter that can be described by the Gibbs-Helmholtz equation, which relates the free energy difference between the folded and unfolded states as a function of temperature, as well as the entropy of unfolding. What is actually measured in PISA is a change in protein solubility, which is an empirical parameter affected by a great many variables, including the presence and concentration of other ambient proteins and other molecules. One might possibly argue that in TPP, where one measures the melting temperature change ∆Tm, thermal stability plays a decisive or at least an important role, but no such assertion can be made in PISA analysis that measures the solubility shift.
  
  We completely agree with the insightful comment from the reviewer and we are very grateful that the point was raised. Our goal was to make this manuscript easily accessible to the entire scientific community, not just experts in the field. In an attempt to simplify the language, we likely also simplified the underlying physical principles that these assays exploit. In defense of our initial manuscript, we did state that PISA measures “a fold change in the abundance of soluble protein in a compound-treated sample vs. a vehicle-treated control after thermal denaturation and high-speed centrifugation.” Despite this attempt to accurately communicate the reviewer’s point, we seem to have not been sufficiently clear. Therefore, we tried to further elaborate on this point and made it clear that we are measuring differences in solubility and interpreting these differences as changes in thermal stability.
  
  In the revised version of the manuscript, we elaborated significantly on our original explanation. The following excerpt appears in the introduction (p. 3):
  
  “So, while CETSA and TPP measure a change in melting temperature (∆TM), PISA measures a change in solubility (∆SM). Critically, there is a strong correlation between ∆TM and ∆SM, which makes PISA a reliable, if still imperfect, surrogate for measuring direct changes in protein thermal stability (Gaetani et al., 2019; Li et al., 2020). Thus, in the context of PISA, a change in protein thermal stability (or a thermal shift) can be defined as a fold change in the abundance of soluble protein in a compoundtreated sample vs. a vehicle-treated control after thermal denaturation and high-speed centrifugation. Therefore, an increase in melting temperature, which one could determine using CETSA or TPP, will lead to an increase in the area under the curve and an increase in the soluble protein abundance relative to controls (positive log2 fold change). Conversely, a decrease in melting temperature will result in a decrease in the area under the curve and a decrease in the soluble protein abundance relative to controls (negative log2 fold change).”
  
  And the following excerpt appears in the results section (p. 4):
  
  “In a PISA experiment, a change in melting temperature or a thermal shift is approximated as a
  
  significant deviation in soluble protein abundance following thermal melting and high-speed centrifugation. Throughout this manuscript, we will interpret these observed alterations in solubility as changes in protein thermal stability. Most commonly this is manifested as a log2 fold change comparing the soluble protein abundance of a compound treated sample to a vehicle-treated control (Figure 1 – figure supplement 1A).”
  
  We have now drawn a clear distinction between what we were actually measuring (changes in solubility) and how we were interpreting these changes (as thermal shifts). We trust that the Reviewer will agree with this point, as they rightly claim that many of the observations presented in our work, which measures thermal stability, indirectly, are consistent with previous studies that measured thermal stability, directly. Again, we thank the reviewer for raising the point and feel that these changes have significantly improved the manuscript.
  
  Another important issue is that the authors claim to have discovered for the first time a number of effects well described in prior literature, sometimes a decade ago. For instance, they marvel at the differences between the solubility changes observed in lysate versus intact cells, while this difference has been investigated in a number of prior studies. No reference to these studies is given during the relevant discussion.
  
  We thank the reviewer for raising this point. Our aim with this paper was to test the proficiency of this assay in high-throughput screening-type applications. We considered these observations as validation of our workflow, but admit that our choice of wording was not always appropriate and that we should have included more references to previous work. It was certainly never our intention to take credit for these discoveries. Therefore, we were more than happy to include more references in the revised version. We think that this makes the paper considerably better and will help readers better understand the context of our study.
  
  The validity of statistical analysis raises concern. In fact, no calculation of statistical power is provided.
  
  As only two replicates were used in most cases, the statistical power must have been pretty limited. Also, there seems to be an absence of the multiple-hypothesis correction.
  
  We agree with the reviewer that a classical comparison using a t-test would be underpowered comparing all log2 normalized fold changes. We know from the data and our validation experiments that stability changes that generate log2 fold changes of 0.2 are indicative of compound engagement. When we use 0.2 to calculate power for a standard two-sample t-test with duplicates, we estimated this to have a power of 19.1%. Importantly, increasing this to n=3 resulted in a power estimate of only 39.9%, which would canonically still be considered to be underpowered. Thus, it is important to note that we instead use the distribution of all measurements for a single protein across all compound treatments to calculate standard deviations (nSD) as presented in this work. Thus, rather than a 2-by-2 comparison, we are comparing two duplicate compound treatments to 94 other compound treatments and 18 DMSO vehicle controls. Moreover, we are using this larger sample set to estimate the sampling distribution. Estimating this with a standard z-test would result in a p-value estimate <<< 0.0001 using the population standard deviation. Additionally, rather than estimate an FDR using say a BenjaminiHochberg correction, we estimated an empirical FDR for target calls based on applying the same cutoffs to our DMSO controls and measuring the proportion of hits called in control samples at each set of thresholds. Finally, we note that several other PISA-based methods have used fold-change thresholds similar to, or less than, those employed in this work (PMID: 35506705, 36377428, 34878405, 38293219).
  
  Also, the authors forgot that whatever results PISA produces, even at high statistical significance, represent just a prediction that needs to be validated by orthogonal means. In the absolute majority of cases such validation is missing.
  
  We appreciate this point and we can assure the reviewer that this point was not lost on us. To this point, we state throughout the paper that the primary purpose of this paper was to execute a chemical screen. Furthermore, we do not claim to present a definitive list of protein targets for each compound. Instead, our intention is to provide a framework for performing PISA studies at scale. In total, we quantified thousands of changes and feel that it would be unreasonable to validate the majority of these cases. Instead, as has been done for CETSA (PMID: 34265272), PISA (PMID: 31545609), and TPP (PMID: 25278616) experiments before, we chose to highlight a few examples and provide a reasonable amount of validation for these specific observations. In Figure 2, we show that two screening compounds—palbociclib and NVP-TAE-226—have a similar impact on PLK1 solubility as the two know PLK1 inhibitors. We then assay each of these compounds, alongside BI 2536, and show that the same compounds that impact the solubility of PLK1, also inhibit its activity in cell-based assays. Finally, we model the structure of palbociclib (which is highly similar to BI 2536) in the PLK1 active site. In Figure 4, we show that AZD-5438 causes a change in solubility of RIPK1 in cell- and lysate-based assays to a similar extent as other compounds known to engage RIPK1. We then test these compounds in cellbased assays and show that they are capable of inhibiting RIPK1 activity in vivo. Finally, in Figure 5, we show that treatment with tyrosine kinase inhibitors and AZD-7762 result in a decrease in the solubility of CRKL. We showed that these compounds, specifically, prevented the phosphorylation of CRKL at Y207. Next, we show that AZD-7762, impacts the thermal stability of tyrosine kinases in lysate-based PISA. Finally, we performed phosphoproteomic profiling of cells treated with bafetinib and AZD-7762 and find that the abundance of many pY sites is decreased after treatment with each compound. It is also worth stating that an important goal of this study was to determine the proficiency of these methods in identifying the targets of each compound. We do not feel that comprehensive validation of the “absolute majority of cases” would significantly improve this manuscript.
  
  Finally, to be a community-useful resource the paper needs to provide the dataset with a user interface so that the users can data-mine on their own.
  
  We agree and are working to develop an extensible resource for this. Owing to the size and complexities there, that work will need to be included in a follow-up manuscript. For now, we feel that the supplemental table we provide can be easily navigated the full dataset. Indeed, this has been the main resource that we have been emailed about since the preprint was first made public. We are glad that the Reviewer considers this dataset to be a highly valuable resource for the scientific community.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Using K562 (Leukemia) cells as an experimental model, Van Vracken et. al. use Thermal Proteome Profiling (TPP) to investigate changes in protein stability after exposing either live cells or crude cell lysates to a library of anti-cancer drugs. This was a large-scale and highly ambitious study, involving thousands of hours of mass spectrometry instrument time. The authors used an innovative combination of TPP together with Proteome Integral Solubility Alternation (PISA) assays to reduce the amount of instrument time needed, without compromising on the amount of data obtained.
  
  The paper is very well written, the relevance of this work is immediately apparent, and the results are well-explained and easy to follow even for a non-expert. The figures are well-presented. The methods appear to be explained in sufficient detail to allow others to reproduce the work.
  
  We thank the reviewer. One of our major goals was to make these assays and the resulting data approachable, especially for non-experts. We are glad that this turned out to be the case.
  
  Strengths:
  
  Using CDK4/6 inhibitors, the authors observe strong changes in protein stability upon exposure to the drug. This is expected and shows their methodology is robust. Further, it adds confidence when the authors report changes in protein stability for drugs whose targets are not well-known. Many of the drugs used in this study - even those whose protein targets are already known - display numerous offtarget effects. Although many of these are not rigorously followed up in this current study, the authors rightly highlight this point as a focus for future work.
  
  Weaknesses:
  
  While the off-target effects of several drugs could've been more rigorously investigated, it is clear the authors have already put a tremendous amount of time and effort into this study. The authors have made their entire dataset available to the scientific community - this will be a valuable resource to others working in the fields of cancer biology/drug discovery.
  
  We agree with the reviewer that there are more leads here that could be followed and we look forward to both exploring these in future work and seeing what the community does with these data.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This work aims to demonstrate how recent advances in thermal stability assays can be utilised to screen chemical libraries and determine the compound mechanism of action. Focusing on 96 compounds with known mechanisms of action, they use the PISA assay to measure changes in protein stability upon treatment with a high dose (10uM) in live K562 cells and whole cell lysates from K562 or HCT116. They intend this work to showcase a robust workflow that can serve as a roadmap for future studies.
  
  Strengths:
  
  The major strength of this study is the combination of live and whole cell lysates experiments. This allows the authors to compare the results from these two approaches to identify novel ligand-induced changes in thermal stability with greater confidence. More usefully, this also enables the authors to separate the primary and secondary effects of the compounds within the live cell assay.
  
  The study also benefits from the number of compounds tested within the same framework, which allows the authors to make direct comparisons between compounds.
  
  These two strengths are combined when they compare CHEK1 inhibitors and suggest that AZD-7762 likely induces secondary destabilisation of CRKL through off-target engagement with tyrosine kinases.
  
  Weaknesses:
  
  One of the stated benefits of PISA compared to the TPP in the original publication (Gaetani et al 2019) was that the reduced number of samples required allows more replicate experiments to be performed. Despite this, the authors of this study performed only duplicate experiments. They acknowledge this precludes the use of frequentist statistical tests to identify significant changes in protein stability. Instead, they apply an 'empirically derived framework' in which they apply two thresholds to the fold change vs DMSO: absolute z-score (calculated from all compounds for a protein) > 3.5 and absolute log2 fold-change > 0.2. They state that the fold-change threshold was necessary to exclude nonspecific interactors. While the thresholds appear relatively stringent, this approach will likely reduce the robustness of their findings in comparison to an experimental design incorporating more replicates. Firstly, the magnitude of the effect size should not be taken as a proxy for the importance of the effect.
  
  They acknowledge this and demonstrate it using their data for PIK3CB and p38α inhibitors (Figures 2BC). They have thus likely missed many small, but biologically relevant changes in thermal stability due to the fold-change threshold. Secondly, this approach relies upon the fold-changes between DMSO and compound for each protein being comparable, despite them being drawn from samples spread across 16 TMT multiplexes. Each multiplex necessitates a separate MS run and the quantification of a distinct set of peptides, from which the protein-level abundances are estimated. Thus, it is unlikely the fold changes for unaffected proteins are drawn from the same distribution, which is an unstated assumption of their thresholding approach. The authors could alleviate the second concern by demonstrating that there is very little or no batch effect across the TMT multiplexes. However, the first concern would remain. The limitations of their approach could have been avoided with more replicates and the use of an appropriate statistical test. It would be helpful if the authors could clarify if any of the missed targets passed the z-score threshold but fell below the fold-change threshold.
  
  The authors use a single, high, concentration of 10uM for all compounds. Given that many of the compounds likely have low nM IC50s, this concentration will often be multiple orders of magnitude above the one at which they inhibit their target. This makes it difficult to assess the relevance of the offtarget effects identified to clinical applications of the compounds or biological experiments. The authors acknowledge this and use ranges of concentrations for follow-up studies (e.g. Figure 2E-F). Nonetheless, this weakness is present for the vast bulk of the data presented.
  
  We agree that there is potential to drive off-target effects at such high-concentrations. However, we note that the concentration we employ is in the same range as previous PISA/CETSA/TPP studies. For example, 10 µM treatments were used in the initial descriptions of TPP (Savitski et al., 2014) and PISA (Gaetani et al., 2019). We also note that temperature may affect off-rates and binding interactions (PMID: 32946682) potentiating the need to use compound concentrations to overcome these effects.
  
  Additionally, these compounds likely accumulate in human plasma/tissues at concentrations that far exceed the compound IC50 values. For example, in patients treated with a standard clinical dose of ribocicilb, the concentration of the compound in the plasma fluctuates between 1 µM and 10 µM. (Bao, X., Wu, J., Sanai, N., & Li, J. (2019). Determination of total and unbound ribociclib in human plasma and brain tumor tissues using liquid chromatography coupled with tandem mass spectrometry. Journal of pharmaceutical and biomedical analysis, 166, 197–204. https://doi.org/10.1016/j.jpba.2019.01.017)
  
  The authors claim that combining cell-based and lysate-based assays increases coverage (Figure 3F) is not supported by their data. The '% targets' presented in Figure 3F have a different denominator for each bar. As it stands, all 49 targets quantified in both assays which have a significant change in thermal stability may be significant in the cell-based assay. If so, the apparent increase in % targets when combining reflects only the subsetting of the data. To alleviate this lack of clarity, the authors could update Figure 3F so that all three bars present the % targets figure for just the 60 compounds present in both assays.
  
  We spent much time debating the best way to present this data, so we are grateful for the feedback. Consistent with the Reviewer’s suggestion, we have included a figure that only considers the 60 compounds for which a target was quantified in both cell-based and lysate-based PISA (now Figure 3E). In addition, we included a pie chart that further illustrates our point (now Figure 3 – figure supplement 2A). Of the 60 compounds, there were 37 compounds that had a known target pass as a hit using both approaches, 6 compounds that had a known target pass as a hit in only cell-based experiments, and 6 compounds that had a known target pass as a hit in only lysate-based experiments.
  
  Within the Venn diagram, we also included a few examples of compounds that fit into each category. Furthermore, we highlighted two examples of compound-target pairs that pass as a hit with one approach, but not the other (Figure 3 – figure supplement 2B,C). We would also like to refer the reviewer to Figure 4D, which indicates that BRAF inhibitors cause a significant change in BRAF thermal stability in lysates but not cells.
  
  Aims achieved, impact and utility:
  
  The authors have achieved their main aim of presenting a workflow that serves to demonstrate the potential value of this approach. However, by using a single high dose of each compound and failing to adequately replicate their experiments and instead applying heuristic thresholds, they have limited the impact of their findings. Their results will be a useful resource for researchers wishing to explore potential off-target interactions and/or mechanisms of action for these 96 compounds, but are expected to be superseded by more robust datasets in the near future. The most valuable aspect of the study is the demonstration that combining live cell and whole cell lysate PISA assays across multiple related compounds can help to elucidate the mechanisms of action.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  More specifically:
  
  P 1 l 20, we quantified 1.498 million thermal stability measurements.
  
  It's a staggering assertion, and it takes some reading to realize that the authors mean the total number of proteins identified and quantified in all experiments. But far from all of these proteins were quantified with enough precision to provide meaningful solubility shifts.
  
  We can assure the reviewer that we were not trying to deceive the readers. We stated ‘1.498 million thermal stability measurements.’ We did not say 1.498 million compound-specific thermal stability shifts.’ We assume that most readers will appreciate that the overall quality of the measurements will be variable across the dataset, e.g., in any work that describes quantitation of thousands of proteins in a proteomics dataset. In accordance with the Reviewer’s suggestion, we have weakened this statement. The revised version of the manuscript now reads as follows (p. 1):
  
  “Taking advantage of this advance, we quantified more than one million thermal stability measurements in response to multiple classes of therapeutic and tool compounds (96 compounds in living cells and 70 compounds in lysates).”
  
  P 7 l 28. We observed a large range of thermal stability measurements for known compound-target pairs, from a four-fold reduction in protein stability to a four-fold increase in protein stability upon compound engagement (Figure 2A).
  
  PISA-derived solubility shift cannot be interpreted simply as a "four-fold reduction/increase in protein stability".
  
  We thank the Reviewer for highlighting this specific passage and agree that it was worded poorly. As such, we have modified the manuscript to the following (p. 8):
  
  “We observed a large range of thermal stability measurements for known compound-target pairs, from a four-fold reduction in protein solubility after thermal denaturation to a four-fold increase in protein solubility upon compound engagement (Figure 2A).”
  
  P 8, l 6. Instead, we posit that maximum ligand-induced change in thermal stability is target-specific.
  
  Yes, that's right, but this has been shown in a number of prior studies.
  
  We agree with the reviewer and accept that we made a mistake in how we worded this sentence, which we regret upon reflection. As such, we have modified this sentence to the following:
  
  “Instead, our data appears to be consistent with the previous observation that the maximum ligandinduced change in thermal stability is target-specific (Savitski et al., 2014; Becher et al., 2016).”
  
  P 11 l 7. Combining the two approaches allows for greater coverage of the cellular proteome and provides a better chance of observing the protein target for a compound of interest. In fact, the main difference is that in-cell PISA provides targets in cases when the compound is a pro-drug that needs to be metabolically processed before engaging the intended target. This has been shown in a number of prior studies, but not mentioned in this manuscript.
  
  While our study was not focused on the issue of pro-drugs, this is an important point and we would be happy to re-iterate it in our manuscript. We thank the Reviewer for the suggestion and have modified the manuscript to reflect this point (p. 19):
  
  “Cell-based studies, on the other hand, have the added potential to identify the targets of pro-drugs that must be metabolized in the cell to become active and secondary changes that occur independent of direct engagement (Savitski et al., 2014; Franken et al., 2015; Almqvist et al., 2016; Becher et al., 2016; Liang et al., 2022).”
  
  While we are happy to make this change, we also would like to point out that the reviewer’s assertions that, “the main difference is that in-cell PISA provides targets in cases when the compound is a prodrug that needs to be metabolically processed before engaging the intended target” also may not fully capture the nuances of protein engagement effectors in the cellular context. Thus, we believe it is important to highlight the ability of cell-based assays to identify secondary changes in thermal stability.
  
  P 11 l 28. These data suggest that the thermal destabilization observed in cell-based experiments might stem from a complex biophysical rearrangement. That's right because it is not about thermal stability, but about protein solubility which is much affected by the environment.
  
  We agree that the readout of solubility is an important caveat for nearly every experiment in the family of assays associated with ‘thermal proteome profiling’. Inherently complex biophysical arrangements could affect the inherent stability and solubility of a protein or complex. Thus, we would be happy to make the following change consistent with the reviewer’s suggestion (p. 12):
  
  “These data suggest that the decrease in solubility observed in cell-based experiments might stem from a complex biophysical rearrangement.”
  
  P 12 l 7 A). Thus, certain protein targets are more prone to thermal stability changes in one experimental setting compared to the other. Same thing - it's about solubility, not stability.
  
  We thank the Reviewer for the recommendation and have modified the revised manuscript as follows (p. 13):
  
  “Thus, certain protein targets were more prone to solubility (thermal stability) changes in one experimental setting compared to the other (Huber et al., 2015).”
  
  P13 l 15. While the data suggests that cell- and lysate-based PISA are equally valuable in screening the proteome for evidence of target engagement... No, they are not equally valuable - cell-based PISA can provide targets of prodrugs, which lysate PISA cannot.
  
  We have removed this sentence to avoid any confusion. We will not place any value judgments on the two approaches.
  
  P 18 l 10. In general, a compound-dependent thermal shift that occurs in a lysate-based experiment is almost certain to stem from direct target engagement. That's true and has been known for a decade. Reference needed.
  
  We recognize this oversight and would be happy to include references. The revised manuscript reads as follows:
  
  “In general, a compound-dependent thermal shift that occurs in a lysate-based experiment is almost certain to stem from direct target engagement (Savitski et al., 2014; Becher et al., 2016). This is because cell signaling pathways and cellular structures are disrupted and diluted. Cell-based studies, on the other hand, have the added potential to identify the targets of pro-drugs that must be metabolized in the cell to become active and secondary changes that occur independent of direct engagement (Savitski et al., 2014; Franken et al., 2015; Almqvist et al., 2016; Becher et al., 2016; Liang et al., 2022).”
  
  P 18 l 29. the data seemed to indicate that the maximal PISA fold change is protein-specific. Therefore, a log2 fold change of 2 for one compound-protein pair could be just as meaningful as a log2 fold change of 0.2 for another. This is also not new information.
  
  We again appreciate the Reviewer for highlighting this oversight. The revised manuscript reads as follows:
  
  “Ultimately, the data seemed to be consistent with previous studies that indicate the maximal change in thermal stability in protein specific (Savitski et al., 2014; Becher et al., 2016; Sabatier et al., 2022). Therefore, a log2 fold change of 2 for one compound-protein pair could be just as meaningful as a log2 fold change of 0.2 for another.”
  
  P 19 l 5. Specifically, the compounds that most strongly impacted the thermal stability of targets, also acted as the most potent inhibitors. I wish this was true, but this is not always so. For instance, in Nat Meth 2019, 16, 894-901 it was postulated that large ∆Tm correspond to biologically most important sites ("hot spots") - the idea that was later challenged and largely discredited in subsequent studies.
  
  Indeed, we agree with the Reviewer that there may be no essential connection between these. Rather, we are simply drawing conclusions from observations within the presented dataset.
  
  Saying nothing about the work presented in the paper that the reviewer notes above, the referenced definition is also more nuanced “…we hypothesized that ‘hotspot’ modification sites identified in this screen (namely, those significantly shifted relative to the unmodified, bulk and even other phosphomodiforms of the same protein) may represent sites with disproportionate effects on protein structure and function under specific cellular conditions.” Indeed, in the response to that work, Potel et al. (https://doi.org/10.1038/s41592-021-01177-5) “agree with the premise of the Huang et al. study that phosphorylation sites that have a significant effect on protein thermal stability are more likely to be functionally relevant, for example, by modulating protein conformation, localization and protein interactions.”
  
  Anecdotally, we also speculate that if we observe proteome engagement for two compounds (let’s say two ATP-competitive kinase inhibitors) that bind in the same pocket (let’s say the ATP binding site) and one causes a greater change in solubility, then it is reasonable to assume that it is a stronger evidence and we see evidence supporting this claim in Figure 2, Figure 3, Figure 4, and Figure 5.
  
  It is also important to point out that previous work has also made similar points. This is highlighted in a review article by Mateus et al. (10.1186/s12953-017-0122-4). The authors state, “To obtain affinity estimates with TPP, a compound concentration range TPP (TPP-CCR) can be performed. In TPPCCR, cells are incubated with a range of concentrations of compound and heated to a single temperature.” In support of this claim, the authors reference two papers—Savitski et al., 2014 and Becher et al., 2016. We have updated this section in the revised manuscript (p. 20):
  
  “While the primary screen was carried out at fixed dose, the increased throughput of PISA allowed for certain compounds to be assayed at multiple doses in a single experiment. In these instances, there was a clear dose-dependent change in thermal stability of primary targets, off-targets, and secondary targets. This not only helped corroborate observations from the primary screen, but also seemed to provide a qualitative assessment of relative compound potency in agreement with previous studies (Savitski et al., 2014; Becher et al., 2016; Mateus et al., 2017). Specifically, the compounds that most strongly impacted the thermal stability of targets, also acted as the most potent inhibitors. In order to be a candidate for this type of study, a target must have a large maximal thermal shift (magnitude of log2 fold change) because there must be a large enough dynamic range to clearly resolve different doses.”
  
  Also, the compound efficacy is strongly dependent upon the residence time of the drug, which may or may not correlate with the PISA shift. Also important is the concentration at which target engagement occurs (Anal Chem 2022, 94, 15772-15780).
  
  In our study, the time and concentration of treatment and was fixed for all compounds at 30 minutes and 10 µM, respectively. Therefore, we do not believe these parameters will affect our conclusions.
  
  P 19 l 19. For example, we found that the clinically-deployed CDK4/6 inhibitor palbociclib is capable of directly engaging and inhibiting PLK1. This is a PISA-based prediction that needs to be validated by orthogonal means.
  
  As we demonstrate in this work, the PISA assays serve as powerful screening methods, thus we agree that validation is important for these types of studies. To this end, we show the following:
  
  • Proteomics: Palbociclib causes a decrease in solubility following thermal melting in cells.
  
  • Chemical Informatic: Palbociclib is structurally similar to BI 2536.
  
  • Protein informatics: Modeling of palbociclib in empirical structures of the PLK1 active site generates negligible steric clashes.
  
  • Biochemical: Palbociclib inhibits PLK1 activity in cells.
  
  We have changed this text to the following to clarify these points:
  
  “For example, we found that the clinically-deployed CDK4/6 inhibitor palbociclib has a dramatic impact on PLK1 thermal stability in live cells, is capable of inhibiting PLK1 activity in cell-based assays, and can be modelled into the PLK1 active site.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  I am wondering why the authors chose to use K562 (leukaemia) cells in this work as opposed to a different cancer cell line (HeLa? Panc1?). It would be helpful if the authors could present some rationale for this decision.
  
  This is a great question. Two reasons really. First, they are commonly used in various fields of research, especially previous studies using proteome-wide thermal shift assays (PMID: 25278616, 32060372) and large scale chemical perturbations screens (PMID: 31806696). Second, they are a suspension line that makes executing the experiments easier because they do not need to be detached from a plate prior to thermal melting. We think this is a valuable point to make in the manuscript, such that non-experts understand this concept. We tried to communicate this succinctly in the revised manuscript, but would be happy to elaborate further if the Reviewer would like us to.
  
  “To enable large-scale chemical perturbation screening, we first sought to establish a robust workflow for assessing protein thermal stability changes in living cells. We chose K562 cells, which grow in suspension, because they have been frequently used in similar studies and can easily be transferred from a culture flask to PCR tubes for thermal melting (Savitski et al., 2014; Jarzab et al., 2020).”
  
  I note that integral membrane proteins are over-represented among targets for anti-cancer therapeutics. To what extent is the membrane proteome (plasma membrane in particular) identified in this work? After examining the methods, I would expect at least some integral membrane proteins to be identified. Do the authors observe any differences in the behaviour of water-soluble proteins versus integral membrane proteins in their assays? It would be helpful if the authors could comment on this in a potential revision.
  
  We agree this is an important point when considering the usage of PISA and thermal stability assays in general for specific classes of therapeutics. To address this, we explored what effect the analysis of thermal stability/solubility had on the proportion of membrane proteins in our data (Author response image 1). Annotations were extracted from Uniprot based on each protein being assigned to the “plasma membrane” (07/2024). We quantified 1,448 (16.5% of total proteins) and 1,558 (17.3% of total proteins) membrane proteins in our cell and lysate PISA datasets, respectively. We also compared the proportion of annotated proteins in these datasets to a recent TMTpro dataset (Lin et al.; PMID: 38853901) and found that the PISA datasets recovered a slightly lower proportion of membrane proteins (~17% in PISA versus 18.9% in total proteome analysis). Yet, we note that we expect more membrane proteins in urea/SDS based lysis methods compared to 0.5% NP-40 extractions.
  
  Author response image 1.
  
  We were not able to find an appropriate place to insert this data into the manuscript, so we have left is here in the response. If the Reviewer feels strongly that this data should be included in the manuscript, we would be happy to include these data.
  
  A final note: I commend the authors for making their full dataset publicly available upon submission to this journal. This data promises to be a very useful resource for those working in the field.
  
  We thank the Reviewer for this and note that we are excited for this data to be of use to the community.
  
  Reviewer #3 (Recommendations For The Authors):
  
  There is no dataset PDX048009 in ProteomeXchange Consortium. I assume this is because it's under an embargo which needs to be released.
  
  We can confirm that data was uploaded to ProteomeXchange.
  
  MS data added to the manuscript during revisions was submitted to ProteomeXchange with the identifier – PDX053138.
  
  Page 9 line 5 refers to 59 compounds quantified in both cell-based and lysate-based, but Figure 3E shows 60 compounds quantified in both. I believe these numbers should match.
  
  We thank the Reviewer for catching this. In response to critiques from this Reviewer in the Public Review, we re-worked this section considerably. Please see the above critique/response for more details.
  
  Page 10, lines 26-28: It would help the reader if some of the potential 'artefactual effects of lysatebased analyses' were described briefly.
  
  We thank the Reviewer for raising this point. The truth is, that we are not exactly sure what is happening here, but we know that, at least, for vorinostat, this excess of changes in lysate-based PISA is consistent across experiments. We also do not see pervasive issues within the plexes containing these compounds. Therefore, we do not think this is due to a mistake or other experimental error. We hypothesize that the effect might result from a change in pH or other similar property that occurs upon addition of the molecule, though we note that we have previously seen that vorinostat can induce large numbers of solubility changes in a related solvent shift assays (doi: 10.7554/eLife.70784). We have modified the text to indicate that we do not fully understand the reason for the observation (p. 11):
  
  “It is highly unlikely that these three molecules actively engage so many proteins and, therefore, the 2,176 hits in the lysate-based screen were likely affected in part by consistent, but artefactual effects of lysate-based analyses that we do not fully understand (Van Vranken et al., 2021).”
  
  Page 24, lines 29-30 appear to contain a typo. I believe the '>' should be '<' or the 'exclude' should be 'retain'.
  
  The Reviewer is completely correct. We appreciate the attention to detail. This mistake has been corrected in the revised manuscript.
  
  Page 25, lines 5-7: The methods need to explain how the trimmed standard deviation is calculated.
  
  We apologize for this oversight. To calculate the trimmed standard deviation, we used proteins that were measured in at least 30 conditions. For these, we then removed the top 5% of absolute log2 foldchanges (compared to DMSO controls) and calculated the standard deviation of the resulting set of log2 fold-changes. This is similar in concept to the utilization of “trimmed means” in proteomics data (https://doi.org/10.15252/msb.20145625), which helps to overcome issues due to extreme outliers in datasets. We have added the following statement to the methods to clarify this point (p. 27):
  
  “Second, for each protein across all cells or lysate assays, the number of standard deviations away from the mean thermal stability measurement (z-score) for a given protein was quantified based on a trimmed standard deviation. Briefly, the trimmed standard deviation was calculated for proteins that were measured in at least 30 conditions. For these, we removed the top 5% of absolute log2 foldchanges (compared to DMSO controls) and calculated the standard deviation of the resulting set of log2 fold-changes.”
  
  Page 25, lines 9-11 needs editing for clarity.
  
  We tested empirical hit rates for estimation of mean and trimmed standard deviation (trimmedSD) thresholds to apply, to maximize sensitivity and minimizing the ‘False Hit Rate’, or the number of proteins in the DMSO control samples called as hits divided by the total number of proteins called as hits with a given threshold applied.
  
  Author response image 2.
  
  Hit calling threshold setting based on maximizing the total hits called and minimizing the False Hit Rate in cells (number of DMSO hits divided by the total number of hits).
  
  Author response image 3.
  
  Hit calling threshold setting based on maximizing the total hits called and minimizing the False Hit Rate in lysates (number of DMSO hits divided by the total number of hits).
  
  Figure 1 supplementary 2a legend states: '32 DMSO controls'. Should that be 64?
  
  We thank the Reviewer for catching our mistake. This has been corrected in the revised manuscript.
  
  I suggest removing Figure 1 supplementary 3c which is superfluous as only the number it presents is already stated in the text (page 5, line 9).
  
  We thank the Reviewer for the suggestion and agree that this panel is superfluous. It has been removed from the revised manuscript.
  
  New data and tables added during revisions:
  
  (1) Table 3 – All log2 fold change values for the cell-based screen. Using this table, proteincentric solubility profiles can be plotted (as in Figures 2D and others).
  
  (2) Table 4 – All log2 fold change values for the lysate-based screen. Using this table, proteincentric solubility profiles can be plotted (as in Figures 2D and others).
  
  (3) Figure 1 – Figure supplement 3H – Table highlighting proteins that pass log2 fold change cutoffs, but not nSD cutoffs and vice versa.
  
  (4) Figure 2 – Panels H and I were updated with a new color scheme.
  
  (5) Figure 3 – Updated main figure and supplement at the request of Reviewer 3.
  
  • Figure 3E – Compares on-target hits for the cell- and lysate-based screens for all compounds for which a target was quantified in both screens.
  
  • Figure 3 – Figure supplement 2 – Highlights on-target hits in both screens, exclusively in cells, and exclusively in lysates.
  
  (6) Figure 5 – PISA data for K562 lysates treated with AZD-7762 at multiple concentrations.
  
  • Figure 5F
  
  • Figure 5 – Figure supplement 3A-C
  
  • Figure 5 – Source data 2
  
  (7) Figure 5 – Phosphoproteomic profiling of K562 cells treated with AZD7762 or Bafetinib.
  
  • Figure 5G
  
  • Figure 5 – Figure supplement 4A-F
  
  • Figure 5 – Source data 3 (phosphoproteome)
  
  • Figure 5 – Source data 4 (associated proteome data)
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.26.577428v4
www.biorxiv.org www.biorxiv.org

The Rapidly Evolving X-linked miR-506 Family Finetunes Spermatogenesis to Enhance Sperm Competition

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.
  
  The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.
  
  L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.
  
  Reply: Agreed, the corresponding sentences were deleted.
  
  Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?
  
  Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).
  
  **Reviewer #2 (Public Review):
  
  In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.
  
  The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.
  
  (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.
  
  Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.
  
  We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”
  
  We also changed “have gained” to “have” throughout the text to avoid confusion.
  
  (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.
  
  Reply: Agreed. We deleted the data mentioned by this reviewer.
  
  (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.
  
  Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).
  
  (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?
  
  Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).
  
  (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.
  
  Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).
  
  Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.
  
  Reply: Agreed. We appreciate your insightful comments.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.
  
  Strengths:
  
  This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.
  
  Weaknesses:
  
  The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.
  
  Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).
  
  Author response image 1.
  
  sRNA-seq of WT and miR-506 family KO testis samples.
  
  Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.
  
  Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.
  
  Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).
  
  Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.
  
  Reply: Agreed. Revised as suggested.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."
  
  Reply: Agreed. Revised as suggested.
  
  (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.
  
  (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.
  
  Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.
  
  (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.
  
  Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.
  
  (2c) Figure S7H. The panel could be easier to read.
  
  Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.
  
  (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?
  
  Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.
  
  Reviewer #3 (Recommendations For The Authors):
  
  CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.
  
  Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.
  
  The detailed description of the generation of various mutant lines should be included in the Methods.
  
  Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).
  
  References:
  
  (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).
  
  (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).
  
  (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).
  
  (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).
  
  (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).
  
  (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).
  
  (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).
  
  (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).
  
  (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.14.544876v2
www.biorxiv.org www.biorxiv.org

Specific Modulation of CRISPR Transcriptional Activators through RNA-Sensing Guide RNAs in Mammalian Cells and Zebrafish Embryos

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Below, we provide a detailed account of the changes we made. For clarity and ease of review:
  
  •        Original reviewers' comments are included and highlighted in grey
  
  •        Our responses to each comment are written in black text
  
  •        Print screens illustrating the specific changes made to the manuscript are enclosed within black squares
  
  eLife assessment
  
  The authors aim to develop a CRISPR system that can be activated upon sensing an RNA. As an initial step to this goal, they describe RNA-sensing guide RNAs for controlled activation of CRISPR modification. Many of the data look convincing and while several steps remain to achieve the stated goal in an in vivo setting and for robust activation by endogenous RNAs, the current work will be important for many in the field.
  
  The eLife assessment summarises our ambition to create a CRISPR system controlled by RNA sensing. The synopsis provided encapsulates the essence of our research, emphasising both the progress we have made and the challenges that lie ahead. This assessment fully resonates with our views.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This paper describes RNA-sensing guide RNAs for controlled activation of CRISPR modification. This works by having an extended guide RNA with a sequence that folds back onto the targeting sequence such that the guide RNA cannot hybridise to its genomic target. The CRISPR is "activated" by the introduction of another RNA, referred to as a trigger, that competes with this "back folding" to make the guide RNA available for genome targeting. The authors first confirm the efficacy of the approach using several RNA triggers and a GFP reporter that is activated by dCas9 fused to transcriptional activators. A major potential application of this technique is the activation of CRISPR in response to endogenous biomarkers. As these will typically be longer than the first generation triggers employed by the authors they test some extended triggers, which also work though not always to the same extent. They then introduce MODesign which may enable the design of bespoke or improved triggers. After that, they determine that the mode of activation by the RNA trigger involves cleavage of the RNA complexes. Finally, they test the potential for their system to work in a developmental setting - specifically zebrafish embryos. There is some encouraging evidence, though the effects appear more subtle than those originally obtained in cell culture.
  
  Overall, the potential of a CRISPR system that can be activated upon sensing an RNA is high and there are a myriad of opportunities and applications for it. This paper represents a reasonable starting point having developed such a system in principle.
  
  The weakness of the study is that it does not demonstrate that the system can be used in a completely natural setting. This would require an endogenous transcript as the RNA trigger with a clear readout. Such an experiment would clearly strengthen the paper and provide strong confidence that the method could be employed for one of the major applications discussed by the authors. The zebrafish data relied on exogenous RNA triggers whereas the major applications (as I understood them) would use endogenous triggers.
  
  Related, most endogenous RNAs are longer than the various triggers tested and may require extensive modification of the system to be detected or utilised effectively.
  
  While additional data would clearly be beneficial, there should nevertheless be a more detailed discussion of these caveats and/or the strengths and applications of the system as it is presented (i.e. utility with synthetic triggers).
  
  We agree with the observation regarding the subtler effects in the zebrafish embryos and the reliance on exogenous RNA triggers. Indeed, the utilisation of endogenous transcripts as triggers in a natural setting is a logical next step. We further acknowledge the need to delve deeper into the complexities and challenges of our system, particularly concerning the detection of endogenous RNA, thus offering valuable insights for researchers looking to adapt our system for various applications. In order to clarify these limitations, we made some changes in the final version of our paper. The following paragraphs have been therefore included in the manuscript discussion:
  
  “In their current iteration, iSBH-sgRNAs show considerable promise for mammalian synthetic biology applications. Specifically, their ability to detect synthetic triggers could be pivotal in the development of complex synthetic RNA circuits and logic gates, thereby advancing the field of cellular reprogramming. However, further work is required to achieve better ON/OFF activation ratios in vivo and more homogeneous activity across tissues in the presence of RNA triggers. Additional chemical modifications could improve iSBH-sgRNA properties, and we believe that chemical modification strategies adopted for siRNA drugs or antisense oligos (Khvorova and Watts (2017)) could also be essential for further iSBH-sgRNA technology development. As iSBH-sgRNAs might be targeted by endogenous nucleases, leading to their degradation, a strategy for preventing this could involve additional chemical modifications. When inserted at certain key positions, such modifications could prevent interaction between iSBH-sgRNAs and cellular enzymes by introducing steric clashes or inhibiting RNA hydrolysis.
  
  Once achieving superior dynamic ranges of iSBH-sgRNA activation in vivo, the next steps would involve understanding the classes of endogenous RNAs that could act as triggers. The chances that an iSBH-sgRNA encounters an endogenous RNA trigger inside a cell would depend on the relative concentrations of the two RNA species. Therefore, a first step towards determining potential endogenous RNA triggers will involve identifying RNA species with comparable expression levels as iSBH-sgRNAs. Then, iSBH-sgRNAs could be designed against these RNA species, followed by experimental validation. It is important to note that eukaryotic cells express a wide range of transcripts of varying sizes, expression levels, and subcellular localisations, all of which could greatly affect iSBH-sgRNA activation levels. Based on the data presented here, we speculate that RNA species up to 300nt that are also highly expressed might act as good triggers. Furthermore, as sgRNAs are involved in targeting Cas9 to genomic DNA in the nucleus, attempting to detect transcripts that are sequestered in the nucleus might also provide additional benefit.”
  
  Reviewer #3 (Public Review):
  
  In this work, the authors describe engineering of sgRNAs that render Cas9 DNA binding controllable by a second RNA trigger. The authors introduce several iterations of their engineered sgRNAs, as well as a computational pipeline to identify designs for user-specified RNA triggers which offers a helpful alternative to purely rational design. Also included is an investigation of the fate of the engineered sgRNAs when introduced into cells, and the use of this information to inform installation of modified nucleotides to improve engineered sgRNA stability. Engineered sgRNAs are demonstrated to be activated by trigger RNAs in both cultured mammalian cells and zebrafish.
  
  The conclusions made by the authors in this work are predominantly supported by the data provided. However, some claims are not consistent with the data shown and some of the figures would benefit from revision or further clarification.
  
  Strengths:
  
  - The sgRNA engineering in this paper is performed and presented in a systematic and logical fashion.
  
  - Inclusion of a computational method to predict iSBH-sgRNAs adds to the strength of the engineering.
  
  - Investigation into the cellular fate of the engineered sgRNAs and the use of this information to guide inclusion of chemically modified nucleotides is also a strength.
  
  - Demonstration of activity in both cultured mammalian cells and in zebrafish embryos increases the impact and utility of the technology reported in this work.
  
  Weaknesses:
  
  - While the methods here represent an important step forward in advancing the technology, they still fall short of the dynamic range and selectivity likely required for robust activation by endogenous RNA.
  
  - While the iSBH-sgRNAs where the RNA trigger overlaps with the spacer appear to function robustly, the modular iSBH-sgRNAs seem to perform quite a bit less well. The authors state that modular iSBHsgRNAs show better activity without increasing background when the SAM system is added, but this is not supported by the data shown in Figure 3D, where in 3 out of 4 cases CRISPR activation in the absence of the RNA trigger is substantially increased.
  
  - There is very little discussion of how the performance of the technology reported in this work compares to previous iterations of RNA-triggered CRISPR systems, of which there are many examples.
  
  Concerning the methods falling short of the dynamic range and selectivity required for robust activation by endogenous RNA, we acknowledge this limitation and recognise the need for improvement in this area. In the resubmitted version of the manuscript, we provided a detailed discussion on how the selection of appropriate triggers might partially improve dynamic ranges and selectivity. This includes an exploration of various strategies and considerations that may enhance the robustness of our system (print screen above, also used for addressing Reviewer #1 comments).
  
  Regarding the inconsistent performance of the modular iSBH-sgRNAs, we acknowledge that modular iSBH-sgRNAs seem to perform slightly less well than first- and second-generation designs. In order to illustrate this, we modified corresponding bar graphs to include fold turn-on iSBH-sgRNA activation in addition to significance (Figures 1, 2 and 3 of the manuscript). We also acknowledge this fact in the text, as well as we recognise this discrepancy in the Figure 3.D and provide further clarifications. To help conveying this message even further, we introduced a new figure (Figure 3- figure supplement 2) to accompany the heat map shown in the Figure 3.D. with corresponding bar graphs. These changes are documented below:
  
  “…promoters. We ran 11 MODesign simulations for each trigger, incrementally extending the loop size while keeping the sgRNA 2 spacer input constant. HEK293T validation experiments showed that choosing modular iSBH-sgRNAs that detect the 4 U6-expressed triggers is possible (Figure 3.D, Figure 3- figure supplement 1.C). Despite not performing quite as well as second-generation designs (Figure 2.A.,Figure 3.D),modular iSBH-sgRNA still enable efficient RNA detection, especially for smaller RNAs such as triggers A and D. For highly efficient designs such asmodular iSBH-sgRNA (D), addition of the SAM effector system (Konermann et al. (2015)) boosted ON-state activation with only a negligible increase in the the OFF-state non-specific activation. Orthogonality tests suggested that activation of modular iSBH-sgRNA designs was specifically conditioned by complementary RNA triggers (Figure 3.E, Figure 3 - figure supplement 2), showing the exquisite specificity of the system.”
  
  Author response image 1.
  
  This supplementary figure reinterprets the data presented in Figure 3.E. using bar plots for enhanced clarity and comparison. It depicts the results of cotransfecting HEK293T cells with four modular iSBH-sgRNAs (A, B, C, and D) and examines all combinations of iSBH-sgRNA: RNA trigger pairings. The bar plots provide a visual representation of mean values with error bars indicating the standard deviation, based on three biological replicates.
  
  Regarding the concern about the lack of comparison with previous iterations of RNA-triggered CRISPR systems, we also acknowledged other similar technologies within the discussion. We also point readers to a literature review we recently published (doi/full/10.1089/crispr.2022.0052) where we describe other similar technologies in more detail.
  
  “To date, a variety of RNA-inducible gRNA designs have been developed (Hanewich-Hollatz et al. (2019); Hochrein et al. (2021); Jakimo et al. (2018); Jiao et al. (2021); Jin et al. (2019); Li et al. (2019); Liu et al. (2022); Lin et al. (2020); Siu and Chen (2019); Galizi et al. (2020); Hunt and Chen (2022b,a); Ying et al. (2020); Choi et al. (2023)). Nevertheless, there is a lack of direct, head-to-head comparisons of these designs under standardised experimental conditions. Some designs were evaluated in vitro, others in bacterial systems, and some in mammalian cells. Consequently, it is challenging to conclusively determine which design exhibits superior properties (Pelea et al. (2022)). Notably, to the best of our knowledge, the iSBH-sgRNA systemis the first RNA-inducible gRNA design tested in vivo and characterising the iSBH-sgRNA activation mechanism was essential for implementing iSBH-sgRNA technology in zebrafish embryos. In vivo, chemical modifications in the spacer sequence were vital for iSBH-sgRNA stability and function.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.08.539738v2
www.biorxiv.org www.biorxiv.org

New submission 22/11/2023, 08:51:42

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  In this study, the authors attempt to describe alterations in gene expression, protein expression, and protein phosphorylation as a consequence of chronic adenylyl cyclase 8 overexpression in a mouse model. This model is claimed to have resilience to cardiac stress.
  
  Major strengths of the study include 1) the large dataset generated which will have utility for further scientific inquiry for the authors and others in the field, 2) the innovative approach of using cross-analyses linking transcriptomic data to proteomic and phosphoproteomic data. One weakness is the lack of a focused question and clear relevance to human disease. These are all critical biological pathways that the authors are studying and essentially, they have compiled a database that could be surveyed to generate and test future hypotheses.
  
  Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis to be innovative. Your comment that we have not focused on a question with clear relevance to human disease is “right on point!”
  
  During chronic pathophysiologic states e.g., chronic heart failure (CHF) in humans, AC/cAMP/PKA/Ca2+ signaling increases progressively the degree of heart failure progresses, leading to cardiac inflammation, mediated in part, by cyclic-AMP- induced up- regulation of renin-angiotensin system (RAS) signaling. Standard therapies for CHF include β-adrenoreceptor blockers and RAS inhibitors, which although effective, are suboptimal in amelioration of heart failure progression. One strategy to devise novel and better therapies for heart failure, would be to uncover the full spectrum of concentric cardio- protective adaptations that becomes activated in response to severe, chronic AC/cAMP/PKA/Ca2+ -induced cardiac stress.
  
  We employed unbiased omics analyses, in our prior study (https://elifesciences.org/articles/80949v1) of the mouse harboring cardiac specific overexpression of adenylyl cyclase type 8 (TGAC8), and identified more than 2,000 transcripts and proteins, comprising a broad array of biological processes across multiple cellular compartments, that differed in TGAC8 left ventricle compared to WT. These bioinformatic analyses revealed that marked overexpression of AC8 engages complex, concentric adaptation "circuity" that has evolved in mammalian cells to confer resilience to stressors that threaten health or life. The main human disease category identified in these analyses was Organismal Injury and Abnormalities, suggesting that defenses against stress were activated as would be expected, in response to cardiac stress. Specific concentric signaling pathways that were enriched and activated within the TGAC8 protection circuitry included cell survival initiation, protection from apoptosis, proliferation, prevention of cardiac-myocyte hypertrophy, increased protein synthesis and quality control, increased inflammatory and immune responses, facilitation of tissue damage repair and regeneration and increased aerobic energetics. These TGAC8 stress response circuits resemble many adaptive mechanisms that occur in response to the stress of disease states and may be of biological significance to allow for proper healing in disease states such as myocardial infarction or failure of the heart. The main human cardiac diseases identified in bioinformatic analyses were multiple types cardiomyopathies, again suggesting that mechanisms that confer resilience to the stress of chronic increased AC-PKA-Ca2+ signaling are activated in the absence of heart failure in the super-performing TGAC8 heart at 3-months of age.
  
  In the present study, we performed a comprehensive in silico analysis of transcription, translation, and post-translational patterns, seeking to discover whether the coordinated transcriptome and proteome regulation of the adaptive protective circuitry within the AC8 heart that is common to many types of cardiac disease states identified in our previous study (https://elifesciences.org/articles/80949v1) extends to the phosphoproteome.
  
  Reviewer #2 (Public Review):
  
  In this study, the investigators describe an unbiased phosphoproteomic analysis of cardiac-specific overexpression of adenylyl cyclase type 8 (TGAC8) mice that was then integrated with transcriptomic and proteomic data. The phosphoproteomic analysis was performed using tandem mass tag-labeling mass spectrometry of left ventricular (LV) tissue in TGAC8 and wild-type mice. The initial principal component analysis showed differences between the TGAC8 and WT groups. The integrated analysis demonstrated that many stress-response, immune, and metabolic signaling pathways were activated at transcriptional, translational, and/or post-translational levels.
  
  The authors are to be commended for a well-conducted study with quality control steps described for the various analyses. The rationale for following up on prior transcriptomic and proteomic analyses is described. The analysis appears thorough and well-integrated with the group's prior work. Confirmational data using Western blot is provided to support their conclusions. Their findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.
  
  Thank you for your efforts to review our manuscript, we are delighted to learn that you found our approach to link transcriptomic, proteomic and phosphoproteome data in our analysis. We are delighted that you found our work to be well-conducted, to have been well performed, and that our analysis was thorough and well-integrated with our prior work in this arena and that are findings have the potential of identifying novel pathways involved in cardiac performance and cardioprotection.
  
  Reviewer #1 (Recommendations For The Authors):
  
  I humbly suggest that the authors reconsider the title, as it could be more clear as to what they are studying. Are the authors trying to highlight pathways related to cardiac resilience? Resilience might be a clearer word than "performance and protection circuitry".
  
  Thank you for this important comment. We have revised the title accordingly: Reprogramming of cardiac phosphoproteome, proteome and transcriptome confers resilience to chronic adenylyl cyclase-driven stress.
  
  Perhaps the text can be reviewed in detail by a copy-editor, as there are many grammatically 'awkward' elements (for example, line 56: "mammalians" instead of mammals), inappropriate colloquialisms (for example, line 73: "port-of-call"), and stylistic unevenness that make it difficult to read.
  
  We have reviewed the text in detail, with the assistance of a copy editor, in order to identify and correct awkward elements and to search for other colloquialisms. Finally, although “stylistic unevenness” to which you refer may be difficult for us to identify during our re-edits, we have tried our best to identify and revise them.
  
  The best-written sections are the first few paragraphs of the discussion section, which finally clarify why the TGAC8 mouse is important in understanding cardiac resilience to stress and how the present study leverages this model to disentangle the biological processes underlying the resilience. I wish this had been presented in this manner earlier in the paper, (in the abstract and introduction) so I could have had a clearer context in which to interpret the data. It would also be helpful to point out whether the TGAC8 mouse has any correlates with human disease.
  
  Thank you for this very important comment. Well put! In addition to recasting the title to include the concept of resilience, we have revised both the abstract and introduction to feature what you consider to be important to the understanding of cardiac resilience to stress, and how the present study leverages this model to disentangle the biological processes underlying the resilience.
  
  Reviewer #2 (Recommendations For The Authors):
  
  How were the cutoffs determined to distinguish between upregulated/downregulated phosphoproteins and phosphopeptides?
  
  Thank you for this important question. We used the same criteria to distinguish differences between TGAC8 and WT for unnormalized and normalized phosphoproteins, -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up), as stated in the methods section, main text and figure legend. The results were consistent across all analyses and selectively verified by experiments.
  
  Were other models assessed for correlation between transcriptome and phosphoproteome other than a linear relationship of log2 fold change?
  
  Thank you for this comment. In addition to a linear relationship of log2 fold change of molecule expression, we also compared protein activities, e.g., Fig 4F, and pathways enriched from different omics, e.g., Fig 3D, 5J, 6B and 6F.
  
  Figures 1A and 5G seem to show outliers. How many biological and technical replicates would be needed to minimize error?
  
  Thank you for the question. Figures 1A and 5G were PCA plots which, as expected, manifested some genetic variability among the same genotypes. The PCA plots, however, are useful in determining how the identified items separated, both within and among genotypes. For bioinformatics analysis such as ours, 4-5 samples are sufficient to accomplish this, as demonstrated by separation, by genotype, of samples in PCA. Thus, in addition to discovery of true heterogeneity among the samples, our results are still able to robustly discover the true differences between the genotypes.
  
  Were the up/downregulated genes more likely to be lowly expressed (which would lead to larger log2 changes identified)?
  
  In response to your query, we calculated the average expression of phosphorylation levels across all samples to observe whether they were expressed in low abundance in all samples. We also generated the MA plots, an application of a Bland–Altman plot, to create a visual representation of omics data. The MA plots in Author response image 1 illustrate that the target molecules with significantly changed phosphorylation levels did not aggregate within the very low abundance. To confirm this conclusion, we adopted two sets of cutoffs: (1) change: -log10(p-value) > 1.3, and log2FoldChange < 0 (down) or log2FoldChange > 0 (up); and (2) change_2: -log10(p-value) > 1.3, and log2FoldChange <= -0.4 (down) or log2FoldChange >= 0.4 (up).
  
  Author response image 1.
  
  "We verified some results through wet lab experiments" in the abstract is vague.
  
  Thank you for the good suggestion. What we meant to indicate here was that identified genotypic differences in selected proteins, phosphoproteins and RNAs discovered in omics were verified by western blots, protein synthesis detection, proteosome activity detection, and protein soluble and insoluble fractions detection. However, we have deleted the reference to the wet lab experiments in the revised manuscript.
  
  There are minor syntactical errors throughout the text.
  
  Thank you very much for the suggestion. As noted in our response, we have edited and revised those errors throughout the text.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.04.29.488779v8
www.biorxiv.org www.biorxiv.org

Mapping Spatial Patterns to Energetic Benefitsin Groups of Flow-coupled Swimmers

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The study seeks to establish accurate computational models to explore the role of hydrodynamic interactions on energy savings and spatial patterns in fish schools. Specifically, the authors consider a system of (one degree-of-freedom) flapping airfoils that passively position themselves with respect to the streamwise direction, while oscillating at the same frequency and amplitude, with a given phase lag and at a constant cross-stream distance. By parametrically varying the phase lag and the cross-stream distance, they systematically explore the stability and energy costs of emergent configurations. Computational findings are leveraged to distill insights into universal relationships and clarify the role of the wake of the leading foil.
  
  We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.
  
  Strengths:
  
  (1) The use of multiple computational models (computational fluid dynamics, CFD, for full Navier-Stokes equations and computationally efficient inviscid vortex sheet, VS, model) offers an extra degree of reliability of the observed findings and backing to the use of simplified models for future research in more complex settings.
  
  (2) The systematic assessment of the stability and energy savings in multiple configurations of pairs and larger ensembles of flapping foils is an important addition to the literature.
  
  (3) The discovery of a linear phase-distance relationship in the formation attained by pairs of flapping foils is a significant contribution, which helps compare different experimental observations in the literature.
  
  (4) The observation of a critical size effect for in-line formations of larger, above which cohesion and energetic benefits are lost at once, is a new discovery in the field.
  
  Thank you for this list of strength – we are delighted that these ideas were clearly communicated in our manuscript.
  
  Note that Newbolt et al. PNAS, 2019 reported distance as a function of phase for pairs of flapping hydrofoils, and Li et al, Nat. Comm., 2020 also reported phase-distance relationship in robotic and biological fish (calling it Vortex Phase Matching). We compiled their results, together with our and other numerical and experimental results, showing that the linear distance-phase relationship is universal.
  
  Weaknesses:
  
  (1) The extent to which observations on one-degree-of-freedom flapping foils could translate to real fish schools is presently unclear so some of the conclusions on live fish schools are likely to be overstated and would benefit from some more biological framing.
  
  Thank you for bringing up this point. Indeed, flapping foils that are free to translate in both the x- and y-directions and rotate in the x-y plane could drift apart in the y-direction. However, this drift occurs at a longer time scale than the forward swimming motion; it is much slower. For this reason, we feel justified to ignore it for the purpose of this study, especially that the pairwise equilibria in the swimming x-direction are reached at a faster time scale.
  
  Below, we include two snapshots taken from published work from the group of Petros Koumoutsakos (Gazzola et al, SIAM 2014). The figures show, respectively, a pair and a group of five undulating swimmers, free to move and rotate in the x-y plane. The evolution of the two and five swimmers is computed in the absence of any control. The lateral drift is clearly sub-dominant to the forward motion. Similar results were reported in Verma et al, PNAS 2018.
  
  These results are independent on the details of the flow interactions model. For example, similar lateral drift is observed using the dipole model dipole model (Kanso & Tsang, FDR 2014, Tsang & Kanso, JNLS 2023).
  
  Another reason why we feel justified to ignore these additional degrees of freedom is the following: we assume a live fish or robotic vehicle would have feedback control mechanisms that correct for such drift. Given that it is a slowly-growing drift, we hypothesize that the organism or robot would have sufficient time to respond and correct its course.
  
  Indeed, in Zhu et al. 2022, an RL controller, which drives an individual fish-like swimmer to swim at a given speed and direction, when applied to pairs of swimmers, resulted in the pair "passively" forming a stable school without any additional information about each other.
  
  We edited the main manuscript in page 4 of the manuscript to include reference to the work cited here and to explain the reasons for ignoring the lateral drift.
  
  Citations:
  
  Gazzola, M., Hejazialhosseini, B., & Koumoutsakos, P. (2014). Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM Journal on Scientific Computing, 36(3), B622-B639. DOI: https://doi.org/10.1137/130943078
  
  Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23), 5849-5854. DOI: https://doi.org/10.1073/pnas.1800923115
  
  Tsang, A. C. H. & Kanso, E., (2013). Dipole Interactions in Doubly Periodic Domains. Journal of Nonlinear Science 23 (2013): 971-991. DOI: https://doi.org/10.1007/s00332-013-9174-5
  
  Kanso, E., & Tsang, A. C. H. (2014). Dipole models of self-propelled bodies. Fluid Dynamics Research, 46(6), 061407. DOI: https://doi.org/10.1088/0169-5983/46/6/061407
  
  Zhu, Y., Pang, J. H., & Tian, F. B. (2022). Stable schooling formations emerge from the combined effect of the active control and passive self-organization. Fluids, 7(1), 41. DOI: https://doi.org/10.3390/fluids7010041
  
  Author response image 1.
  
  Antiphase self-propelled anguilliform swimmers. (a) – (d) Wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ centre of mass trajectories.
  
  Author response image 2.
  
  Parallel schooling formation. (a) – (d) wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 7T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ center of mass trajectories.
  
  (2) The analysis of non-reciprocal coupling is not as novel as the rest of the study and potentially not as convincing due to the chosen linear metric of interaction (that is, the flow agreement).
  
  We thank the referee for this candid and constructive feedback. In fact, we view this aspect of the study as most “revolutionary” because it provides a novel approach to pre-computing the locations of stable equilibria even without doing expensive all-to-all coupled simulations or experiments.
  
  Basically, the idea is the following: you give me a flow field, it doesn’t matter how you obtained it, whether from simulations or experimentally, and I can tell you at what locations in this flow field a virtual flapping swimmer would be stable and save hydrodynamic energy!
  
  In the revised version, we changed page 3 and 7 in main text, and added a new section “Diagnostic tools” in SI to better illustrate this.
  
  Overall, this is a rigorous effort on a critical topic: findings of the research can offer important insight into the hydrodynamics of fish schooling, stimulating interdisciplinary research at the interface of computational fluid mechanics and biology.
  
  We thank the referee again for their careful read of the manuscript and their constructive feedback.
  
  Reviewer #2 (Public Review):
  
  The document "Mapping spatial patterns to energetic benefits in groups of flow-coupled swimmers" by Heydari et al. uses several types of simulations and models to address aspects of stability of position and power consumption in few-body groups of pitching foils. I think the work has the potential to be a valuable and timely contribution to an important subject area. The supporting evidence is largely quite convincing, though some details could raise questions, and there is room for improvement in the presentation. My recommendations are focused on clarifying the presentation and perhaps spurring the authors to assess additional aspects:
  
  We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.
  
  (1) Why do the authors choose to set the swimmers free only in the propulsion direction? I can understand constraining all the positions/orientations for investigating the resulting forces and power, and I can also understand the value of allowing the bodies to be fully free in x, y, and their orientation angle to see if possible configurations spontaneously emerge from the flow interactions. But why constrain some degrees of freedom and not others? What's the motivation, and what's the relevance to animals, which are fully free?
  
  We would like to thank the referee for raising this point. It is similar to the point raised above by the first referee. As explained above the reason is the following: in freely-swimming, hydrodynamically-interacting “fish,” the lateral drift is sub-dominant to the forward swimming motion. Therefore, we ignore it in the model. Please see our detailed response above for further clarification, and see changes in page 4 in the main manuscript.
  
  (2) The model description in Eq. (1) and the surrounding text is confusing. Aren't the authors computing forces via CFD or the VS method and then simply driving the propulsive dynamics according to the net horizontal force? It seems then irrelevant to decompose things into thrust and drag, and it seems irrelevant to claim that the thrust comes from pressure and the drag from viscous effects. The latter claim may in fact be incorrect since the body has a shape and the normal and tangential components of the surface stress along the body may be complex.
  
  Thank you for pointing this out! It is indeed confusing.
  
  In the CFD simulations, we are computing the net force in the swimming x-direction direction by integrating using the definition of force density in relation to the stress tensor. There is no ambiguity here.
  
  In the VS simulations, however, we are computing the net force in the swimming x-direction by integrating the pressure jump across a plate of zero thickness. There is no viscous drag. Viscous drag is added by hand, so-to-speak. This method for adding viscous drag in the context of the VS model is not new, it has been used before in the literature as explained in the SI section “Vortex sheet (VS) model” (pages 30 and 31).
  
  .
  
  (3) The parameter taudiss in the VS simulations takes on unusual values such as 2.45T, making it seem like this value is somehow very special, and perhaps 2.44 or 2.46 would lead to significantly different results. If the value is special, the authors should discuss and assess it. Otherwise, I recommend picking a round value, like 2 or 3, which would avoid distraction.
  
  Response: The choice of dissipation time is both to model viscous effect and reduce computational complexity. Introducing it is indeed introduces forcing to the simulation. Round value, like 2 or 3, is equal to an integer multiple of the flapping period, which is normalized to T=1, Therefore, an integer value of would cause forcing at the resonant frequency and lead to computational blow up. To avoid this effect, a parameter choice of = 2.45, 2.44 or 2.46 would be fine and would lead to small perturbation to the overall simulation, compared to no dissipation at all. This effect is studied in detail in the following published work from our group:
  
  Huang, Y., Ristroph, L., Luhar, M., & Kanso, E. (2018). Bistability in the rotational motion of rigid and flexible flyers. Journal of Fluid Mechanics, 849, 1043-1067. DOI: https://doi.org/10.1017/jfm.2018.446
  
  (4) Some of the COT plots/information were difficult to interpret because the correspondence of beneficial with the mathematical sign was changing. For example, DeltaCOT as introduced on p. 5 is such that negative indicates bad energetics as compared to a solo swimmer. But elsewhere, lower or more negative COT is good in terms of savings. Given the many plots, large amounts of data, and many quantities being assessed, the paper needs a highly uniform presentation to aid the reader.
  
  Thank you for pointing this out! We updated Figures 3,6 as suggested.
  
  (5) I didn't understand the value of the "flow agreement parameter," and I didn't understand the authors' interpretation of its significance. Firstly, it would help if this and all other quantities were given explicit definitions as complete equations (including normalization). As I understand it, the quantity indicates the match of the flow velocity at some location with the flapping velocity of a "ghost swimmer" at that location. This does not seem to be exactly relevant to the equilibrium locations. In particular, if the match were perfect, then the swimmer would generate no relative flow and thus no thrust, meaning such a location could not be an equilibrium. So, some degree of mismatch seems necessary. I believe such a mismatch is indeed present, but the plots such as those in Figure 4 may disguise the effect. The color bar is saturated to the point of essentially being three tones (blue, white, red), so we cannot see that the observed equilibria are likely between the max and min values of this parameter.
  
  Thank you for pointing this out! You are correct in your understanding of the flow agreement parameter, but not in your interpretation.
  
  Basically, “if the match were perfect, then the swimmer would generate no relative flow and thus no thrust,” means that “such a location could not be is an equilibrium.” Let me elaborate. An equilibrium is one at which the net thrust force is zero. The equilibrium is stable if the slope of the thrust force is negative. Ideally, this is what maximizing the flow agreement parameter would produce.
  
  For example, consider an ideal fluid where the flow velocity is form in vertical direction. Consider a “ghost swimmer” heaving at a velocity . Under this scenario, flow agreement and thrust parameters are
  
  Let’s now consider a balance of forces on the “ghost swimmer.” The ghost swimmer is in relative equilibrium if and only if:
  
  It gives us
  
  We then consider stability at this equilibrium by calculating the derivative of thrust parameter over phase
  
  The corresponding values at equilibria are
  
  Thus, when taking the positive which means the equilibria is a stable fixed point. We included this analysis in a new section in the SI page 32.
  
  (6) More generally, and related to the above, I am favorable towards the authors' attempts to find approximate flow metrics that could be used to predict the equilibrium positions and their stability, but I think the reasoning needs to be more solid. It seems the authors are seeking a parameter that can indicate equilibrium and another that can indicate stability. Can they clearly lay out the motivation behind any proposed metrics, and clearly present complete equations for their definitions? Further, is there a related power metric that can be appropriately defined and which proves to be useful?
  
  Thank you – these are excellent suggestions. Indeed, we needed to better explain the motivation and equations. Perhaps the main idea for these metrics can be best understood when explained in the context of the simpler particle model, which we now do in the SI and explain the main text.
  
  (7) Why do the authors not carry out CFD simulations on the larger groups? Some explanations should be given, or some corresponding CFD simulations should be carried out. It would be interesting if CFD simulations were done and included, especially for the in-line case of many swimmers. This is because the results seem to be quite nuanced and dependent on many-body effects beyond nearest-neighbor interactions. It would certainly be comforting to see something similar happen in CFD.
  
  We are using a open-source version of the Immersed Boundary Method that is not specifically optimized for many interacting swimmers. Therefore, the computational cost of performing CFD simulations for more swimmers is high. Therefore, we used the CFD simulations sporadically with fewer simmers (2 or 3) and we performed systematic simulations in the context of the VS model.
  
  For the same Reynolds number in Figure 1, we simulated three and four swimmers in CFD: three swimmers forms a stable formation, four swimmers don’t, consistent with the VS model, with the forth swimmer colliding with the third one. Results are included in the SI figure 8 of the main text.
  
  (8) Related to the above, the authors should discuss seemingly significant differences in their results for long in-line formations as compared to the CFD work of Peng et al. [48]. That work showed apparently stable groups for numbers of swimmers quite larger than that studied here. Why such a qualitatively different result, and how should we interpret these differences regarding the more general issue of the stability of tandem groups?
  
  Thank you for bringing up this important comparison. Peng et al. [48] (Hydrodynamic schooling of multiple self-propelled flapping plates) studied inline configuration of flapping airfoils at Reynolds number =200. There are several differences between their work and ours. The most important one is that they used a flexible plate, which makes the swimmer more adaptive to changes in the flow field, e.g. changes in tailbeat amplitude and changes in phase along its body and diverts some of the hydrodynamic energy to elastic energy. We edited the main text page 10 at the end of section “Critical size of inline formations beyond which cohesion is lost” to explain this distinction.
  
  (9) The authors seem to have all the tools needed to address the general question about how dynamically stable configurations relate to those that are energetically optimal. Are stable solutions optimal, or not? This would seem to have very important implications for animal groups, and the work addresses closely related topics but seems to miss the opportunity to give a definitive answer to this big question.
  
  Indeed, that is exactly the point – in pairwise formations, stable configurations are also energetically optimal! In larger groups, there is no unique stable configuration – each stable configuration is associated with a different degree of energy savings. Interestingly, when exploring various equilibrium configurations in a school of four, we found the diamond formation of D. Weihs, Nature, 1972 to be both stable and most optimal among the configurations we tested. However, claiming this as a global optimum may be misleading – our standpoint is that fish schools are always dynamic and that there are opportunities for energy savings in more than one stable configuration.
  
  We added a section in new text “Mapping emergent spatial patterns to energetic benefits”, and added a new figure in the maintext (Fig. 10) and a new figure in the SI (Fig. S. 8)
  
  (10) Time-delay particle model: This model seems to construct a simplified wake flow. But does the constructed flow satisfy basic properties that we demand of any flow, such as being divergence-free? If not, then the formulation may be troublesome.
  
  The simplified wake flow captures the hydrodynamic trail left by the swimmer in a very simplified manner. In the limit of small amplitude, it should be consistent with the inviscid vortex sheet shed of T. Wu’s waving swimmer model (Wu TY. 1961).
  
  The model was compared to experiments and used in several recent publications from the Courant Institute (Newbolt et al. 2019, 2022, 2024).
  
  Citations:
  
  Wu, T. Y. T. (1961). Swimming of a waving plate. Journal of Fluid Mechanics, 10(3), 321-344. DOI: https://doi.org/10.1017/S0022112061000949
  
  Newbolt, J. W., Lewis, N., Bleu, M., Wu, J., Mavroyiakoumou, C., Ramananarivo, S., & Ristroph, L. (2024). Flow interactions lead to self-organized flight formations disrupted by self-amplifying waves. Nature Communications, 15(1), 3462. DOI: https://doi.org/10.1038/s41467-024-47525-9
  
  Newbolt, J. W., Zhang, J., & Ristroph, L. (2022). Lateral flow interactions enhance speed and stabilize formations of flapping swimmers. Physical Review Fluids, 7(6), L061101. DOI: https://doi.org/10.1103/PhysRevFluids.7.L061101
  
  Newbolt, J. W., Zhang, J., & Ristroph, L. (2019). Flow interactions between uncoordinated flapping swimmers give rise to group cohesion. Proceedings of the National Academy of Sciences, 116(7), 2419-2424. DOI: https://doi.org/10.1073/pnas.1816098116
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Congratulations on such a comprehensive and well-thought-out study; I truly enjoyed reading it and have only a couple of suggestions that I believe will help further strengthen the paper. I am including a bunch of references here that are very familiar to me without the expectation of you to include them all, just to point at areas that I feel you might consider useful.
  
  We thank the referee again for their careful read of the manuscript and for their constructive feedback. We appreciate it.
  
  First, I believe that some more rationale is needed to justify the chosen modeling framework. I am fully aware of how difficult is to run these simulations, but I see some critical assumptions that need to be at least spelled out for the reader to appreciate the limitations of the study: (1) Constraining the cross-stream coordinate (a stability analysis should include perturbations on the cross-stream coordinate as well, see, for example, https://doi.org/10.1017/flo.2023.25 -- I know this is much simpler as it discards any vortex shedding) and (2) Assuming equal frequency and amplitude (there are studies showing variation of tail beat frequency in animals depending on their position in the school, see, for example, https://doi.org/10.1007/s00265-014-1834-4).
  
  Thank you for these suggestions. These are indeed important and interesting points to discuss in the manuscript. See response above regarding point 1. Regarding point 2, this is of course important and will be pursued in future extensions of this work. We edited the intro and discussion of the main text to explain this.
  
  In the paper “Stability of schooling patterns of a fish pair swimming against a flow”, The authors considered a pair of swimmers swimming in a channel. They analyzed stability of the system and find multiple equilibria of the system, including inline and staggered formation, and a special formation of perpendicular to the wall. Studying fish school in confined domain and analyzing their stability is very interesting. We added citation to this paper in the discussion section at the end of page 10.
  
  In the paper “Fish swimming in schools save energy regardless of their spatial position”, the authors measured the reduction in power of fish by measuring tail beat frequency and oxygen consumption and compared them to measurements in solitary fish. They found that in a school of fish, individuals always save power comparing to swimming alone. However, there is one important caveat in this study: they considered a larger school of fish and expressed the results in terms of pairwise configurations (see schematics we draw below). This is misleading because it may suggest that formations with only two fish provide benefits each other, while in fact, the data is obtained from a larger school with many neighbors. They only consider a fish’s relationship to its nearest neighbor. But in a large school, other neighbors will also have influence on their energy consumption. In the schematics below, we emphasized on several focal fishes, marking them as red, green, and blue. We also marked their nearest neighbors using the same color, but lighter. The nearest neighbors are what the authors are considering to show its neighbor relationship. For example, a problematic one is the red fish, for which its nearest neighbor is behind it, but indeed, its power saving may come from the other neighbors, which are around or ahead it.
  
  Author response image 3.
  
  Second, I would like to see more biology context with respect to limitations that are inherent to a purely mechanical model, including, neglecting vision that we know plays a synergistic role in determining schooling patterns. For example, a recent study https://doi.org/10.1016/j.beproc.2022.104767 has presented experiments on fish swimming in the dark and in bright conditions, showing that it is unlikely that hydrodynamics alone could explain typically observed swimming patterns in the literature.
  
  Thank you for this suggestion and for sharing us with the paper “Collective response of fish to combined manipulations of illumination and flow”. This is a great study, and we are sorry to have missed it.
  
  In this paper, the authors found that when having illumination, fish swim more cohesively, which is in consistent with another paper we already cited “The sensory basis of schooling by intermittent swimming in the rummy-nose tetra (Hemigrammus rhodostomus)”. Another important conclusion in this paper is that when having brighter illumination and with flow, fish school spend more time side by side. This connects well to the conclusion in another paper we cited “Simple phalanx pattern leads to energy saving in cohesive fish schooling,” where at lower flow speed in a water channel, fish tended to form a dynamic school while at higher flow speed, they organized in a side-by-side/ phalanx configuration. This conclusion is consistent with our study that in side-by-side formation, fish share power saving.
  
  Importantly, it is well known that both vision and flow sensing play important roles in fish schooling. This study aimed to merely explore what is possible through passive hydrodynamic interactions, without visual and flow sensing and response. We clarify this in the revised version of the manuscript.
  
  Third, I am not too convinced about the flow agreement metric, which only accounts for linear interactions between the foils. More sophisticated approaches could be utilized as the one proposed here https://doi.org/10.1017/jfm.2018.369, based on a truly model-agnostic view of the interaction - therein, the authors show non-reciprocal (in strength and time-scale) coupling between two in-line flapping foils using information theory. I also would like to mention this older paper https://doi.org/10.1098/rsif.2012.0084, where an equivalent argument about the positioning of a trailing fish with respect to a leading robotic fish is made from experimental observations.
  
  Thank you for these remarks and for sharing these two interesting papers.
  
  The flow agreement metric is not specific to two fish, as we show in Fig. 6 of the manuscript. We edited the manuscript and SI to better explain the motivation and implementation of the flow agreement parameter. We edited the main text, see revisions on page 7, and added a new section call “diagnostic tools.”.
  
  In the paper “An information-theoretic approach to study fluid–structure interactions”, the authors calculate the transfer entropy between two oscillating airfoils when they are hydrodynamically coupled. This is an interesting study! We will apply this approach to analyzing larger schools in the future. We cited this paper in the introduction.
  
  In the paper “Fish and robots swimming together: attraction towards the robot demands biomimetic locomotion”, the authors found that fish will swim behind an artificial fish robot, especially when the fish robot is beating its tail instead of static. At specific conditions, the fish hold station behind the robot, which may be due to the hydrodynamic advantage obtained by swimming in the robot’s wake. DPIV resolved the wake behind a static/ beating fish robot, but did not visualize the flow field when the fish is there. This study is similar to a paper we already cited “In-line swimming dynamics revealed by fish interacting with a robotic mechanism”, in which, they considered fish-foil interaction. In the revised manuscript, we cite both papers.
  
  For the reviewer’s comments about flow agreement only accounts for linear interactions between the foils, we want to explain more to clarify this. The flow agreement parameter is a nonlinear metric, which considered the interaction between a virtual swimmer and an arbitrary unsteady flow field. Although the metric is a linear function of swimmer’s speed, it is indeed a nonlinear function of spacing and phase, which are the quantities we care about. Moreover, the flow field can by generated by either experiment or CFD simulation, and behind one or more swimmers. It is true that it is a one way coupled system since the virtual swimmer does not perturb the flow field.
  
  Again, this is great work and I hope these suggestions are of help.
  
  Thank you again! We are delighted to receive such a positive and constructive feedback.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) About Figure 1: Panel C should be made to match between CFD and VS with regard to the swimmer positions. Also, if the general goal of the figure is to compare CFD and VS, then how about showing a difference map of the velocity fields as a third column of panels across A-D?
  
  Thank you for pointing this out. Figure 1 C is updated accordingly.
  
  The general goal is to show the CFD and VS simulations produce qualitatively similar results. Some quantities are not the same across models, e.g. the swimming speed of swimmers are different, but the scaled distance is the same.
  
  (2) Figure 3: In A, it would be nice to keep the y-axis the same across all plots, which would aid quick visual comparison. In B, the legend labels for CFD and VS should be filled in with color so that the reader can more easily connect to the markers in the plot.
  
  Thank you for pointing this out, we’ve updated figure 3 and 6.
  
  (3) Figures 4, 9, and Supplementary Figures too: As mentioned previously, the agreement parameter plots are saturated in the color map, possibly obscuring more detailed information.
  
  Thank you for pointing this out. The goal is to show that there is a large region with positive flow agreement parameter.
  
  We picked up the flow agreement behind a single swimmer in VS simulation (Fig.4B) and added the counter lines to it (represents 0.25 and 0.5). Not many details are hidden by the saturated colormap.
  
  Author response image 4.
  
  We also updated Fig 4 and Fig 9 accordingly.
  
  (4) Figure 6: Is this CFD or VS? Why show one or the other and not both? In B, it seems that there are only savings available and no energetically costly positions. This seems odd. In C, it seems the absolute value on dF/dd is suppressing some important information about stability - the sign of this seems important. In E, the color bar seems to be reflected from what is standard, i.e. 0 on the left and 100 on the right, as in F.
  
  Thank you for asking. Fig. 6 is based only on VS simulations. There are hundreds of simulations in this figure, we are not running CFD simulations to save computational effort. Representative CFD simulations are shown in Figure 1,2,3, for comparison. We added a sentence in the figure caption for clarification.
  
  In C, since is always negative for emergent formations (only stable equilibria can appear during forward time simulation), we are showing its absolute value for comparison.
  
  In E, we are flipping this because larger flow agreement parameter corresponds to more power saving, in the other word, negative changes in COT.
  
  (5) Fig. 8: For cases such as in D that have >100% power savings, does this mean that the swimmer has work done by the flow? How to interpret this physically for a flapping foil and biologically for a fish?
  
  Yes, it means the hydrofoil/fish gets a free ride, and even able to harvest energy from the incoming flow. Actually, similar phenomenon has been reported in the biology and engineering literature. For example, Liao et al. 2003, Beal et al. 2006 found that live or dead fish can harvest energy from incoming vortical flow by modulating their body curvature.
  
  In engineering, Chen et al. 2018, Ribeiro et al. 2021 have found that the following airfoil in a tandem/ inline formation can harvest energy from the wake of leading swimmer in both simulation and experiemnts.
  
  Citations:
  
  Liao, J. C., Beal, D. N., Lauder, G. V., & Triantafyllou, M. S. (2003). Fish exploiting vortices decrease muscle activity. Science, 302(5650), 1566-1569. DOI: https://doi.org/10.1126/science.1088295
  
  Beal, D. N., Hover, F. S., Triantafyllou, M. S., Liao, J. C., & Lauder, G. V. (2006). Passive propulsion in vortex wakes. Journal of fluid mechanics, 549, 385-402. DOI: https://doi.org/10.1017/S0022112005007925
  
  Chen, Y., Nan, J., & Wu, J. (2018). Wake effect on a semi-active flapping foil based energy harvester by a rotating foil. Computers & Fluids, 160, 51-63. DOI: https://doi.org/10.1016/j.compfluid.2017.10.024
  
  Ribeiro, B. L. R., Su, Y., Guillaumin, Q., Breuer, K. S., & Franck, J. A. (2021). Wake-foil interactions and energy harvesting efficiency in tandem oscillating foils. Physical Review Fluids, 6(7), 074703. DOI: https://doi.org/10.1103/PhysRevFluids.6.074703
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.15.580536v2
www.biorxiv.org www.biorxiv.org

Spectral decomposition unlocks ascidian morphogenesis

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer 1:
  
  (1) Figure 2 is mentioned before Figure 1
  
  We thank the reviewer for pointing this out, this was a mistake. What was meant by Figure 2 was actually Figure 1. This has been corrected in the manuscript.
  
  (2) Figure 1c: red is used to indicate cell junctions on raw data, but also the error.
  
  The color red is used to indicate cell junctions on raw data on figure 1c left, while it is used to indicate the error on figure 1c right.
  
  The Lagrangian error can be negative right? This is not reflected by the error scale which goes from 0% to 100%
  
  A negative Lagragian error would mean that the distance between real and simulated cellular junctions decreased over time. We effectively treat this case as if there was no displacement, and the error is hence 0%.
  
  Why do you measure the error in percent?
  
  The error is measured in percentages because it is relative to the apical length of a cell.
  
  (3) Figure 2: The distinction between pink and red in e_2(t) is very difficult. What do the lines indicate?
  
  The lines indicate directions of the eigen vectors of the strain rate tensor at every material particle of the embryo.
  
  (4) L156 "per unit length": Rather per unit time?
  
  We thank the reviewer for pointing this out. We apologize for this mistake. "per unit length" has been changed to "per unit time"
  
  (5) L159 "Eigen vectors in this sense": is there another sense?
  
  "In this sense" is referring to the geometric description of eigen vectors. The phrase has been removed
  
  (6) L164 "magnitude of the rate of change underwent by a particle at the surface of the embryo in the three orthogonal spatial directions of most significant rate of change."
  
  Would a decomposition in two directions within the surface's tangent plane and one perpendicular to it not be better?
  
  We also performed the decomposition of the strain rate tensor as suggested within the surface's tangent plane and one perpendicular to it, but did not notice any tangible differences in the overall analysis, especially after derivation of the scalar field.
  
  (7) L174 "morphological activity": I think this notion is never defined
  
  By morphological activity we mean any noticeable shape changes
  
  (8) L177: I did not quite understand this part
  
  This part tries to convey that the scalar strain rate field evidences coordinated cell behaviors by highlighting wide regions of red that traverse cell boundaries (e.g. fig.2b, $t=5.48hpb$). At the same time, the strain rate field preserves cell boundaries, highlighted by bands of red at cellular intersections, when cell coordinated cell behaviors are not preponderant (e.g. fig.2b, $t=4hpb$).
  
  (9) Ll 194 "Unsurprisingly, these functions play an important role in many branches of science including quantum mechanics and geophysics Knaack and Stenflo (2005); Dahlen and Tromp (2021)." Does this really help in understanding spherical harmonics?
  
  This comment was made with the aim of showing to the reader that Spherical Harmonics have proved to be useful in other fields. Although it does not help in understanding spherical harmonics, it establishes that they can be effective.
  
  (10) Figure 3a: I do not find this panel particularly helpful. What does the color indicate? What are the prefactors of the spherical harmonics?
  
  This panel showcases the restriction of the strain rate scalar field to the spherical harmonics with the l and m specified. Each material particle of the embryo surface at the time is colored with respect to the value of . The values are computed according to equation 2 and are showcased in figure 3c.
  
  (11) L 265: Please define "scalogram" as opposed to a spectrogram.
  
  Scalograms are the result of wavelet transforms applied to a signal. Although spectrogram can specifically refer to the spectrum of frequencies resulting for example from a Fourier transform, the term can also be used in a broader sense to designate any time-frequency representation. In the context of this paper, we used it interchangeably with scalogram. We have changed all occurrences of spectrogram to scalogram in the revised manuscript.
  
  (12) L 299 "the analysis was carried out the 64-cell stage.": Probably 'the analysis was carried out at the 64-cell stage'
  
  We thank the reviewer for pointing this out. The manuscript was revised to reflect the suggested change.
  
  (13) L 340 "Another outstanding advantage over traditional is": Something seems to be missing in this sentence.
  
  We thank the reviewer for pointing this out. We have modified the sentence in the revised manuscript. It now reads “Another outstanding advantage of our workflow over traditional methods is that our workflow is able to compress the story of the development ... ”.
  
  (14) Ll 357 "on the one hand, the overall spatial resolution of the raw data, on the other hand, the induced computational complexity.": Is there something missing in this sentence
  
  The sentence tries to convey the idea that in implementing our method, there is a comprise to be made between the choice of the number of particles on the constructed mesh and the computational complexity induced by this choice. There is also a comprise to be made between this choice of the number of particles and the spatial resolution of the original dataset.
  
  Reviewer 2:
  
  (1) The authors should clearly state to which data this method has been applied in this paper. Also, to what kind of data can this method be applied? For instance, should the embryo surface be segmented?
  
  The method has been applied on 3D+time imaging data of ascidian embryonic development data hosted on the morphonet (morphonet.org) platform. The data on the morphonet platform comes in two formats: closed surface meshes of segmented cells spatially organized into the embryo, and 3D voxelated images of the embryo. The method was first designed for the former format and then extended to the later. There is no requirement for the embryo surface to be segmented.
  
  (2) In this paper, it is essential to understand the way that the authors introduced the Lagrangian markers on the surface of the embryo. However, understanding the method solely based on the description in the main text was difficult. I recommend providing a detailed explanation of the methodology including equations in the main text for clarity.
  
  We believe that adding mathematical details of the method into the text will cloud the text and make it more difficult to understand. Interested readers can refer to the supplementary material for detailed explanation of the method.
  
  (3) In eq.(1) of the supplementary information, d(x,S_2(t)) could be a distance function between S_1 and S_2 although it was not stated. How was the distance function between the surfaces defined?
  
  What was meant here was d(x,S_1(t)) where x is a point of S_2(t). d(x,S_1(t)) referring to the distance between point x and S_1(t). The definition of the distance function has been clarified in the supplementary information.
  
  (4) In the section on the level set scheme of supplementary information, the derivation of eq.(4) from eq.(3) was not clear.
  
  We added an intermediary equation for clarification.
  
  (5) Why is a reference shape S_1(0) absent at t=0?
  
  A reference shape S_1(0) is absent at t=0 precisely because that is what we are trying to achieve: construct an evolving Lagrangian surface S_2(t) matching S_1(t) at all times.
  
  (6) In Figure 2(a), it is unclear what was plotted. What do the colors mean? A color bar should be provided.
  
  The caption of the figure describes the colors: “a) Heatmap of the eigenvector fields of the strain rate tensor. Each row represents a vector field distinguished by a distinct root color (\textit{yellow, pink, white}). The gradient from the root color to red represents increasing magnitudes of the strain rate tensor.”
  
  (7) With an appropriate transformation, it would be possible to create a 2D map from a 3D representation shown in for instance Figure 2. Such a 2D representation would be more tractable for looking at the overall activities.
  
  We thank the reviewer for pointing this out. In Figure 4b of the supplementary information, we provide a 2D projection of the scalar strain rate field.
  
  (8) The strain rate is a second-order tensor that contains rich information. In this paper, the information in the tensor has been compressed into a scalar field by taking the square root of the sum of the squares of the eigenvalues. However, such a representation may not distinguish important events such as stretching and compression of the tissue. The authors should provide appropriate arguments regarding the limitations of this analysis.
  
  The tensor form of the strain rate field is indeed endowed with more information than the scalar eigen value field derived. However, our objective in this project was not to exhaust the richness of the strain rate tensor field but rather to serve as a proof of concept that our global approach to studying morphogenesis could in fact unveil sufficiently rich information on the dynamical processes at play. Although not in the scope of this project, a more thorough exploration of the strain rate tensor field could be the object of future investigations.
  
  (9) The authors claimed that similarities emerge between the spatiotemporal distribution of morphogenesis processes in the previous works and the heatmaps in this work. Some concrete data should be provided to support this claim.
  
  All claims have been backed with references to previous works. For instances, looking at figure 2b, the two middle panels on the lower row (5.48hpf, 6.97hpf), we explained that the concentration of red refers respectively to endoderm invagination during gastrulation, and zippering during neurulation [we cited Hashimoto et al. (2015)]. Here, we relied on eye observation to spot the similarities. The rest of the paper provides substantial and robust additional support for these claims using spectral decomposition in space and time.
  
  (10) The authors also claimed that "A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant and those where coordinated cell behaviours dominate." The authors should provide specific examples and analysis to support this argument.
  
  Here, we relied on eye observation to make this claim. This whole section of the paper “Strain rate field describes ascidian morphogenesis” was about computing, plot and observing the strain rate field.
  
  However, specific examples were provided. This paragraph was building towards this statement, and the evidence was scattered through the paragraph. We have now revised the sentence to ensure that we highlight specific examples:
  
  “A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant (e.g. fig.2b, $t=4hpb$) and those where coordinated cell behaviours dominate (e.g. fig.2b, $t=5.48hpb$).”
  
  (11) The authors should provide the details of the analysis method used in Figure 3b, including relevant equations. In particular, it would be helpful to clarify the differences that cause the observed differences between Figure 3b and Figure 3c.
  
  Figure 3b was introduced with the sentence: “In analogy to Principal Components Analysis, we measure the average variance ratio over time of each harmonic with respect to the original signal (Fig.3b).” explaining the origin of variance ratio values used in figure 3b. We have now added the mathematical expression to further clarify.
  
  (12) The authors found that the variance ratio of Y_00 was 64.4%. Y_00 is a sphere, indicating that most of the activity can be explained by a uniform activity. Which actual biological process explains this symmetrical activity?
  
  The reviewer makes a good point which also gave us a lot to think about during the analysis. Observing that the contribution of Y00 peaks during synchronous divisions, which are interestingly restricted only to the animal pole, we conjecture that localized morphological ripples and can be felt throughout the embryo.
  
  (13) The contribution of other spherical harmonics than Y_00 and Y_10 should be shown.
  
  Other spherical harmonics contributed individual to less than 1% and we did not find it important to include them in the main figure. We will add supplementary material.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.22.554368v3
www.biorxiv.org www.biorxiv.org

Is tumor mutational burden predictive of response to immunotherapy?

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  In their manuscript entitled: "Is tumor mutational burden predictive of response to immunotherapy?", Gurjao and colleagues discuss the use of tumor mutational burden (TMB) as a predictive biomarker for cancer patients to respond to immune checkpoint blockage (ICB). By analyzing a large cohort of 882 patient samples across different tumor types they find either little or no association of TMB to the response of ICB. In addition, they showed that finding the optimal cutoff for patient stratification lead to a severe multiple testing problem. By rigorously addressing this multiple testing problem only non-small cell lung cancer out of 10 cancer types showed a statistically significant association of TMB and response to ICB. Nevertheless, it is clearly shown that in any case the rate of misclassification is too high that TMB alone would qualify as a clinically suitable biomarker for ICB response. Finally, the authors demonstrate with a simple mathematical model that only a few strong immunogenic mutations would be sufficient for an ICB response, thereby showing that also patients with a low TMB score could benefit from immunotherapy. The manuscript is clearly written, the results are well presented and the applied methods are state-of-the-art.
  
  We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript. We address below the reviewer’s recommendations.
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The method used for mutation call can also influence the TMB score. Mutation data was downloaded from public databases and not re-called for this study, a potential caller bias could be present. What was the calling strategy of the used data sets? For the present study, I don't think that this is crucial because different callers or post-call processing would be used at different sites to determine TMB. I think it should the mutation calling bias should also be discussed in the manuscript as another shortcoming for TMB as a biomarker for ICB response.
  
  We thank the reviewer for this comment. Mutational data was not aggregated across studies and caller bias would thus not have any impact on the results of this manuscript. In addition, we further clarified the role of mutation calling bias in the Discussions section.
  
  “Although attractive and scalable, TMB does not consider the effect of specific mutations (missense, frameshift etc), their presentation and clonality (19), nor the state of the tumour, its microenvironment, and interactions with the immune system that can be integrated into potentially better predictors of response to ICB (43, 44). In addition, another major limitation of TMB is the lack of standardized measures. This includes the lack of standard sequencing methods to assess TMB: TMB can be measured from Whole-Exome sequencing, Whole-Genome sequencing, targeted panel and even RNA sequencing. This also includes biases introduced by using different mutation calling pipelines resulting in different TMB, sequencing depth and different characteristics of the samples (e.g. low purity samples typically yield lower TMB).”
  
  (2) In their mathematical model of neoantigens and immunogenicity it is assumed that the probability of a mutation to be immunogenic is constant for all mutations. In reality this is certainly not satisfied. However, the central conclusion from the model still holds. I think that this is important to discuss in the manuscript.
  
  We thank the reviewer for this suggestion and now consider the case where each mutation has its own probability p(i) of being immunogenic.
  
  “Our model shows that achieving about constant 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} for 𝑁 > 10 − 20 mutations, requires and . The same argument holds when each mutation has its own probability to be immunogenic 𝑝(𝑖), then , where is the mean probability of a mutation to be immunogenic. Thus only the average probability of a mutation to be immunogenic matters. In summary, we find that the model agrees with clinical data if individual non-synonymous mutations have, on average, 𝑝~10 − 20% chance for triggering an immune response.”
  
  (3) In the mathematical formula on page 8, C_N^k is the binomial coefficient. This should be stated or written out.
  
  Thank you for pointing this out. Corrected.
  
  “Due to immunodominance, only a few 𝑘crit immunogenic mutations are sufficient to elicit a full k𝑐𝑟𝑖𝑡 immune response. Hence, the probability for a cancer with 𝑁 (=TMB) mutations to elicit an immune response is then the probability of having 𝑘 or more immunogenic mutations among :
  
  which is the CDF of a binomial distribution.”
  
  (4) The mathematical model provides an explanation that tumors with a low TMB can also respond on ICB. It cannot explain tumors with high TMB lacking ICB response. An explanation of this phenomenon is discussed in the paper but I think also the impact of the tumor immune microenvironment should be mentioned here.
  
  As we explained in the presentation of the model, even immunogenic tumors elicit response to ICB with some probability. In the revision we write:
  
  “𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} = 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} · 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒}, where 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} is the probability of clinical response, given that cancer elicits an immune response which is complex and depends on many factors including tumor immune microenvironment. Yet the prerequisite for the clinical response is the immune response 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} that we focus on.”
  
  Reviewer #2 (Public Review):
  
  The manuscript points out that TMB cut-offs are not strong predictors of response to immunotherapy or overall survival. By randomly shuffling TMB values within cohorts to simulate a null distribution of log-rank test p-values, they show that under correction, the statistical significance of previously reported TMB cut-offs for predicting outcomes is questionable.
  
  We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript.
  
  There is a clinical need for a better prediction of treatment response than TMB alone can provide. However, no part of the analysis challenges the validity of the well-known pan-cancer correlation between TMB and immunotherapy response.
  
  We address the pan-cancer correlation in the supplemental text and Figure S3. We realized the supplemental text was missing in eLife submission and included in the bioRxiv only. We apologize for this oversight. In particular, we show that the “well-known pan-cancer correlation” is largely based on a few outlier cancer subtypes - MSI colorectal cancers and uveal/ ocular melanomas. We show that when we remove these cancer types from the pan-cancer dataset, the correlation becomes non-significant for the remaining 15 cancer types.
  
  The failure to detect significant TMB cut-offs may be due to insufficient power, as the examined cohorts have relatively low sample sizes. A power analysis would be informative of what cohort sizes are needed to detect small to modest effects of TMB on immune response.
  
  Since we see no effect, we cannot perform a power analysis. Moreover, increasing cohort sizes cannot increase the effect -- dramatic misclassification of responders (the fraction of responders below the treatment cutoff) would remain the same, making TMB unsuitable for clinical decision-making.
  
  The manuscript provides a simple model of immunogenicity that is tailored to be consistent with a claimed lack of relationship between TMB and response to immunotherapy. Under the model, if each mutation that a tumor has acquired has a relatively high probability of being immunogenic (~10%, they suggest), and if 1-2 immunogenic mutations is enough to induce an immune response, then most tumors produce an immune response, and TMB and response should be uncorrelated except in very low-TMB tumors.
  
  Contrary to reviewer’s suggestion, our modeling is not tailored to be consistent with the lack of association between TMB and response. On the contrary, we found the model has two regimes: the first regime (where p<<1) in which higher TMB leads to a higher probability of response, which doesn’t agree with the data , and the second regime (p~0.1) in which cancers with TMB>10-20 are immunogenic, consistent with the clinical data.
  
  We further expanded on these key points in the Results:
  
  “The model shows two different behaviors. If individual mutations are unlikely to be immunogenic (𝑝 ≪ 1) , e.g. due to a low probability of being presented, the probability of response increases gradually with TMB (Figure 5B). The neoantigen theory generally expects such gradual increase in immunogenicity of cancer with TMB. Yet, available data (Figure 2) don’t show such a trend.
  
  On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data.”
  
  We also expanded on these key points in the Introduction:
  
  “We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”
  
  The question then becomes whether the response is sufficient to wipe out tumor cells in conjunction with immunotherapy, which is essentially the same question of predicting response that motivated the original analysis. While TMB alone is not an excellent predictor of treatment response, the pan-cancer correlation between TMB and response/survival is highly significant, so the model's only independent prediction is wrong.
  
  Our study indicates that TMB is a very poor predictor (writing that it’s “not an excellent predictor of treatment response” is understatement). Moreover we show that a widely believed “pan-cancer correlation” is shaky as well (Supplemental text and Figure S3). So we don’t see any contradictions between the model and the data.
  
  Additionally, experiments to predict and validate neoepitopes suggest that a much smaller fraction of nonsynonymous mutations produce immune responses1,2.
  
  We agree with the reviewer. That’s exactly what the model suggests.
  
  A key idea that is overlooked in this manuscript is that of survivorship bias: self-evidently, none of the mutations found at the time of sequencing have been immunogenic enough to provoke a response capable of eliminating the tumor. While the authors suggest that immunoediting "is inefficient, allowing tumors to accumulate a high TMB," the alternative explanation fits the neoepitope literature better: most mutations that reach high allele frequency in tumor cells are not immunogenic in typical (or patient-specific) tumor environments. Of course, immunotherapies sometimes succeed in overcoming the evolved immune evasion of tumors. Higher-TMB tumors are likely to continue to have higher mutation rates after sequencing; increased generation of new immunogenic mutations may partially explain their modestly improved responses to therapy.
  
  We disagree with reviewers' assertion that survivorship bias could explain observed phenomena. If immunogenic mutations that arise during cancer development were eliminated (by purifying selection, i.e. reduced fitness or cellular death) then observed mutations would carry noticeable signatures of purifying selection. On the contrary, cancer genomic data shows incredibly weak signals of purifying selection on non-synonymous mutations (Weghorn and Sunyaev, Nature Genetics 2017), and observed passenger mutations are practically indistinguishable from random in their effect on proteins (McFarland et al PNAS 2013).
  
  We do agree with the statement that “most mutations … in tumor cells are not immunogenic”. In fact that’s exactly what our model predicts: (1-p)~90% of mutations in the model are non-immunogenic, while remaining p~10% being sufficient to trigger an immune response. We clarify this in the text of the paper: “On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data. ”
  
  Reviewer #2 (Recommendations For The Authors):
  
  Abstract
  
  Defining TMB as "number of non-synonymous mutations": while TMB is not consistently defined throughout the literature, it is usually given as a rate rather than a total count, and sometimes synonymous mutations are included. Consider adopting the definition used by the TMB Harmonization Project: "number of somatic mutations per megabase of interrogated genomic sequence.3"
  
  We thank the reviewer for their comment,
  
  Be more specific about your findings, so that abstract readers can get some understanding of your proposed explanation for the "immunogenicity of neoantigens and the lack of association between TMB and response."
  
  We thank the reviewer for their comment. We modified the abstract to explain that the theory we developed expands the neoantigen theory yet can be consistent with the observed lack of association between TMB and response:
  
  "Second, we develop a model that expands the neoantigen theory and can be consistent with both immunogenicity of neoantigens and the lack of association between TMB and response. Our analysis shows that the use of TMB in clinical practice is not supported by available data and can deprive patients of treatment to which they are likely to respond.”
  
  Introduction
  
  Again, consider using a more standard definition of TMB.
  
  We thank the reviewer for their comment. Our study did not seek to harmonize TMB across the datasets and we thus used the total number of mutations rather than the mutational rate often used for comparison across different datasets.
  
  Expand the introduction to provide a preview of the purpose and direction of your analysis. The current draft reveals only that the analysis will relate to TMB.
  
  We expanded the introduction providing the motivation, the approach, and the summary of main findings.
  
  “Using a biomarker to stratify and prioritize patients for treatment runs a risk of depriving patients who have a chance to respond to a life-saving treatment. High variability of response makes relying on a predictor particularly risky. Hence, we revisit original data that were used to establish correlation between TMB and response. We tested TMB as a predictor of both binary responder/non-responder labels from original clinical studies, as well as continuous survival data. We also investigated whether a TMB threshold could distinguish patients with high and low survival after multiple hypothesis testing. We find that no TMB threshold performs better on the clinical data than on randomized ones.
  
  We further show that irrespective of the strategy to choose the threshold, even if we were to employ the optimal TMB cutoff, it would still lead to about 25% of responders falling below the treatment prioritization threshold. In addition, we re-examine the pan-cancer association of TMB with response rate to ICB.
  
  “Finally we revisit the neoantigen theory that was the rationale for using TMB as a predictor of response to immunotherapy. The theory stipulates that non-synonymous mutations can lead to the production of unique antigens (_neo_antigens) that are recognized by the immune system as foreign, triggering the immune response to cancer. The theory further assumes that the more mutations a cancer has, the more likely it triggers the immune system, and the more likely it will benefit from immunotherapy. We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”
  
  Section: Is TMB associated with response after treatment?
  
  The claim that after excluding melanoma and some colorectal cancers, there is no relationship between TMB and response rates in pan-cancer studies cites references 12 and 14. In reference 12 (Yarchoan et al.), it is clear from glancing at their Figure 1 that a pan-cancer correlation between TMB and response would remain with these cancer types excluded. This discrepancy requires explanation. "Supplementary text" is cited for this claim, but it was not included in the file that I received.
  
  We address the pan-cancer correlation in the supplemental text and Figure S3. While the figure was available, we realized the supplemental text was missing in eLife submission. We apologize for this oversight.
  
  Plots of survival and TMB do not show "visible correlation": Please strengthen this claim with an appropriate statistical test.
  
  We expand the figure caption to explain the following:
  
  “Plots of progression-free survival and TMB for melanoma and lung cancer ICB cohorts show the lack of correlation or of an obvious TMB cutoff. Computing a simple correlation for survival and censored data cannot correctly represent the dependence since patients who are alive live longer than the reported survival, and limiting correlation to patients who are dead would bias the analysis. Thus other survival statistics are used through the paper.”
  
  Section: Model reconciles neoantigen theory and data
  
  Page 8: In the probability formula, the C term is not defined. My guess is that it means choose(N, k).
  
  Please clarify.
  
  Thank you for pointing this out. Corrected using more conventional notation.
  
  which is the CDF of a binomial distribution.
  
  Page 8: Assuming the above, P(immune response) = P(X >= k_crit); where X~Bin(N, p). The formula should be explicitly introduced in terms of the CDF of the binomial distribution to prevent readers from thinking the wheel is being re-invented.
  
  We thank the reviewer for pointing this out, we modified the equation in the text to make it easier to see this point (see above). We refrain from going further since the CDF of a binomial distribution doesn’t have a closed form and can only be written as the regularized incomplete beta function.
  
  Page 9: Missing word in "allowing cancers with as little as mutations to be"
  
  We thank the reviewer for pointing this out, we modified the text accordingly.
  
  See comments in public review. In brief, I think a convincing case is made regarding the significance of TMB cut-offs as predictors of survival within cancer types, but frankly this elementary model is not compelling.
  
  Section: Materials and Methods
  
  In the manuscript, it is stated that TMB is accepted as reported by data sources. Since most of the comparisons in the manuscript are within-data-source, that is acceptable. However, it should be ensured that TMB measurements are comparable between samples within each source. For example, when TMB is reported as a total mutation count, it can be verified that all samples have the same coverage, or measurement can be converted to mutations per megabase of coverage. In the same vein, if this manuscript's definition of TMB only includes nonsynomous mutations, it should be confirmed that the TMB reported by data sources excludes synonymous mutations.
  
  We thank the reviewer for their comment. We leverage total TMB as reported in the original studies claiming an association between TMB and response/ survival.
  
  Figure S2: Instead of writing "the Youden index associated cutoffs is also plotted," it can be stated that the asterisk represents the Youden index cutoff, or a legend can be added that provides this information.
  
  We thank the reviewer for pointing this out, we modified the text accordingly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2020.09.03.260265v4
www.medrxiv.org www.medrxiv.org

Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Tiedje et al. investigated the transient impact of indoor residual spraying (IRS) followed by seasonal malaria chemoprevention (SMC) on the plasmodium falciparum parasite population in a high transmission setting. The parasite population was characterized by sequencing the highly variable DBL$\alpha$ tag as a proxy for var genes, a method known as varcoding. Varcoding presents a unique opportunity due to the extraordinary diversity observed as well as the extremely low overlap of repertoires between parasite strains. The authors also present a new Bayesian approach to estimating individual multiplicity of infection (MOI) from the measured DBL$\alpha$ repertoire, addressing some of the potential shortcomings of the approach that have been previously discussed. The authors also present a new epidemiological endpoint, the so-called "census population size", to evaluate the impact of interventions. This study provides a nice example of how varcoding technology can be leveraged, as well as the importance of using diverse genetic markers for characterizing populations, especially in the context of high transmission. The data are robust and clearly show the transient impact of IRS in a high transmission setting, however, some aspects of the analysis are confusing.
  
  (1) Approaching MOI estimation with a Bayesian framework is a well-received addition to the varcoding methodology that helps to address the uncertainty associated with not knowing the true repertoire size. It's unfortunate that while the authors clearly explored the ability to estimate the population MOI distribution, they opted to use only MAP estimates. Embracing the Bayesian methodology fully would have been interesting, as the posterior distribution of population MOI could have been better explored.
  
  We thank the reviewer for appreciating the extension of var_coding we present here. We believe the comment on maximum _a posteriori (MAP) refers to the way we obtained population-level MOI from the individual MOI estimates. We would like to note that reliance on MAP was only one of two approaches we described, although we then presented only MAP. Having calculated both, we did not observe major differences between the two, for this data set. Nonetheless, we revised the manuscript to include the result based on the mixture distribution which considers all the individual MOI distributions in the Figure supplement 6.
  
  (2) The "census population size" endpoint has unclear utility. It is defined as the sum of MOI across measured samples, making it sensitive to the total number of samples collected and genotyped. This means that the values are not comparable outside of this study, and are only roughly comparable between strata in the context of prevalence where we understand that approximately the same number of samples were collected. In contrast, mean MOI would be insensitive to differences in sample size, why was this not explored? It's also unclear in what way this is a "census". While the sample size is certainly large, it is nowhere near a complete enumeration of the parasite population in question, as evidenced by the extremely low level of pairwise type sharing in the observed data.
  
  We consider the quantity a census in that it is a total enumeration or count of infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires. But our focus here is in a measure of population size itself. The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. for the seasonal influenza virus and for the measles virus (Bedford et al., 2011)), and it is also clear in the ecological literature for non-pathogen populations (Palstra and Fraser, 2012).
  
  We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission which persists across the IRS intervention. Of course, one would like to be able to use this quantity across studies that differ in sampling depth and the reviewer makes an insightful and useful suggestion. It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size, respectively) (Table supplement 7). We can go further, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or to the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size. We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths our original sample (Figure supplement 8 in the revised manuscript). We now include in the revision discussion of this point, which allows an extrapolation of the census population size to the whole population of hosts in the local area.
  
  We have also clarified the time denominator: Given the typical duration of infection, we expect our population size to be representative of a per-generation measure_._
  
  (3) The extraordinary diversity of DBL$\alpha$ presents challenges to analyzing the data. The authors explore the variability in repertoire richness and frequency over the course of the study, noting that richness rapidly declined following IRS and later rebounded, while the frequency of rare types increased, and then later declined back to baseline levels. The authors attribute this to fundamental changes in population structure. While there may have been some changes to the population, the observed differences in richness as well as frequency before and after IRS may also be compatible with simply sampling fewer cases, and thus fewer DBL$\alpha$ sequences. The shift back to frequency and richness that is similar to pre-IRS also coincides with a similar total number of samples collected. The authors explore this to some degree with their survival analysis, demonstrating that a substantial number of rare sequences did not persist between timepoints and that rarer sequences had a higher probability of dropping out. This might also be explained by the extreme stochasticity of the highly diverse DBL$\alpha$, especially for rare sequences that are observed only once, rather than any fundamental shifts in the population structure.
  
  We thank the reviewer raising this question which led us to consider whether the change in the number of DBLα types over the course of the study (and intervention) follows from simply sampling fewer P. falciparum cases. We interpreted this question as basically meaning that one can predict the former from the latter in a simple way, and that therefore, tracking the changes in DBLα type diversity would be unnecessary. A simple map would be for example a linear relationship (a given proportion of DBLα types lost given genomes lost), and even more trivially, a linear loss with a slope of one (same proportion). Note, however, that for such expectations, one needs to rely on some knowledge of strain structure and gene composition. In particular, we would need to assume a complete lack of overlap and no gene repeats in a given genome. We have previously shown that immune selection leads to selection for minimum overlap and distinct genes in repertoires at high transmission (see for example (He et al., 2018)) for theoretical and empirical evidence of both patterns). Also, since the size of the gene pool is very large, even random repertoires would lead to limited overlap (even though the empirical overlap is even smaller than that expected at random (Day et al., 2017)). Despite these conservators, we cannot a priori assume a pattern of complete non-overlap and distinct genes, and ignore plausible complexities introduced by the gene frequency distribution.
  
  To examine this insightful question, we simulated the loss of a given proportion of genomes from baseline in 2012 and examined the resulting loss of DBLα types. We specifically cumulated the loss of infections in individuals until it reached a given proportion (we can do this on the basis of the estimated individual MOI values). We repeated this procedure 500 times for each proportion, as the random selection of individual infection to be removed, introduces some variation. Figure 2 below shows that the relationship is nonlinear, and that one quantity is not a simple proportion of the other. For example, the loss of half the genomes does not result in the loss of half the DBLα types.
  
  Author response image 1.
  
  Non-linear relationship between the loss of DBLα types and the loss of a given proportion of genomes. The graph shows that the removal of parasite genomes from the population through intervention does not lead to the loss of the same proportion of DBLα types, as the initial removal of genomes involves the loss of rare DBLα types mostly whereas common DBLα types persist until a high proportion of genomes are lost. The survey data (pink dots) used for this subsampling analysis was sampled at the end of wet/high transmission season in Oct 2012 from Bongo District from northern Ghana. We used the Bayesian formulation of the _var_coding method proposed in this work to calculate the multiplicity of infection of each isolate to further obtain the total number of genomes. The randomized surveys (black dots) were obtained based on “curveball algorithm” (Strona et al., 2014) which keep isolate lengths and type frequency distribution.
  
  We also investigated whether the resulting pattern changed significantly if we randomized the composition of the isolates. We performed such randomization with the “curveball algorithm” (Strona et al., 2014). This algorithm randomizes the presence-absence matrix with rows corresponding to the isolates and columns, to the different DBLα types; importantly, it preserves the DBLα type frequency and the length of isolates. We generated 500 randomizations and repeated the simulated loss of genomes as above. The data presented in Figure 2 above show that the pattern is similar to that obtained for the empirical data presented in this study in Ghana. We interpret this to mean that the number of genes is so large, that the reduced overlap relative to random due to immune selection (see (Day et al., 2017)) does not play a key role in this specific pattern.
  
  Reviewer #2 (Public Review):
  
  In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebounds more slowly than prevalence measures. Overall, I found these results clear, convincing, and well-presented. They add to a growing literature that demonstrates the relevance of asymptomatic reservoirs. There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric. However, I am not fully convinced the current implementation will be applied meaningfully across additional studies.
  
  (1) I find the term "census population size" problematic as the groups being analyzed (hosts grouped by age at a single time point) do not delineate distinct parasite populations. Separate parasite lineages are not moving through time within these host bins. Rather, there is a single parasite population that is stochastically divided across hosts at each time point. I find this distinction important for interpreting the results and remaining mindful that the 2,000 samples at each time point comprise a subsample of the true population. Instead of "census population size", I suggest simplifying it to "census count" or "parasite lineage count". It would be fascinating to use the obtained results to model absolute parasite numbers at the whole population level (taking into account, for instance, the age structure of the population), and I do hope this group takes that on at some point even if it remains outside the scope of this paper. Such work could enable calculations of absolute---rather than relative---fitness and help us further understand parasite distributions across hosts.
  
  Lineages moving exclusively through a given type of host or “patch” are not a necessary requirement for enumerating the size of the total infections in such subset. It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings where one has multiple habitat patches, with individuals able to move across patches.
  
  Remaining mindful that the count is relative to sample size is an important point. Please see our response to comment (2) of reviewer 1, also for the choice of terminology. We prefer not to adopt “census count” as a census in our mind is a count, and we are not clear on the concept of lineage for these highly recombinant parasites. Also, census population size has been adopted already in the literature for both pathogens and non-pathogens, to make a distinction with the notion of effective population size in population genetics (see our response to reviewer 1) and is consistent with our usage as outlined in the introduction.
  
  Thank you for the comment on an absolute number which would extrapolate to the whole host population. Please see again our response to comment (2) of reviewer 1, on how we can use mean MOI for this purpose once the sampling is sufficient for this quantity to become constant/stable with sampling effort.
  
  (2) I'm uncertain how to contextualize the diversity results without taking into account the total number of samples analyzed in each group. Because of this, I would like a further explanation as to why the authors consider absolute parasite count more relevant than the combined MOI distribution itself (which would have sample count as a denominator). It seems to me that the "per host" component is needed to compare across age groups and time points---let alone different studies.
  
  Again, thank you for the insightful comment. We provide this number as a separate quantity and not a distribution, although it is clearly related to the mean MOI of such distribution. It gives a tangible sense for the actual infection count (different from prevalence) from the perspective of the parasite population in the ecological sense. The “per host” notion which enables an extrapolation to any host population size for the purpose of a complete count, or for comparison with another study site, has been discussed in the above responses for reviewer 1 and now in the revision of the discussion.
  
  (3) Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLα repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator?
  
  This is a very good point and one we now discuss further in our revision. There is no predefined upper bound one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low values, and therefore for locations with low transmission intensity. Interestingly, we have observed that this is not the case in our paper by Labbe et al. (Labbé et al., 2023) where we used model simulations in a gradient of three transmission intensities, from high to low values. The original _var_coding method performed well across the gradient. This robustness may arise from a nonlinear and fast transition from low to high overlap that is accompanied by MOI changing rapidly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This matter clearly needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of overlap.
  
  Smaller comments:
  
  - Figure 1 provides confidence intervals for the prevalence estimates, but these aren't carried through on the other plots (and Figure 5 has lost CIs for both metrics). The relationship between prevalence and diversity is one of the interesting points in this paper, and it would be helpful to have CIs for both metrics when they are directly compared.
  
  Based on the reviewer’s advice we have revised both Figure 4 and Figure 5, to include the missing uncertainty intervals. The specific approach for each quantity is described in the corresponding caption.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions.
  
  Strengths:
  
  This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age-stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population.
  
  Census population size is complementary to parasite prevalence where the former gives a measure of the “parasite population size”, and the latter describes the “proportion of infected hosts”. The reason we see similar trends for the “genetic information” (i.e., census population size) and “age-specific parasite prevalence” is because we identify all samples for var_coding based on the microscopy (i.e., all microscopy positive _P. falciparum isolates). But what is more relevant here is the relative percentage change in parasite prevalence and census population size following the IRS intervention. To make this point clearer in the revised manuscript we have updated Figure 4 and included additional panels plotting this percentage change from the 2012 baseline, for both census population size and prevalence (Figure 4EF). Overall, we see a greater percentage change in 2014 (and 2015), relative to the 2012 baseline, for census parasite population size vs. parasite prevalence (Figure 4EF) as a consequence of the significant changes in distributions of MOI following the IRS intervention (Figure 3). As discussed in the Results following the deployment of IRS in 2014 census population size decreased by 72.5% relative to the 2012 baseline survey (pre-IRS) whereas parasite prevalence only decreased by 54.5%.
  
  With respect to the reviewer’s comment on “practicalities and cost”, var_coding has been used to successfully amplify _P. falciparum DNA collected as DBS that have been stored for more than 5-years from both clinical and lower density asymptomatic infection, without the additional step and added cost of sWGA ($8 to $32 USD per isolates, for costing estimates see (LaVerriere et al., 2022; Tessema et al., 2020)), which is currently required by other molecular surveillance methods (Jacob et al., 2021; LaVerriere et al., 2022; Oyola et al., 2016). _Var_coding involves a single PCR per isolate using degenerate primers, where a large number of isolates can be multiplexed into a single pool for amplicon sequencing. Thus, the overall costs for incorporating molecular surveillance with _var_coding are mainly driven by the number of PCRs/clean-ups, the number samples indexed per sequencing run, and the NGS technology used (discussed in more detail in our publication Ghansah et al. (Ghansah et al., 2023)). Previous work has shown that _var_coding can be use both locally and globally for molecular surveillance, without the need to be customized or updated, thus it can be fairly easily deployed in malaria endemic regions (Chen et al., 2011; Day et al., 2017; Rougeron et al., 2017; Ruybal-Pesántez et al., 2022, 2021; Tonkin-Hill et al., 2021).
  
  Weaknesses:
  
  Overall the manuscript is well-written and generally comprehensively explained. Some terms could be clarified to help the reader and I had some issues with a section of the methods and some of the more definitive statements given the evidence supporting them.
  
  Thank you for the overall positive assessment. On addressing the “issues with a section of the methods” and “some of the more definitive statements given the evidence supporting them”, it is impossible to do so however, without an explicit indication of which methods and statements the reviewer is referring to. Hopefully, the answers to the detailed comments and questions of reviewers 1 and 2 address any methodological concerns (i.e., in the Materials and Methods and Results). To the issue of “definitive statements”, etc. we are unable to respond without further information.
  
  Recommendations For The Authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Line 273: there is a reference to a figure which supports the empirical distribution of repertoire given MOI = 1, but the figure does not appear to exist.
  
  We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing this to our attention.
  
  Line 299: while this likely makes little difference, an insignificant result from a Kolmogorov-Smirnov test doesn't tell you if the distributions are the same, it only means there is not enough evidence to determine they are different (i.e. fail to reject the null). Also, what does the "mean MOI difference" column in supplementary table 3 mean?
  
  The mean MOI difference is the difference in the mean value between the pairwise comparison of the true population-level MOI distribution, that of the population-level MOI estimates from either pooling the maximum a posteriori (MAP) estimates per individual host or the mixture distribution, or that of the population-level MOI estimates from different prior choices. This is now clarified as requested in the Table supplements 3 - 6.
  
  Figure 4: how are the confidence intervals for the estimated number of var repertoires calculated? Also should include horizontal error bars for prevalence measures.
  
  The confidence intervals were calculated based on a bootstrap approach. We re-sampled 10,000 replicates from the original population-level MOI distribution with replacement. Each resampled replicate is the same size as the original sample. We then derive the 95% CI based on the distribution of the mean MOI of those resampled replicates. This is now clarified as requested in the Figure 4 caption (as well as Table supplement 7 footnotes). In addition, we have also updated Figure 4AB and have included the 95% CI for all measures for clarity.
  
  Reviewer #2 (Recommendations For The Authors):
  
  - I would like to see a plot like Supplemental Figure 8 for the upsA DBLα repertoire size.
  
  The upsA repertoire size for each survey and by age group has now been provided as requested in Figure supplement 5AB.
  
  - Supplemental Table 2 is cut off in the pdf.
  
  We have now resolved this issue so that the Table supplement 2 is no longer cut off.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The manuscript terms the phrase "census population size". To me, the census is all about the number of individuals, not necessarily their diversity. I appreciate that there is no simple term for this, and I imagine the authors have considered many alternatives, but could it be clearer to say the "genetic census population size"? For example, I found the short title not particularly descriptive "Impact of IRS and SMC on census population size", which certainly didn't make me think of parasite diversity.
  
  Please see our response to comment (2) of reviewer 1. We prefer not to add “genetic” to the phrase as the distinction from effective population size from population genetics is important, and the quantity we are after is an ecological one.
  
  The authors do not currently say much about the potential biases in the genetic data and how this might influence results. It seems likely that because (i) patients with sub-microscopic parasitaemia were not sampled and (ii) because a moderate number of (likely low density) samples failed to generate genetic data, that the observed MOI is an overestimate. I'd be interested to hear the authors' thoughts about how this could be overcome or taken into account in the future.
  
  We thank the reviewer for this this comment and agree that this is an interesting area for further consideration. However, based on research from the Day Lab that is currently under review (Tan et al. 2024, under review), the estimated MOI using the Bayesian approach is likely not an “overestimate” but rather an “underestimate”. In this research by Tan et al. (2024) isolate MOI was estimated and compared using different initial whole blood volumes (e.g., 1, 10, 50, 100 uL) for the gDNA extraction. Using _var_coding and comparing these different volumes it was found that MOI was significantly “underestimated” when small blood volumes were used for the gDNA extraction, i.e., there was a ~3-fold increase in median MOI between 1μL and 100μL blood. Ultimately these findings will allow us to make computational corrections so that more accurate estimates of MOI can be obtained from the DBS in the future.
  
  The authors do not make much of LLIN use and for me, this can explain some of the trends. The first survey was conducted soon after a mass distribution whereas the last was done at least a year after (when fewer people would have been using the nets which are older and less effective). We have also seen a rise in pyrethroid resistance in the mosquito populations of the area which could further diminish the LLIN activity. This difference in LLIN efficacy between the first and last survey could explain similar prevalence, yet lower diversity (in Figures 4B/5). However, it also might mean that statements such as Line 478 "This is indicative of a loss of immunity during IRS which may relate to the observed loss of var richness, especially the many rare types" need to be tapered as the higher prevalence observed in this age group could be caused by lower LLIN efficacy at the time of the last survey, not loss of immunity (though both could be true).
  
  We thank the reviewer for this question and agree that (i) LLIN usage and (ii) pyrethroid resistance are important factors to consider.
  
  (i) Over the course of this study self-reported LLIN usage the previous night remained high across all age groups in each of the surveys (≥ 83.5%), in fact more participants reported sleeping under an LLIN in 2017 (96.8%) following the discontinuation of IRS compared to the 2012 baseline survey (89.1%). This increase in LLIN usage in 2017 is likely a result of several factors including a rebound in the local vector population making LLINs necessary again, increased community education and/or awareness on the importance of using LLINs, among others. Information on the LLINs (i.e., PermaNet 2.0, Olyset, or DawaPlus 2.0) distributed and participant reported usage the previous night has now been included in the Materials and Methods as requested by the reviewer.
  
  (ii) As to the reviewer’s question on increased in pyrethroid resistance in Ghana over the study period, research undertaken by our entomology collaborators (Noguchi Memorial Insftute for Medical Research: Profs. S. Dadzie and M. Appawu; and Navrongo Health Research Centre: Dr. V. Asoala) has shown that pyrethroid resistance is a major problem across the country, including the Upper East Region. Preliminary studies from Bongo District (2013 - 2015), were undertaken to monitor for mutations in the voltage gated sodium channel gene that have been associated with knockdown resistance to pyrethroids and DDT in West Africa (kdr-w). Through this analysis the homozygote resistance kdr-w allele (RR) was found in 90% of An. gambiae s.s. samples tested from Bongo, providing evidence of high pyrethroid resistance in Bongo District dating back to 2013, i.e., prior to the IRS intervention (S. Dadzie, M. Appawu, personal communication). Although we do not have data in Bongo District on kdr-w from 2017 (i.e., post-IRS), we can hypothesize that pyrethroid resistance likely did not decline in the area, given the widespread deployment and use of LLINs.
  
  Thus, given this information that (i) self-reported LLIN usage remained high in all surveys (≥ 83.5%), and that (ii) there was evidence of high pyrethroid resistance in 2013 (i.e., kdr-w (RR) _~_90%), the rebound in prevalence observed for the older age groups (i.e., adolescents and adults) in 2017 is therefore best explained by a loss of immunity.
  
  I must confess I got a little lost with some of the Bayesian model section methods and the figure supplements. Line 272 reads "The measurement error is simply the repertoire size distribution, that is, the distribution of the number of non-upsA DBLα types sequenced given MOI = 1, which is empirically available (Figure supplement 3)." This does not appear correct as this figure is measuring kl divergence. If this is not a mistake in graph ordering please consider explaining the rationale for why this graph is being used to justify your point.
  
  We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing our attention to this matter. We hope that the inclusion of this Figure as well as a more detailed description of the Bayesian approach helps to makes this section in the Materials and Methods clearer for the reader.
  
  I was somewhat surprised that the choice of prior for estimating the MOI distribution at the population level did not make much difference. To me, the negative binomial distribution makes much more sense. I was left wondering, as you are only measuring MOI in positive individuals, whether you used zero truncated Poisson and zero truncated negative binomial distributions, and if not, whether this was a cause of a lack of difference between uniform and other priors.
  
  Thank you for the relevant question. We have indeed considered different priors and the robustness of our estimates to this choice and have now better described this in the text. We focused on individuals who had a confirmed microscopic asymptomatic P. falciparum infection for our MOI estimation, as median P. falciparum densities were overall low in this population during each survey (i.e., median ≤ 520 parasites/µL, see Table supplement 1). Thus, we used either a uniform prior excluding zero or a zero truncated negative binomial distribution when exploring the impact of priors on the final population-level MOI distribution. A uniform prior and a zero-truncated negative binomial distribution with parameters within the range typical of high-transmission endemic regions (higher mean MOI with tails around higher MOI values) produce similar MOI estimates at both the individual and population level. However, when setting the parameter range of the zero-truncated negative binomial to be of those in low transmission endemic regions where the empirical MOI distribution centers around mono-clonal infections with the majority of MOI = 1 or 2 (mean MOI » 1.5, no tail around higher MOI values), the final population-level MOI distribution does deviate more from that assuming the aforementioned prior and parameter choices. The final individual- and population-level MOI estimates are not sensitive to the specifics of the prior MOI distribution as long as this distribution captures the tail around higher MOI values with above-zero probability.
  
  The high MOI in children <5yrs in 2017 (immediately after SMC) is very interesting. Any thoughts on how/why?
  
  This result indicates that although the prevalence of asymptomatic P. falciparum infections remained significantly lower for the younger children targeted by SMC in 2017 compared 2012, they still carried multiclonal infections, as the reviewer has pointed out (Figure 3B). Importantly this upward shift in the MOI distributions (and median MOI) was observed in all age groups in 2017, not just the younger children, and provides evidence that transmission intensity in Bongo has rebounded in 2017, 32-months a er the discontinuation of IRS. This increase in MOI for younger children at first glance may seem to be surprising, but instead likely shows the limitations of SMC to clear and/or supress the establishment of newly acquired infections, particularly at the end of the transmission season following the final cycle of SMC (i.e., end of September 2017 in Bongo District; NMEP/GHS, personal communication) when the posttreatment prophylactic effects of SMC would have waned (Chotsiri et al., 2022).
  
  Line 521 in the penultimate paragraph says "we have analysed only low density...." should this not be "moderate" density, as low density infections might not be detected? The density range itself is not reported in the manuscript so could be added.
  
  In Table supplement 1 we have provided the median, including the inter-quartile range, across each survey by age group. For the revision we have now provided the density min-max range, as requested by the reviewer. Finally, we have revised the statement in the discussion so that it now reads “….we have analysed low- to moderate-density, chronic asymptomatic infections (see Table supplement 1)……”.
  
  Data availability - From the text the full breakdown of the epidemiological survey does not appear to be available, just a summary of defined age bounds in the SI. Provision of these data (with associated covariates such as parasite density and host characteristics linked to genetic samples) would facilitate more in-depth secondary analyses.
  
  To address this question, we have updated the “Data availability statement” section with the following statement: “All data associated with this study are available in the main text, the Supporting Information, or upon reasonable request for research purposes to the corresponding author, Prof. Karen Day (karen.day@unimelb.edu.au).”
  
  REFERENCES
  
  Bedford T, Cobey S, Pascual M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11. doi:10.1186/1471-2148-11-220
  
  Chen DS, Barry AE, Leliwa-Sytek A, Smith T-AA, Peterson I, Brown SM, Migot-Nabias F, Deloron P, Kortok MM, Marsh K, Daily JP, Ndiaye D, Sarr O, Mboup S, Day KP. 2011. A molecular epidemiological study of var gene diversity to characterize the reservoir of Plasmodium falciparum in humans in Africa. PLoS One 6:e16629. doi:10.1371/journal.pone.0016629
  
  Chotsiri P, White NJ, Tarning J. 2022. Pharmacokinetic considerations in seasonal malaria chemoprevention. Trends Parasitol. doi:10.1016/j.pt.2022.05.003
  
  Day KP, Artzy-Randrup Y, Tiedje KE, Rougeron V, Chen DS, Rask TS, Rorick MM, Migot-Nabias F, Deloron P, Luty AJF, Pascual M. 2017. Evidence of Strain Structure in Plasmodium falciparum Var Gene Repertoires in Children from Gabon, West Africa. PNAS 114:E4103–E4111. doi:10.1073/pnas.1613018114
  
  Ghansah A, Tiedje KE, Argyropoulos DC, Onwona CO, Deed SL, Labbé F, Oduro AR, Koram KA, Pascual M, Day KP. 2023. Comparison of molecular surveillance methods to assess changes in the population genetics of Plasmodium falciparum in high transmission. Fron9ers in Parasitology 2:1067966. doi: 10.3389/fpara.2023.1067966
  
  He Q, Pilosof S, Tiedje KE, Ruybal-Pesántez S, Artzy-Randrup Y, Baskerville EB, Day KP, Pascual M. 2018. Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum. Nat Commun 9:1817. doi:10.1038/s41467-018-04219-3
  
  Jacob CG, Thuy-nhien N, Mayxay M, Maude RJ, Quang HH, Hongvanthong B, Park N, Goodwin S, Ringwald P, Chindavongsa K, Newton P, Ashley E. 2021. Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. Elife 10:1–22.
  
  Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19:e1010816. doi:doi.org/10.1101/2022.06.27.497801
  
  LaVerriere E, Schwabl P, Carrasquilla M, Taylor AR, Johnson ZM, Shieh M, Panchal R, Straub TJ, Kuzma R, Watson S, Buckee CO, Andrade CM, Portugal S, Crompton PD, Traore B, Rayner JC, Corredor V, James K, Cox H, Early AM, MacInnis BL, Neafsey DE. 2022. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study. Mol Ecol Resour 2285–2303. doi:10.1111/1755-0998.13622
  
  Oyola SO, Ariani C V., Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, Jacob CG, Ogo TD, Rockeg K, Newbold CI, Berriman M, Kwiatkowski DP. 2016. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selecFve whole genome amplification. Malar J 15:1–12. doi:10.1186/s12936-016-1641-7
  
  Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: A compendium and appraisal. Ecol Evol 2:2357–2365. doi:10.1002/ece3.329
  
  Rougeron V, Tiedje KE, Chen DS, Rask TS, Gamboa D, Maestre A, Musset L, Legrand E, Noya O, Yalcindag E, Renaud F, Prugnolle F, Day KP. 2017. Evolutionary structure of Plasmodium falciparum major variant surface antigen genes in South America : Implications for epidemic transmission and surveillance. Ecol Evol 7:9376–9390. doi:10.1002/ece3.3425
  
  Ruybal-Pesántez S, Sáenz FE, Deed S, Johnson EK, Larremore DB, Vera-Arias CA, Tiedje KE, Day KP. 2021. Clinical malaria incidence following an outbreak in Ecuador was predominantly associated with Plasmodium falciparum with recombinant variant antigen gene repertoires. medRxiv.
  
  Ruybal-Pesántez S, Tiedje KE, Pilosof S, Tonkin-Hill G, He Q, Rask TS, Amenga-Etego L, Oduro AR, Koram KA, Pascual M, Day KP. 2022. Age-specific patterns of DBLa var diversity can explain why residents of high malaria transmission areas remain susceptible to Plasmodium falciparum blood stage infection throughout life. Int J Parasitol 20:721–731.
  
  Strona G, Nappo D, Boccacci F, Fagorini S, San-Miguel-Ayanz J. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat Commun 5. doi:10.1038/ncomms5114
  
  Tessema SK, Hathaway NJ, Teyssier NB, Murphy M, Chen A, Aydemir O, Duarte EM, Simone W, Colborn J, Saute F, Crawford E, Aide P, Bailey JA, Greenhouse B. 2020. Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. Journal of Infec9ous Diseases 225:1227–1237.
  
  Tonkin-Hill G, Ruybal-Pesántez S, Tiedje KE, Rougeron V, Duffy MF, Zakeri S, Pumpaibool T, Harnyuganakorn P, Branch OH, Ruiz-Mesıa L, Rask TS, Prugnolle F, Papenfuss AT, Chan Y, Day KP. 2021. Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents. PLoS Genet 7:e1009269. doi:10.1371/journal.pgen.1009269
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2023.05.18.23290210v3
www.biorxiv.org www.biorxiv.org

PUFA stabilizes a conductive state of the selectivity filter in IKs channels

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This study makes an interesting finding: a polyunsaturated fatty acid, Lin-Glycine, increases the conductance of KCNQ1/KCNE1 channels by stabilizing a state of the selectivity filter that allows K+ conduction. The stabilization of a conducting state appears well supported by single-channel analysis, though some method details are missing. The linkage to PUFA action through the selectivity filter is supported by the disruption of PUFA effects by mutation of residues which change conformation in two KCNQ1 structures from the literature. Claims about differences in Lin-Glycine binding to these two structural conformations seem to lack clear support, thus the claim seems speculative that PUFAs increase Gmax by binding to a crevice in the pore domain. A potentially definitive functional experiment is conducted by single-channel recordings with selectivity filter domain mutation Y315F which ablates the Lin-Glycine effect on Gmax. However, this appears to be an n=1 experiment. Overall, the major claim of the abstract is supported: "... that the selectivity filter in KCNQ1 is normally unstable ... and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state." However, the claim in the abstract that selectivity filter instability "explains the low open probability" seems too general.
  
  We thank the reviewer for the comments, and we would like to address the main concern regarding the single channels. We now state the number of experiments used for the single channel analysis. We agree that the claim in the abstract seems too general and we now made it more specific to our findings.
  
  Reviewer #2 (Public Review):
  
  Golluscio et al. address one of the mechanisms of IKs (KCNQ1/KCNE1) channel upregulation by polyunsaturated fatty acids (PUFA). PUFA is known to upregulate KCNQ1 and KCNQ1/KCNE1 channels by two mechanisms: one shifts the voltage dependence to the negative direction, and the other increases the maximum conductance (Gmax). While the first mechanism is known to affect the voltage sensor equilibrium by charge effect, the second mechanism is less known. By applying the single-channel recordings and mutagenesis on the putative binding sites (most of them related to the selectivity filter), they concluded that the selectivity filter is stabilized to a conductive state by PUFA binding.
  
  Strengths:
  
  They mainly used single-channel recordings and directly assessed the behavior of the selectivity filter. The method is straightforward and convincing enough to support their claims.
  
  Weaknesses:
  
  The structural model they used is the KCNQ1 channel without KCNE1 because KCNQ1/KCNE1 channel complex is not available yet. As the binding site of PUFAs might overlap with KCNE1, it is not very clear how PUFA binds to the KCNQ1 channel in the presence of KCNE1.
  
  Using other previous PUFA-related KCNQ1 mutants will strengthen their conclusions. For example, the Gmax of the K326E mutant is reduced by PUFA binding. Examining whether K326E shows reduced numbers of non-empty sweeps in the single-channel recordings will be a good addition.
  
  We thank the reviewer for the public review. We would like to address the main weak points of the comments. As a structure of KCNQ1/KCNE1 in complex is not available yet, we used KCNQ1 alone. We believe that the PUFA and KCNE1 binding sites will not overlap as we previously presented data in agreement with the idea that KCNE1 rotates the VSD relative the PD (Wu et al., 2021). This would leave enough space for both PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301) without competing with KCNE1. We appreciate the suggestion of adding single-channel recordings of K326E mutant and we agree it would make a valuable addition to strengthen our conclusions. However, single channel recordings for KCNQ1 are very challenging and time consuming to obtain, so we would like to keep this in consideration for future studies.
  
  Reviewer #3 (Public Review):
  
  This manuscript reveals an important mechanism of KCNQ1/IKs channel gating such that the open state of the pore is unstable and undergoes intermittent closed and open conformations. PUFA enhances the maximum open probability of IKs by binding to a crevice adjacent to the pore and stabilizing the open conformation. This mechanism is supported by convincing single-channel recordings that show empty and open channel traces and the ratio of such traces is affected by PUFA. In addition, mutations of the pore residues alter PUFA effects, convincingly supporting that PUFA alters the interactions among these pore residues.
  
  Strengths:
  
  The data are of high quality and the description is clear.
  
  Weaknesses:
  
  Some comments about the presentation.
  
  (1) The structural illustrations in this manuscript in general need to be more clarified.
  
  (2) The manuscript heavily relies on the comparison between the S4-down and S4-up structures (Figures 3, 4, and 7) to illustrate the difference between the extracellular side of the pore and to lead to the hypothesis of open-state stability being affected by PUFA. This may mislead the readers to think that the closed conformation of the channel in the up-state is the same as that in the down-state.
  
  We thank the reviewer for the public review, and we would like to address the comments about the presentation. We agree that the structural illustrations need to be more detailed, and we amended our previous illustrations. We have now included a new Figure 3 with a more detailed legend and a new Figure 4 that includes more information, such as the main chain of the whole selectivity filter and surrounding peptide.
  
  We have now added some clarification regarding the structures of KCNQ1 with S4-down and S4-up to clarify that the closed conformation of the channel in the up-state is different from that in the down-state. We also emphasize this difference in the Discussion.
  
  Recommendations for the authors:
  
  Reviewer #1:
  
  (1) Explain more thoroughly how the single-channel recordings were done:
  
  - How was Lin-Glycine applied in these experiments? The patch configuration is unclear. Was Lin-Glycine added to the patch pipette? If not, why is Lin-Glycine expected to reach the proposed binding site in the outer leaflet? Were controls time-matched applications of vehicles with ethanol?
  
  Data were collected using the cell attached patch configuration to minimize disruption to the patch and avoid rundown problems due to the loss of PIP2. Lin-Glycine was solubilized in DMSO and the desired concentration was added directly to the bath. We had no a priori reason to know if the PUFA would reach the proposed binding site but the consistency at which there was an increase in channel activity 5-10 minutes after addition to the bath convinced us that it was indeed reaching the binding site. This time frame fits with our prior experience with mefenamic acid effects on single channels (Wang et al 2020). The mefenamic acid binding site is external to the membrane so the drug must enter the cell and cross the patch membrane to affect channel activity. In addition, shown below is a previous recording from our lab, where nothing was added to the bath over a 55-minute time while recording consecutive files. This shows the typical behavior of IKs, with activity tending to cluster with a few active sweeps in between many blank sweeps. The behavior in this patch contrasts with that seen in the presence of Lin-glycine, where the clusters of activity spread over an increasing number of sweeps.
  
  In addition, we have previously shown that 0.1% DMSO (concentration used in the present study) does not affect the GV of KCNQ1 but there is a non-significant decrease in tail current amplitudes of about 14% (Eldstrom et al., 2021). As such we do not think that the effects we see with Lin-Glycine, with an increase in activity can be explained by vehicle effects alone.
  
  Author response image 1.
  
  We added some more details in the section Material and Method.
  
  - How well the replicates match the representative data in Figures 1, S1, and 6 is unclear (except for average current and Po in the last second of the traces from Figure 1). Are the results in Fig 6 n=1?
  
  We now show in a data supplement that 3 replicates were used to access the change in channel activity upon addition of Lin-glycine.
  
  - Diary plots (as in Werry et al. 2013) and additional descriptions of the timeline of Lin-Glycine application and analyses could add credibility to interpretations.
  
  We added a Diary plot of for the First latency to open in Supplementary Figure S1.
  
  - Amounts of plasmids and lipofectamine that were used in transfections are missing.
  
  We added the information in Material and Method section as follow:
  
  “Single channel currents were recorded from transiently transfected mouse ltk- fibroblast cells (LM cells) using 1.5 mL Lipofectamine 2000 (Thermo Fisher Scientific). Cells were transfected with 1.5 mg of pcDNA3 containing a linked KCNE1-KCNQ1 construct 20, to ensure fully KCNE1-saturated complexes, in addition to a plasmid containing green fluorescent protein (GFP) to identify transfected cells”
  
  - Inclusion/exclusion criteria for patches analyzed are missing.
  
  We added the information in Material and Method section as follow:
  
  “Only patches that were largely free of endogenous currents and had few channels, such that there were several blank sweeps to average for use for leak subtraction, were analyzed.”
  
  - Whether blinding, randomization, or pre-determined n values were employed is not mentioned.
  
  No blinding, randomization or pre-determined n values were employed.
  
  - Analysis methods are sometimes unclear: How was Po calculated? Representative sweeps appear to have been leak and capacitance subtracted. How was that done?
  
  Po was estimated from all-point amplitude histogram as follow: Po = Sum (iN/(iestimateNtotal), where N is the number of points for a specific current i in the histogram, iestimate = 0.4 pA from the peak of the histogram, and Ntotal = 10,000 is the total number of points in the last second of the trace. p = 0.75 ± 0.12 (n = 8) and p = 0.87 ± 0.04 (n = 3) for Control and Lin-Glycine, respectively.
  
  Leak and capacitance were subtracted with averaged empty sweeps.
  
  (2) The change of cells used for whole cell vs single channel (oocytes vs mouse ltk- fibroblast cells) could be discussed. These cells likely have different lipids in their membranes. Is there any other evidence that PUFAs have the same effects on KCNE1-KCNQ1 in these cells? Does the V0.5 shift?
  
  A similar effect on Gmax, in both oocytes and mouse ltk-fibroblast cells, is shown in Figure 1 and 2. In Figure 2, the shift in latency suggests a shift in V0.5, suggesting the binding of PUFA to Site I.
  
  (3) The manuscript associates selectivity filter changes with S4 being up or down. It would help to clarify whether there was a change in [K+] in the two KCNQ1 structures used for modeling, as Mandala and MacKinnon (2023) state: "We note that one interesting difference between the two up structures regards the occupancy of K+ ions in the selectivity filter (SI Appendix, Fig. S5 C and D). In the polarized sample, due to the low extravesicular concentration of K+, density is only visible at the first and third positions in the selectivity filter, while density is present at all four positions in the unpolarized sample. Similar differences were observed in our previous study on Eag (20) and are qualitatively consistent with crystal structures of KcsA solved under symmetrical high and low K+ concentrations (45)."
  
  Our studies states that there are some differences in the two structures with S4 in up-state and S4 in down-state and a reorganization of the pore. As for the change in [K+] occupancy in the two structures, we are not sure as our knowledge only come from what stated in Mandala and Mackinnon (2023). Mandala and MacKinnon did not discuss the selectivity filter in the down state structure in their paper and there are no K ions in any of their pdb files. So, we don’t know how many K+ ions there are in the down state.
  
  (4) The manuscript states " PUFAs increase Gmax by binding to a crevice in the pore domain" and "we elucidated that Lin-Glycine binds to a crevice between K326 and D301", this seems speculative without any actual binding studies or concrete structural evidence. A quantitative structural modeling analysis of whether changes in the crevice change the theoretical binding of Lin-Glycine might provide a stronger basis for speculation.
  
  We toned down these statements in Results and Discussion to:
  
  “Crevice residues affect PUFA ability to increase Gmax"
  
  And
  
  Discussion: “We tested the hypothesis that the effect of Lin-Glycine involved conformational changes in the selectivity filter following PUFA binding to two residues K326 and D301 at the pore domain. Those residues delimit a small crevice that seems to change in size in different structures with S4 up or S4 down (Figure 3, D-F).”
  
  (5) The several figures detailing differences in selectivity filter conformation in the KCNQ1 structures are interesting and relevant in that they identify the movement of residues such as Y315 that, when mutated, ablate Lin-Glycine effect on Gmax. It would help to clarify whether T312 and I313 also move between the two selectivity filter conformations.
  
  From the morph of the selectivity filter in the two conformations, it is noticeable that the changes and residue movements involve only residues at the upper part of the selectivity filter (including Y315 and D317). T312 and I313, are in the lower part of the selectivity filter and do not seem to move or rotate from their position between the two conformations of the selectivity filter.
  
  We now include a Supplementary Figures S3 and S4 that show the extent of movement of each residue in the pore region and a short description of this in the Results section.
  
  (6) The claim in the abstract that selectivity filter instability "explains the low open probability" seems too general. Lin-Glycine seems to increase the likelihood of conduction by 2.5-fold, but it was not clear whether open probability ceases to be low or whether other mechanisms also keep Po low.
  
  We reword this sentence to “Our results suggest that the selectivity filter in KCNQ1 is normally unstable, contributing to the low open probability, and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state..”
  
  Reviewer #2:
  
  (1) While all the electrophysiological recordings used KCNQ1/KCNE1 channels, all the structural models they used are KCNQ1 channels (without KCNE1). I know it is because the KCNQ1/KCNE1 complex structure is unavailable. However, according to their previous results, KCNQ1 alone is also upregulated by PUFAs. I am curious about what the single-channel recordings of KCNQ1 alone look like in the presence and absence of PUFAs.
  
  We would love to include single-channel recordings of KCNQ1, but they are extremely hard to measure due to the small size and flickering nature of the channel.
  
  (2) As mentioned above, we do not have the KCNQ1/KCNE1 structure yet have the KCNQ1/KCNE3 structures (Sun and MacKinnon, Cell, 2020). According to the PDBs (6V00 or 6V01), the clevis (K326 and D301) looks covered by KCNE3. Is it true that PUFAs do not upregulate KCNQ1/KCNE3? If true, KCNE1 may not cover the clevis, so the binding mode should differ from the KCNQ1/KCNE3 structures. Please discuss the possible blocking of the clevis by KCNE proteins.
  
  We previously presented data that is consistent with that KCNE1 rotates the VSD towards the PD (Wu et al., 2021). This mechanism would leave room for PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301). So we think that this rotation will prevent PUFA and KCNE1 from competing for the same space. As for KCNQ1/KCNE3 we currently do not have any evidence about a possible upregulation by PUFA.
  
  (3) In the cryoEM structure with S4 resting (Figure 3F), the clevis looks too narrow for PUFA to bind. Is there any (either previous or current) evidence supporting that PUFA binding is state-dependent?
  
  Because PUFAs integrate first into the bilayer and then diffuse towards its binding site on the channel, it would be hard to test a state-dependence of the binding. In addition, once PUFAs are in the bilayer, the rate of binding/unbinding is quite fast (within the ns range according to our previous MD simulations), whereas opening/closing rate is very slow (100 ms-s). So, the combination of slow wash in/washout, fast binding/unbinding, and slow opening/closing would make it very difficult to test the state-dependence of the binding by using a fast perfusion or different voltage protocols.
  
  (4) In the previous report (Liin et al. Cell Reports, 2018), K326 is the most critical site for PUFA binding. Why the K326 mutants are not included in the current study? I also would like to see the single-channel recordings of the K326E mutant, which showed a smaller Gmax. Does the PUFA application reduce the probability of non-empty traces in this mutant?
  
  As Liin et al. reported, mutations of K326 reduce the ability of PUFA to increase the Gmax. In this work, we wanted to gain further biophysical information on the mechanism that leads to an increase in Gmax, considering the knowledge we had from work conducted in our lab previously. We therefore focused here on residues downstream of K326 that we think are important for inducing the conformational changes at the selectivity filter. We agree that single channel experiments on K326E would be very interesting but that has to be for a future study.
  
  Minor points
  
  (1) Liin et al. used S209F (Po of 0.4) and I204F (Po of 0.04) mutants. Their single-channel recordings would be a good addition.
  
  We thank the reviewer for the suggestion. However, single channels analysis on S209F and I204F were previously shown (Eldstrom et al., 2010).
  
  (2) I would like to see how the Site I mutations (R2Q/Q3R) affect (or do not affect) the single-channel recordings (open probability and latency).
  
  Thank you for the excellent suggestion. It would be interesting to assess the behavior of the channel when mutations occur at Site I. However, we think this information will not add any more detail to this study as we focus here our attention on the mechanism for Gmax increase. Single channels recordings are extremely hard to get, therefore we chose to include only mutations at Site II for this study.
  
  (3) I would like the G-V curves for all the mutations at 0 and 20 uM of Lin-Glycine (Figure 3C and Figures 5A and B).
  
  We now added the G-V curves in Supplementary Figure S7.
  
  (4) I assume all the PUFAs have a similar effect on the selectivity filter, but a few other examples of PUFAs would be nice to see.
  
  We anticipate that PUFAs and analogues with similar properties to Lin-Glycine would increasing the Gmax by a similar mechanism, because other PUFAs have been previously shown to increase the Gmax (Bohannon et al., 2020).
  
  (5) Although the probabilities of non-empty sweeps are written in the manuscript, bar graph presentations would be a nice addition to Figures 2 and 6.
  
  We have added bar graphs of non-empty sweeps for Fig 2 and 6 in.
  
  (6) Is there no statistical significance for D317E and T309S in Figure 5A?
  
  No statistical significance for D317E and T309S
  
  (7) There is no reference to Figure 7 in the manuscript.
  
  A reference to Figure 7 has been added to the manuscript in the following paragraph.
  
  “Taken together, our results suggest that the binding of PUFA to Site II increases Gmax by promoting a series of interactions that stabilize the channel pore in the conductive state. For instance, we speculate that in the conductive state, hydrogen bonds between W304-D317 and W305-Y315, which are likely absent in the non-conductive conformation of KCNQ1, are created and that PUFA binding to Site II favors the transition towards the conductive state of the channel (Figure 7)”
  
  Reviewer #3:
  
  (1) Clarify the structural figures. Figures 3 D, E, and F - explain what the colors indicate.
  
  A more detailed description of Figure 3 has been added to the legend.
  
  “D, E and F) Structure of crevice between S5 and S6 in KCNQ1 with S4 up (D and E) and S4 down (F). Residues that surround the crevice from S6 shown in blue (K326, T327, S330, V334) and from S5 in red (D301, A300, L303, F270). Remaining KCNQ1 residues shown in purple…, linoleic acid (LIN: gold color)”
  
  Fig 4. Only side chains of the residues are shown, making it hard to relate the figure to the familiar K channel selectivity filter. The main chain of the entire selectivity should be shown to orient readers to the familiar view of the K channel selectivity filter. In addition, the structures shown are only part of the selectivity filter, it should be specified which part of the selectivity filter is shown. These will also help the discussion at the bottom of page 10 and subsequent text.
  
  We now provide a new Figure 4 with more details such as the main chain of the whole selectivity filter and surrounding peptide.
  
  (2) Cautions should be stated clearly when the structural comparison between the S4-up and S4-down is made that the structure of the pore when it is closed with S4-up may differ from the structure of the pore with S4-down.
  
  We now state in addition “Clearly, there will be other differences in the pore domain between structures with activated and resting VSDs, for example the state of the activation gate.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.11.575247v2
www.biorxiv.org www.biorxiv.org

Distractor effects in decision making are related to the individual’s style of integrating choice attributes

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  Reviewer #1 (Public Review):
  
  The authors did a great job addressing the weaknesses I raised in the previous round of review, except on the generalizability of the current result in the larger context of multi-attribute decision-making. It is not really a weakness of the manuscript but more of a limitation of the studied topic, so I want to keep this comment for public readers.
  
  The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, where a multiplicative rule is hard to formulate?
  
  We thank the reviewer for the comment. With regards whether the current type of stimuli may have biased participants to use an additive rule rather, we believe many other forms of stimuli for representing choice attributes would be equally likely to cause a similar bias. This is because the additive strategy is an inherently simplistic and natural way to integrate different pieces of non-interacting information. More importantly, even though it is easy to employ an additive strategy, most participants still demonstrated some levels of employing the multiplicative rule. However, it would indeed be interesting for future studies to explore whether the current composite model remains dominant in situations where the optimal solutions require an additive or subtractive rule, such as those concerning quality and price.
  
  “The same would apply even with a different choice of cues as long as the information is conveyed by two independent visual features.”
  
  “While the additive strategy is a natural and simple approach for integrating non-interacting pieces of information, to some extent, participants also used the multiplicative strategy that was optimal in the current experiment. A general question for such composite models is whether people mix two strategies in a consistent manner on every trial or whether there is some form of probabilistic selection occurring between the two strategies on each trial such that only one strategy is used on any given trial while, on average, one strategy is more probable than the other. It would also be interesting to examine whether a composite model is appropriate in contexts where the optimal solution is additive or subtractive, such as those concerning quality and price.”
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The current study provided a follow-up analysis using published datasets focused on the individual variability of both the distraction effect (size and direction) and the attribute integration style, as well as the association between the two. The authors tried to answer the question of whether the multiplicative attribute integration style concurs with a more pronounced and positively oriented distraction effect.
  
  Strengths:
  
  The analysis extensively examined the impacts of various factors on decision accuracy, with a particular focus on using two-option trials as control trials, following the approach established by Cao & Tsetsos (2022). The statistical significance results were clearly reported.
  
  The authors meticulously conducted supplementary examinations, incorporating the additional term HV+LV into GLM3. Furthermore, they replaced the utility function from the expected value model with values from the composite model.
  
  We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.
  
  Reviewer #1 Comment 1
  
  Weaknesses:
  
  There are several weaknesses in terms of theoretical arguments and statistical analyses.
  
  First, the manuscript suggests in the abstract and at the beginning of the introduction that the study reconciled the "different claims" about "whether distraction effect operates at the level of options' component attributes rather than at the level of their overall value" (see line 13-14), but the analysis conducted was not for that purpose. Integrating choice attributes in either an additive or multiplicative way only reflects individual differences in combining attributes into the overall value. The authors seemed to assume that the multiplicative way generated the overall value ("Individuals who tended to use a multiplicative approach, and hence focused on overall value", line 20-21), but such implicit assumption is at odds with the statement in line 77-79 that people may use a simpler additive rule to combine attributes, which means overall value can come from the additive rule.
  
  We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent. Within this manuscript, our primary focus is on the different methods of value integration in which the overall value is computed (i.e., additive, multiplicative, or both), rather than the interaction at the individual level of attributes. However, we do not exclude the possibility that the distractor effect may occur at multiple levels. Nevertheless, in light of the reviewer’s comment, we agree that we should focus the argument on whether distractors facilitate or impair decision making and downplay the separate argument about the level at which distractor effects operate. We have now revised the abstract:
  
  “It is widely agreed that people make irrational decisions in the presence of irrelevant distractor options. However, there is little consensus on whether decision making is facilitated or impaired by the presence of a highly rewarding distractor or whether the distraction effect operates at the level of options’ component attributes rather than at the level of their overall value. To reconcile different claims, we argue that it is important to incorporate consideration of the diversity of people’s ways of decision making. We focus on a recent debate over whether people combine choice attributes in an additive or multiplicative way. Employing a multi-laboratory dataset investigating the same decision making paradigm, we demonstrated that people used a mix of both approaches and the extent to which approach was used varied across individuals. Critically, we identified that this variability was correlated with the effect of the distractor on decision making. Individuals who tended to use a multiplicative approach to compute value, showed a positive distractor effect. In contrast, in individuals who tended to use an additive approach, a negative distractor effect (divisive normalisation) was prominent. These findings suggest that the distractor effect is related to how value is constructed, which in turn may be influenced by task and subject specificities. Our work concurs with recent behavioural and neuroscience findings that multiple distractor effects co-exist.” (Lines 12-26)
  
  Furthermore, we acknowledge that the current description of the additive rule could be interpreted in several ways. The current additive utility model described as:
  
  where is the options’ utility, is the reward magnitude, is the probability, and is the magnitude/probability weighing ratio . If we perform comparison between values according to this model (i.e., HV against LV), we would arrive at the following comparison:
  
  If we rearrange (1), we will arrive at:
  
  While equations (1) and (2) are mathematically equivalent, equation (1) illustrates the interpretation where the comparison of the utilities occurs after value integration and forming an overall value. On the other hand, equation (2) can be broadly interpreted as the comparison of individual attributes in the absence of an overall value estimate for each option. Nonetheless, while we do not exclude the possibility that the distractor effect may occur at multiple levels, we have made modifications to the main manuscript employ more consistently a terminology referring to different methods of value estimation while recognizing that our empirical results are compatible with both interpretations.
  
  Reviewer #1 Comment 2
  
  The second weakness is sort of related but is more about the lack of coherent conceptual understanding of the "additive rule", or "distractor effect operates at the attribute level". In an assertive tone (lines 77-80), the manuscript suggests that a weighted sum integration procedure of implementing an "additive rule" is equal to assuming that people compare pairs of attributes separately, without integration. But they are mechanistically distinct. The additive rule (implemented using the weighted sum rule to combine probability and magnitude within each option and then applying the softmax function) assumes value exists before comparing options. In contrast, if people compare pairs of attributes separately, preference forms based on the within-attribute comparisons. Mathematically these two might be equivalent only if no extra mechanisms (such as inhibition, fluctuating attention, evidence accumulation, etc) are included in the within-attribute comparison process, which is hardly true in the three-option decision.
  
  We thank the reviewer for the comment. As described in our response to Reviewer #1 Comment 1, we are aware and acknowledge that there may be multiple possible interpretations of the additive rule. We also agree with the reviewer that there may be additional mechanisms that are involved in three- or even two- option decisions, but these would require additional studies to tease apart. Another motivation for the approach used here, which does not explicitly model the extra mechanisms the reviewer refers to was due to the intention of addressing and integrating findings from previous studies using the same dataset [i.e. (Cao & Tsetsos, 2022; Chau et al., 2020)]. Lastly, regardless of the mechanistic interpretation, our results show a systematic difference in the process of value estimation. Modifications to the manuscript text have been made consistent with our motivation (please refer to the reply and the textual changes proposed in response to the reviewer’s previous comment: Reviewer #1 Comment 1).
  
  Reviewer #1 Comment 3
  
  Could the authors comment on the generalizability of the current result? The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, whether a multiplicative rule is hard to formulate?
  
  We thank the reviewer for the comment. We agree with the observation that the stimulus space, with colour linearly correlated with magnitude, and orientation linearly correlated with probability, may bias subjects towards an additive rule. But that’s indeed the point: in order to maximise reward, subjects should have focused on the outcome space without being driven by the stimulus space. In practice, people are more or less successful in such endeavour. Nevertheless, we argue that the specific choice of visual stimuli we used is no more biased towards additive space than any other. In fact, as long as two or more pieces of information are provided for each option, as opposed to a single cue whose value was previously learned, there will always be a bias towards an additive heuristic (a linear combination), regardless of whether the cues are shapes, colours, graphs, numbers, words.
  
  As the reviewer suggested, the dataset analyzed in the current manuscript suggests that the participants were leaning towards the additive rule. Although there was a general tendency using the additive rule while choosing between the rectangular bars, we can still observe a spread of individuals using either, or both, additive and multiplicative rules, suggesting that there was indeed diversity in participants’ decision making strategies in our data.
  
  In previous studies, it was observed that human and non-human individuals used a mix of multiplicative and additive rules when they were tested on experimental paradigms different from ours (Bongioanni et al., 2021; Farashahi et al., 2019; Scholl et al., 2014). It was also observed that positive and negative distractor effects can be both present in the same data set when human and non-human individuals made decisions about food and social partner (Chang et al., 2019; Louie et al., 2013). It was less clear in the past whether the precise way a distractor affects decision making (i.e., positive/negative distractor effect) is related to the use of decision strategy (i.e., multiplicative/additive rules) and this is exactly what we are trying to address in this manuscript. A follow-up study looking at neural data (such as functional magnetic resonance imaging data) could provide a better understanding of the mechanistic nature of the relationship between distractor effects and decision strategy that we identified here.
  
  We agree with the reviewer that it is true that a multiplicative strategy may not be applicable to some decision contexts. Here it is important to look at the structure of the optimal solution (the one maximizing value in the long run). Factors modulating value (such as probability and temporal delay) require a non-linear (e.g., multiplicative solution), while factors of the cost-benefit form (such as effort and price) require a linear solution (e.g., subtraction). In the latter scenario the additive heuristic would coincide with the optimal solution, and the effect addressed in this study may not be revealed. Nonetheless, the present data supports the notion of distinct neural mechanisms at least for probabilistic decision-making, and is likely applicable to decision-making in general.
  
  Our findings, in conjunction with the literature, also suggest that a positive distractor effect could be a general phenomenon in decision mechanisms that involve the medial prefrontal cortex. For example, it has been shown that the positive distractor effect is related to a decision mechanism linked to medial prefrontal cortex [especially the ventromedial prefrontal cortex (Chau et al., 2014; Noonan et al., 2017)]. It is also known a similar brain region is involved not only when individuals are combining information using a multiplicative strategy (Bongioanni et al., 2021), but also when they are combining information to evaluate new experience or generalize information (Baram et al., 2021; Barron et al., 2013; Park et al., 2021). We have now revised the Discussion to explain this:
  
  “In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 260-274)
  
  Reviewer #1 Comment 4
  
  The authors did careful analyses on quantifying the "distractor effect". While I fully agree that it is important to use the matched two-option trials and examine the interaction terms (DV-HV)T as a control, the interpretation of the results becomes tricky when looking at the effects in each trial type. Figure 2c shows a positive DV-HV effect in two-option trials whereas the DV-HV effect was not significantly stronger in three-option trials. Further in Figure 5b,c, in the Multiplicative group, the effect of DV-HV was absent in the two-option trials and present in the three-option trials. In the Additive group, however, the effect of DV-HV was significantly positive in the two-option trials but was significantly lowered in the three-option trials. Hence, it seems the different distractor effects were driven by the different effects of DV-HV in the two-option trials, rather than the three-option trials?
  
  We thank the reviewer for the comment. While it may be a bit more difficult to interpret, the current method of examining the (DV−HV)T term rather than (DV−HV) term was used because it was the approach used in a previous study (Cao & Tsetsos, 2022).
  
  During the design of the original experiments, trials were generated pseudo-randomly until the DV was sufficiently decorrelated from HV−LV. While this method allows for better group-level examination of behaviour, Cao and Tsetsos were concerned that this approach may have introduced unintended confounding covariations to some trials. In theory, one of the unintended covariations could occur between the DV and specific sets of reward magnitude and probability of the HV and LV. The covariation between parameters can lead to an observable positive distractor effect in the DV−HV as a consequence of the attraction effect or an unintended byproduct of using an additive method of integrating attributes [for further elaboration, please refer to Figure 1 in (Cao & Tsetsos, 2022)]. While it may have some limitations, the approach suggested by Cao and Tsetsos has the advantage of leveraging the DV−HV term to absorb any variance contributed by possible confounding factors such that true distractor effects, if any, can be detected using the (DV−HV)T term.
  
  Reviewer #1 Comment 5
  
  Note that the pattern described above was different in Supplementary Figure 2, where the effect of DV-HV on the two-option trials was negative for both Multiplicative and Additive groups. I would suggest considering using Supplementary Figure 2 as the main result instead of Figure 5, as it does not rely on multiplicative EV to measure the distraction effect, and it shows the same direction of DV-HV effect on two-option trials, providing a better basis to interpret the (DV-HV)T effect.
  
  We thank the reviewer for the comments and suggestion. However, as mentioned in the response to Reviewer #1 Comment 4, the current method of analysis adopted in the manuscript and the interpretation of only (DV−HV)T is aimed to address the possibility that the (DV−HV) term may be capturing some confounding effects due to covariation. Given that the debate that is addressed specifically concerns the (DV−HV)T term, we elected to display Figure 5 within the main text and keep the results of the regression after replacing the utility function with the composite model as Supplementary Figure 5 (previously labelled as Supplementary Figure 2).
  
  Reviewer #2 (Public Review):
  
  This paper addresses the empirical demonstration of "distractor effects" in multi-attribute decision-making. It continues a debate in the literature on the presence (or not) of these effects, which domains they arise in, and their heterogeneity across subjects. The domain of the study is a particular type of multi-attribute decision-making: choices over risky lotteries. The paper reports a re-analysis of lottery data from multiple experiments run previously by the authors and other laboratories involved in the debate.
  
  Methodologically, the analysis assumes a number of simple forms for how attributes are aggregated (adaptively, multiplicatively, or both) and then applies a "reduced form" logistic regression to the choices with a number of interaction terms intended to control for various features of the choice set. One of these interactions, modulated by ternary/binary treatment, is interpreted as a "distractor effect."
  
  The claimed contribution of the re-analysis is to demonstrate a correlation in the strength/sign of this treatment effect with another estimated parameter: the relative mixture of additive/multiplicative preferences.
  
  We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.
  
  Reviewer #2 Comment 1
  
  Major Issues
  
  (1) How to Interpret GLM 1 and 2
  
  This paper, and others before it, have used a binary logistic regression with a number of interaction terms to attempt to control for various features of the choice set and how they influence choice. It is important to recognize that this modelling approach is not derived from a theoretical claim about the form of the computational model that guides decision-making in this task, nor an explicit test for a distractor effect. This can be seen most clearly in the equations after line 321 and its corresponding log-likelihood after 354, which contain no parameter or test for "distractor effects". Rather the computational model assumes a binary choice probability and then shoehorns the test for distractor effects via a binary/ternary treatment interaction in a separate regression (GLM 1 and 2). This approach has already led to multiple misinterpretations in the literature (see Cao & Tsetsos, 2022; Webb et al., 2020). One of these misinterpretations occurred in the datasets the authors studied, in which the lottery stimuli contained a confound with the interaction that Chau et al., (2014) were interpreting as a distractor effect (GLM 1). Cao & Tsetsos (2022) demonstrated that the interaction was significant in binary choice data from the study, therefore it can not be caused by a third alternative. This paper attempts to address this issue with a further interaction with the binary/ternary treatment (GLM 2). Therefore the difference in the interaction across the two conditions is claimed to now be the distractor effect. The validity of this claim brings us to what exactly is meant by a "distractor effect."
  
  The paper begins by noting that "Rationally, choices ought to be unaffected by distractors" (line 33). This is not true. There are many normative models that allow for the value of alternatives (even low-valued "distractors") to influence choices, including a simple random utility model. Since Luce (1959), it has been known that the axiom of "Independence of Irrelevant Alternatives" (that the probability ratio between any two alternatives does not depend on a third) is an extremely strong axiom, and only a sufficiency axiom for a random utility representation (Block and Marschak, 1959). It is not a necessary condition of a utility representation, and if this is our definition of rational (which is highly debatable), not necessary for it either. Countless empirical studies have demonstrated that IIA is falsified, and a large number of models can address it, including a simple random utility model with independent normal errors (i.e. a multivariate Probit model). In fact, it is only the multinomial Logit model that imposes IIA. It is also why so much attention is paid to the asymmetric dominance effect, which is a violation of a necessary condition for random utility (the Regularity axiom).
  
  So what do the authors even mean by a "distractor effect." It is true that the form of IIA violations (i.e. their path through the probability simplex as the low-option varies) tells us something about the computational model underlying choice (after all, different models will predict different patterns). However we do not know how the interaction terms in the binary logit regression relate to the pattern of the violations because there is no formal theory that relates them. Any test for relative value coding is a joint test of the computational model and the form of the stochastic component (Webb et al, 2020). These interaction terms may simply be picking up substitution patterns that can be easily reconciled with some form of random utility. While we can not check all forms of random utility in these datasets (because the class of such models is large), this paper doesn't even rule any of these models out.
  
  We thank the reviewer for the comment. In this study, one objective is to address an issue raised by Cao and Tsetsos (2022), suggesting that the distractor effect claimed in the Chau et al. (2014) study was potentially confounded by unintended correlation introduced between the distractor and the chooseable options. They suggested that this could be tested by analyzing the control binary trials and the experimental ternary trials in a single model (i.e., GLM2) and introducing an interaction term (DV−HV)T. The interaction term can partial out any unintended confound and test the distractor effect that was present specifically in the experimental ternary trials. We adopted these procedures in our current studies and employed the interaction term to test the distractor effects. The results showed that overall there was no significant distractor effect in the group. We agree with the reviewer’s comment that if we were only analysing the ternary trials, a multinomial probit model would be suitable because it allows noise correlation between the choices. Alternatively, had a multinomial logistic model been applied, a Hausman-McFadden Test could be run to test whether the data violates the assumption of independence of irrelevant alternatives (IIA). However, in our case, a binomial model is preferred over a multinomial model because of: (1) the inclusion of the binary trials, and (2) the small number of trials in which the distractor was chosen (the median was 4% of all ternary trials).
  
  However, another main objective of this study is to consider the possibility that the precise distractor effect may vary across individuals. This is exactly why we employed the composite model to estimate individual’s decision making strategy and investigated how that varied with the precise way the distractor influenced decision making.
  
  In addition, we think that the reviewer here is raising a profound point and one with which we are in sympathy; it is true that random noise utility models can predict deviations from the IIA axiom. Central to these approaches is the notion that the representations of the values of choice options are noisy. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion as if each sample were being drawn from a distribution. As a consequence, the value of a distractor that is “drawn” during a decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Our understanding is that Webb, Louie and colleagues (Louie et al., 2013; Webb et al., 2020) suggest an explanation approximately along these lines when they reported a negative distractor effect during some decisions, i.e., they follow the predictions of divisive normalization suggesting that decisions become more random as the distractor’s value is greater.
  
  An alternative approach, however, assumes that rather than noise in the representation of the option itself, there is noise in the comparison process when the two options are compared. This is exemplified in many influential decision making models including evidence accumulation models such as drift diffusion models (Shadlen & Shohamy, 2016) and recurrent neural network models of decision making (Wang, 2008). It is this latter type of model that we have used in our previous investigations (Chau et al., 2020; Kohl et al., 2023). However, these two approaches are linked both in their theoretical origin and in the predictions that they make in many situations (Shadlen & Shohamy, 2016). We therefore clarify that this is the case in the revised manuscript as follows:
  
  “In the current study and in previous work we have used or made reference to models of decision making that assume that a noisy process of choice comparison occurs such as recurrent neural networks and drift diffusion models (Shadlen & Shohamy, 2016; Wang, 2008). Under this approach, positive distractor effects are predicted when the comparison process becomes more accurate because of an impact on the noisy process of choice comparison (Chau et al., 2020; Kohl et al., 2023). However, it is worth noting that another class of models might assume that a choice representation itself is inherently noisy. According to this approach, on any given decision a sample is drawn from a distribution of value estimates in a noisy representation of the option. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion. As a consequence, the value of a distractor that is “drawn” during decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Louie and colleagues (Louie et al., 2013) suggest an explanation approximately along these lines when they reported a positive distractor effect during some decisions. Such different approaches share theoretical origins (Shadlen & Shohamy, 2016) and make related predictions about the impact of distractors on decision making.” (Lines 297-313)
  
  Reviewer #2 Comment 2
  
  (2) How to Interpret the Composite (Mixture) model?
  
  On the other side of the correlation are the results from the mixture model for how decision-makers aggregate attributes. The authors report that most subjects are best represented by a mixture of additive and multiplicative aggregation models. The authors justify this with the proposal that these values are computed in different brain regions and then aggregated (which is reasonable, though raises the question of "where" if not the mPFC). However, an equally reasonable interpretation is that the improved fit of the mixture model simply reflects a misspecification of two extreme aggregation processes (additive and EV), so the log-likelihood is maximized at some point in between them.
  
  One possibility is a model with utility curvature. How much of this result is just due to curvature in valuation? There are many reasonable theories for why we should expect curvature in utility for human subjects (for example, limited perception: Robson, 2001, Khaw, Li Woodford, 2019; Netzer et al., 2022) and of course many empirical demonstrations of risk aversion for small stakes lotteries. The mixture model, on the other hand, has parametric flexibility.
  
  There is also a large literature on testing expected utility jointly with stochastic choice, and the impact of these assumptions on parameter interpretation (Loomes & Sugden, 1998; Apesteguia & Ballester, 2018; Webb, 2019). This relates back to the point above: the mixture may reflect the joint assumption of how choice departs from deterministic EV.
  
  We thank the reviewer for the comment. They are indeed right to mention the vast literature on curvature in subjective valuation; however it is important to stress that the predictions of the additive model with linear basis functions are quite distinct for the predictions of a multiplicative model with non-linear basis functions. We have tested the possibility that participants’ behaviour was better explained by the latter and we showed that this was not the case. Specifically, we have added and performed model fitting on an additional model with utility curvature based on prospect theory (Kahneman & Tversky, 1979) with the weighted probability function suggested by (Prelec, 1998):
  
  where and represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively. is the weighted magnitude and is the weighted probability, while and are the corresponding distortion parameters. This prospect theory (PT) model is included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720). We have now included these results in the main text and Supplementary Figure 2:
  
  “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)
  
  Reviewer #2 Comment 3
  
  3) So then how should we interpret the correlation that the authors report?
  
  On one side we have the impact of the binary/ternary treatment which demonstrates some impact of the low value alternative on a binary choice probability. This may reflect some deep flaws in existing theories of choice, or it may simply reflect some departure from purely deterministic expected value maximization that existing theories can address. We have no theory to connect it to, so we cannot tell. On the other side of the correlation, we have a mixture between additive and multiplicative preferences over risk. This result may reflect two distinct neural processes at work, or it may simply reflect a misspecification of the manner in which humans perceive and aggregate attributes of a lottery (or even just the stimuli in this experiment) by these two extreme candidates (additive vs. EV). Again, this would entail some departure from purely deterministic expected value maximization that existing theories can address.
  
  It is entirely possible that the authors are reporting a result that points to the more exciting of these two possibilities. But it is also possible (and perhaps more likely) that the correlation is more mundane. The paper does not guide us to theories that predict such a correlation, nor reject any existing ones. In my opinion, we should be striving for theoretically-driven analyses of datasets, where the interpretation of results is clearer.
  
  We thank the reviewer for their clear comments. Based on our responses to the previous comments it should be apparent that our results are consistent with several existing theories of choice, so we are not claiming that there are deep flaws in them, but distinct neural processes (additive and multiplicative) are revealed, and this does not reflect a misspecification in the modelling. We have revised our manuscript in the light of the reviewer’s comments in the hope of clarifying the theoretical background which informed both our data analysis and our data interpretation.
  
  First, we note that there are theoretical reasons to expect a third option might impact on choice valuation. There is a large body of work suggesting that a third option may have an impact on the values of two other options (indeed Reviewer #2 refers to some of this work in their Reviewer #2 Comment 1), but the body of theoretical work originates partly in neuroscience and not just in behavioural economics. In many sensory systems, neural activity changes with the intensity of the stimuli that are sensed. Divisive normalization in sensory systems, however, describes the way in which such neural responses are altered also as a function of other adjacent stimuli (Carandini & Heeger, 2012; Glimcher, 2022; Louie et al., 2011, 2013). The phenomenon has been observed at neural and behavioural levels as a function not just of the physical intensity of the other stimuli but as a function of their associated value (Glimcher, 2014, 2022; Louie et al., 2011, 2015; Noonan et al., 2017; Webb et al., 2020).
  
  Analogously there is an emerging body of work on the combinatorial processes that describe how multiple representational elements are integrated into new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). These studies have originated in neuroscience, just as was the case with divisive normalization, but they may have implications for understanding behaviour. For example, they might be linked to behavioural observations that the values assigned to bundles of goods are not necessarily the sum of the values of the individual goods (Hsee, 1998; List, 2002). One neuroscience fact that we know about such processes is that, at an anatomical level, they are linked to the medial frontal cortex (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). A second neuroscientific fact that we know about medial frontal cortex is that it is linked to any positive effects that distractors might have on decision making (Chau et al., 2014; Noonan et al., 2017). Therefore, we might make use of these neuroscientific facts and theories to predict a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. This is precisely what we did; we predicted the correlation on the basis of this body of work and when we tested to see if it was present, we found that indeed it was. It may be the case that other behavioural economics theories offer little explanation of the associations and correlations that we find. However, we emphasize that this association is predicted by neuroscientific theory and in the revised manuscript we have attempted to clarify this in the Introduction and Discussion sections:
  
  “Given the overlap in neuroanatomical bases underlying the different methods of value estimation and the types of distractor effects, we further explored the relationship. Critically, those who employed a more multiplicative style of integrating choice attributes also showed stronger positive distractor effects, whereas those who employed a more additive style showed negative distractor effects. These findings concur with neural data demonstrating that the medial prefrontal cortex (mPFC) computes the overall values of choices in ways that go beyond simply adding their components together, and is the neural site at which positive distractor effects emerge (Barron et al., 2013; Bongioanni et al., 2021; Chau et al., 2014; Fouragnan et al., 2019; Noonan et al., 2017; Papageorgiou et al., 2017), while divisive normalization was previously identified in the posterior parietal cortex (PPC) (Chau et al., 2014; Louie et al., 2011).” (Lines 109-119)
  
  “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.
  
  In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)
  
  Reviewer #2 Comment 4
  
  (4) Finally, the results from these experiments might not have external validity for two reasons. First, the normative criterion for multi-attribute decision-making differs depending on whether the attributes are lotteries or not (i.e. multiplicative vs additive). Whether it does so for humans is a matter of debate. Therefore if the result is unique to lotteries, it might not be robust for multi-attribute choice more generally. The paper largely glosses over this difference and mixes literature from both domains. Second, the lottery information was presented visually and there is literature suggesting this form of presentation might differ from numerical attributes. Which is more ecologically valid is also a matter of debate.
  
  We thank the reviewer for the comment. Indeed, they are right that the correlation we find between value estimation style and distractor effects may not be detected in all contexts of human behaviour. What the reviewer suggests goes along the same lines as our response to Reviewer #1 Comment 3, multi-attribute value estimation may have different structure: in some cases, the optimal solution may require a non-linear (e.g., multiplicative) response as in probabilistic or delayed decisions, but other cases (e.g., when estimating the value of a snack based on its taste, size, healthiness, price) a linear integration would suffice. In the latter kind of scenarios, both the optimal and the heuristic solutions may be additive and people’s value estimation “style” may not be teased apart. However, if different neural mechanisms associated with difference estimation processes are observed in certain scenarios, it suggests that these mechanisms are always present, even in scenarios where they do not alter the predictions. Probabilistic decision-making is also pervasive in many aspects of daily life and not just limited to the case of lotteries.
  
  While behaviour has been found to differ depending on whether lottery information is presented graphically or numerically, there is insufficient evidence to suggest biases towards additive or multiplicative evaluation, or towards positive or negative distractor effects. As such, we may expect that the correlation that we reveal in this paper, grounded in distinct neural mechanisms, would still hold even under different circumstances.
  
  Taking previous literature as examples, similar patterns of behaviour have been observed in humans when making decisions during trinary choice tasks. In a study conducted by Louie and colleagues (Louie et al., 2013; Webb et al., 2020), human participants performed a snack choice task where their behaviour could be modelled by divisive normalization with biphasic response (i.e., both positive and negative distractor effects). While these two studies only use a single numerical value of price for behavioural modelling, these prices should originate from an internal computation of various attributes related to each snack that are not purely related to lotteries. Expanding towards the social domain, studies of trinary decision making have considered face attractiveness and averageness (Furl, 2016), desirability of hiring (Chang et al., 2019), as well as desirability of candidates during voting (Chang et al., 2019). These choices involve considering various attributes unrelated to lotteries or numbers and yet, still display a combination of positive distractor and negative distractor (i.e. divisive normalization) effects, as in the current study. In particular, the experiments carried out by Chang and colleagues (Chang et al., 2019) involved decisions in a social context that resemble real-world situations. These findings suggests that both types of distractor effects can co-exist in other value based decision making tasks (Li et al., 2018; Louie et al., 2013) as well as decision making tasks in social contexts (Chang et al., 2019; Furl, 2016).
  
  Reviewer #2 Comment 5
  
  Minor Issues:
  
  The definition of EV as a normative choice baseline is problematic. The analysis requires that EV is the normative choice model (this is why the HV-LV gap is analyzed and the distractor effect defined in relation to it). But if the binary/ternary interaction effect can be accounted for by curvature of a value function, this should also change the definition of which lottery is HV or LV for that subject!
  
  We thank the reviewer for the comment. While the initial part of the paper discussed results that were defined by the EV model, the results shown in Supplementary Figure 2 were generated by replacing the utility function based on values obtained by using the composite model. Here, we have also redefined the definition of HV or LV for each subject depending on the updated value generated by the composite model prior to the regression.
  
  References
  
  Apesteguia, J. & Ballester, M. Monotone stochastic choice models: The case of risk and time preferences. Journal of Political Economy (2018).
  
  Block, H. D. & Marschak, J. Random Orderings and Stochastic Theories of Responses. Cowles Foundation Discussion Papers (1959).
  
  Khaw, M. W., Li, Z. & Woodford, M. Cognitive Imprecision and Small-Stakes Risk Aversion. Rev. Econ. Stud. 88, 1979-2013 (2020).
  
  Loomes, G. & Sugden, R. Testing Different Stochastic Specificationsof Risky Choice. Economica 65, 581-598 (1998).
  
  Luce, R. D. Indvidual Choice Behaviour. (John Wiley and Sons, Inc., 1959).
  
  Netzer, N., Robson, A. J., Steiner, J. & Kocourek, P. Endogenous Risk Attitudes. SSRN Electron. J. (2022) doi:10.2139/ssrn.4024773.
  
  Robson, A. J. Why would nature give individuals utility functions? Journal of Political Economy 109, 900-914 (2001).
  
  Webb, R. The (Neural) Dynamics of Stochastic Choice. Manage Sci 65, 230-255 (2019).
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The way an unavailable (distractor) alternative impacts decision quality is of great theoretical importance. Previous work, led by some of the authors of this study, had converged on a nuanced conclusion wherein the distractor can both improve (positive distractor effect) and reduce (negative distractor effect) decision quality, contingent upon the difficulty of the decision problem. In very recent work, Cao and Tsetsos (2022) reanalyzed all relevant previous datasets and showed that once distractor trials are referenced to binary trials (in which the distractor alternative is not shown to participants), distractor effects are absent. Cao and Tsetsos further showed that human participants heavily relied on additive (and not multiplicative) integration of rewards and probabilities.
  
  The present study by Wong et al. puts forward a novel thesis according to which interindividual differences in the way of combining reward attributes underlie the absence of detectable distractor effect at the group level. They re-analysed the 144 human participants and classified participants into a "multiplicative integration" group and an "additive integration" group based on a model parameter, the "integration coefficient", that interpolates between the multiplicative utility and the additive utility in a mixture model. They report that participants in the "multiplicative" group show a negative distractor effect while participants in the "additive" group show a positive distractor effect. These findings are extensively discussed in relation to the potential underlying neural mechanisms.
  
  Strengths:
  
  - The study is forward-looking, integrating previous findings well, and offering a novel proposal on how different integration strategies can lead to different choice biases.
  
  - The authors did an excellent job of connecting their thesis with previous neural findings. This is a very encompassing perspective that is likely to motivate new studies towards a better understanding of how humans and other animals integrate information in decisions under risk and uncertainty.
  
  - Despite that some aspects of the paper are very technical, methodological details are well explained and the paper is very well written.
  
  We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.
  
  Reviewer #3 Comment 1
  
  Weaknesses:
  
  The authors quantify the distractor variable as "DV - HV", i.e., the relative distractor variable. Do the conclusions hold when the distractor is quantified in absolute terms (as "DV", see also Cao & Tsetsos, 2023)? Similarly, the authors show in Suppl. Figure 1 that the inclusion of a HV + LV regressor does not alter their conclusions. However, the (HV + LV)*T regressor was not included in this analysis. Does including this interaction term alter the conclusions considering there is a high correlation between (HV + LV)*T and (DV - HV)*T? More generally, it will be valuable if the authors assess and discuss the robustness of their findings across different ways of quantifying the distractor effect.
  
  We thank the reviewer for the comment. In the original manuscript we had already demonstrated that the distractor effect was related to the integration coefficient using a number of complementary analyses. They include Figure 5 based on GLM2, Supplementary Figure 3 based on GLM3 (i.e., adding the HV+LV term to GLM2), and Supplementary Figure 4 based on GLM2 but applying the utility estimate from the composite model instead of expected value (EV). These three sets of analyses produced comparable results. The reason why we elected not to include the (HV+LV)T term in GLM3 (Supplementary Figure 3) was due to the collinearity between the regressors in the GLM. If this term is included in GLM3, the variance inflation factor (VIF) would exceed an acceptable level of 4 for some regressors. In particular, the VIF for the (HV+LV) and (HV+LV)T regressors is 5.420, while the VIF for (DV−HV) and (DV−HV)T is 4.723.
  
  Here, however, we consider the additional analysis suggested by the reviewer and test whether similar results are obtained. We constructed GLM4 including the (HV+LV)T term but replacing the relative distractor value (DV-HV) with the absolute distractor value (DV) in the main term and its interactions, as follows:
  
  GLM4:
  
  A significant negative (DV)T effect was found for the additive group [t(72)=−2.0253, p=0.0465] while the multiplicative group had a positive trend despite not reaching significance. Between the two groups, the (DV)T term was significantly different [t(142)=2.0434, p=0.0429]. While these findings suggest that the current conclusions could be partially replicated, simply replacing the relative distractor value with the absolute value in the previous analyses resulted in non-significant findings. Taking these results together with the main findings, it is possible to conclude that the positive distractor effect is better captured using the relative DV-HV term rather than the absolute DV term. This would be consistent with the way in which option values are envisaged to interact with one another in the mutual inhibition model (Chau et al., 2014, 2020) that generates the positive distractor effect. The model suggests that evidence is accumulated as the difference between the excitatory input from the option (e.g. the HV option) and the pooled inhibition contributed partly by the distractor. We have now included these results in the manuscript:
  
  “Finally, we performed three additional analyses that revealed comparable results to those shown in Figure 5. In the first analysis, reported in Supplementary Figure 3, we added an term to the GLM, because this term was included in some analyses of a previous study that used the same dataset (Chau et al., 2020). In the second analysis, we added an term to the GLM. We noticed that this change led to inflation of the collinearity between the regressors and so we also replaced the (DV−HV) term by the DV term to mitigate the collinearity (Supplementary Figure 4). In the third analyses, reported in Supplementary Figure 5, we replaced the utility terms of GLM2. Since the above analyses involved using HV, LV, and DV values defined by the normative Expected Value model, here, we re-defined the values using the composite model prior to applying GLM2. Overall, in the Multiplicative Group a significant positive distractor effect was found in Supplementary Figures 3 and 4. In the Additive Group a significant negative distractor effect was found in Supplementary Figures 3 and 5. Crucially, all three analyses consistently showed that the distractor effects were significantly different between the Multiplicative Group and the Additive Group.” (Lines 225-237)
  
  Reviewer #3 Comment 2
  
  The central finding of this study is that participants who integrate reward attributes multiplicatively show a positive distractor effect while participants who integrate additively show a negative distractor effect. This is a very interesting and intriguing observation. However, there is no explanation as to why the integration strategy covaries with the direction of the distractor effect. It is unlikely that the mixture model generates any distractor effect as it combines two "context-independent" models (additive utility and expected value) and is fit to the binary-choice trials. The authors can verify this point by quantifying the distractor effect in the mixture model. If that is the case, it will be important to highlight that the composite model is not explanatory; and defer a mechanistic explanation of this covariation pattern to future studies.
  
  We thank the reviewer for the comment. Indeed, the main purpose of applying the mixture model was to identify the way each participants combined attributes and, as the reviewer pointed out, the mixture model per se is context independent. While we acknowledge that the mixture model is not a mechanistic explanation, there is a theoretical basis for the observation that these two factors are linked.
  
  Firstly, studies that have examined the processes involved when humans combine and integrate different elements to form new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023) have implicated the medial frontal cortex as a crucial region (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). Meanwhile, previous studies have also identified that positive distractor effects are linked to the medial frontal cortex (Chau et al., 2014; Noonan et al., 2017). Therefore, the current study utilized these two facts to establish the basis for a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. Nevertheless, we agree with the reviewer that it will be an important future direction to look at how the covariation pattern emerges in a computational model. We have revised the manuscript in an attempt to address this issue.
  
  “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.
  
  In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)
  
  Reviewer #3 Comment 3
  
  - Correction for multiple comparisons (e.g., Bonferroni-Holm) was not applied to the regression results. Is the "negative distractor effect in the Additive Group" (Fig. 5c) still significant after such correction? Although this does not affect the stark difference between the distractor effects in the two groups (Fig. 5a), the classification of the distractor effect in each group is important (i.e., should future modelling work try to capture both a negative and a positive effect in the two integration groups? Or just a null and a positive effect?).
  
  We thank the reviewer for the comment. We have performed Bonferroni-Holm correction and as the reviewer surmised, the negative distractor effect in the additive group becomes non-significant. However, we have to emphasize that our major claim is that there was a covariation between decision strategy (of combining attributes) and distractor effect (as seen in Figure 4). That analysis does not imply multiple comparisons. The analysis in Figure 5 that splits participants into two groups was mainly designed to illustrate the effects for an easier understanding by a more general audience. In many cases, the precise ways in which participants are divided into subgroups can have a major impact on whether each individual group’s effects are significant or not. It may be possible to identify an optimal way of grouping, but we refrained from taking such a trial-and-error approach, especially for the analysis in Figure 5 that simply supplements the point made in Figure 4. The key notion we would like the readers to take away is that there is a spectrum of distractor effects (ranging from negative to positive) that will vary depending on how the choice attributes were integrated.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Reviewer #1 Recommendations 1
  
  Enhancements are necessary for the quality of the scientific writing. Several sentences have been written in a negligent manner and warrant revision to ensure a higher level of rigor. Moreover, a number of sentences lack appropriate citations, including but not restricted to:
  
  - Line 39-41.
  
  - Line 349-350 (also please clarify what it means by parameter estimate" is very accurate: correlation?).
  
  We thank the reviewer for the comment. We have made revisions to various parts of the manuscript to address the reviewer’s concerns.
  
  “Intriguingly, most investigations have considered the interaction between distractors and chooseable options either at the level of their overall utility or at the level of their component attributes, but not both (Chau et al., 2014, 2020; Gluth et al., 2018).” (Lines 40-42)
  
  “Additional simulations have shown that the fitted parameters can be recovered with high accuracy (i.e., with a high correlation between generative and recovered parameters).” (Lines 414-416)
  
  Reviewer #1 Recommendations 2
  
  Some other minor suggestions:
  
  - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).
  
  - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).
  
  - Adding some figure titles on Figure 2 so it is clear what each panel stands for.
  
  - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?
  
  - Line 298: binomial linking function (instead of binomial distribution).
  
  - Line 100: composite, not compositive.
  
  - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?
  
  We thank the reviewer for the suggestions. We have made revisions to the title and various parts of the manuscript to address the reviewer’s concerns.
  
  - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).
  
  We have now revised the manuscript:
  
  “Distractor effects in decision making are related to the individual’s style of integrating choice attributes” (title of the manuscript)
  
  “More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 99-100)
  
  “While these results may seem to suggest that a distractor effect was not present at an overall group level, we argue that the precise way in which a distractor affects decision making is related to how individuals integrate the attributes.” (Lines 164-167)
  
  - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).
  
  We have also modified all Figures to remove the intercept.
  
  - Adding some figure titles on Figure 2 so it is clear what each panel stands for.
  
  We have added titles accordingly.
  
  - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?
  
  In conjunction with addressing Reviewer #3 Recommendation 6, we have adapted the violin plots into histograms for a better representation of the values.
  
  - Line 298: binomial linking function (instead of binomial distribution).
  
  - Line 100: composite, not compositive.
  
  - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?
  
  We have made revisions accordingly.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Reviewer #2 Recommendations 1
  
  Line 294. The definition of DV, HV, LV is not sufficient. Presumably, these are the U from the following sections? Or just EV? But this is not explicitly stated, rather they are vaguely referred to as values." The computational modelling section refers to them as utilities. Are these the same thing?
  
  We thank the reviewer for the suggestion. We have clarified that the exact method for calculating each of the values and updated the section accordingly.
  
  “where HV, LV, and DV refer to the values of the chooseable higher value option, chooseable lower value option, and distractor, respectively. Here, values (except those in Supplementary Figure 5) are defined as Expected Value (EV), calculated by multiplying magnitude and probability of reward.” (Lines 348-350)
  
  Reviewer #2 Recommendations 2
  
  The analysis drops trials in which the distractor was chosen. These trials are informative about the presence (or not) of relative valuation or other factors because they make such choices more (or less) likely. Ignoring them is another example of the analysis being misspecified.
  
  We thank the reviewer for the suggestion and this is related to Major Issue 1 raised by the same reviewer. In brief, we adopted the same methods implemented by Cao and Tsetsos (Cao and Tsetsos, 2022) and that constrained us to applying a binomial model. Please refer to our reply to Major Issue 1 for more details.
  
  Reviewer #2 Recommendations 3
  
  Some questions and suggestions on statistics and computational modeling:
  
  Have the authors looked at potential collinearity between the regressors in each of the GLMs?
  
  We thank the reviewer for the comment. For each of the following GLMs, the average variance inflation factor (VIF) has been calculated as follows:
  
  GLM2 using the Expected Value model:
  
  Author response table 1.
  
  GLM2 after replacing the utility function based on the normative Expected Value model with values obtained by using the composite model:
  
  Author response table 2.
  
  GLM3:
  
  Author response table 3.
  
  As indicated in the average VIF values calculated, none of them exceed 4, suggesting that the estimated coefficients were not inflated due to collinearity between the regressor in each of the GLMs.
  
  Reviewer #2 Recommendations 4
  
  - Correlation results in Figure 4. What is the regression line displayed on this plot? I suspect the regression line came from Pearson's correlation, which would be inconsistent with the Spearman's correlation reported in the text. A reasonable way would be to transform both x and y axes to the ranked data. However, I wonder why it makes sense to use ranked data for testing the correlation in this case. Those are both scalar values. Also, did the authors assess the influence of the zero integration coefficient on the correlation result? Importantly, did the authors redo the correlation plot after defining the utility function by the composite models?
  
  We thank the reviewer for the suggestion. The plotted line in Figure 4 was based on the Pearson’s correlation and we have modified the text to also report the Pearson’s correlation result as well.
  
  If we were to exclude the 32 participants with integration coefficients smaller than 1×10-6 from the analysis, we still observe a significant positive Pearson’s correlation [r(110)=0.202, p=0.0330].
  
  Author response image 1.
  
  Figure 4 after excluding 32 participants with integration coefficients smaller than 1×10-6.
  
  “As such, we proceeded to explore how the distractor effect (i.e., the effect of (DV−HV)T obtained from GLM2; Figure 2c) was related to the integration coefficient (η) of the optimal model via a Pearson’s correlation (Figure 4). As expected, a significant positive correlation was observed [r(142)=0.282, p=0.000631]. We noticed that there were 32 participants with integration coefficients that were close to zero (below 1×10-6). The correlation remained significant even after removing these participants [r(110)=0.202, p=0.0330].” (Lines 207-212)
  
  The last question relates to results already included in Supplementary Figure 5, in which the analyses were conducted using the utility function of the composite model. We notice that although there was a difference in integration coefficient between the multiplicative and additive groups, a correlational analysis did not generate significant results [r(142)=0.124, p=0.138]. It is possible that the relationship became less linear after applying the composite model utility function. However, it is noticeable that in a series of complementary analyses (Figure 5: r(142)=0.282, p=0.000631; Supplementary Figure 3: r(142)=0.278, p=0.000746) comparable results were obtained.
  
  Reviewer #2 Recommendations 5
  
  - From lines 163-165, were the models tested on only the three-option trials or both two and three-opinion trials? It is ambiguous from the description here. It might be worth checking the model comparison based on different trial types, and the current model fitting results do not tell an absolute sense of the goodness of fit. I would suggest including the correctly predicted trial proportions in each trial type from different models.
  
  We thank the reviewer for the suggestion. We have only modeled the two-option trials and the key reason for this is because the two-option trials can arguably provide a better estimate of participants’ style of integrating attributes as they are independent of any distractor effects. This was also the same reason why Cao and Tsetsos applied the same approach when they were re-analyzing our data (Cao and Tsetsos, 2022). We have clarified the statement accordingly.
  
  “We fitted these models exclusively to the Two-Option Trial data and not the Distractor Trial data, such that the fitting (especially that of the integration coefficient) was independent of any distractor effects, and tested which model best describes participants’ choice behaviours.” (Lines 175-178)
  
  Reviewer #2 Recommendations 6
  
  - Along with displaying the marginal distributions of each parameter estimate, a correlation plot of these model parameters might be useful, given that some model parameters are multiplied in the value functions.
  
  We thank the reviewer for the suggestion. We have also generated the correlation plot of the model parameters. The Pearson’s correlation between the magnitude/probability weighting and integration coefficient was significant [r(142)=−0.259, p=0.00170]. The Pearson’s correlation between the inverse temperature and integration coefficient was not significant [r(142)=−0.0301, p=0.721]. The Pearson’s correlation between the inverse temperature and magnitude/probability weighting was not significant [r(142)=−0.0715, p=0.394].
  
  “Our finding that the average integration coefficient was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule. However, it also shows rather than being fully additive ( =0) or multiplicative ( =1), people’s choice behaviour is best described as a mixture of both. Supplementary Figure 1 shows the relationships between all the fitted parameters.” (Lines 189-193)
  
  Reviewer #2 Recommendations 7
  
  Have the authors tried any functional transformations on amounts or probabilities before applying the weighted sum? The two attributes are on entirely different scales and thus may not be directly summed together.
  
  We thank the reviewer for the comment. Amounts and probabilities were indeed both rescaled to the 0-1 interval before being summed, as explained in the methods (Line XXX). Additionally, we have now added and performed model fitting on an additional model with utility curvature based on the prospect theory (Kahneman & Tversky, 1979) and a weighted probability function (Prelec, 1998):
  
  where and represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively. is the weighted magnitude and is the weighted probability, while and are the corresponding distortion parameters. This prospect theory (PT) model was included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains as the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).
  
  “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)
  
  Reviewer #3 (Recommendations For The Authors):
  
  Reviewer #3 Recommendations 1
  
  - In the Introduction (around line 48), the authors make the case that distractor effects can co-exist in different parts of the decision space, citing Chau et al. (2020). However, if the distractor effect is calculated relative to the binary baseline this is no longer the case.
  
  - Relating to the above point, it might be useful for the authors to make a distinction between effects being non-monotonic across the decision space (within individuals) and effects varying across individuals due to different strategies adopted. These two scenarios are conceptually distinct.
  
  We thank the reviewer for the comment. Indeed, the ideas that distractor effects may vary across decision space and across different individuals are slightly different concepts. We have now revised the manuscript to clarify this:
  
  “However, as has been argued in other contexts, just because one type of distractor effect is present does not preclude another type from existing (Chau et al., 2020; Kohl et al., 2023). Each type of distractor effect can dominate depending on the dynamics between the distractor and the chooseable options. Moreover, the fact that people have diverse ways of making decisions is often overlooked. Therefore, not only may the type of distractor effect that predominates vary as a function of the relative position of the options in the decision space, but also as a function of each individual’s style of decision making.” (Lines 48-54)
  
  Reviewer #3 Recommendations 2
  
  - The idea of mixture models/strategies has strong backing from other Cognitive Science domains and will appeal to most readers. It would be very valuable if the authors could further discuss the potential level at which their composite model might operate. Are the additive and EV quantities computed and weighted (as per the integration coefficient) within a trial giving rise to a composite decision variable? Or does the integration coefficient reflect a probabilistic (perhaps competitive) selection of one strategy on a given trial? Perhaps extant neural data can shed light on this question.
  
  We thank the reviewer for the comment. The idea is related to whether the observed mixture in integration models derives from value being actually computed in a mixed way within each trial, or each trial involves a probabilistic selection between the additive and multiplicative strategies. We agree that this is an interesting question and to address it would require the use of some independent continuous measures to estimate the subjective values in quantitative terms (instead of using the categorical choice data). This could be done by collecting pupil size data or functional magnetic resonance imaging data, as the reviewer has pointed out. Although the empirical work is beyond the scope of the current behavioural study, it is worth bringing up this point in the Discussion:
  
  “The current finding involves the use of a composite model that arbitrates between the additive and multiplicative strategies. A general question for such composite models is whether people mix two strategies in a consistent manner on every trial or whether there is some form of probabilistic selection occurring between the two strategies on each trial such that only one strategy is used on any given trial while, on average, one strategy is more probable than the other. To test which is the case requires an independent estimation of subjective values in quantitative terms, such as by pupillometry or functional neuroimaging. Further understanding of this problem will also provide important insight into the precise way in which distractor effects operate at the single-trial level.” (Lines 275-282)
  
  Reviewer #3 Recommendations 3
  
  Line 80 "compare pairs of attributes separately, without integration". This additive rule (or the within-attribute comparison) implies integration, it is just not multiplicative integration.
  
  We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent.
  
  “For clarity, we stress that the same mathematical formula for additive value can be interpreted as meaning that 1) subjects first estimate the value of each option in an additive way (value integration) and then compare the options, or 2) subjects compare the two magnitudes and separately compare the two probabilities without integrating dimensions into overall values. On the other hand, the mathematical formula for multiplicative value is only compatible with the first interpretation. In this paper we focus on attribute combination styles (multiplicative vs additive) and do not make claims on the order of the operations. More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 92-100)
  
  Reviewer #3 Recommendations 4
  
  - Not clear why the header in line 122 is phrased as a question.
  
  We thank the reviewer for the suggestion. We have modified the header to the following:
  
  “The distractor effect was absent on average” (Line 129)
  
  Reviewer #3 Recommendations 5
  
  - The discussion and integration of key neural findings with the current thesis are outstanding. It might help the readers if certain statements such as "the distractor effect is mediated by the PPC" (line 229) were further unpacked.
  
  We thank the reviewer for the suggestion. We have made modifications to the original passage to further elaborate the statement.
  
  “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016).” (Lines 250-253)
  
  Reviewer #3 Recommendations 6
  
  - In Fig. 3c, there seem to be many participants having the integration coefficient close to 0 but the present violin plot doesn't seem to best reflect this highly skewed distribution. A histogram would be perhaps better here.
  
  We thank the reviewer for the suggestion. We have modified the descriptive plots to use histograms instead of violin plots.
  
  “Figures 3c, d and e show the fitted parameters of the composite model: , the integration coefficient determining the relative weighting of the additive and multiplicative value ( , ); , the magnitude/probability weighing ratio ( , ); and , the inverse temperature ( , ). Our finding that the average integration coefficient was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule.” (Lines 186-191)
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.20.554013v2
www.biorxiv.org www.biorxiv.org

Salmonella-induced SIRT1 and SIRT3 are crucial for maintaining the metabolic switch in bacteria and host for successful pathogenesis

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The current manuscript by Hajra et al deals with the role of the prominent Sirtuins SIRT1 and -3 during infection of macrophages with Salmonella Typhimurium (ST). Apparently, ST infection induces upregulation of host cell SRTs to aid its own metabolism during the intracellular lifestyle and to help reprogramming macrophage polarization. The manuscript has two parts, namely one part that deals with Salmonella infection in cells, where RAW 264.7 murine macrophage-like cells, sharing some features with primary macrophages, were employed. Infected RAW cells displayed a tendency to polarize towards wound-healing M2 and not inflammatory M1 macrophages, which was dependent on SRT. Consequently, the inflammatory response in RAW was more robust in the absence of SRT. Moreover, loss of SRTs leads to impaired bacterial proliferation in these cells, which was attributed to defects in metabolic adaption of the bacteria in the absence of SRT-activity and to the increased M1 inflammatory response.
  
  Unfortunately, the line of argumentation remains incomplete because corresponding assays in mice showed the opposite result as compared to the experiments using RAW 264.7 cells. i.e. loss of SRTs leads to increased bacterial load in animals (versus impaired proliferation in RAW 264.7 cells). The authors cannot explain this discrepancy.
  
  Strengths:
  
  Extensive analysis of Salmonella infection in RAW macrophage-like cells and mice in the context of SRT1/3 function.
  
  Weaknesses:
  
  Lack of connection between the cell-based and organismic data, which are not supportive of each other.
  
  We are highly grateful for your valuable and insightful comments. Thank you for appreciating the merit of our manuscript. We agree with the opposing phenotypes among the RAW264.7 cell line (Fig. 2A), primary peritoneal macrophages (ex vivo) (Fig.2B), and in vivo mouse model (Fig.8) findings. Both RAW264.7 macrophage and peritoneal macrophage infection show attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst. This is in sharp contrast to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination. The higher bacterial load in the organs including the spleen (Fig.8B) is attributed to increased pro-inflammatory cytokine burst and ROS production (Fig.8F-H, Fig.S9) triggering bacterial dissemination. The pro-inflammatory arsenals like IL-6, IL-1β and ROS that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs (Fig. 8I-L, Fig.S3F-G). This is in line with the following previous findings-
  
  Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).
  
  Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science signaling. 2016;9(410):ra4).
  
  In our revised manuscript, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations. Our results show that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, within the CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig. M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.
  
  Reviewer #2 (Public Review):
  
  Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study. There are a few comments below that would further strengthen the current study.
  
  Major comments:
  
  In the in vivo study (lines 436-446) - the authors noticed increased pathogen burden in the EX-527 or the 3TYP-treated mice cohorts but decreased pathogen burden within the F4/80+ macrophage population. What are the other cell types that have increased pathogen burden in splenocytes from EX-527 or the 3TYP treated? Can this be further explored and explained?
  
  While the authors indicated that IL-6 cytokine storm and elevated ROS production could result in bacterial dissemination in vivo, one could also argue that Sirt1/3 inhibitors might have an impact on gut function and/or gut microbiota (PMID: 22115311). Did Sirt1/3 inhibitors also lead to increased pathogen burdens in the gut? If so, the potential effect of these in vivo treatments on gut microbiota/colonization resistance should be discussed.
  
  Minor comment:
  
  Sirt1 has been shown to be degraded during Salmonella infection (PMID: 28192515), which is different from the current study. An explanation should be provided for this.
  
  We thank you for your encouraging and gracious comments. We deeply appreciate your time and efforts in providing constructive feedback for the betterment of our work. As per your precious suggestions, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations apart from F4/80+ macrophage populations. Our analysis suggests that the CD45+ splenic population show increased bacterial loads similar to the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator treated mice group (Fig. 8M-S). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.
  
  We immensely appreciate the reviewer for this insightful question about the effect of SIRT1/3 on the gut per se. To answer your question, we observed increased pathogen loads within the mesenteric lymph nodes of the gut in the SIRT1/3 inhibitor-treated mice groups (Fig.8B). In our revised manuscript, we evaluated gut inflammation via IL1-β estimation in the mice's ileal tissues and have observed heightened IL-1β production in the inhibitor-treated mice cohorts in comparison to the vehicle control (Fig. S3G). We have also examined gut epithelial pathology via Haematoxylin-Eosin (H&E) staining of the ileal sections to address the effect of in vivo treatment on gut microbiota and colonization resistance which is appended here. However, the gut microbiota crosstalk and their effect on colonization resistance is a part of another current study and it is being examined in detail there. Therefore, this appended H&E has not been incorporated in the revised manuscript.
  
  Author response image 1.
  
  In line with the reference PMID: 28192515, where Sirt1 has been shown to be degraded during Salmonella infection at later time points of infection, our study also has shown that both SIRT1 mRNA (Fig. 1A) and protein levels (Fig. S1A) show an elevated expression at 2h and 6h post-infection and show a downregulation at 16h in comparison to the 6h time point. However, SIRT3 expression levels remain elevated even at later time points of infection. Therefore, we speculate that there is a shared role between SIRT1 and SIRT3 that facilitates the phenotypes reported in our study.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this paper, Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knockdown experiments in RAW macrophage cell lines to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice, inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.
  
  Strengths:
  
  The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions has not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 results in a similar outcome.
  
  Weaknesses:
  
  The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidence is also lacking to prove the proposed mechanisms. For instance, they show correlative data that the knockdown of Sirt1-mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.
  
  We appreciate the reviewer’s critical analysis of our work. In the revised manuscript, we aimed to eliminate the low-quality data sets and have tried to substantiate them with better and conclusive ones, as directed in the recommendations for the author section. We agree with the reviewer that the inclusion of both Sirtuins 1 and 3 has resulted in too many pathways and mechanisms and focusing on one SIRT and its mechanism of metabolic reprogramming and immune modulation would have been a less complicated alternative approach. However, as rightly pointed out, our work demonstrated the shared and few overlapping roles of the two sirtuins, SIRT1 and SIRT3, together mediating the immune-metabolic switch upon Salmonella infection. As per the reviewer’s suggestion, we have performed additional experiments with HIF-1α inhibitor treatment in our revised manuscript to substantiate our correlative findings on SIRT1-mediated regulation of host glycolysis (Fig.7G).
  
  Reviewer #1 (Recommendations For The Authors):
  
  The authors state "SIRT1 and SIRT3 inhibition resulted in increased pathogen loads in organs and triggered enhanced bacterial dissemination, together leading to increased susceptibility of the mice to S. Typhimurium infection owing to increased ROS and IL-6 production." How can this be reconciled? To the reviewer, this is not a convincing explanation. The reviewer is not a mouse pathologist, so maybe did not understand the argument in full.
  
  However, in order to clarify whether these phenomena can be brought into context and explained by for instance cell-autonomous (in (RAW) macrophages) versus non-autonomous (in mice) mechanisms, it would be required to bring in context the organismic phenotype with a cellular phenotype, using more physiologic primary macrophages.
  
  (1) The authors show in Figure 8 that in general SRT inhibition leads to increased infection whereas SRT activation results in decreased infection. This is even true for e the spleen (e.g. Figure 8B), which should be full of macrophages upon infection.
  
  (2) Only Figure 8L implies that endogenous primary, splenic macrophages show a higher infection rate upon pharmacologic SRT activation, which would potentially mirror the RAW results. This is however not supportive of their own explanation: Who would now produce more ROS and IL6 if these macrophages are more supportive of intracellular ST? Is there a difference in the roles or SRTs between different types of macrophages and/or neutrophils? And between macrophages and somatic cells concerning ST infection? The reviewer tends to believe that RAW cells display a defective killing response (such as ROS production) as they are highly transformed cells. Therefore, the authors should use cultured peritoneal macrophages or BMDMs in addition to RAW264.7 cells.
  
  The literature cited by the authors also implies that the inflammatory response in mice is higher in the absence of SRTs. This is in line with a role for SRTs in (negatively) regulating M1 inflammatory polarization but probably not with increased bacterial burden in mice. If it was, then increased dissemination could be explained by increased tissue damage. However, the flow cytometry experiments from infected organs then do not confirm that, as the infection of individual cells is higher upon SRT inhibition. Thus there seems a broad gap between the role of SRTs in ST infection in RAW264.7 cells versus non-transformed cells.
  
  I would not discard the RAW results, as I am convinced that they contain valuable data. However, it needs to be clarified what aspect of the host response RAW 264.7 cells represent. Primary macrophages might likely be more aggressive towards the bacteria. Finally, the question arises: what is the role of the metabolic switch in the in vivo setting?
  
  The reviewer recommends repeating some key experiments by in-vitro-infecting BMDMs or isolated peritoneal macrophages (after some days of culturing) to bridge between the present RAW-derived data and the mouse data. How is the bacterial load with and without SRT inhibitor/activator in primary macrophages, when infected outside of the body? Can ex-vivo infection also affect polarization of e.g. peritoneal macrophages or the metabolic switch? If it is possible to find a conclusive explanation for their data, then this story might really add to our understanding of another aspect of how ST manipulates the host to survive.
  
  In case the reviewer understands the mouse experiments correctly, all assays on peritoneal cells were performed after in-vivo-infection and/or treatment.
  
  Together, RAW 264.7 murine macrophage-like cells might not be the right model to understand the phenotypes in full. As far as the reviewer knows, these cells are not capable of killing bacteria as effectively as activated primary macrophages or neutrophils.
  
  A few of the key findings of RAW264.7 macrophages have been replicated in primary peritoneal macrophages (Fig. 2B, S3E-F, S6B, S7B-D). We wanted to clarify that the peritoneal macrophage experiments were performed ex vivo, wherein peritoneal macrophages were isolated from mice were then subjected to SIRT1/3 inhibitor treatments and Salmonella infection and not after in vivo treatment or infection. In ex vivo setting, we have examined the effect of SIRTs on the metabolic switch during Salmonella infection (Fig. S7B-D) which resembled our RAW264.7 macrophage data. Additionally, in in vivo setting, we have analyzed the transcript level expression of host metabolic genes and corresponding bacterial metabolic genes in infected mice liver and spleen tissue under SIRT1/3 inhibitor treatment (Fig.S7E-F, Fig.6C-D). Our primary peritoneal macrophage data exactly mirrors the RAW264.7 macrophage findings showing attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst upon SIRT1/3 knockdown or inhibition (Fig.2A-B). This is opposite to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination (Fig.8A-H). The pro-inflammatory arsenals that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs owing to tissue damage (Fig.8E-L). This is in line with the following previous findings-
  
  Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).
  
  Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science Signaling. 2016;9(410):ra4).
  
  As per the reviewer’s suggestions, we have analyzed other populations apart from F4/80+ macrophages and have observed that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, the CD1c+ population, CD45+ granulocytes, or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig.8M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Abstract
  
  The authors state that perturbing Sirt1 and Sirt3 results in a shift in Salmonella's metabolism. On the contrary, the data reflects the metabolism in the host cell and not the bacteria. This statement is wrong. They only show increased expression of some of the glycolytic genes in Salmonella, which is not sufficient to make the claim that the switch to fatty acid oxidation in macrophages is due to utilisation of glucose by the bacteria.
  
  We value the reviewer’s response and have accordingly reframed our sentence in the abstract (Line 24-25).
  
  Fig 1: Expression of Sirt1 - The data needs to be supported with a western blot for Sirt1 and Sirt3 but the Western blots shown in the supplementary figure are of very poor quality and do not support the authors' claim.
  
  We have repeated the western blot and have supplemented the previous blot with an alternate blot in Fig. S1A as per your precious input.
  
  Why haven't the authors shown any representative blots for Sirt1 and Sirt3 upon infection with Salmonella mutants? They need to italicize the genes when they describe mRNA expression.
  
  Previously we had only performed transcript-level expression of Sirt1 and Sirt3 upon infection with Salmonella mutants and therefore representative blot image was absent. The gene names have been duly italicized while describing mRNA expression (Line 126-154). We regret the inconvenience caused. We have performed the western blotting to assess the protein expression profile upon infection with Salmonella mutants as per the reviewer’s suggestion and the representative blot image has been duly appended in the revised manuscript (Fig. S1B).
  
  What is the rationale for examining Sirt1 and Sirt3 mRNA in M1 and M2 macrophages? Salmonella infection on its own will polarise the macrophages towards M1. How long were these macrophages infected? The time points are missing.
  
  The rationale behind the examination of Sirt1 and Sirt3 mRNA in M1 and M2 polarized was to ascertain whether indeed M1 polarized macrophages exhibit decreased expression of Sirt1 or Sirt3 and polarization of macrophages toward M2 state show upregulation of Sirt1 and Sirt3 upon Salmonella infection. After confirming these above-mentioned findings through this preliminary experiment, we then hypothesized whether Salmonella infection on its own will polarise the macrophages toward an immunosuppressive M2 state at a later time course of infection as infection drives the induction of SIRT expression and whether this is mediated by Sirt1 and Sirt3 (Fig. 3). We are extremely apologetic for not mentioning the 16h time-point in the figure and the missing time point has been duly documented in the revised manuscript (Line 155).
  
  Fig S2 knockdown of Sirt1 and Sirt3 are not convincing.
  
  We are extremely sorry for the inconclusive knockdown blot. An alternative blot has been substantiated in the revised manuscript (Fig. S2,C-D).
  
  Fig 2A and 2B the time point post infection has not been mentioned. Although it is stated that 2h and 16h post-infection samples were analysed. Only one time point has been shown.
  
  We are sorry for the confusion. We wanted to clarify that Fig.2A and Fig. 2B show the fold proliferation where fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay.
  
  Fold Proliferation= [CFU at 16h]/[CFU at 2h]
  
  The cytokines data are intriguing in that the increase in IL-6 relative to control is seen only at 2h and 20h but not at 6h. Il-6 at 20h in untransfected cells is comparable to uninfected cells. Did the authors investigate cell death? Salmonella induces various forms of cell death which could account for the decreased cytokine production at later time points.
  
  We have investigated the cell death upon Salmonella infection via MTT assay. At later time points of infection, we indeed observed around 16 percent decrease in cell survival compared to the initial time point of 2h. The results have been appended here and it supports our eminent reviewer’s reasoning for the decreased cytokine production at later time points.
  
  Author response image 2.
  
  Additional cytokines such as IL-1b would be helpful. Also, not sure how uninfected macrophages produce nearly 200pg of IL-10.
  
  As per the author’s critical suggestion, we have assessed the IL-1b cytokine production at 16h post-infection in RAW264.7 macrophages and peritoneal macrophages and mice serum samples at 5th day post-infection (Fig.S3C, S3E-F). Our results indicate increased production of IL-b in the infected SIRT1/3 knockdown RAW264.7 macrophages, SIRT1/3 inhibitor-treated peritoneal macrophages and in mice serum samples under SIRT1/3 inhibitor treatment in comparison to the vehicle control. Additionally, we have quantified IL-1b in mice ileal tissues under SIRT1/3 inhibitor treatment (Fig.S3G) and have obtained heightened intestinal IL-1b production in the inhibitor-treated cohorts. We thank the reviewer for raising the concern for 200pg of IL-10 in the uninfected macrophages. We have repeated the experiment and have provided an alternative representative graph for the experiment wherein the IL-10 levels in the uninfected cohorts range between 20-40pg/ml (Fig. S3B).
  
  It is surprising that the authors have found increased Sirt1 binding to NFkB, however there is no change in acetylated NFkB upon infection (Fig 4B). Acetylated p65 is equally high in uninfected Scrambled siRNA, UI shSirt1, STM Scr, and STM shSirt1. Furthermore, increased binding of Sirt1 with NFkb would mean decreased acetylation hence decreased inflammation. However, Salmonella induces profound inflammation.
  
  We thank the reviewers for their insightful and critical questioning. We truly acknowledge that due to oversaturation there was no apparent change in the acetylated p65 among the different sample sets. Therefore, in the revised manuscript we have provided an image at lower exposure where the changes in the acetylation of the p65 subunit are apparent. Salmonella induces inflammation upon challenge similar to any other pathogens and induces acute inflammatory responses. This heightened acute inflammation at the initial phases of infection subsides at a later phase of infection. Here, we have performed the Sirt1 interaction with NFκB at 16hr post-infection where increased binding of Sirt1 with NFκB facilitates the resolution of the Salmonella-_induced acute inflammation. This is in line with previous reports that suggest SIRT1 suppresses acute inflammation through the promotion of p65 acetylation and inhibition of NFκB activity. (Yang H, Zhang W, Pan H, et al. SIRT1 activators suppress inflammatory responses through promotion of p65 deacetylation and inhibition of NF-κB activity. _PLoS One. 2012;7(9):e46364. doi:10.1371/journal.pone.0046364, Liu TF, Yoza BK, El Gazzar M, Vachharajani VT, McCall CE. NAD+-dependent SIRT1 deacetylase participates in epigenetic reprogramming during endotoxin tolerance. J Biol Chem. 2011;286(11):9856–64., Liu TF, Vachharajani V, Millet P, Bharadwaj MS, Molina AJ, McCall CE. Sequential actions of SIRT1-RELB-SIRT3 coordinate nuclear-mitochondrial communication during immunometabolic adaptation to acute inflammation and sepsis. J Biol Chem. 2015;290(1):396–408.)
  
  Please explain how the acetylated p65 was analysed.
  
  Total endogenous p65 subunit was immunoprecipitated using Anti-NFκB p65 antibody and the immunoprecipitated fraction was probed with Anti-Acetylated Lysine antibody to assess acetylated p65.
  
  An increase in ROS production is seen in a relatively small percentage of cells- not more than 4% of cells. How does this contribute to such a significant difference in intracellular bacterial burden? Also, it is not clear how the authors calculated the fold change in proliferation. It is better to show the actual bacterial burden logarithmically.
  
  We strongly agree with the reviewer’s concerns, and we have reanalyzed the flow cytometric data set. The revised data have been presented in Fig. S5 which shows a considerable increase in DCFDA positive population. For instance, the infected scrambled control shows around 2.44% of ROS-producing cells, however knockdown of SIRT1 and SIRT3 increases the ROS-producing cells to 27.34% and 28.64% respectively.
  
  Fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay. Fold proliferation has been calculated as opposed to absolute CFU values to nullify the differential phagocytosis of bacteria to the macrophages among the samples.
  
  Fold Proliferation= [CFU at 16h]/[CFU at 2h]
  
  An increase in metabolic genes is not sufficient to show that the macrophages are metabolically reprogrammed.
  
  We thank the reviewer for the valuable comment. We agree that an increase in metabolic gene profile is not sufficient to claim metabolic reprogramming. Therefore, in addition to the metabolic gene profile, we have estimated lactate production (end-product of glycolysis) as an indicator of glycolysis (Fig. 5 C-E) and have performed the fatty acid β oxidation activity (Fig. 5G-H) to support our claims.
  
  Figure 5F the band intensities do not visually match the bands shown for PFK. For instance, shSIRT1 STM (1.00) and shSIRT3 STM (0.81).
  
  We are extremely sorry for the erroneous band intensity for shSIRT3. Upon reanalysis of the band intensities, we have corrected the band intensity for shSIRT3 to 2.28 (Fig.5F).
  
  It is surprising that HADHA is not expressed in uninfected samples.
  
  We are extremely apologetic for the inappropriate representative blot. We feel that the discrepancy might have arisen due to the usage of old antibodies. We have provided an alternate blot for the HADHA gene where fresh antibody staining solution was used for probing which shows expression even in the uninfected samples (Fig.5F).
  
  Figure 6A - What is the significance of PFA fixed samples (PI) compared to SI samples? This has not been discussed.
  
  PFA-fixed samples are paraformaldehyde-treated bacterial samples that harbor the immune signals or Pattern Associated Molecular Patterns (PAMPs). The rationale for using PI in addition to SI samples was to show whether the phenomena is driven by live metabolically active pathogens or is mediated by PAMPs.
  
  I understand that the hypothesis is that during the later phase of infection, there is an increase in fatty acid oxidation which correlates with a decrease in inflammation. However, at 6h there is no increase in genes regulating fatty acid oxidation. Why did the authors choose 6h when the previous experiments have been done at 16h?
  
  We indeed agree with the reviewer’s understanding of our hypothesis that there is an increase in fatty acid oxidation along the progression of infection which correlates with a decrease in inflammation. The Salmonella intracellular replication has been reported to commence at 6h post-internalization when SPI-2 effector expression is fully established (Helaine S, Thompson JA, Watson KG, Liu M, Boyle C, Holden DW. Dynamics of intracellular bacterial replication at the single cell level. Proc Natl Acad Sci U S A. 2010;107(8):3746-3751. doi:10.1073/pnas.1000041107). Therefore, we have assessed the 6h timepoint post-infection in addition to the initial and later timepoints of 2h and 16h respectively. Additionally, the nanostring gene profiling data of both host and bacterial genes indicate the onset of both metabolic (Fig. 5A, 6A) and immune genes (Fig. 3A) modulation at 6h post-infection. We have validated these results via qPCR studies and have observed an upregulation in the transcript level of fatty acid oxidation genes as depicted in Fig. S7A in RAW264.7 macrophages.
  
  Line 355 it is mentioned that Sirt1 and Sirt3 abrogate metabolic shift by reducing glycolytic flux. This is incorrect as experiments such as carbon chase assays have not been performed to investigate glycolytic flux.
  
  As per the reviewer’s valuable suggestion, we have removed the word ‘flux’ from the above-mentioned statement(Line 351, Line 353).
  
  Lines 392-393: "We immunoprecipitated PDHA1 and checked for its interaction with SIRT3 or SIRT1 under knockdown condition of SIRT3 or upon SIRT3 inhibitor treatment (Fig.7 G-H)"
  
  What is the rationale for checking PDHA1 interaction with Sirt under Sirt knockdown conditions?
  
  We are thankful to the reviewer for the critical comments. The rationale for checking PDHA1 interaction with Sirt was to ascertain that indeed Sirt interacted with PDHA1 under S. Typhimurium infection and abrogation of either protein expression (knockdown) or their enzymatic activity (inhibitor treatment) diminished the interaction.
  
  Moreover, the blots are very confusing and do not represent the authors' claims.
  
  (1) In the input blot I do not see Sirt3 depletion in shSirt3 knockdown sample.
  
  The knockdown has been quantified in the input blot as per your suggestion. A knockdown of 40% has been obtained in the uninfected dataset whereas a knockdown of 47.1% has been obtained in the infected data set at 16h post-infection (Fig.7H).
  
  (2) Why does Sirt1 interact with PDHA1 similar to Sirt3. Do both the proteins bind to PDHA1 at the same time/ competitively? If so do they both deacetylate?
  
  In literature, Sirt3 has been shown to interact with PDHA1 and deacetylate PDHA1. However, the interaction of Sirt1 with PDHA1 has not been reported previously and therefore we are unable to comment on the exact dynamics of the interaction. Future studies need to be performed to explore these phenomena in depth. However, SIRT1 agonist SRT1720 has been shown to impact PDH phosphorylation and its activity (Han Y, Sun W, Ren D, Zhang J, He Z, Fedorova J, Sun X, Han F, Li J. SIRT1 agonism modulates cardiac NLRP3 inflammasome through pyruvate dehydrogenase during ischemia and reperfusion. Redox Biol. 2020 Jul;34:101538).
  
  (3) Figure 7I in the IP: IgG samples Sirt3 seem to bind to IgG non-specifically, which questions the specificity of Sirt3 binding to PDHA1.
  
  We appreciate the reviewer for pointing out this concern. The immunoprecipitation experiment has been repeated and the same has been appended in the revised manuscript and we observe no non-specific binding of Sirt3 antibody to IgG.
  
  (4) In Figure 7I all the bands Ac PDHA1, PDHA1, and Sirt3 look similar with double bands, which has not been seen in other blots. How is this possible?
  
  This cannot explain the increase in beta-oxidation observed.
  
  We thank the reviewer for raising this concern. We have repeated the experiment and provided the alternative blot as per the reviewer’s suggestion.
  
  The rationale for performing this experiment was to show that SIRT plays an important role in the activation of downstream TCA cycle pathways via PDHA1 deacetylation during Salmonella infection. The deacetylation of PDHA1 has been previously reported to cause transcriptional activation of the downstream TCA cycle and oxidative phosphorylation (Zhang Y, Wen P, Luo J, et al., Cell Death Dis.,2021). Additionally, PDHA1 hyperacetylation has been reported to cause lactate overproduction (An, S., Yao, Y., Hu, H. et al. PDHA1 hyperacetylation-mediated lactate overproduction promotes sepsis-induced acute kidney injury via Fis1 lactylation. Cell Death Dis 14, 457 (2023)). In our study, increased lactate production and PDHA1 hyperacetylation have been observed during SIRT3 inhibition conditions upon Salmonella infection.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.11.21.517246v4
www.biorxiv.org www.biorxiv.org

Distinct neural bases of subcomponents of the attentional blink

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this study, the authors used a multi-alternative decision task and a multidimensional signaldetection model to gain further insight into the cause of perceptual impairments during the attentional blink. The model-based analyses of behavioural and EEG data show that such perceptual failures can be unpacked into distinct deficits in visual detection and discrimination, with visual detection being linked to the amplitude of late ERP components (N2P and P3) and discrimination being linked to the coherence of fronto-parietal brain activity.
  
  Strengths:
  
  The main strength of this paper lies in the fact that it presents a novel perspective on the cause of perceptual failures during the attentional blink. The multidimensional signal detection modelling approach is explained clearly, and the results of the study show that this approach offers a powerful method to unpack behavioural and EEG data into distinct processes of detection and discrimination.
  
  Thank you.
  
  Weaknesses:
  
  (1.1) While the model-based analyses are compelling, the paper also features some analyses that seem misguided, or, at least, insufficiently motivated and explained. Specifically, in the introduction, the authors raise the suggestion that the attentional blink could be due to a reduction in sensitivity or a response bias. The suggestion that a response bias could play a role seems misguided, as any response bias would be expected to be constant across lags, while the attentional blink effect is only observed at short lags. Thus, it is difficult to understand why the authors would think that a response bias could explain the attentional blink.
  
  In the revision, we seek to better motivate the bias component. A deficit in T2 identification accuracy could arise from either sensitivity or criterion effects at short lags. For example, in short T1-T2 lag trials participants may adopt a more conservative choice criterion for reporting the presence of T2 thereby yielding lower accuracies for short lags. Criterion effects need not be uniform across lags: A participant could infer the T1-T2 lag on each trial based on various factors, such as trial length, and systematically adjust their choice criterion across lags, prior to making a response.
  
  Below, we present a simple schematic for how a conservative choice criterion impacts accuracy. Consider a conventional attentional blink paradigm where the task is to detect and report T2's presence. For simplicity, we assume that prior probabilities for T2’s occurrence are equal, such that the number of “T2 present” and “T2 absent” trials are equal.
  
  We model this task with a one-dimensional signal detection theory (SDT) model (left panel). Here, ψ represents the decision variable and the red and gray Gaussians represent the conditional density of ψ for the T2 present (“signal”) and T2 absent (“noise”) conditions, respectively. We increase the criterion from its optimal value (here, midpoint of signal and noise means), to reflect increasingly conservative choices. As the criterion increases and deviates further from its optimal value – here, reflecting a conservative bias – accuracy drops systematically (right panel).
  
  Author response image 1.
  
  We have revised the Introduction as follows:
  
  “Distinguishing between sensitivity and criterion effects is crucial because a change in either of these parameters can produce a change in the proportion of correct responses[41,42]. A lower proportion of correct T2 detections may reflect not only a lower detection d’ at short lags but also a sub-optimal choice criterion corresponding, for instance, to a conservative detection bias (Fig. 1, right, top). Importantly, such criterion effects need not be uniform across intertarget lags: the lag on each trial could be inferred based on various factors, such as trial length, allowing participants to adopt different choice criteria for the different lags prior to making a response.”
  
  (1.2) A second point of concern regards the way in which the measures for detection and discrimination accuracy were computed. If I understand the paper correctly, a correct detection was defined as either correctly identifying T2 (i.e., reporting CW or CCW if T2 was CW or CCW, respectively, see Figure 2B), or correctly reporting T2's absence (a correct rejection).
  
  Here, it seems that one should also count a misidentification (i.e., incorrect choice of CW or CCW when T2 was present) as a correct detection, because participants apparently did detect T2, but failed to judge/remember its orientation properly in case of a misidentification. Conversely, the manner in which discrimination performance is computed also raises questions. Here, the authors appear to compute accuracy as the average proportion of T2present trials on which participants selected the correct response option for T2, thus including trials in which participants missed T2 entirely. Thus, a failure to detect T2 is now counted as a failure to discriminate T2. Wouldn't a more proper measure of discrimination accuracy be to compute the proportion of correct discriminations for trials in which participants detected T2?
  
  Indeed, detection and discrimination accuracies were computed with precisely the same procedure, and under the same conditions, as described by the Reviewer. We regret our poor description. For clarity, we have revised the following line in the Results section; we have also updated the Methods (section on Behavioral data analysis: Measuring attentional blink effects on psychometric quantities).
  
  “Detection accuracies were calculated based on the proportion of trials in which T2 was correctly detected (Methods). Briefly, we computed the average proportion of hits, misidentifications, and correct rejections; misidentifications were included because, although incorrectly identified, the target was nevertheless correctly detected. In contrast, discrimination accuracies were derived from T2 present trials, based on the proportion of correct identifications alone (Methods).”
  
  (1.3) My last point of critique is that the paper offers little if any guidance on how the inferred distinction between detection and discrimination can be linked to existing theories of the attentional blink. The discussion mostly focuses on comparisons to previous EEG studies, but it would be interesting to know how the authors connect their findings to extant, mechanistic accounts of the attentional blink. A key question here is whether the finding of dissociable processes of detection and discrimination would also hold with more meaningful stimuli in an identification task (e.g., the canonical AB task of identifying two letters shown amongst digits).
  
  There is evidence to suggest that meaningful stimuli are categorized just as quickly as they are detected (Grill-Spector & Kanwisher, 2005; Grill-Spector K, Kanwisher N. Visual recognition: as soon as you know it is there, you know what it is. Psychol Sci. 2005 Feb;16(2):152-60. doi: 10.1111/j.0956-7976.2005.00796.x. PMID: 15686582.). Does that mean that the observed distinction between detection and discrimination would only apply to tasks in which the targets consist of otherwise meaningless visual elements, such as lines of different orientations?
  
  Our results are consistent with previous literature suggested by the reviewer. Specifically, we model detection and discrimination not as sequential processes, but as concurrent computations (Figs. 3A-B). Yet, our results suggest that these processes possess distinct neural bases. We have further revised the Discussion in context of this literature in the revised manuscript.
  
  “…Interestingly, we found no evidence indicating that these two computations (detection and discrimination) were sequential; in fact, the modulation of beta coherence occurred almost immediately after T2 onset, and lasted well afterwards (>400 ms from T2 onset) (Fig. 5A-B) suggesting that an analysis of T2’s features proceeded in parallel with its detection and consolidation. We also modeled detection and discrimination as concurrent computations in our SDT model (Fig. 3A-B). Previous work suggests that while object detection and categorization processes proceed in parallel, detection and identification processes occur sequentially[77]. Our results are in line with this literature, if we consider T2’s discrimination judgement – clockwise versus counterclockwise of vertical – to be a categorization, rather than an identification judgement. Moreover, this earlier study[75] observed significant trial-wise correlations between detection and categorization responses, suggesting that the two processes involve the operation of the same perceptual filters (“analyzers”). Our study, on the other hand, reports distinct neural bases for detection and discrimination computations. Yet, the two sets of findings are not mutually contradictory.
  
  In many conventional attentional blink tasks[3,20,25], complex visual stimuli, like letters, must be detected among a stream of background distractors with closely similar features, such as digits. In this case, target detection would require the operation of shape-selective perceptual filters for feature analysis. These same shape-selective filters would be involved also for discriminating between distinct, but related target stimuli (e.g., two designated candidate letters). In our task, target gratings needed to be distinguished in a stream of plainly distinct background distractors (plaids), whereas the discrimination judgement involved analysis of grating orientation. As a result, our task design likely precludes the need for the same perceptual filters in the detection and the discrimination judgements. Absent this common feature analysis, our results suggest distinct electrophysiological correlates for the detection and discrimination of targets.”
  
  Reviewer #2 Public review):
  
  Summary:
  
  The authors had two aims: First, to decompose the attentional blink (AB) deficit into the two components of signal detection theory; sensitivity and bias. Second, the authors aimed to assess the two subcomponents of sensitivity; detection and discrimination. They observed that the AB is only expressed in sensitivity. Furthermore, detection and discrimination were doubly dissociated. Detection modulated N2p and P3 ERP amplitude, but not frontoparietal beta-band coherence, whereas this pattern was reversed for discrimination.
  
  Strengths:
  
  The experiment is elegantly designed, and the data - both behavioral and electrophysiological - are aptly analyzed. The outcomes, in particular the dissociation between detection and discrimination blinks, are consistently and clearly supported by the results. The discussion of the results is also appropriately balanced.
  
  Thank you.
  
  Weaknesses:
  
  (2.1) The lack of an effect of stimulus contrast does not seem very surprising from what we know of the nature of AB already. Low-level perceptual factors are not thought to cause AB. This is fine, as there are also other, novel findings reported, but perhaps the authors could bolster the importance of these (null) findings by referring to AB-specific papers, if there are indeed any, that would have predicted different outcomes in this regard.
  
  While there is consensus that the low-level perceptual factors are not affected by the attentional blink, other studies have suggested evidence to the contrary (e.g., Chua et al, Percept. Psychophys., 2005)[1]. We have mentioned the significance of our findings in the context of such conflicting evidence in literature, in the revised Discussion.
  
  “Surprisingly, we found no significant effect of contrast on either type of deficit (Figs. 2A-B). In other words, high (100%) contrast T2 stimuli were also strongly susceptible to the detection and discrimination bottlenecks associated with the attentional blink. Thus, despite a clear contrast-dependent encoding of T2 in early sensory cortex, the attentional blink produced a significant deficit with downstream processing, even for targets of high contrast. While at odds with some earlier work, which suggest an early-stage perceptual bottleneck [82–84], these results are largely consistent with findings from the majority of previous studies [3,7,9,11,19,20,82,85,86] which suggest a late-stage bottleneck.”
  
  (2.2) On an analytical note, the ERP analysis could be finetuned a little more. The task design does not allow measurement of the N2pc or N400 components, which are also relevant to the AB, but the N1 component could additionally be analyzed. In doing so, I would furthermore recommend selecting more lateral electrode sites for both the N1, as well as the P1. Both P1 and N1 are likely not maximal near the midline, where the authors currently focused their P1 analysis.
  
  We performed these suggested analysis. Whereas in the original submission we had used the O1, O2 and Oz electrodes, we now estimate the P1 and N1 with the more lateral P7 and P8 electrodes[2], as suggested by the reviewer.
  
  Even with these more lateral electrodes, we did not observe a significant N1 component in a 90-160 ms window[3] in the long lag trials (p=0.207, signed rank test for amplitude less than zero); a one-tailed Bayes factor (BF=1.35) revealed no clear evidence for or against an N1 component. Analysis of the P1 component with these more lateral electrodes also yielded no statistically significant blink-induced modulation (P1(short lag-long lag) = 0.25 ± 0.16, uV, p=0.231, BF=0.651) (SI Figure S3, revised).
  
  These updated analyses are now reported in the revised Results (lines 317-319) and Methods (lines 854-855). In addition, we have revised SI Table S2 with the new P1 component analysis.
  
  (2.3) Impact & Context:
  
  The results of this study will likely influence how we think about selective attention in the context of the AB phenomenon. However, I think its impact could be further improved by extending its theoretical framing. In particular, there has been some recent work on the nature of the AB deficit, showing that it can be discrete (all-or-none) and gradual (Sy et al., 2021; Karabay et al., 2022, both in JEP: General). These different faces of target awareness in the AB may be linked directly to the detection and discrimination subcomponents that are analyzed in the present paper. I would encourage the authors to discuss this potential link and comment on the bearing of the present work on these behavioural findings.
  
  Thank you. We have now discussed our findings in the context of these recent studies in the revised manuscript.
  
  “…In line with this hypothesis, we discovered that the attentional blink induced dissociable detection and discrimination deficits. There was no statistically significant correlation between these two types of deficits within and across participants and evidence for such a correlation was weak, at best. Unlike previous target identification designs that conflated attentional blink’s effect on detection versus discrimination performance[3,4,9,25,37], our 3-AFC task, and associated signal detection model enabled quantifying each of these deficits separately and identifying a double dissociation between their respective neural correlates. Our dissociation of the attentional blink into distinct subcomponents is complementary to recent studies, which examined whether the attentional blink reflects an all-or-none phenomenon[73,74]. For example, the T2 deficit induced by the attentional blink can be either all-or-none or graded, depending on whether T1 and T2 judgements involve distinct or common features, respectively[73]. While a graded change in precision could reflect sensitivity effects, an all-or-none change in guess rates – without a concomitant change in precision – may reflect a criterion increase (conservative detection bias) effect. Future experiments that incorporate a three-alternative response, with concurrent detection and discrimination, along with key task elements of these earlier studies, may further help resolve these findings.”
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In the present study, the authors aimed to achieve a better understanding of the mechanisms underlying the attentional blink, that is, a deficit in processing the second of two target stimuli when they appear in rapid succession. Specifically, they used a concurrent detection and identification task in- and outside of the attentional blink and decoupled effects of perceptual sensitivity and response bias using a novel signal detection model. They conclude that the attentional blink selectively impairs perceptual sensitivity but not response bias, and link established EEG markers of the attentional blink to deficits in stimulus detection (N2p, P3) and discrimination (fronto-parietal high-beta coherence), respectively. Taken together, their study suggests distinct mechanisms mediating detection and discrimination deficits in the attentional blink.
  
  Strengths:
  
  Major strengths of the present study include its innovative approach to investigating the mechanisms underlying the attentional blink, an elegant, carefully calibrated experimental paradigm, a novel signal detection model, and multifaceted data analyses using state-of-the art model comparisons and robust statistical tests. The study appears to have been carefully conducted and the overall conclusions seem warranted given the results. In my opinion, the manuscript is a valuable contribution to the current literature on the attentional blink. Moreover, the novel paradigm and signal detection model are likely to stimulate future research.
  
  Thank you.
  
  Weaknesses:
  
  Weaknesses of the present manuscript mainly concern the negligence of some relevant literature, unclear hypotheses, potentially data-driven analyses, relatively low statistical power, potential flaws in the EEG methods, and the absence of a discussion of limitations. In the following, I will list some major and minor concerns in detail.
  
  (3.1) Hypotheses: I appreciate the multifaceted, in-depth analysis of the given dataset including its high amount of different statistical tests. However, neither the Introduction nor the Methods contain specific statistical hypotheses. Moreover, many of the tests (e.g., correlations) rely on selected results of previous tests. It is unclear how many of the tests were planned a priori, how many more were performed, and how exactly corrections for multiple tests were implemented. Thus, I find it difficult to assess the robustness of the results.
  
  We hypothesized that neural computations associated with target detection would be characterized by regional (local) neuronal markers (e.g., parietal or occipital ERPs), whereas computations linked to feature discrimination would involve neural coordination across multiple brain regions (e.g. fronto-parietal coherence) (lines 135-138). We planned and conducted our statistical tests based on this hypothesis. All multiple comparison corrections (Bonferroni-Holm correction, see Methods) were performed separately for each class of analyses.
  
  Based on this overarching hypothesis, the following tests were planned and conducted.
  
  ERP analysis: Based on an extensive review of recent literature[4] (Zivony et al., 2022 we performed the following tests: i) We tested whether four ERP component amplitudes (parietal P1, fronto-central P2, occipito-parietal N2p, and parietal P3) were significantly different between short and long lags with a Wilcoxon signed rank test followed by Bonferroni-Holm multiple comparison correction; ii) We correlated the ERPs whose amplitudes showed a significant difference in analysis (i) with detection and discrimination d’ deficits (six correlations) using robust (bend) correlations[5]; again, this was followed by a Bonferroni-Holm multiple comparison correction. Note that there is no circularity with planning analysis (ii) based on the results of analysis (i) because the latter is agnostic to detection versus discrimination blink deficits. In case (i), where no a priori hypothesis about directionality were available, all p-values were based on two-tailed tests but for case (ii), where we had an a priori directional hypothesis, p-values were computed from one-tailed tests. This has now been clarified in the revised Methods lines 937-940 and 950-952.
  
  Coherence analysis: Based on a seminal study of long-range synchrony modulation by the attentional blink[6], we examined fronto-parietal coherence in the beta (13-30 Hz) band, separately for the left and right hemispheres, and performed the following comparisons. i) We computed differences between the fronto-parietal coherogram (time-frequency representation of coherence, Fig. 5A-D) between short-lag and long-lag conditions, and performed a twodimensional cluster-based permutation test[7]; this method inherently corrects for multiple comparisons across time-frequency windows. ii) Because the analysis in (i) revealed the clearest evidence for coherence differences in the canonical high-beta (20-30 Hz band) in the left fronto-parietal electrodes (Figs. 5C-D; 0-300 ms following target onset), we correlated power in this band with detection and discrimination d’ deficits; this was followed by a Bonferroni-Holm multiple comparison correction. As before there is no circularity with planning analysis (ii) based on the results of analysis (i) because the latter is agnostic to detection versus discrimination blink deficits. Again, in case (i), where no a priori hypothesis about directionality was made, all p-values were based on two-tailed tests but for case (ii), where we had an a priori directional hypothesis, p-values were computed from one-tailed tests.
  
  For completeness, we performed all of the other correlations, for example, correlations with coherence in the low-beta band or with the right fronto-parietal electrodes (SI Table 3). These latter analyses were not planned, nor did they yield significant results.
  
  Neural distance analysis: This was a novel analysis designed to test the hypothesis that detection and discrimination deficits would be correlated with neural distances along distinct dimensions. i) First, we compared neural distances across lag conditions at different timepoints following target onset with a one-dimensional cluster-based permutation test[7] ; ii) Next, we correlated the neural distances along the detection and discrimination dimension with the detection and discrimination d’ deficits (Fig. 6E-F, 6G-H), as well as with the ERP and coherence markers (Fig. 7A-B, 7C-D). For each of these analyses, we employed robust (bend) correlations[5] followed by a Bonferroni-Holm multiple comparison correction. As before, pvalues were computed using two-tailed tests for case (i) and one-tailed tests for case (ii), based on the absence or presence of an a priori directional hypothesis.
  
  (3.2) Power: Some important null findings may result from the rather small sample sizes of N = 24 for behavioral and N = 18 for ERP analyses. For example, the correlation between detection and discrimination d' deficits across participants (r=0.39, p=0.059) (p. 12, l. 263) and the attentional blink effect on the P1 component (p=0.050, no test statistic) (p. 14, 301) could each have been significant with one more participant. In my opinion, such results should not be interpreted as evidence for the absence of effects.
  
  We have modified these claims in the revised Results. In addition, we now compute and report Bayes factors, which enable evaluating evidence for the presence versus absence of effects.
  
  “Detection and discrimination d’ deficits were not statistically significantly correlated (r=0.39, t=2.28, p=0.059); Bayes factor analysis revealed no clear evidence for or against a correlation between these subcomponent deficits (BF=1.18) (SI Fig. S2, left).”
  
  “Discrimination accuracy deficits were not statistically significantly different between high and low detection accuracy deficit blocks (z=1.97, p=0.067), and the Bayes factor revealed no strong evidence for or against such a difference (BF=1.42) (Fig. 3G).”
  
  In addition, the results are interpreted as follows (lines 294-296):
  
  “Moreover, detection and discrimination d’ deficits were not significantly correlated both within and across participants, with no clear evidence for or against a correlation, based on the Bayes factor.”
  
  The null result on the P1 has changed because of the analysis with the alternative electrode set suggested by Reviewer #2 (see comment #2.2). We now report these results as follows:
  
  “By contrast, the P1, an early sensory component, showed no statistically significant blinkinduced modulation (P1= 0.25 ± 0.16µV, z = 1.19, p=0.231, BF = 0.651) (SI Fig. S3).”
  
  (3.3) Neural basis of the attentional blink: The introduction (e.g., p. 4, l. 56-76) and discussion (e.g., p. 19, 427-447) do not incorporate the insights from the highly relevant recent review by Zivony & Lamy (2022), which is only cited once (p. 19, l. 428). Moreover, the sections do not mention some relevant ERP studies of the attentional blink (e.g., Batterink et al., 2012; Craston et al., 2009; Dell'Acqua et al., 2015; Dellert et al., 2022; Eiserbeck et al., 2022; Meijs et al., 2018).
  
  We have now cited these previous studies at the appropriate places in the revised Introduction.
  
  “The effect of the attentional blink on the processing of the second target is well studied. In particular, previous studies have investigated the stage at which attentional blink affects T2’s processing (early or late) [14–17] and the neural basis of this effect, including the specific brain regions involved[15,18–20]. Several theoretical frameworks characterize a sequence of phases of the attentional blink, including target selection based on relevance, detection, feature processing, and encoding into working memory[9,21]. Overall, there is little support for attentional blink deficits at an early, sensory encoding[14] stage; by contrast, the vast majority of literature suggests that T2’s processing is affected at a late stage[8,10]. Consistent with these behavioral results, scalp electroencephalography (EEG) studies have reported partial or complete suppression of late event-related potential (ERP) components, particularly those linked to attentional engagement (P2, N2, N2pc or VAN)[15,22–25], working memory (P3) [20,26–30] or semantic processing (N400)[31]; early sensory components (P1/N1) are virtually unaffected[20,24] (reviewed in detail in Zivony and Lamy, 2022[32]) .”
  
  (3.4) Detection versus discrimination: Concerning the neural basis of detection versus discrimination (e.g., p. 6, l. 98-110; p. 18, l. 399-412), relevant existing literature (e.g., Broadbent & Broadbent, 1987; Hillis & Brainard, 2007; Koivisto et al., 2017; Straube & Fahle, 2011; Wiens et al., 2023) is not included.
  
  Thank you for these suggestions. We have now cited these studies in the revised Discussion.
  
  “It is increasingly clear that detection and discrimination are separable processes, each mediated by distinct neural mechanisms. Behaviorally, accurately identifying the first target, versus merely detecting it, produces stronger deficits with identifying the second target[59]. Moreover, dissociable mechanisms have been reported to mediate object detection and discrimination in visual adaptation contexts[60]. Neurally, shape detection and identification judgements produce activations in non-overlapping clusters in various brain regions in the visual cortex, inferior parietal cortex, and the medial frontal lobe[61]. Similarly, occipital ERPs associated with conscious awareness also show clear differences between detection and discrimination. For instance, an early posterior negative component (200-300 ms) was significantly modulated in amplitude by success in detection, but not in identification[62]. The closely related visual awareness negativity (VAN) was substantially stronger at the detection, compared to the discrimination, threshold[63].
  
  Furthermore, a significant body of previous work has reported dissociable behavioural and neural mechanisms underlying attention’s effects on target detection versus discrimination. Behavioral studies have reported distinct effects on target detection versus discrimination in both endogenous[64] and exogenous[65] attention tasks.”
  
  (3.5) Pooling of lags and lags 1 sparing: I wonder why the authors chose to include 5 different lags when they later pooled early (100, 300 ms) and late (700, 900 ms) lags, and whether this pooling is justified. This is important because T2 at lag 1 (100 ms) is typically "spared" (high accuracy) while T2 at lag 3 (300 ms) shows the maximum AB (for reviews, see, e.g., Dux & Marois, 2009; Martens & Wyble, 2010). Interestingly, this sparing was not observed here (p. 43, Figure 2). Nevertheless, considering the literature and the research questions at hand, it is questionable whether lag 1 and 3 should be pooled.
  
  Lag-1 sparing is not always observed in attentional blink studies; there are notable exceptions to reports of lag-1 sparing[8,9]. Our statistical tests revealed no significant difference in accuracies between short lag (100 and 300 ms) trials or between long lag (700 and 900 ms) trials but did reveal significant differences between the short and long lag trials (ANOVA, followed by post-hoc tests). To simplify the presentation of the findings, we pooled together the short lag (100 and 300 ms) and, separately, the long lag (700 and 900 ms) trials. We have presented these analyses, and clarified the motivation for pooling these lags in the revised Methods.
  
  “Based on these psychometric measures, we computed detection and discrimination accuracies as follows. Detection accuracies were computed as the average proportion of the hits, misidentification and correct rejection responses; misidentifications were included because not missing the target reflected accurate detection. By contrast, discrimination accuracies were computed based on the average proportion of the two correct identifications (hits) on T2 present trials alone. We performed 2-way ANOVAs on both detection and discrimination accuracies with the inter-target lag (5 values) and T2 contrast independent factors. We found main effects of both lag (F(4,92)=18.81, p<0.001) and contrast (F(1,92)=21.78, p<0.001) on detection accuracy, but no interaction effect between lag and contrast (F(4,92)=1.92, p=0.113). Similarly, we found main effects of both lag (F(4,92)=25.08, p<0.001) and contrast (F(1,92)=16.58, p<0.001) on discrimination accuracy, but no interaction effect between lag and contrast (F(4,92)=0.93, p=0.450). Post-hoc tests based on Tukey’s HSD revealed a significant difference in discrimination accuracies between the two shortest lags (100 ms and 300 ms) and the two longest lags (700 and 900 ms) for both low and high contrast targets, and for both detection and discrimination accuracies (p<0.01). But they revealed no significant difference between the two shortest lags (p>0.25) or the two longest lags (p>0.40) for either target contrast or for either accuracy type. As a result, for subsequent analyses, we pooled together the “short lag” (100 ms and 300 ms) and the “long lag” (700 ms and 900 ms) trials. We quantified the effect of the attentional blink on each of the psychometric measures as well as detection and discrimination accuracies by comparing their respective, average values between the short lag and long lag trials, separately for the high and low T2 contrasts.”
  
  (3.6) Discrimination in the attentional blink. Concerning the claims that previous attentional blink studies conflated detection and discrimination (p. 6, l. 111-114; p. 18, l. 416), there is a recent ERP study (Dellert et al., 2022) in which participants did not perform a discrimination task for the T2 stimuli. Moreover, since the relevance of all stimuli except T1 was uncertain in this study, irrelevant distractors could not be filtered out (cf. p. 19, l. 437). Under these conditions, the attentional blink was still associated with reduced negativities in the N2 range (cf. p. 19, l. 427-437) but not with a reduced P3 (cf. p. 19, l 439-447).
  
  We have addressed the relationship between our findings and those of Dellert et al (2022)[10] in the revised Discussion.
  
  “… In the present study, we observed that the parietal P3 amplitude was correlated selectively with detection, rather than discrimination deficits. This suggests that the P3 deficit indexes a specific bottleneck with encoding and consolidating T2 into working memory, rather than an inability to reliably maintain its features. In this regard, a recent study[22] measured ERP correlates of the perceptual awareness of the T2 stimulus whose relevance was uncertain at the time of its presentation. In contrast to earlier work, this study observed no change in P3b amplitude across seen (detected) and unseen targets. Taken together with this study, our findings suggest that rather than indexing visual awareness, the P3 may index detection, but only when information about the second target, or a decision about its appearance, needs to be maintained in working memory. Additional experiments, involving targets of uncertain relevance, along with our behavioral analysis framework, may help further evaluate this hypothesis.”
  
  (3.7) General EEG methods: While most of the description of the EEG preprocessing and analysis (p. 31/32) is appropriate, it also lacks some important information (see, e.g., Keil et al., 2014). For example, it does not include the length of the segments, the type and proportion of artifacts rejected, the number of trials used for averaging in each condition, specific hypotheses, and the test statistics (in addition to p-values).
  
  We regret the lack of details. We have included these in the revised Methods, and expanded on the description of the trial rejection (SCADS) algorithm.
  
  The revised Methods section on EEG Preprocessing mentions the type and proportion of artifacts rejected:
  
  “We then epoched the data into trials and applied SCADS (Statistical Control of Artifacts in Dense Array EEG/MEG Studies[90]) to identify bad epochs and artifact contaminated channels. SCADS detects artifacts based on three measures: maximum amplitude over time, standard deviation over time, and first derivative (gradient) over time. Any electrode or trial exhibiting values outside the specified boundaries for these measures was excluded. The boundaries were defined as M ± n*λ, where M is the grand median across electrodes and trials for each of the three measures, and λ is the root mean square (RMS) of the deviation of medians across sensors relative to the grand median. We set n to 3, allowing data within three boundaries to be retained. The percentage of electrodes per participant rejected was 6.3 ± 0.43% (mean ± s.e.m. across participants), whereas the percentage of trials rejected per electrode and participant was 3.4 ± 0.33% (mean ± s.e.m.).”
  
  The revised Methods section on ERP analysis mentions the number of trials for averaging in each condition and the length of the segments:
  
  “First trials were sorted based on inter-target lags (100, 300, 500, 700 and 900 ms). This yielded an average of (200±13, 171±9.71, 145 ± 7.54, 117 ± 5.43, 87 ± 4.51 ) (mean ± s.e.m. across participants) trials for each of the 5 lags, respectively.”
  
  “Then, EEG traces were epoched from -300 ms before to +700 ms after either T1 onset or T2 onset and averaged across trials to estimate T1-evoked and T2-evoked ERPs, respectively.”
  
  Specific hypotheses are mentioned in response #3.1; we also now mention the test statistic associated with each test at the appropriate places in the Results. For example:
  
  “Among these ERP components, the N2p component and the P2 component were both significantly suppressed during the blink (∆amplitude, short-lag – long-lag: N2p=-0.47 ± 0.12 µV, z=-3.20, p=0.003, BF=40, P2=-0.19 ± 0.07 µV, z=-2.54, p=0.021, BF=4.83, signed rank test) (Fig. 4A, right). Similarly, the parietal P3 also showed a significant blink-induced suppression (P3= -0.45 ± 0.09µV, z=-3.59, p < 0.001, BF>10<sup>2</sup>) (Fig. 4B, right).”
  
  “Neural inter-class distances (||η||) along both the detection and discrimination dimensions decreased significantly during the blink (short lag-long lag: ∆||ηdet|| = -1.30 ± 0.70, z=-3.68, p=0.006, BF=20; ∆||ηdis|| = -1.23 ± 0.42, z=-3.54, p<0.001, BF>10<sup>2</sup>) (Figs. 6C-D).”
  
  (3.8) EEG filters: P. 31, l. 728: "The data were (...) bandpass filtered between 0.5 to 18 Hz (...). Next, a bandstop filter from 9-11 Hz was applied to remove the 10 Hz oscillations evoked by the RSVP presentation." These filter settings do not follow common recommendations and could potentially induce filter distortions (e.g., Luck, 2014; Zhang et al., 2024). For example, the 0.5 high-pass filter could distort the slow P3 wave. Mostly, I am concerned about the bandstop filter. Since the authors commendably corrected for RSVP-evoked responses by subtracting T2-absent from T2-present ERPs (p. 31, l. 746), I wonder why the additional filter was necessary, and whether it might have removed relevant peaks in the ERPs of interest.
  
  Thank you for this suggestion. Originally, the 9-11 Hz bandstop filter was added to remove the strong 10 Hz evoked oscillation from the EEG response for obtaining a cleaner signal for the other analyses, like the analysis of neural dimensions (Fig. 6)
  
  We performed two control ERP analyses to address the reviewers’ concern:
  
  (1) We removed the bandstop filter and re-evaluated the P1, P2, N2pc and P3 ERP amplitudes. We observed no statistically significant difference in the modulation of any of the 4 ERP components (P1: p=0.031, BF=0.692, P2: p=0.038, BF=1.21, N2pc: p=0.286, BF=0.269, P3: p=0.085, BF=0.277). In particular, Bayes Factor analysis revealed substantial evidence against a difference in the N2pc and P3 amplitudes before versus after the bandstop filter removal (BF<0.3).
  
  (2) We removed the bandstop filter and repeated all of the same analyses as reported in the Results and summarized in SI Table S2. We observed a virtually identical pattern of results, summarized in an analogous table, below (compare with SI Table S2, revised, in the Supplementary Information).
  
  Author response table 2.
  
  We have now mentioned this control analysis briefly in the Methods (lines 863-865).
  
  (3.9) Coherence analysis: P. 33, l. 786: "For subsequent, partial correlation analyses of coherence with behavioral metrics and neural distances (...), we focused on a 300 ms time period (0-300 ms following T2 onset) and high-beta frequency band (20-30 Hz) identified by the cluster-based permutation test (Fig. 5A-C)." I wonder whether there were any a priori criteria for the definition and selection of such successive analyses. Given the many factors (frequency bands, hemispheres) in the analyses and the particular shape of the cluster (p. 49, Fig 5C), this focus seems largely data-driven. It remains unclear how many such tests were performed and whether the results (e.g., the resulting weak correlation of r = 0.22 in one frequency band and one hemisphere in one part of a complexly shaped cluster; p. 15, l. 327) can be considered robust.
  
  Please see responses to comments #3.1 and #3.2 (above). In addition to reporting further details regarding statistical tests, their hypotheses, and multiple comparisons corrections, we computed Bayes factors to quantify the strength of the evidence for correlations, as appropriate. Interpretations have been rephrased depending on whether the evidence for the null or alternative hypothesis is strong or equivocal. For example:
  
  “Bayes factor analysis revealed no clear evidence for or against a correlation between these subcomponent deficits (BF=1.18) (SI Fig. S2, left).”
  
  “Discrimination accuracy deficits were not statistically significantly different between high and low detection accuracy deficit blocks (z=1.97, p=0.067), and the Bayes factor revealed no strong evidence for or against such a difference (BF=1.42) (Fig. 3G).”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1.a) Line 76-79: "Despite this extensive literature, previous studies have essentially treated the attentional blink as a unitary, monolithic phenomenon. As a result, fundamental questions regarding the component mechanisms of the attentional blink remain unanswered." This statement seems antithetical to the fact that theories of the AB suggest a variety of different mechanisms as possible causes of the effect.
  
  The statement has been revised as follows:
  
  “Despite this extensive literature, many previous studies have[ studied the attentional blink as a unitary phenomenon. While some theoretical models9,21,32] and experimental studies[38,39] have explored distinct mechanisms underlying the attentional blink, several fundamental questions about its distinct component mechanisms remain unanswered.”
  
  (1.b) Line 95-97: Here, the authors should explain in more detail how a response bias could fluctuate across lags.
  
  Addressed in response to public reviews, #1.1.
  
  (1.c) Line 98: I found this second question a much more compelling motivation for the study than the earlier stated question of whether the AB reflects a reduction in sensitivity or a fluctuation (?) of response bias.
  
  Thank you.
  
  (1.d) Line 143: What do the authors mean by "geometric" distribution of lags? In virtually all AB studies, the distribution of lags is uniform. Wasn't that the case in this study?
  
  We employed a geometric distribution for the trials of different lags, and verified that the sampled distribution of lags was well fit by this distribution (χ<sup>2</sup>(3, 312)=0.22, p=0.974). We chose a geometric distribution – with a flat hazard function[11] – over the uniform distribution to avoid conflating the effects of temporal expectation with those of the attention blink on criterion[12] at different lags.
  
  (1.e) Line 158-160: Explain why incorrect discrimination responses were not counted as correct detection. Explain why failure to detect T2 was counted as a discrimination error.
  
  Addressed in response to public reviews, #1.2.
  
  (1.f) Line 167: The results do not show lag-1 sparing, which is a typical property of the AB.
  
  The authors should report this, and explain why their paradigm did not show a sparing effect.
  
  Addressed in response to public reviews, #3.5.
  
  (1.g) Line 262-263: With only 24 participants, the study appears to be underpowered to reliably detect correlations. This should be noted as a limitation.
  
  Addressed in response to public reviews, #3.2.
  
  (1.h) Line 399-412: This section could be moved to the introduction to explain and motivate the aim of examining the distinct contributions of detection and discrimination to the AB.
  
  We have revised the Introduction to better motivate the aims of the study.
  
  Reviewer #2 (Recommendations for the authors):
  
  (2.a) A small note about the writing: as a matter of style, I would advise editing the generic phrasing (e.g., "shedding new light", "complex interplay") in abstract and general discussion.
  
  These are now revised as follows (for example):
  
  Line 26 - “These findings provide detailed insights into the subcomponents of the attentional blink….”
  
  Line 596 - “More broadly, these findings contribute to our understanding of the relationship between attention and perception….”
  
  (2.b) Some references appear double and/or without volume or page numbers (e.g., 44/61).
  
  Thank you. Amended now.
  
  Reviewer #3 (Recommendations for the authors):
  
  (3.a) Suggestions for additional analyses:
  
  I appreciate that the authors have quantified the evidence for null effects in simple comparisons using Bayes factors. In my opinion, the study would additionally benefit from Bayesian ANOVAs, which can also easily be implemented in JASP (Keysers et al., 2020), which the authors have already used for the other tests. As a result, they could further substantiate some of their claims related to null effects (e.g., p. 9, l. 175; p. 12, l. 246).
  
  Thank you. We have added Bayes factor values for ANOVAs (implemented in JASP[13]) wherever applicable in the revised manuscript. For example:
  
  “While we found a main effect of both lag (detection: F(1,23)=29.8, p<0.001, BF >10<sup>3</sup> discrimination: F(1,23)=54.1, p<0.001, BF >10<sup>3</sup>) and contrast (detection: F(1,23)=21.02, p<0.001, BF>10<sup>2</sup>, discrimination: F(1,23) =13.75, p=0.001, BF=1.22), we found no significant interaction effect between lag and contrast (detection: F(1,23)=1.92, p=0.113, BF=0.49, discrimination: F(1,23) = 0.93, p=0.450, BF=0.4).”
  
  “A two-way ANOVA with inter-target lag and T2 contrast as independent factors revealed a main effect of lag on both d’<sub>det</sub> (F(1,23)=30.3, p<0.001, BF>10<sup>3</sup>) and d’<sub>dis</sub> (F(1,23)=100.3, p<0.001, BF>10<sup>3</sup>). Yet, we found no significant interaction effect between lag and contrast for d’<sub>det</sub> (F(1,23)=2.3, p=0.141, BF=0.44).”
  
  Minor points
  
  (3.b) Statistics: Many p-values are reported without the respective test statistics (e.g., p. 9, l. 164; p. 12, l. 241-244 and 252-258; p. 13, l. 271, etc.).
  
  Addressed in response to public reviews, #3.7.
  
  (3.c) P. 4, l. 58: It is not entirely clear how the authors define "early or late". For example, while they consider the P2/N2/N2pc complex as "late" (l. 62-64), these ERP components are considered "early" in the debate on "early vs. late" neural correlates of consciousness (for a review, see Förster et al., 2020).
  
  We appreciate the debate. Our naming convention follows these seminal works[3,14–16].
  
  (3.d) P. 5., l. 77: "previous studies have essentially treated the attentional blinks as a unitary, monolithic phenomenon": There are previous studies in which both the presence and identity of T2 were queried (e.g., Eiserbeck et al., 2022; Harris et al., 2013).
  
  Addressed in response to recommendations for authors, #1.a.
  
  (3.e) P. 9, l. 169-177: The detection and discrimination accuracies are analyzed using twoway ANOVAs with the factors lags and contrast. I wonder why the lag effects are additionally analyzed using Wilcoxon signed rank tests using data pooled across the T2 contrasts (p., 9, l. 161-168)? If I understand it correctly, these tests should correspond to the main effects of lag in the ANOVAs. Indeed, both analyses lead to the same conclusions (l. 167 and l. 176).
  
  Our motivation was to first establish the attentional blink effect, with data pooled across contrasts. The subsequent ANOVA allowed delving deeper into contrast and interaction effects. Indeed, the results were consistent across both tests.
  
  (3.f) P. 12, l. 242: I wonder why the T2 contrasts are pooled in the statistical tests (but plotted separately, p. 45, Figure 3C).
  
  Model selection analysis distinct d’<sub>det</sub> parameter values across contrasts, as reflected in Fig. 3C. As mentioned in response #3.e contrasts effects were analyzed with an ANOVA.
  
  (3.g) P. 13, l. 287: "high and low contrast T2 trials were pooled to estimate reliable ERPs". The amount of trials per condition is not provided.
  
  Addressed in response to public reviews, #3.7.
  
  (3.h) P. 45, Figure 3D/F: In my opinion, plotting the contrasts and lags separately (despite the results of the model selection) would have provided a better idea of the data.
  
  We appreciate the reviewer’s suggestion, but followed the results of model selection for consistency.
  
  (3.i) P. 21, l. 470: "the left index finger to report clockwise orientations and the right index finger to report counter-clockwise orientations": This left/right mapping seems counterintuitive to me, and the authors also used the opposite mapping in Figures 1 and 2. It is not described in the Methods (p. 25) and thus is unclear.
  
  We regret the typo. Revised as follows:
  
  “...the left index finger to report counter-clockwise orientations and the right index finger to report clockwise orientations.”
  
  (3.j) P. 22, l. 514: "Taken together, these results suggest the following, testable schema (SI Figure S5)." Figure S5 seems to be missing.
  
  Amended. This is Fig. 8 in the revised manuscript.
  
  (3.k) P. 25, l. 559: I do not understand why the circular placeholders around the stimuli were included, and they are not mentioned in Figure 2A (p. 43). When I saw the figure and read the inscription, I wondered whether they were actually part of the stimulus presentation or symbolized something else.
  
  The placeholder was described in the earlier Methods section. We have now also mentioned it in caption for Fig. 2A.
  
  “All plaids were encircled by a circular placeholder. The fixation dot and the placeholder were present on the screen throughout the trial.”
  
  This avoided spatial uncertainty with estimating stimulus dimensions during the presentation.
  
  (3.l) P. 32, l. 754: The interval of interest for the P1 from 40 to 140 ms seems unusually early to me. The component usually peaks at 100 ms (e.g., at 96 ms in the cited study by Sergent et al., 2005), which also seems to be the case in the present study (Fig. S3, p. 57). I wonder how they were defined.
  
  For our analyses, we employed the peak value of the P1 ERP component in a window from 40-140 ms. The peak occurred around 100 ms (SI Fig. S3), which aligns with the literature.
  
  Additional minor comments:
  
  These comments have been all addressed, and typos corrected, by revising the manuscript at the appropriate places.
  
  3.m.1. L. 14: In my opinion, this sentence is difficult to read due to the nested combination of singular and plural forms. Importantly, as the authors also acknowledge (e.g., l. 83), perceptual sensitivity and choice bias could both be compromised, so I would suggest using plural and adding "or both" as a third option for clarity. See also p. 10, l. 204.
  
  3.m.2. L. 14: The comma before "As a result" should be replaced by a period.
  
  3.m.3. L. 45 "to guide Behavior" should be lowercase.
  
  3.m.4. L. 67: "Activity in the parietal, lateral prefrontal cortex and anterior cingulate cortex" could be read as if there was a "parietal, prefrontal cortex", so I would suggest removing the first "cortex".
  
  Revised/amended.
  
  3.m.5. L. 77: "fundamental questions regarding the component mechanisms of the attentional blink remain unanswered": The term "component mechanisms" is a bit unclear to me.
  
  We elaborate on this term in the very next set of paragraphs in the Introduction.
  
  3.m.6. L. 88: "a lower proportion of correct T2 detections can arise from a lower detection d'". "Arise from" sounds a bit off given that d' is a function of hits and false alarms.
  
  3.m.7. L. 95: I would suggest citing the updated edition of the classic "Detection Theory: A User's Guide" by Hautus, Macmillan & Creelman (2021).
  
  3.m.8. L. 102: "a oriented grating" should be "an".
  
  3.m.9. L. 126: "key neural markers - a local neural marker (event-related potentials) potentials" should be rephrased/corrected.
  
  3.m.10. L. 129: There are inconsistent tenses (mostly past tense but "we synthesize").
  
  3.m.11. L. 138: Perhaps the abbreviations (e.g., dva, cpd) should be introduced here (first mention) rather than in the Methods below.
  
  3.m.12. L. 148: "at the end of each trial participants first, indicated": The comma position should be changed.
  
  3.m.13. L. 176 "attentional blink-induced both a ...": The hyphen should be removed.
  
  3.m.14. L. 396: I think "but neither of them affects" would be better here.
  
  3.m.15. L. 383: "Detection deficits were signaled by ERP components such as the occipitoparietal N2p and the parietal P3": In my opinion, "such as" is too vague here.
  
  Revised/amended.
  
  3.m.16. L. 403: "Neurally, improved detection of attended targets is accompanied by (...) higher ERP amplitudes". Given the different mechanisms underlying the ERP, this section would benefit from more details.
  
  Addressed in response to public reviews, #3.4.
  
  3.m.17.    L. 924: References 18 and 46 seem to be the same.
  
  3.m.18.    L. 1181: I think d'det should be d'dis here.
  
  3.m.19.    L. 1284: "détection" should be "detection".
  
  3.m.20.    I found some Figure legends a bit confusing. For example, 5E refers to 4E, but 4E refers to 4C.
  
  3.m.21.    In Figures 4A/B and 6C/D, some conditions are hidden due to the overlap of CIs. Could they be made more transparent?
  
  Revised/amended.
  
  References:
  
  (1) Fook K.Chua. The effect of target contrast on the attentional blink. Percept Psychophys 5, 770–788 (2005).
  
  (2) Chmielewski, W. X., Mückschel, M., Dippel, G. & Beste, C. Concurrent information affects response inhibition processes via the modulation of theta oscillations in cognitive control networks. Brain Struct Funct 221, 3949–3961 (2016).
  
  (3) Sergent, C., Baillet, S. & Dehaene, S. Timing of the brain events underlying access to consciousness during the attentional blink. Nat Neurosci 8, 1391–400 (2005).
  
  (4) Zivony, A. & Lamy, D. What processes are disrupted during the attentional blink? An integrative review of event-related potential research. Psychon Bull Rev 29, 394–414 (2022).
  
  (5) Pernet, C. R., Wilcox, R. & Rousselet, G. A. Robust Correlation Analyses: False Positive and Power Validation Using a New Open Source Matlab Toolbox. Front Psychol 3, (2013).
  
  (6) Gross, J. et al. Modulation of long-range neural synchrony reflects temporal limitations of visual attention in humans. Proceedings of the National Academy of Sciences 101, 13050–13055 (2004).
  
  (7) Eric Maris and Robert Oostenveld. Nonparametric statistical testing of EEG and MEG data. J Neurosci Methods 164, 177–190 (2007).
  
  (8) Hommel, B. & Akyürek, E. G. Lag-1 sparing in the attentional blink: Benefits and costs of integrating two events into a single episode. The Quarterly Journal of Experimental Psychology Section A 58, 1415–1433 (2005).
  
  (9) Livesey, E. J. & Harris, I. M. Target sparing effects in the attentional blink depend on type of stimulus. Atten Percept Psychophys 73, 2104–2123 (2011).
  
  (10) Dellert, T. et al. Neural correlates of consciousness in an attentional blink paradigm with uncertain target relevance. Neuroimage 264, 119679 (2022).
  
  (11) Nobre, A., Correa, A. & Coull, J. The hazards of time. Curr Opin Neurobiol 17, 465– 470 (2007).
  
  (12) Bang, J. W. & Rahnev, D. Stimulus expectation alters decision criterion but not sensory signal in perceptual decision making. Sci Rep 7, 17072 (2017).
  
  (13) JASP Team. JASP (version 0.19.0.) [Computer Software]. Preprint at (2022).
  
  (14) Luck, S. J. Electrophysiological Correlates of the Focusing of Attention within Complex Visual Scenes: N2pc and Related ERP Components. (Oxford University Press, 2011). doi:10.1093/oxfordhb/9780195374148.013.0161.
  
  (15) Brydges, C. R., Fox, A. M., Reid, C. L. & Anderson, M. Predictive validity of the N2 and P3 ERP components to executive functioning in children: a latent-variable analysis. Front Hum Neurosci 8, (2014).
  
  (16) Michalewski, H. J., Prasher, D. K. & Starr, A. Latency variability and temporal interrelationships of the auditory event-related potentials (N1, P2, N2, and P3) in normal subjects. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section 65, 59–71 (1986).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.04.583330v2
www.biorxiv.org www.biorxiv.org

New submission 26/06/2023, 18:33:56

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The expression and localization of Foxc2 strongly suggest that its role is mainly confined to As undifferentiated spermatogonia (uSPGs). Lineage tracing demonstrated that all germ cells were derived from the FOXC2+ uSPGs. Specific ablation of the FOXC2+ uSPGs led to the depletion of all uSPG populations. Full spermatogenesis can be achieved through the transplantation of Foxc2+ uSPGs. Male germ cell-specific ablation of Foxc2 caused Sertoli-only testes in mice. CUT&Tag sequencing revealed that FOXC2 regulates the factors that inhibit the mitotic cell cycle, consistent with its potential role in maintaining a quiescent state in As spermatogonia. These data made the authors conclude that the FOXC2+ uSPG may be the true SSCs, essential for maintaining spermatogenesis. The conclusion is largely supported by the data presented, but two concerns should be addressed: 1) terminology used is confusing: primitive SSCs, primitive uSPGs, transit amplifying SSCs... 2) the GFP+ cells used for germ cell transplantation should be better controlled using THY1+ cells.
  
  Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:
  
  1> Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript. In general, ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript.
  
  2> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).
  
  Reviewer #2 (Public Review):
  
  The authors found FOXC2 is mainly expressed in As of mouse undifferentiated spermatogonia (uSPG). About 60% of As uSPG were FOXC2+ MKI67-, indicating that FOXC2 uSPG were quiescent. Similar spermatogonia (ZBTB16+ FOXC2+ MKI67-) were also found in human testis.
  
  The lineage tracing experiment using Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice demonstrated that all germ cells were derived from the FOXC2+ uSPG. Furthermore, specific ablation of the FOXC2+ uSPGs using Foxc2iCreERT2/+;Rosa26LSL-DTA/+ mice resulted in the depletion of all uSPG population. In the regenerative condition created by busulfan injection, all FOXC2+ uSPG survived and began to proliferate at around 30 days after busulfan injection. The survived FOXC2+ uSPGs generated all germ cells eventually. To examine the role of FOXC2 in the adult testis, spermatogenesis of Foxc2f/-;Ddx4Cre/+ mice was analyzed. From a 2-month-old, the degenerative seminiferous tubules were increased and became Sertoli cell-only seminiferous tubules, indicating FOXC2 is required to maintain normal spermatogenesis in adult testes. To get insight into the role of FOXC2 in the uSPG, CUT&Tag sequencing was performed in sorted FOXC2+ uSPG from Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice 3 days after TAM diet feeding. The results showed some unique biological processes, including negative regulation of the mitotic cell cycle, were enriched, suggesting the FOXC2 maintains a quiescent state in spermatogonia.
  
  Lineage tracing experiments using transgenic mice of the TAM-inducing system was well-designed and demonstrated interesting results. Based on all data presented, the authors concluded that the FOXC2+ uSPG are primitive SSCs, an indispensable subpopulation to maintain adult spermatogenesis.
  
  The conclusion of the mouse study is mostly supported by the data presented, but to accept some of the authors' claims needs additional information and explanation. Several terminologies define cell populations used in the paper may mislead readers.
  
  1) "primitive spermatogonial stem cell (SSC)" is confusing. SSCs are considered the most immature subpopulation of uSPG. Thus, primitive uSPGs are likely SSCs. The naming, primitive SSCs, and transit-amplifying SSCs (Figure 7K) are weird. In general, the transit-amplifying cell is progenitor, not stem cell. In human and even mouse, there are several models for the classification of uSPG and SSCs, such as reserved stem cells and active stem cells. The area is highly controversial. The authors' definition of stem cells and progenitor cells should be clarified rigorously and should compare to existing models.
  
  Thanks for your good comments. Considering that our results showed that FOXC2+ SSCs are in a quiescent state and that Mechanistically FOXC2 maintained the quiescent state of SSCs by promoting the expression of negative regulators of cell cycle, we have replaced ‘primitive SSCs’ with ‘quiescent SSCs’ in the revised manuscript. We agree with the reviewer that ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript. Further，from our point of view, the FOXC2+Ki67+ SSCs could be regarded as active stem cells, and the FOXC2+Ki67- SSCs could be regarded as reserved stem cells, although further research evidence is still needed to confirm this.
  
  2) scRNA seq data analysis and an image of FOXC2+ ZBTB16+ MKI67- cells by fluorescent immunohistochemistry are not sufficient to conclude that they are human primitive SSCs as described in the Abstract. The identity of human SSCs is controversial. Although Adark spermatogonia are a candidate population of human SSCs, the molecular profile of the Adark spermatogonia seems to be heterogeneous. None of the molecular profiles was defined by a specific cell cycle phase. Thus, more rigorous analysis is required to demonstrate the identity of FOXC2+ ZBTB16+ MKI67- cells and Adark spermatogonia.
  
  We agree with the reviewer that the identity of human SSCs remain elusive even though Adark population demonstrates certain characteristics of SSCs. To acknowledge this notion, we have revised our conclusion as such that only suggests FOXC2+ZBTB16+MKI67- represents a quiescent state of human SSCs.
  
  3) FACS-sorted GFP+ cells and MACS-THY1 cells were used for functional transplantation assay to evaluate SSC activity. In general, the purity of MACS is significantly lower than that of FACS. Therefore, FACS-sorted THY1 cells must be used for the comparative analysis. As uSPGs in adult testes express THY1, the percentage of GFP+ cells in THY1+ cells determined by flow cytometry is important information to support the transplantation data.
  
  Thanks for your good comments. According to your suggestions, we have addressed your concerns as follows:
  
  1> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).
  
  2> We performed FACS analysis to determine the proportion of GFP+ cells in FACS-sorted THY1+ cells from Rosa26LSL-T/G/LSL-T/G or Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice at day 3 post TAM induction, and the result showed that GFP+ cells account for approximately 20.9±0.21% of THY1+ cells, See Author response image 1.
  
  Author response image 1.
  
  4) The lineage tracing experiments of FOXC2+-SSCs in Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G showed ~95% of spermatogenic cells and 100% progeny were derived from the FOXC2+ (GFP+) spermatogonia (Figure 2I, J) at month 4 post-TAM induction, although FOXC2+ uSPG were quiescent and a very small subpopulation (~ 60% of As, ~0.03% in all cells). This means that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did not contribute to spermatogenesis at all eventually. This is a striking result. There is a possibility that FOXC2CRE expresses more widely in the uSPG population although immunohistochemistry could not detect them.
  
  Thanks for your good comments. From our lineage tracing results, over 95% of the spermatogenic cells are derived from the FOXC2+ SSCs in the testes of 4-month-old mice, which means that FOXC2+ SSCs maintain a long-term stable spermatogenesis. In addition, previous studies have shown that only a portion of As spermatogonia belong to SSCs with complete self-renewal ability (PMID: 28087628, PMID: 25133429), which is consistent with our findings. Therefore, we speculate that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did contribute to spermatogenesis but cannot maintain a long-term spermatogenesis due to limited self-renewal ability.
  
  5) The CUT&Tag_FOXC2 analysis on the FACS-sorted FOXC2+ showed functional enrichment in biological processes such as DNA repair and mitotic cell cycle regulation (Figure 7D). The cells sorted were induced Cre recombinase expression by TAM diet and cut the tdTomato cassette out. DNA repair process and negative regulation of the mitotic cell cycle could be induced by the Cre/lox recombination process. The cells analyzed were not FOXC2+ uSPG in a normal physiological state.
  
  We do appreciate the reviewer’s concern on the possibility of the functions enriched in the analysis as referred might be derived from Cre/lox recombination. However, we think it is unlikely that the Cre/lox recombination process, supposed to be rather local and specific, can trigger such a systemic and robust response by the DNA damage and cell cycle regulatory pathways. The reasons are as follows: First, as far as we are aware, there has been sufficient data to support this suggested scenario. Second, we did not observe any alteration in either the SSC behaviors or spermatogenesis in general upon the TAM-induced genomic changes, suggesting the impact from the Cre/lox recombination on DNA damage or cell cycle was not significant. Third, no factors associated with the DNA repair process were revealed in the differential analysis of single-cell transcriptomes of FOXC2-WT and FOXC2-KO.
  
  6) Wei et al (Stem Cells Dev 27, 624-636) have published that FOXC2 is expressed predominately in As and Apr spermatogonia and requires self-renewal of mouse SSCs; however, the authors did not mention this study in Introduction, but referred shortly this at the end of Discussion. Their finding should be referred to and evaluated in advance in the Introduction.
  
  Thanks for your good comments. According to your suggestion, we have revised the introduction to refer this latest parallel work on FOXC2. We are happy to see that our discoveries are converged to the important role of FOXC2 in regulating SSCs in adult mammals.
  
  Reviewer #3 (Public Review):
  
  By popular single-cell RNA-seq, the authors identified FOXC2 as an undifferentiated spermatogonia-specific expressed gene. The FOXC2+-SSCs can sufficiently initiate and sustain spermatogenesis, the ablation of this subgroup results in the depletion of the uSPG pool. The authors provide further evidence to show that this gene is essential for SSCs maintenance by negatively regulating the cell cycle in adult mice, thus well-established FOXC2 as a key regulator of SSCs quiescent state.
  
  The experiments are well-designed and conducted, the overall conclusions are convincing. This work will be of interest to stem cell and reproductive biologists.
  
  Thanks for the positive feedback.
  
  Reviewer #1 (Recommendations for the Authors):
  
  The authors should address the following concerns:
  
  1) The most primitive uSPGs should be the true SSCs. The term "primitive SSCs" is very confusing.
  
  2) In addition to FACS-sorted GFP+ cells, FACS-sorted THY1+ cells should also be used for transplantation.
  
  Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:
  
  1) Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript.
  
  2) The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).
  
  Reviewer #3 (Recommendations for the Authors):
  
  The experiments are well-designed and conducted, the overall conclusions are convincing. The only concerns are the writing, especially the introduction which was not well-rationalized. Sounds the three subtypes and three models for SSCs' self-renew are irrelevant to the major points of this manuscript. I don't think you need to talk too much about the markers of SSCs. Instead, I suggest you provide more background about the quiescent or activation states of the SSCs. In addition to that, as a nuclear-localized protein, it cannot be used to flow cytometric sorting, I don't think it should be emphasized as a marker. You identified a key transcription factor for maintaining the quiescent state of the primitive SSCs, that's quite important!
  
  Appreciate the positive feedback and constructive suggestions on the writing. We have substantially revised our manuscript to include the relevant advances and understanding from the field as well as highlight the importance of FOXC2 in regulating the quiescent state of SSCs.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.12.20.521179v2
www.biorxiv.org www.biorxiv.org

A Ctnnb1 enhancer transcriptionally regulates Wnt signaling dosage to balance homeostasis and tumorigenesis of intestinal epithelia

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  (1) One issue that needs to be considered is the nomenclature of the enhancer. The authors have presented data to show this enhancer controls the expression of Ctnnb1 in the stomach, intestine, and colon tissues. However, the name proposed by the authors, ieCtnnb1 (intestinal enhancer of Ctnnb1), doesn't represent its functions. It might be more appropriate to call it a different name, such as gieCtnnb1 (gastrointestinal enhancer of Ctnnb1).
  
  We thank the reviewer for the insightful suggestion and agree that wholemount reporter assays indicated ieCtnnb1 and ieCTNNB1 indeed display activity in the stomach. However, in current study, we focused on the cellular distribution and the function in intestinal epithelia. After careful consideration, we reasoned that the current designation, ieCtnnb1, would be more appropriately represent its expression pattern and functions based on provided evidence. We hope the reviewer could understand our reasoning.
  
  (2) The writing of this manuscript can be improved in a few places.
  
  a) The definitions or full names for the abbreviations of some terms, e.g., Ctnnb1, ieCtnnb1, in both abstract and main text, are needed when they first appear. Specifically, Line 108 should be moved to Lines 26 and 95. Lines 125126 are redundant. ieCtnnb1 in Line 130 needs to be defined.
  
  We appreciate the suggestion. In the revision, we have included the definition of Ctnnb1 and the full name of ieCtnnb1 when they first appear in the abstract and the main text. Lines 125-126 were deleted in the revision.
  
  b) Line 192-194, the description of the result needs to be rewritten to reflect
  
  the higher expression of LacZ transcript in eGFP+ cells.
  
  We would like to emphasize that the key point of this part is that the enhancer activity of ieCtnnb1 is present in both Lgr5-eGFP+ and Lgr5-eGFP- cells. This was validated by single-cell sequencing, which revealed the presence of LacZ transcripts in the Paneth cells. Moreover, we could not confidently conclude that eGFP+ cells have higher expression levels of LacZ, as these measurements were obtained from separate, semi-quantitative RTqPCR experiments.
  
  c) More details are needed for how the data using human tumor samples were generated and how they were analyzed.
  
  We thank the suggestion. In the revision, we have provided additional details regarding the data and subsequent analyses of human CRC samples as follows: “We previously conducted paired analyses of chromatin immunoprecipitation sequencing (ChIP-seq) for H3K27ac and H3K4me3, alongside RNA-seq on 68 CRC samples and their adjacent normal (native) tissue (Li et al., 2021). In the current study, we performed analyses for the enrichment of H3K27ac and H3K4me3 at ieCTNNB1 and CTNNB1 promoter regions, as well as the expression levels of CTNNB1, followed by combined analyses (Figure. 5A, Figure 5 - figure supplement 1).”
  
  d) The genomic structures from multiple species are presented at the bottom of Figure 1a. However, the description and explanation are lacking in both the main text and the figure legend.
  
  We apologize for not presenting clearly. We have added related description in the legend of Figure 1A as “The sequence conservation of the indicated species is shown at the bottom as vertical lines”. We also added an explanation in lines 162-163 of the main text: “Notably, unlike neCtnnb1, the primary sequence of ieCtnnb1 is not conserved among vertebrates (Figure 1A, bottom)”.
  
  Reviewer #2:
  
  (1) One of the main issues emerging during reading concerns the interpretation of the consequence of deleting the ieCtnnb1 enhancer. The authors write on line 235 that the deletion of ieCtnnb1 "undermined" Wnt signaling in the intestinal epithelium. This feels too strong, as the status of the pathway is only mildly affected, testified by the observation that mice with homozygous deletion on ieCtnnb1 are alive and well. The enhancer likely "only" drives higher Ctnnb1 expression, and it does not affect Wnt signaling by other mechanisms. The reduction of Wnt target gene expression upon its deletion is easily interpreted as the consequence of reduced β-catenin. Also the title, in my opinion, allows this ambiguity to stick in readers' minds. In other words, the authors present no evidence that the ieCtnnb1 enhancer controls Wnt signaling dosage via any mechanism other than its upregulation of Ctnnb1 expression in the intestinal epithelium. Reduced Ctnnb1, in turn, could explain the observed reduction of Wnt signaling output and the interesting downstream physiological consequences. Unless the authors think otherwise, I suggest they clarify this throughout the text, including necessary modifications to the title.
  
  We greatly appreciate the reviewer’s important comments and suggestion. We agree that ieCtnnb1’s direct effect on the canonical Wnt signaling is to regulate the transcription of Ctnnb1 in the intestinal epithelia. Therefore, knockout of ieCtnnb1 leads to compromised expression of Ctnnb1 and, consequently, reduced Wnt signaling. The term “undermined” is indeed too strong and has been revised to “compromised” in the revision (line 237). Similar revisions have been made throughout the manuscript. Particularly, the title was changed into “A Ctnnb1 enhancer transcriptionally regulates Wnt signaling dosage to balance homeostasis and tumorigenesis of intestinal epithelia”. However, as we state in the following point, decreased levels of β-catenin on ieCtnnb1 loss could lead to indirect effect, including the reduced expression of Bambi, which might cause a more significant decrease of nuclear β-catenin.
  
  (2) It is unclear how the reduction of Ctnnb1 mRNA caused by deletion of ieCtnnb1 in mice could lead to a preferential decrease of nuclear more than membranous β-catenin (Fig. 1K and L). This might reflect a general cell autonomous reduction in Wnt signaling activation; yet, it is not clear how this could occur. Do the authors have any explanations for this?
  
  It's a very important question. We observed that in inCtnnb1 knockout epithelia, the expression of Bambi (BMP and activin membrane-bound inhibitor) was significantly downregulated. Since BAMBI has been reported to stabilize β-catenin and facilitate its nuclear translocation, it is likely that the reduced level of BAMBI resulting from the loss of ieCtnnb1 further decreased nuclear βcatenin. In the revision, the expression change of Bambi has been added in Figure 1M. Moreover, the related content was extensively discussed with proper citations: “We noticed that after knocking out ieCtnnb1, the level of βcatenin in the nuclei of small intestinal crypt cells of Ctnnb1Δi.enh mice decreased more significantly compared to that in the cytoplasm (49.5% vs. 29.8%). Although the loss of ieCtnnb1 should not directly lead to reduced nuclear translocation of β-catenin, RNA-seq results showed that the loss of ieCtnnb1 causes a reduction in the expression of Bambi (BMP and activin membranebound inhibitor), a target gene in the canonical Wnt signaling pathway (Figure 1M). BAMBI promotes the binding of Frizzled to Dishevelled, thereby stabilizing β-catenin and facilitating its nuclear translocation (Lin et al., 2008; Liu et al., 2014; Mai et al., 2014; Zhang et al., 2015). Thus, it is likely that the decreased level of BAMBI resulting from the loss of ieCtnnb1 further reduced nuclear βcatenin”.
  
  (3) In Figure 1 K-L the authors show β-catenin protein level. Why not show its mRNA?
  
  The mRNA levels of Ctnnb1 in small and large intestinal crypts were shown in Figure 1I and 1J, demonstrating reduced expression of Ctnnb1 upon ieCtnnb1 knockout. We hope the reviewer understands that it is unnecessary to measure the nuclear and cytosolic levels of Ctnnb1 transcripts, as the total mRNA level generally reflects the protein level.
  
  (4) Concerning the GSEA of Figure 1 that includes the Wnt pathway components: a) it would be interesting to see which components and to what extent is their expression affected; b) why should the expression of Wnt components that are not Wnt target genes be affected in the first place? It is odd to see this described uncritically and used to support the idea of downregulated Wnt signaling.
  
  We appreciate the suggestion and apologize for any lack of clarity. The affected components of the Wnt signaling pathway and the extent of their changes are summarized in Figure 1 – figure supplement 3. Additionally, we have provided explanations for their downregulation. For instance, the reduced expression of Wnt3 and Wnt2b ligands in ieCtnnb1-KO crypts may be attributed to the decreased numbers of Paneth cells.
  
  (5) In lines 251-252 the authors refer to "certain technical issues" in the isolation of cell type from the intestinal epithelium. Why this part should be obscure in the characterization of a tissue for which there are several established protocols of isolation and analysis is not clear. I would rather describe what these issues have been and how they protocol of isolation and analysis is not clear. I would rather describe what these issues have been and how they might have affected the data presented.
  
  We thank the reviewer for pointing this out. The single-cell preparation and sequencing of small intestinal cryptal epithelial cells were carried out largely according to reported protocols with slight modification. The enrichment of live crypt epithelial cells (EpCAM+DAPI-) by flow cytometry and cell filtering after single-cell sequencing were appropriate (Figure 2 – figure supplement 1A1C). We would like to emphasize a few points: 1) Unlike other protocols, we did not exclude immune cells, erythrocytes, or endothelial cells using negative sorting antibodies. 2) When defining cell populations, we focused exclusively on epithelial cell types and did not consider other cell types, such as immune cells. As a result, the so-called “undefined” cells include a mixture of nonepithelial cells. Indeed, markers for erythrocytes (AY036118/Erf1, PMID:12894589) and immune cells (Gm42418 and Lars2, PMID:30940803, PMID: 35659337) were the top three enriched genes in the “undefined” cluster (Figure 2 – figure supplement 1D). 3) Nonetheless, the overall findings remain robust, as key observations such as the loss of Paneth cells and reduced cell proliferation were validated through histological studies. This information has been incorporated into the revised manuscript with related references cited (lines 254-259).
  
  (6) It is interesting that human SNPs exist that seem to fall within the ieCTNNB1 enhancer and affect the gastrointestinal expression of CTNNB1. Could the author report or investigate whether this SNP is present in human populations that have been considered in large-scale studies for colorectal cancer susceptibility? It seems to me a rather obvious next step of extreme importance to be ignored.
  
  (7) From Figure 5A a reader could conclude that colorectal tumor cells have a higher expression of CTNNB1 mRNA than in normal epithelium. This is the first time I have seen this observation which somewhat undermines our general understanding of Wnt-induced carcinogenesis exclusively initiated by APC mutations whereby it is β-catenin's protein level, not expression of its mRNA, of crucial importance. I find this to be potentially the most interesting observation of the current study, which could be linked to the activity of the enhancer discovered, and I suggest the authors elaborate more on this and perhaps consider it for future experimental follow-ups.
  
  We appreciate the comments and suggestions. We therefore added related content in the revision (lines 470-475): “Importantly, ieCTNNB1 displayed higher enhancer activity in most CRC samples collected in the study. Moreover, the SNP rs15981379 (C>T) within ieCTNNB1 is associated with the expression of CTNNB1 in the GI tract. Future population studies could investigate how the enhancer activity of ieCTNNB1 and this particular SNP are associated with CRC susceptibility and prognosis”.
  
  (8) I am surprised that the authors, who seem to have dedicated lots of resources to this study, are satisfied by analyzing their ChIP experiments with qPCR rather than sequencing (Figure 6). ChIP-seq would produce a more reliable profile of the HNF4a and CREB1 binding sites on these loci and in other control regions, lending credibility to the whole experiment and binding site identification. Sequencing would also take care of the two following conceptual problems in primer design.
  
  First: while the strategy to divide enhancer and promoter in 6 regions to improve the resolution of their finding is commendable, I wonder how the difference in signal reflects primers' efficiency rather than HNF4/CREB1 exact positioning. The possibility of distinguishing between regions 2 and 3, for example, in a ChIP-qPCR experiment, also depends on the average DNA fragment length after sonication, a parameter that is not specified here.
  
  Second: what are the primers designed to detect the ieCtnnb1 enhancer amplifying in the yellow-columns samples of Figure 6G? In this sample, the enhancer is deleted, and no amplification should be possible, yet it seems that a value is obtained and set to 1 as a reference value.
  
  This is indeed a crucial point, and we fully agree with the reviewer that “ChIP-seq would produce a more reliable profile of the HNF4a and CREB1 binding sites on these loci and in other control regions”. However, we believe that our current ChIP-qPCR experiments have adequately addressed the potential concerns raised by the reviewers. (1) We have ensured that the DNA fragment length after sonication falls within the range of 200 bp to 500 bp, with an average length of approximately 300 bp (Author response image 1A). We have stated the point in the revised methods section (line 633). (2) We have randomly inspected 14 out of 26 primer sets used in Figure 6 and its supplemental figure (Author response image 1B-E), confirming that all primer sets demonstrate equal amplification efficiency (ranging from 90% to 110%). This information has also been included in the revised methods section (line 650). (3) Figures 6G and 6H show reduced enrichment of HNF4𝛼 (6G) and p-S133-CREB1 (6H) at the Ctnnb1 promoter in ieCtnnb1 knockout ApcMin/+ tumor tissues. The ChIP-qPCR primers used were positioned at the Ctnnb1 promoter, not at ieCtnnb1, with IgG control enrichment serving as the reference values on the Y-axes.
  
  Author response image 1.
  
  (A) Agarose gel electrophoresis of sonicated DNA. (B-E) Tests of amplification efficiency for primer sets used in ChIP-qPCR.
  
  (9) The ChIP-qPCR showing preferential binding of pS133-CREB1 in small intestinal crypts and CHT15 cells (line 393) should be shown.
  
  The ChIP-qPCR results demonstrating preferential binding of p-S133-
  
  CREB1 over CREB1 have been added in revised Figure 6C, 6D and Figure 6 – Supplement 1C.
  
  (10) It is not entirely clear what the blue tracks represent at the bottom of Figures 6C-D and Figure 6 - Figure Supplement 1C-D. The ChIP-seq profiles of both CREB1 and HNF4a shown in Figures 6A and Figure 6 - Figure Supplement 1A do not seem to match. Taking HNF4a, for example from Figure 6 - Figure Supplement 1A it seems to bind on the Ctnnb1 promoter, while in Figure 6 - Figure Supplement 1D the peaks are within the first intron. I realize this might all be a problem with a different scale across figure panels, but I suggest producing a cleared figure.
  
  We apologize for the confusion. We have revised Figure 6C-6D, Figure 6 - figure supplement 1C-D, and the corresponding legends to enhance clarity. (1) The top panels of Figures 6C and 6D respectively highlight shaded regions of ieCTNNB1 (pink) and the CTNNB1 promoter (grey) in Figure 6A, emphasizing the enrichment of p-S133-CREB1. (2) The top panels of Figure 6 – figure supplement 1C and 1D respectively highlight shaded regions of ieCtnnb1 (pink) and the Ctnnb1 promoter (grey) in Figure 6A – figure supplement 1A, emphasizing the enrichment of HNF4α. (3) Because Figures 6C-6D and Figure 6 - figure supplement 1C-1D respectively correspond to human and mouse genomes, the positions of peaks and scales differ.
  
  (11) In the intro the authors refer to "TCF-4". I suggest they use the more recent unambiguous nomenclature for this family of transcription factors and call it TCF7L2.
  
  TCF-4 has been changed into TCF7L2 in the revision (line 81)
  
  (12) In lines 121-122, the authors write "Although numerous putative enhancers...only a fraction of them were functionally annotated". To what study/studies are the authors referring? Please provide references.
  
  References were added in the revision (line 124)
  
  (13) In some parts the authors use strong words that should in my opinion be attenuated. Examples are: (i) at line 224, "maintains" would be better substituted with "contribute", as in the absence of ieCtnnb1, Ctnnb1 is still abundantly expressed; (ii) at line 266 "compromised" when the proliferative capacity of CFCs and TACs seems to be only mildly reduced; (iii) at line 286 "disrupts", the genes are simply downregulated.
  
  We thank these great suggestions. 1) On lines 224-225, the sentence was revised to: “These data suggest that ieCtnnb1 plays a specific role in regulating the transcription of Ctnnb1 in intestinal epithelia”. 2) On line 271, “compromised” were replaced with “mildly reduced”. 3) In ieCtnnb1 knockout epithelial cells of small intestine, genes related to secretory functions were decreased, while genes related to absorptive functions were increased. Therefore, the term 'disrupts' is more appropriate than 'downregulates'.
  
  Reviewer #3:
  
  Line 81, c-Myc should be human MYC (italics) to agree with the other human gene names in this sentence.
  
  c-Myc has been changed into MYC in the revision (line 82)
  
  Line 215, wildtype should be wild-type.
  
  “wildtype” has been changed into “wild-type” in the revision (line 215)
  
  Line 224, Elimination of the enhancer did not abolish expression of Ctnnb1; therefore, it would be better to say that it "helps to maintain Ctnnb1 transcription"
  
  The sentence was changed into “These data suggest that ieCtnnb1 plays a specific role in regulating the transcription of Ctnnb1 in intestinal epithelia” in revision (lines 224-225)
  
  Line 228, perhaps "to activate transcription" is meant.
  
  “active” has been changed into “activate” in the revision (line 228)
  
  Line 235, consider "reduced" instead of "undermined".
  
  “undermined” has been replaced with “compromised” in the revision (line 237)
  
  Line 262, "em" dashes should be a both ends of this insertion.
  
  Line 298, "dysfunctional" would be better.
  
  Line 356, "samples were".
  
  Line 481, 12-hr (add hyphen).
  
  All above points have been optimized according to the reviewer’s suggestion.
  
  Line 712, Is "poly-N" meant?
  
  “Poly-N” indicates undetected bases during sequencing. This explanation was added in the revision (lines 759-760).
  
  Figure 1K, the GAPDH signal is not visible and that panel is unnecessary as there is an H3 control.
  
  Figure 1K and 1L respectively show levels of nuclear and cytoplasmic βcatenin. GAPDH and H3 were used as internal references for the cytoplasmic and nuclear fractions, respectively, confirming both robust fractionation and equal loading.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.21.600033v2
www.biorxiv.org www.biorxiv.org

Glial ferritin maintains neural stem cells via transporting iron required for self-renewal in Drosophila

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #3 (Public Review):
  
  The iron manipulation experiments are in the whole animal and it is likely that this affects general feeding behaviour, which is known to affect NB exit from quiescence and proliferative capacity. The loss of ferritin in the gut and iron chelators enhancing the NB phenotype are used as evidence that glia provide iron to NB to support their number and proliferation. Since the loss of NB is a phenotype that could result from many possible underlying causes (including low nutrition), this specific conclusion is one of many possibilities.
  
  We have investigated the feeding behavior of fly by Brilliant Blue (sigma, 861146)[1]. Our result showed that the amount of dye in the fly body were similar between control group and BPS group, suggesting that BPS almost did not affect the feeding behavior (Figure 3—figure supplement 1A).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  There was a gap between the Pros nuclear localization and downstream targets of ferritin, particularly NADH dehydrogenase and biosynthesis. Could overexpression of Ndi1 restore Pros localization in NBs?
  
  Ferritin defect downregulates iron level, which leads to cell cycle arrest of NBs via ATP shortage. And cell cycle arrest of NBs probably results in NB differentiation[2, 3]. We have added the experiment in Figure 5—figure supplement 2. This result showed that overexpression of Ndi1 could significantly restore Pros localization in NBs.
  
  The abstract requires revision to cover the major findings of the manuscript, particularly the second half.
  
  We revised the abstract to add more major findings of the manuscript in the second half as follows:
  
  “Abstract
  
  Stem cell niche is critical for regulating the behavior of stem cells. Drosophila neural stem cells (Neuroblasts, NBs) are encased by glial niche cells closely, but it still remains unclear whether glial niche cells can regulate the self-renewal and differentiation of NBs. Here we show that ferritin produced by glia, cooperates with Zip13 to transport iron into NBs for the energy production, which is essential to the self-renewal and proliferation of NBs. The knockdown of glial ferritin encoding genes causes energy shortage in NBs via downregulating aconitase activity and NAD+ level, which leads to the low proliferation and premature differentiation of NBs mediated by Prospero entering nuclei. More importantly, ferritin is a potential target for tumor suppression. In addition, the level of glial ferritin production is affected by the status of NBs, establishing a bicellular iron homeostasis. In this study, we demonstrate that glial cells are indispensable to maintain the self-renewal of NBs, unveiling a novel role of the NB glial niche during brain development.”
  
  In Figure 2B Mira appeared to be nuclear in NBs, which is inconsistent with its normal localization. Was it Dpn by mistake?
  
  In Figure 2B, we confirmed that it is Mira. Moreover, we also provide a magnified picture in Figure 2B’, showing that the Mira mainly localizes to the cortex or in the cytoplasm as previously reported.
  
  Figure 2C, Fer1HCH-GFP/mCherry localization was non-uniform in the NBs revealing 1-2 regions devoid of protein localization potentially corresponding to the nucleus and Mira crescent enrichment. It is important to co-label the nucleus in these cells and discuss the intracellular localization pattern of Ferritin.
  
  We have revised the picture with nuclear marker DAPI in Figure 2C. The result showed that Fer1HCH-GFP/Fer2LCH-mCherry was not co-localized with DAPI, which indicated that Drosophila ferritin predominantly distributes in the cytosol[4, 5]. As for the concern mentioned by this reviewer, GFP/mCherry signal in NBs was from glial overexpressed ferritin, which probably resulted in non-uniform signal.
  
  In Figure 3-figure supplement 3F, glial cells in Fer1HCH RNAi appeared to be smaller in size. This should be quantified. Given the significance of ferritin in cortex glial cells, examining the morphology of cortex glial cells is essential.
  
  In Figure 3—figure supplement 3F, we did not label single glial cells so it was difficult to determine whether the size was changed. However, it seems that the chamber formed by the cellular processes of glial cells becomes smaller in Fer1HCH RNAi. The glial chamber will undergo remodeling during neurogenesis, which responses to NB signal to enclose the NB and its progeny[6]. Thus, the size of glial chamber is regulated by NB lineage size. In our study, ferritin defect leads to the low proliferation, inducing the smaller lineage of each NB, which likely makes the chamber smaller.
  
  Since the authors showed that the reduced NB number was not due to apoptosis, a time-course experiment for glial ferritin KD is recommended to identify the earliest stage when the phenotype in NB number /proliferation manifests during larval brain development.
  
  We observed brains at different larval stages upon glial ferritin KD. The result showed that NB proliferation decreased significantly, but NB number declined slightly at the second-instar larval stage (Figure 5—figure supplement 1E and F), suggesting that brain defect of glial ferritin KD manifests at the second-instar larval stage.
  
  Transcriptome analysis on ferritin glial KD identified genes in mitochondrial functions, while the in vivo EM data suggested no defects in mitochondria morphology. A short discussion on the inconsistency is required.
  
  For the observation of mitochondria morphology via the in vivo EM data, we focused on visible cristae in mitochondria, which was used to determine whether the ferroptosis happens[7]. It is possible that other details of mitochondria morphology were changed, but we did not focus on that. To describe this result more accurately, we replaced “However, our observation revealed no discernible defects in the mitochondria of NBs after glial ferritin knockdown” with the “However, our result showed that the mitochondrial double membrane and cristae were clearly visible whether in the control group or glial ferritin knockdown group, which suggested that ferroptosis was not the main cause of NB loss upon glial ferritin knockdown” in line 207-209.
  
  The statement “we found no obvious defects of brain at the first-instar larval stage (0-4 hours after larval hatching) when knocking down glial ferritin (Figure 5-figure supplement 1C).” lacks quantification of NB number and proliferation, making it challenging to conclude.
  
  We have provided the quantification of NB number and proliferation rate of the first-instar larval stage in Figure 5—figure supplement 1C and D. The data showed that there is no significant change in NB number and proliferation rate when knocking down ferritin, suggesting that no brain defect manifests at the first-instar larval stage.
  
  A wild-type control is necessary for Figure 6A-C as a reference for normal brain sizes.
  
  We have added Insc>mCherry RNAi as a reference in Figure 6A-D, which showed that the brain size of tumor model is larger than normal brain. Moreover, we removed brat RNAi data from Figure 6A-D to Figure 6—figure supplement 1A-D for the better layout.
  
  In Figures 6B, D, “Tumor size” should be corrected to “Larval brain volume”.
  
  Here, we measured the brain area to assess the severity of the tumor via ImageJ instead of 3D data of the brain volume. So we think it would be more appropriate to use the “Larval brain size” than “Larval brain volume” here. Thus, we have corrected “Tumor size” to “Larval brain size” in Figure 6B and D to Figure 6—figure supplement 1B and D.
  
  Considering that asymmetric division defects in NBs may lead to premature differentiation, it is advisable to explore the potential involvement of ferritin in asymmetric division.
  
  aPKC is a classic marker to determine the asymmetric division defect of NB. We performed the aPKC staining and found it displayed a crescent at the apical cortex based on the daughter cell position whether in control or glial ferritin knockdown (Figure 5—figure supplement 3A). This result indicated that there was no obvious asymmetric defect after glial ferritin knockdown.
  
  In the statement "Secondly, we examined the apoptosis in glial cells via Caspase-3 or TUNEL staining, and found the apoptotic signal remained unchanged after glial ferritin knockdown (Figure 3-figure supplement 3A-D).", replace "the apoptosis in glial cells" with "the apoptosis in larval brain cells".
  
  We have replaced "the apoptosis in glial cells" with "the apoptosis in larval brain cells" in line 216.
  
  Include a discussion on the involvement of ferritin in mammalian brain development and address the limitations associated with considering ferritin as a potential target for tumor suppression.
  
  We have added the discussion about ferritin in mammalian brain development in line 428-430 and limitation of ferritin for suppressing tumor in line 441-444.
  
  Indicate Insc-GAL4 as BDSC#8751, even if obtained from another source. Additionally, provide information on the extensively used DeRed fly stock used in this study within the methods section.
  
  We provided the stock information of Insc-GAL4 and DsRed in line 673-674.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Major points:
  
  The number of NBs differs a lot between experiments. For example, in Fig 1B and 1K controls present less than 100 NBs whereas in Figure 1 Supplementary 2B it can be seen that controls have more than 150. Then, depending on which control you compare the number of NBs in flies silencing Fer1HCH or Fer2LCH, the results might change. The authors should explain this.
  
  Figure 1 Supplementary 2B (Figure 1 Supplementary 3B in the revised version) shows NB number in VNC region while Fig 1B and 1K show NB number in CB region. At first, we described the general phenotype showing the NB number in CB and VNC respectively (Fig 1 and Fig 1-Supplementary 1 and 3 in the revised version). And the NB number is consistent in each region. After then, we focused on NB number in CB for the convenience.
  
  This reviewer encourages the authors to use better Gal4 lines to describe the expression patterns of ferritins and Zip13 in the developing brain. On the one hand, the authors do not state which lines they are using (including supplementary table). On the other hand, new Trojan GAL4 (or at least InSite GAL4) lines are a much better tool than classic enhancer trap lines. The authors should perform this experiment.
  
  All stock source and number were documented in Table 2. Ferritin GAL4 and Zip13 GAL4 in this study are InSite GAL4. In addition, we also used another Fer2LCH enhancer trapped GAL4 to verify our result (DGRC104255) and provided the result in Figure 2—figure supplement 1. Our data showed that DsRed driven by Fer2LCH-GAL4 was co-localized with the glia nuclear protein Repo, instead of the NB nuclear protein Dpn, which was consistent with the result of Fer1HCH/Fer2LCH GAL4. In addition, we will try to obtain the Trojan GAL4 (Fer1HCH/Fer2LCH GAL4 and Zip13 GAL4) and validate this result in the future.
  
  The authors exclude very rapidly the possibility of ferroptosis based only on some mitochondrial morphological features without analysing the other hallmarks of this iron-driven cell death. The authors should at least measure Lipid Peroxidation levels in their experimental scenario either by a kit to quantify by-products of lipid peroxidation such as Malonaldehide (MDA) or using an anti 4-HNE antibody.
  
  We combined multiple experiments to exclude the possibility of ferroptosis. Firstly, ferroptosis can be terminated by iron chelator. And we fed fly with iron chelator upon glial ferritin knockdown, but NB number and proliferation were not restored, which suggested that ferroptosis probably was not the cause of NB loss induced by glial ferritin knockdown (Figure 3B and C). Secondly, Zip13 transports iron into the secretary pathway and further out of the cells in Drosophila gut[8]. Our data showed that knocking down iron transporter Zip13 in glia resulted in the decline of NB number and proliferation, which was consistent with the phenotype upon glial ferritin knockdown (Figure 3E-G). More importantly, the knockdown of Zip13 and ferritin simultaneously aggravated the phenotype in NB number and proliferation (Figure 3E-G). These results suggested that the phenotype was induced by iron deficiency in NB, which excluded the possibility of iron overload or ferroptosis to be the main cause of NB loss upon glial ferritin knockdown. Finally, we observed mitochondrial morphology on double membrane and the cristae that are critical hallmarks of ferroptosis, but found no significant damage (Figure 3-figure supplement 2E and F).
  
  In addition, we have added the 4-HNE determination in Figure 3—figure supplement 2G and H. This result showed that 4-HNE level did not change significantly, suggesting that lipid peroxidation was stable, which supported to exclude the possibility that the ferroptosis led to the NB loss upon glial ferritin knockdown.
  
  All of the above results together indicate that ferroptosis is not the cause of NB loss after ferritin knockdown.
  
  A major flaw of the manuscript is related to the chapter Glial ferritin defects result in impaired Fe-S cluster activity and ATP production and the results displayed in Figure 4. The authors talk about the importance of FeS clusters for energy production in the mitochondria. Surprisingly, the authors do not analyse the genes involved in this process such as but they present the interaction with the cytosolic FeS machinery that has a role in some extramitochondrial proteins but no role in the synthesis of FeS clusters incorporated in the enzymes of the TCA cycle and the respiratory chain. The authors should repeat the experiments incorporating the genes NSF1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) or remove (or at least rewrite) this entire section.
  
  Thanks for this constructive advice and we have revised this in Figure 4B and C. We repeated the experiment with blocking mitochondrial Fe-S cluster biosynthesis by knocking down Nfs1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971), respectively. Nfs1 knockdown in NB led to a low proliferation, which was consistent with CIA knockdown. However, we did not observe the obvious brain defect in ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) knockdown in NB. Our interpretation of these results is that Nfs1 probably is a necessary core component in Fe-S cluster assembly while others are dispensable[9].
  
  The presence and aim of the mouse model Is unclear to this reviewer. On the one hand, It Is not used to corroborate the fly findings regarding iron needs from neuroblasts. On the other hand, and without further explanation, authors migrate from a fly tumor model based on modifying all neuroblasts to a mammalian model based exclusively on a glioma. The authors should clarify those issues.
  
  Although iron transporter probably is different in Drosophila and mammal, iron function is conserved as an essential nutrient for cell growth and proliferation from Drosophila to mammal. The data of fly suggested that iron is critical for brain tumor growth and thus we verified this in mammalian model. Glioma is the most common form of central nervous system neoplasm that originates from neuroglial stem or progenitor cells[10]. Therefore, we validated the effect of iron chelator DFP on glioma in mice and found that DFP could suppress the glioma growth and further prolong the survival of tumor-bearing mice.
  
  Minor points
  
  Although referred to adult flies, the authors did not include either in the introduction or in the discussion existing literature about expression of ferritins in glia or alterations of iron metabolism in fly glia cells (PMID: 21440626 and 25841783, respectively) or usage of the iron chelator DFP in drosophila (PMID: 23542074). The author should check these manuscripts and consider the possibility of incorporating them into their manuscript.
  
  Thanks for your remind. We have incorporated all recommended papers into our manuscript line 65-67 and 168.
  
  The number of experiments in each figure is missing.
  
  All experiments were repeated at least three times. And we revised this in Quantifications and Statistical Analysis of Materials and methods.
  
  If graphs are expressed as mean +/- sem, it is difficult to understand the significance stated by the authors in Figure 2E.
  
  We apologize for this mistake and have revised this in Quantifications and Statistical Analysis. All statistical results were presented as means ± SD.
  
  When authors measure aconitase activity, are they measuring all (cytosolic and mitochondrial) or only one of them? This is important to better understand the experiments done by the authors to describe any mitochondrial contribution (see above in major points).
  
  In this experiment, we were measuring the total aconitase activity. We also tried to determine mitochondrial aconitase but it failed, which was possibly ascribed to low biomass of tissue sample.
  
  In this line, why do controls in aconitase and atp lack an error bar? Are the statistical tests applied the correct ones? It is not the same to have paired or unpaired observations.
  
  It is the normalization. We repeated these experiments at least three times in different weeks respectively, because the whole process was time-consuming and energy-consuming including the collection of brains, protein determination and ATP or aconitase determination. And the efficiency of aconitase or ATP kit changed with time. We cannot control the experiment condition identically in different batches. Therefore, we performed normalization every time to present the more accurate result. The control group was normalized as 1 via dividing into itself and other groups were divided by the control. This normalized process was repeated three times. Therefore, there is no error bar in the control group. We think it is appropriate to apply ANOVA with a Bonferroni test in the three groups.
  
  In some cases, further rescue experiments would be appreciated. For example, expression of Ndi restores control NAD+ levels or number of NBs, it would be interesting to know if this is accompanied by restoring mitochondrial integrity and its ability to produce ATP.
  
  We have determined ATP production after overexpressing Ndi1 and provided this result in Figure 4—figure supplement 1B. The data showed that expression of Ndi1 could restore ATP production upon glial Fer2LCH knockdown, which was consistent with our conclusion.
  
  Lines 293-299 on page 7 are difficult to understand.
  
  According to our above results, the decrease of NB number and proliferation upon glial ferritin knockdown (KD) was caused by energy deficiency. As shown in the schematic diagram (Author response image 1), “T” represented the total energy which was used for NB maintenance and proliferation. “N” indicated the energy for maintaining NB number. “P” indicated the energy for NB proliferation. “T” is equal to “N” plus “P”. When ferritin was knocked down in glia, “T”, “N” and “P” declined in “Ferritin KD” compared to “wildtype (WT)”. Knockdown of pros can prevent the differentiation of NB, but it cannot supply the energy for NB, which probably results in the rescue of NB number but not proliferation. Specifically, NB number increased significantly in “Ferritin KD Pros KD” compared to “Ferritin KD”, which resulted in consuming more energy for NB maintenance in “Ferritin KD Pros KD”. As shown in the schematic diagram, “T” was not changed between “Ferritin KD Pros KD” and “Ferritin KD”, whereas ”N” was increased in “Ferritin KD Pros KD” compared to “Ferritin KD”. Thus, “P” was decreased, which suggested that less energy was remained for proliferation, leading to the failure of rescue in NB proliferation. It seemed that the level of proliferation in “Ferritin KD Pros KD” was even lower than “Ferritin KD”.
  
  Author response image 1.
  
  The schematic diagram of relationship between energy and NB function in different groups. “T” represents total energy for NB maintenance and proliferation. “N” represents the energy for NB maintenance. “P” represents the energy for NB proliferation. T=N+P
  
  Line 601 should indicate that Tables 2 and 3 are part of the supplementary material.
  
  We have revised this in line 678.
  
  Figure 4-supplement 1. Only validation of 2 genes from a RNAseq seems too little.
  
  We dissected hundreds of brains for sorting NBs because of low biomass of fly brain. This is a difficult and energy-consuming work. Most NBs were used for RNA-seq, so we can only use a small amount of sample left for validation which is not enough for more genes.
  
  Figure 6E, the authors indicate that 10 mg/ml DFP injection could significantly prolong the survival time. Which increase in % is produced by DFP?
  
  We have provided the bar graph in Author response image 2. The increase is about 16.67% by DFP injection.
  
  Author response image 2.
  
  The bar graph of survival time of mice treated with DFP. (The unpaired two-sided Student’s t test was employed to assess statistical significance. Statistical results were presented as means ± SD. n=7,6; *: p<0.05)
  
  Reviewer #3 (Recommendations For The Authors):
  
  As I read the initial results that built the story (glia make ferritin>release it> NBs take them up>use it for TCA and ETC) I kept thinking about what it meant for NBs to be 'lost'. This led me to consider alternate possibilities that the results might point to, other than the ones the authors were suggesting. It was only in Figure 5 that the authors ruled out some of those possibilities. I would suggest that they first illustrate how NBs are lost upon glial ferritin loss of function before they delve into the mechanism. This would also be a place to similarly address that glial numbers and general morphology are unchanged upon ferritin loss.
  
  This recommendation provides a valuable guideline to build this story especially for researchers who are interested in neural stem cell studies. Actually, we tried this logic to present our study but found that there are several gaps in the middle of the manuscript, such as the relationship between glial ferritin and Pros localization in NB, so that the whole story cannot be fluently presented. Therefore, we decided to present this study in the current way.
  
  More details of the screen would be useful to know. How many lines did they screen, what was the assay? This is not mentioned anywhere in the text.
  
  We have added this in Screen of Materials and methods. We screened about 200 lines which are components of classical signaling pathways, highly expressed genes in glial cells or secretory protein encoding genes. UAS-RNAi lines were crossed with repo-Gal4, and then third-instar larvae of F1 were dissected. We got the brains from F1 larvae and performed immunostaining with Dpn and PH3. Finally, we observed the brain in Confocal Microscope.
  
  Many graphs seem to be repeated in the main figures and the supplementary data. This is unnecessary, or at least should be mentioned.
  
  We appreciate your kind reminder. However, we carefully went through all the figures and did not find the repeated graphs, though some of them look similar.
  
  The authors mention that they tested which glial subtypes ferritin is needed in, but don't show the data. Could they please show the data? Same with the other iron transport/storage/regulation. Also, in both this and later sections, the authors could mention which Gal4 was used to label what cell types. The assumption is that the reader will know this information.
  
  We have added the result of ferritin knockdown in glial subpopulations in Figure 1—figure supplement 2. However, considering that the quantity of iron-related genes, we did not take the picture, but we recorded this in Table 3.
  
  For all their images showing colocalisation, magnified, single-colour images shown in grayscale will be useful. For example, without the magnification, it is not possible to see the NB expression of the protein trap line in Figure 2B. A magnified crop of a few NBs (not a single one like in 2C) would be more useful.
  
  We have provided Figure 2A’, B’, D’ and Figure 3D’ as suggested.
  
  There are a lot of very specific assays used to detect ROS, NAD, aconitase activity, among others. It would be nice to have a brief but clear description of how they work in the main text. I found myself having to refer to other sources to understand them. (I believe SoNAR should be attributed to Zhao et al 206 and not Bonnay et al 2020.)
  
  We have added a brief description about ROS, aconitase activity, NAD in line 198-199, 229-231, and 269 as suggested.
  
  I did not understand the normalisation done with respect to SoNAR. Is this standard practice? Is the assumption that 'overall protein levels will be higher in slowly proliferating NBs' reasonable? This is why they state the need to normalise.
  
  The SoNAR normalization is not a standard practice. However, we think that our normalization of SoNar is reasonable. According to our results, the expression level of Dpn and Mira seemed higher in glial ferritin knockdown, so we speculated that some proteins accumulated in slowly proliferating NBs. Thus, we used Insc-GAL4 to drive DsRed for indicating the expression level of Insc and found that DsRed rose after glial ferritin knockdown, suggesting that Insc expression was increased indeed. Therefore, we have to normalize SoNar driven by Insc-GAL4 based on DsRed driven by Insc-Gal4, which eliminates the effect of increased Insc upon glial ferritin knockdown.
  
  FAC is mentioned as a chelator? But the authors seem to use it oppositely. Is there an error?
  
  FAC is a type of iron salt, which is used to supply iron. We have also indicated that in line 156 according to your advice.
  
  The lack of any cell death in the L3 brain surprised me. There should be plenty of hemilineages that die, as do many NBs, particularly in the abdominal segments. Is the stain working? Related to this, P35 is not the best method for rescuing cell death. H99 might be a better way to go.
  
  We were also surprised to see this result and repeated this experiment for several times with both negative and positive controls. Moreover, we also used TUNEL to validate this result, which led to the same result. We will try to use H99 to rescue NB loss in the future, because it needs to be integrated and recombined with our current genetic tools.
  
  It would be nice to see the aconitase activity signal as opposed to just the quantification.
  
  This method can only determine the absorbance for indicating aconitase activity, so our result is just the quantification.
  
  Glia are born after NBs are specified. In fact, they arise from NBs (and glioblasts). So, it's unlikely that the knockdown of ferritin in glia can at all affect initial NB specification.
  
  We completely agree with this statement.
  
  The section on tumor suppression seems out of place. The fly data on which the authors base this as an angle to chase is weak. Dividing cells will be impaired if they have inadequate energy production. As a therapeutic, this will affect every cell in the body. I'm not sure that cancer therapeutics is pursuing such broadly acting lines of therapies anymore.
  
  Our data suggested that iron/ferritin is more critical for high proliferative cells. Tumor cells have a high expression of TfR (Transferrin Receptor)[11], which can bind to Transferrin and ferritin[12]. And ferritin specifically targets on the tumor cells[11]. Thus, we think iron/ferritin is extremely essential for tumor cells. If we can find the appropriate dose of iron/ferritin inhibitor, suppressing tumor growth but maintaining normal cell growth, iron/ferritin might be an effective target of tumor treatment.
  
  The feedback from NB to glial ferritin is also weak data. The increased cell numbers (of unknown identity) could well be contributing to the increase in ferritin. I would omit the last two sections from the MS.
  
  In brat RNAi and numb RNAi, increased cells are NB-like cells, which cannot undergo further differentiation and are not expected to produce ferritin. More importantly, we used Repo (glia marker) as the reference and quantified the ratio of ferritin level to Repo level, which can exclude the possibility that increased glial cells lead to the increase in ferritin.
  
  References
  
  (1) Tanimura T, Isono K, Takamura T, et al. Genetic Dimorphism in the Taste Sensitivity to Trehalose in Drosophila-Melanogaster. J Comp Physiol, 1982,147(4):433-7
  
  (2) Myster DL, Duronio RJ. Cell cycle: To differentiate or not to differentiate? Current Biology, 2000,10(8):R302-R4
  
  (3) Dalton S. Linking the Cell Cycle to Cell Fate Decisions. Trends in Cell Biology, 2015,25(10):592-600
  
  (4) Nichol H, Law JH, Winzerling JJ. Iron metabolism in insects. Annu Rev Entomol, 2002,47:535-59
  
  (5) Pham DQ, Winzerling JJ. Insect ferritins: Typical or atypical? Biochim Biophys Acta, 2010,1800(8):824-33
  
  (6) Speder P, Brand AH. Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife, 2018,7
  
  (7) Mumbauer S, Pascual J, Kolotuev I, et al. Ferritin heavy chain protects the developing wing from reactive oxygen species and ferroptosis. PLoS Genet, 2019,15(9):e1008396
  
  (8) Xiao G, Wan Z, Fan Q, et al. The metal transporter ZIP13 supplies iron into the secretory pathway in Drosophila melanogaster. Elife, 2014,3:e03191
  
  (9) Marelja Z, Leimkühler S, Missirlis F. Iron Sulfur and Molybdenum Cofactor Enzymes Regulate the Life Cycle by Controlling Cell Metabolism. Front Physiol, 2018,9
  
  (10) Morgan LL. The epidemiology of glioma in adults: a "state of the science" review. Neuro-Oncology, 2015,17(4):623-4
  
  (11) Fan K, Cao C, Pan Y, et al. Magnetoferritin nanoparticles for targeting and visualizing tumour tissues. Nat Nanotechnol, 2012,7(7):459-64
  
  (12) Li L, Fang CJ, Ryan JC, et al. Binding and uptake of H-ferritin are mediated by human transferrin receptor-1. Proc Natl Acad Sci U S A, 2010,107(8):3505-10
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.09.566380v2
www.biorxiv.org www.biorxiv.org

Accelerated signal propagation speed in human neocortical dendrites

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The propagation of electrical signals within neuronal circuits is tightly regulated by the physical and molecular properties of neurons. Since neurons vary in size across species, the question arises whether propagation speed also varies to compensate for it. The present article compares numerous speed-related properties in human and rat neurons. They found that the larger size of human neurons seems to be compensated by a faster propagation within dendrites but not the axons of these neurons. The faster dendritic signal propagation was found to arise from wider dendritic diameters and greater conductance load in human neurons. In addition, the article provides a careful characterization of human dendrites and axons, as the field has only recently begun to characterize post-operative human cells. There are only a few studies reporting dendritic properties and these are not all consistent, hence there is the added value of reporting these findings, particularly given that the characterization is condensed in a compartmental model.
  
  Strengths:
  
  The study was performed with great care using standard techniques in slice electrophysiology (pharmacological manipulation with somatic patch-clamp) as well as some challenging ones (axonal and dendritic patch-clamp). Modeling was used to parse out the role of different features in regulating dendritic propagation speed. The finding that propagation speed varies across species is novel as previous studies did not find a large change in membrane time constant or axonal diameters (a significant parameter affecting speed). A number of possible, yet less likely factors were carefully tested (Ih, membrane capacitance). The main features outlined here are well-known to regulate speed in neuronal processes. The modeling was also carefully done to verify that the magnitude of the effects is consistent with the difference in biophysical properties. Hence, the findings appear very solid to me.
  
  Weaknesses:
  
  The role of diameter in regulating propagation speed is well-known in the axon literature.
  
  We thank the reviewer for this comment. This is indeed true. The paper does not claim that this is new – we just refereed to Waxman’s book to acknowledge this established effect. Our main emphasize is on the impact of dendritic (rather than axonal) diameter – highlighting the faster EPSP speed near the input synapse and converging to steady-state value further away from the soma and using this to explore the impact of differences in dendritic diameter of rat vs. human on EPSP latency and velocity. We now made this point clearer in the revised text.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this paper, Oláh and colleagues introduce new research data on the cellular and biophysical elements involved in transmission within the pyramidal circuits of the human neocortex. They gathered a comprehensive set of patch-clamp recordings from human and rat pyramidal neurons to compare how the temporal aspect of neuronal processing is maintained in the larger human neocortex. A broad range of experimental, theoretical, and computational methods are used, including two-photon guided dual whole-cell recordings, electron microscopy, and computational simulations of reconstructed neurons.
  
  Recordings from synaptically connected pyramidal neurons revealed longer intercellular path lengths within the human neocortex. Further, by using dual whole-cell recordings from somadendrite and soma-axon locations, they found that short latencies from soma to soma can be partly attributed to an increased propagation speed for synaptic potentials, but not for the propagation of action potentials along the axon.
  
  Next, in a series of extensive computational modeling studies focusing on the synaptic potentials, the authors observe that the short-latency within large human pyramidal neural circuits may have a passive origin. For a wide array of local synaptic input sites, the authors show that the conductance load of the dendrites, electrically coupled to a large diameter apical dendrite, affects the cable properties. The result is a relatively faster propagation of EPSPs in the human neuron.
  
  The manuscript is well-written and the physiological experiments and biophysical arguments are very well explained. I appreciated the in-depth theoretical steps for the simulations. That passive cable properties of the dendrites are causing a higher velocity in human dendrites is interesting but there is a disconnect between the experimental findings and the model simulations. Based on the present data the contribution of active membrane properties cannot be dismissed and deserves further experiments.
  
  See our response below
  
  Strengths:
  
  The authors present state-of-the-art 2P-guided dual whole-cell recordings in human neurons. In combination with detailed reconstructions, these approaches represent the next steps in unravelling the information processing in human circuits.
  
  The computational modeling based on cable theory and experimentally constrained simulations provides an excellent integrated view of the passive membrane properties.
  
  Weaknesses:
  
  There are smaller and larger issues with the statistical analyses of the experimental data which muddles the interim conclusions.
  
  That the cable properties alone are the main explanation for speeding the electrical signaling in human pyramidal neurons appears inconsistent with the experimental data.
  
  This is an excellent point – we indeed performed analysis on only passive cases – highlighting (and now also ranking) the impact of the various morpho-electrical properties of the neurons on the differences in signal latency in human vs. rats. We did explored (not shown) the effect of active channels in the dendrites (including the h-current); as expected the results strongly depend on channel density and their spatial distribution over the dendritic tree. As we do not know these parameters for the modelled cells, we decided to remain focus on the impact of passive/morphological parameters. We also note that the experimental results (page 4-5 in manuscript) show minor contribution of h-current emphasizing that the passive properties have the main role in differentiating human and rats. differences between human and rat.
  
  Some of the electrophysiological experiments require further control experiments to make robust conclusions.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This study indicates that connections across human cortical pyramidal cells have identical latencies despite a larger mean dendritic and axonal length between somas in the human cortex. A precise demonstration combining detailed electrophysiology and modeling indicates that this property is due to faster propagation of signals in proximal human dendrites. This faster propagation is itself due to a slightly thicker dendrite, a larger capacitive load, and stronger hyperpolarizing currents. Hence, the biophysical properties of human pyramidal cells are adapted such that they do not compromise information transfer speed.
  
  Strengths:
  
  The manuscript is clear and very detailed. The authors have experimentally verified a large number of aspects that could affect propagation speed and have pinpointed the most important one. This paper provides an excellent comparison of biophysical properties between rat and human pyramidal cells. Thanks to this approach a comprehensive description of the mechanisms underlying the acceleration of propagation in human dendrite is provided.
  
  Weaknesses:
  
  Several aspects having an impact on propagation speed are highlighted (dendritic diameter, ionic channels, capacitive load) and there is no clear ranking of their impact on signal propagation speed. It seems that the capacitive load plays a major role, much more than dendritic diameter for which only a 10% increase is observed across species. Both aspects actually indicate that there is an increase in passive signal propagation speed with bigger cells at least close to the soma. This suggests that bigger cells are mechanically more rapid. An intuitive reason why capacitive load increases speed would also help the reader follow the demonstration.
  
  We thank the referee for both these excellent points. In response to them:
  
  (i) We now performed a new comprehensive statistical analysis and show the ranking of the effect of the different morphological/cable factors on EPSP propagation. This analysis appears in both Supp. Table 5& 6, Fig. S16 and also in the main text as follows:
  
  To rank the impact of the various factors affecting EPSP propagation latency in human and rat neurons, we conducted a comprehensive statistical analysis using two complementary approaches: the generalized linear model (GLM) (Kiebel & Holmes, 2007) as well as SHAP (SHapley Additive exPlanations) (Lundberg & Lee, 2017) based on fitting Gradient Tree Boosting (Friedman, 2002)model. We began by fitting a GLM without interaction terms among the factors affecting EPSP latency (Suppl. Table 5). This enables us to quantify the primary individual factors affecting EPSP propagation. Our analysis revealed the following ranking order: 1) physical distance of synapses from soma had the strongest effect; 2) species differences; 3) conductance load, as demonstrated by our “hybrid cells” manipulation; 4) radii of the apical dendrite, affecting the cables’ space constant, λ; and 5) the specific cable parameters, as revealed when using per-cell fitted parameters versus uniform cable parameters, was minimal. We next performed GLM analysis with interaction terms showing that, as expected, there are significant interactions between the factors affecting EPSP latency (Suppl. Table 6). To further validate the above ranking while incorporating the interactions between the various factors affecting EPSP latency, we performed a SHAP analysis. Notably, even with interactions included, the ranking of the factors affecting signal propagation are aligned with the results from the analysis based on the GLM without interaction terms (see Fig S.16).
  
  (ii) As for the intuitive explanation required by the referee. We added the following paragraph In the Discussion:
  
  The intuitive reason for this enhancement is that the large conductance load (the “leaky end” boundary conditions) more effectively “steals” the synaptic (axial) current (like water pouring faster into a large pool). The more mathematical intuition is that the large soma (sink) adds fast time constants to the system (see also related explanation in Fig. 4 in Eyal et al., 2014).
  
  We thank the editors for considering and revising our manuscript for publication in eLife. We appreciate the positive appreciation of the work and the critical points raised by the reviewers. We have responded in detail to all the excellent comments from all reviewers. We believe that these revisions have significantly improved the quality of our study.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  There are two points that could improve the reading experience of this nice manuscript. These should be easily addressed with minor re-phrasing.
  
  Credit to conduction velocity literature. Less widely known in the dendrite literature, in the axon literature, the relationship between propagation speed and process diameter is well established. I thought the two articles cited (Jack Noble Tsien and Agmon-Snir & Segev) were not as direct in the treatment of this relationship. The work of Stephen Waxman, for instance, made clear how axon diameter tightly controls propagation speed (see for instance the Scholarpedia entry by Swadlow and Waxman). In my opinion, this is a widely known piece of work, that is part of some introductory books to neuroscience. While the article does not claim they found this relationship, parts of the presentation are better understood if we ignore this well-known fact. I am referring to the abstract, intro, and the beginning of results where 'larger' is presented as synonymous with 'slower'. For instance 'to compensate for the increase neurons' size' (abstract) or 'the increase in size of dendrites and axons might come with a cost of longer signal propagation times' only makes sense if 'size' refers to spatial extent, not diameter.
  
  We thank for this valid point; leaving out axon diameter references was not intentional. We have now added the suggested reference to our manuscript. In the size comparisons, we have only pointed out the obvious size differences between the body and the dendritic processes. We have reworded sentences with size comparisons.
  
  In Abstract (lines 1-6):
  
  Human-specific cognitive abilities depend on information processing in the cerebral cortex, where neurons are significantly larger, their processes are longer and sparser compared to rodents. We found that, in synaptically-connected layer 2/3 pyramidal cells (L2/3 PCs), soma-tosoma signal propagation delay is similar in humans and rodents. Thus, to compensate for the increase in neurons’s longer processes, membrane potential changes must propagate faster in human axons and/or dendrites.
  
  In section “Effect of dendritic thickness” in Results we have modified it as follows:
  
  The relationship between conduction velocity and axon diameter is well known for small myelinated and unmyelinated axons (Waxman and Bennett, 1972). Anatomical features of neuronal processes dendrites also have a major influence on signal propagation properties 5,19, thus …
  
  Waxman, S. G. and Bennett, M. V. L. Relative conduction velocity of small myelinated and nonmyelinated fibres in the central nervous system. Nature New Biol., 238217-219, 1972.
  
  Two or four dendritic factors? The study identifies two major dendritic factors influencing the propagation speed (diameter and load), however the end of the results highlights four factors. I did not understand how factor 2 was different than factor 1. Neither did I understand how factor 4 was different from the other factors. There seemed to be a little redundancy here that could be streamlined.
  
  We thank the reviewer for pointing this out. We now have changes the respective text, added the ranking statistics (see above) to assess the effect of the different parameters on signal propagation in dendrites.
  
  Microcircuits? The study found that the changes in speed arise from the dendrites rather than the axons, as such it seems it would be more precise to replace 'microcircuits' with 'dendrites'.
  
  We are thankful for this suggestion. We change the title to Accelerated signal propagation speed in human neocortical dendrites.
  
  Typos
  
  P3 line 24 'find significant difference the propagation'.
  
  P6 line 35 'how morphological differences' it would be useful to specify which morphological difference here.
  
  Corrected.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The statistical analyses should be changed. T-testing populations and comparing visual differences of differences ("human minus rats") is a common but egregious error in the field of neurosciences (see doi:10.1038/nn.2886). The conclusion that HCN channels "... do not by themselves explained the differences between the two species" (lines 174-176) is not compelling. The design of the experiments presented in Figure 3 is paired recordings and the addition of a blocker (ZD7288 or TTX cocktail). These are classic 2 x 2 factorial designs (species x drug). The authors will need to perform a repeated-measured analysis of variance (RM-ANOVA) and provide information on the interaction significance. Please revise the figures and improve statistical reporting. Post-hoc comparisons of the velocity populations are required to support the idea of whether h-channels are explaining the observed differences.
  
  Thank you for drawing our attention to this error. The statistical analysis of the pharmacological experiments was re-performed as suggested. After the 2-way ANOVA with repeated measures and Bonferroni post-hoc correction, we can indeed find significant differences only in the control group, namely that the propagation speed of bAPs in human dendrites was significantly higher. The implementation of the proposed statistical analysis demonstrates that the administration of ZD has no statistically significant effect on the propagation speed of human or rat dendrites. The treatment with TTX cocktail resulted in a significant difference in signal propagation in humans but not in rodents. However the trend is discernible and the P = 0.0588 value is close to the widely accepted 0.05 threshold. After the TTX cocktail treatment, the speed of signal propagation did not differ significantly between the two species. However, on average, the human dendrites remained faster. These alterations in P-values do not affect our primary conclusions. The MS text has been modified accordingly.
  
  (2) Although ZD7288, in my opinion, influences the bAP (see point #1) the authors subsequently leave the h-current unblocked in the experiments in Figures 3D, E. Here, they use sodium, potassium, and calcium currents as well as synaptic conductances. I am puzzled why (in line 188) they claim the dendrites are "passive" although the data show h-currents are contributing to the shape of the bAP in human neurons. In line 196 they conclude voltage-gated conductances have a "minor" contribution and passive properties a main role. Please revise conclusions or provide better experimental support.
  
  Thank you for this point. We meant to refer to the state in which no action potential can be generated, although the word 'passive' might be misleading in this context; we rephrase these sentences in the MS accordingly.
  
  (3) A major concern is the injection of an AP in voltage-clamp mode. Although this is the right choice and I'm in support of the experiment, it is technically challenging to space clamp the soma and fully recapitulate the speed and amplitude of a 100 mV depolarization. The voltage drop in peak amplitude as well as the increased delay between the baseline AP (current clamp) and AP in blocker conditions (voltage clamp) could be fully explained by switching between current- and voltage-clamp modes. In additional control experiments, the authors should add a second voltage follower electrode (CC) at the soma showing whether the authors can preserve the original AP (from CC) in VC/blocker condition. It may well be they need to adjust the injection protocol.
  
  Our experiments were designed to replicate the work of Stuart et al. (1994), in which they compared the attenuation of active and passive backpropagating signals. When they blocked Na+ channels with TTX they injected simulated action potentials in voltage-clamp mode. They concluded that TTX-sensitive Na+ channels cause somatic action potential entry into the dendritic compartment. They found a comparable attenuation of the backward propagating action potential in the dendrites in control conditions (~70 %).
  
  We performed control recordings based on the reviewer’s suggestion (Author response image 1).
  
  Author response image 1.
  
  Injection of the previously recorded AP (blue) in VC mode produced a completely similar somatic AP in CC mode (orange). The slight temporal delay between the two signal caused by the different position of the pipettes on the cell body. The right panel shows the plot of the two peak-aligned APs as a function of each other, close to the blue ‘equality’ line. We concluded that the original AP is well preserved in VC/blocker condition.
  
  (5) From the paragraph entitled "Modeling EPSP propagation in dendrites" and onwards the authors make countless conclusions based on theory and modelling results but without any statistical support. Multiple neurons are used thus it is rather straightforward to provide numerical support for the assertions. For example, but this is not an exhaustive list, how should we interpret that latency ranges are different (line 240, line 253) etc.? Or were the estimated Cm values of human and rat neurons (0.6 versus 1.1) significantly different? And if so, how does this align with the Cm estimates in the nucleated patch experiments?
  
  We thank the referee for this comment and now added a set of statistical analyses. The results appear now throughout the whole theoretical paper in revised article. In particular with respect to Figs. 6&7 where we now show that, indeed, our various manipulations (e.g., hybrid vs. original cells) as well as the cable parameters (Cm, Rm) are indeed significantly different between human and rats whereas the membrane time constant is not significantly different between human and rat. As for Cm in human. Our limited sample size shows significant difference between human and rat. Yet, the range of values for Cm that we found in our modeling study does fall within the experimental range reported in the present study.
  
  Minor
  
  Line 44. The "simulated EPSP" example in Figure 2C is not a command waveform for an EPSC. Line 526 in the methods states that also ramp currents were used. Please revise to clarify the main text.
  
  Thank you for bringing this discrepancy to our attention. In the experiments, we used ramp injections. We have made this clear in the main text as follows: ”... we tested orthodromic or forward propagating signal propagation velocity by injecting short-duration current ramps to simulate EPSP (sEPSP) signals in the dendrites and recorded the resultant subthreshold voltage response in the soma”
  
  Line 522. The authors state the recordings were all carried out "in current clamp mode" but detailed VC method information is lacking. Did they use series resistance compensation?
  
  We did not use series resistance compensation.
  
  Line 479 From which region(s) where human "neocortical slices" sampled? Please add this information.
  
  We have added regions of origin to the Methods section: frontal (n = 21), temporal (n = 20), parietal (n = 20), and occipital (n = 1).
  
  Please show higher temporal resolution example traces, for example in Figure 3. Differences are at the micrometer scale, but APs are shown at the millisecond scale. Hard to judge the quality of the data. Showing the command potentials (inset Figure 3D, E) is misleading (see major point #3).
  
  In response to the reviewer's request, we have redrawn the example traces in Figure 3.
  
  Please check the labeling of figures. There is information missing. For example, in Figure 5 A to C I am missing information and the units of the axes.
  
  In the black plots on the right side of panels B and C, the y-axis shows the thickness measurements for the given dendrite stacked on top of each other and the x-axis shows the measurement values, the units for the x-axis are µm as mentioned in the figure legend.
  
  Line 981 "scalebars" should read scale bars."
  
  Line 986 "bootstraped" should read "bootstrapped".
  
  Done.
  
  Are the dendritic diameters increased for all basal and apical higher-order branches? It is unclear how the model simulations were built on diameters of primary and higher-order branches.
  
  In our modelling study we took the actual diameter of the reconstructed PCs in both proximal and higher order branches. We did compare per-distance differences in diameter – but it is automatically incorporated into the computation of the basal load (“equivalent cables” in Figs 6&8).
  
  The velocity calculation for axonal propagation (yielding a ~0.9 m/s conduction velocity, Figure 2B) is incorrect. Using the peak of the action potentials between soma and axon misses the fact that action potentials start earlier and spatially distally from the soma in the axon. Please revise the calculation to include the temporal delay and actual distance travelled by the forward propagating action potential.
  
  Thank you for this question. We are aware that the AP is generated at the AIS and that it is located between the two recording electrodes and we have to take into account that the signal propagates from the AIS to the soma and this may shorten the delay in the system. To the best of our knowledge, there is no experimental evidence of the location of the AP generation site on the AIS in layer 2-3 pyramidal cells in the human neocortex, so we assumed that it is located 35 microns from the soma, and that the propagation speed from the AIS to the two directions is the same. Consequently, we have corrected our propagation velocity values as follows:
  
  “For the axon bleb recordings we assumed that the axon initial segment (AIS) of the cells are 35 µm from the axon hillock, and the APs propagate to forward (to the bleb) and backward (to the soma) at the same speed. For the correction of the AIS we used the following formula: (2)
  
  where vcorr is the corrected propagation speed for AIS position, l is the axonal distance between the soma and the axon bleb, t is the latency between the two measuring point, ais is the assumed position of the AIS alongside the axon (35 µm).”
  
  What explains the strongly attenuated axonal action potential at the bleb? Is this representative?
  
  The strongly attenuated axonal action potential at the bleb can be explained by a few key factors:
  
  (1) Membrane Integrity: Bleb formation often indicates some level of membrane damage or alteration. This can disrupt the normal ionic gradients across the membrane, leading to a failure in generating or propagating action potentials effectively.
  
  (2) Current Leakage: Bleb formation may create additional pathways for ion leakage, which can dissipate the electrical current that would normally propagate the action potential. This leakage reduces the overall amplitude of the action potential.
  
  Line 275 "To our delight", please rephrase.
  
  Corrected.
  
  Reviewer #3 (Recommendations For The Authors):
  
  - In Figure 1, the number of cells used to assess intersomatic distance is quite low. A larger number of neuron pairs should be analyzed to be more representative. Or at least an explanation of why such a low sampling can be conclusive.
  
  We appreciate the reviewer’s concerns on sample sizes of the first set of experiments, where the anatomical pathways were measured through the synapses of coupled cells with electrophysiological recordings. We acknowledge that this is a limitation of our study. However, in this series of experiments, we simply wanted to experimentally confirm already known results which consisted of two parts: first, that in humans the dendrites and axons of neurons are longer, and second, that they have the same time delay in terms of synaptic latency.
  
  The reported similarity in synaptic latencies is consistent with the results of a recent study by Campagnola et al. (2022) showing that EPSP latencies of local connections between layer 2/3 pyramidal cells are in the same range in humans and mice (human median latency = 1.73 ms vs. mouse median latency = 1.49 ms). We came to the same conclusion in our previous work where we compared pyramidal basket cell synaptically coupled pairs in human and rat pairs (Molnár et al. 2016).
  
  On the other hand, we report interspecific differences in cable pathways from soma to soma, again consistent with the literature suggesting that the length of pyramidal neural processes is longer in humans than in rodents (see Supplementary Figure 1 and e.g. Berg et al. 2021).
  
  From a practical point of view the collection of experimental data in this hard won experiment is particularly difficult. The electrophysiological recording of a connected pair with an appropriate pre- and postsynaptic series resistance, where human tissue samples are limited, is the first step here. To obtain information about the path of the signals between pre- and postsynaptic cells, an anatomical reconstruction is required. This requires a) a high-quality recovery of postsynaptic dendrites and presynaptic axons, b) successful tracing of all potential contact points between presynaptic axons and postsynaptic dendrites back to the pre- and postsynaptic soma. The difficulty of the latter point in particular arises from the fact that parts of the presynaptic axonal arbor are myelinated and the success of biocytin-based tracing depends on the length of the myelinated axon branches. The success/failure of complete axonal tracing only becomes apparent at the end of these efforts.
  
  - The author should provide an intuitive explanation of why capacitive load accelerates propagation in the dendrite.
  
  See answer above
  
  - The author should more clearly rank the contribution of each difference between rat and human neurons. The 10% increase in dendritic diameter which affects velocity only via a square root seems a very weak contribution. This should be clarified.
  
  We now added a set of statistical methods to perform such a ranking in the theoretical part of this study, as described above (and in a new paragraph, attached above) in the revised article.
  
  References
  
  Eyal, G., Mansvelder, H. D., de Kock, C. P. J., & Segev, I. (2014). Dendrites impact the encoding capabilities of the axon. Journal of Neuroscience, 34(24), 8063–8071. https://doi.org/10.1523/JNEUROSCI.5431-13.2014
  
  Friedman, J. H. (2002). Stochastic gradient boosting. In Computational Statistics & Data Analysis (Vol. 38). www.elsevier.com/locate/csda
  
  Kiebel, S. J., & Holmes, A. P. (2007). The General Linear Model. In K. Friston, J. Ashburner, S. Kiebel, T. Nichols, & P. William (Eds.), Statistical Parametric Mapping (pp. 101–125). Academic Press.
  
  Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.09.30.510270v3
www.biorxiv.org www.biorxiv.org

Assemblies, synapse clustering and network topology interact with plasticity to explain structure-function relationships of the cortical connectome

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  General response
  
  Our modeling study integrates recent experimental advances on dendritic physiology, biophysical plasticity rules, and network connectivity motifs into a single model, aiming to clarify their hypothesized inseparable functional roles in neocortical learning. By modelling excitatory plasticity in multi-synaptic connections on dendrites within a network with biologically constrained higher-order structure, we show these aspects are sufficient to account for a wide range of interesting phenomena: First, the calcium-based plasticity rule acted sparsely and specifically, keeping the network stable without requiring homeostatic mechanisms or inhibitory plasticity, as usually employed for models based on STDP rules. Most importantly, simulations of the network initiated in a recurrent-excitation induced synchronous state transitioned to an in vivo-like asynchronous state, and remained there. Second, plastic changes were stimulus-dependent and could be predicted by neurons’ membership in functional assemblies, spatial clustering of synapses on dendrites, and the topology of the network’s connectivity. Several of our predictions could be confirmed by comparison to the MICrONS dataset.
  
  Our study thus aims to provide a first broad exploration of these phenomena and their interactions in a model, as well as a foundation for future studies that examine specific aspects more deeply. Specific concerns of the reviewers about parameter choices (reviewer 2’s 2nd point - 2.2), claims about stability (2.1 and 3.1), the STDP control (1.5), and the motivation behind network metrics (1.8, 2.3) are addressed in detail below and in the revised manuscript.
  
  Reviewer #1 (Public review):
  
  This paper investigates the dynamics of excitatory synaptic weights under a calcium-based plasticity rule, in long (up to 10 minutes) simulations of a 211,000-neuron biophysically detailed model of a rat cortical network.
  
  Strengths
  
  (1) A very detailed network model, with a large number of neurons, connections, synapses, etc., and with a huge number of biological considerations implemented in the model.
  
  (2) A carefully developed calcium-based plasticity rule, which operates with biologically relevant variables like calcium concentration and NMDA conductances.
  
  (3) The study itself is detailed and thorough, covering many aspects of the cellular and network anatomy and properties and investigating their relationships to plasticity.
  
  (4) The model remains stable over long periods of simulations, with the plasticity rule maintaining reasonable synaptic weights and not pushing the network to extremes.
  
  (5) The variety of insights the authors derive in terms of relationships between the cellular and network properties and dynamics of the synaptic weights are potentially interesting for the field.
  
  (6) Sharing the model and the associated methods and tools is a big plus.
  
  We thank the reviewer for their comments.
  
  Weaknesses
  
  (1) Conceptually, there seems to be a missed opportunity here in that it is not clear what the network learns to do. The authors present 10 different input patterns, the network does some plasticity, which is then analyzed, but we do not know whether the learning resulted in anything functionally significant. Did the network learn to discriminate the patterns much better than at the beginning, to capture or anticipate the timing of pattern presentation, detect similarities between patterns, etc.? This is important to understand if one wants to assess the significance of synaptic changes due to plasticity. For example, if the network did not learn much new functionally, relative to its initial state, then the observed plasticity could be considered minor and possibly insufficient. In that case, were the network to learn something substantial, one would potentially observe much more extensive plasticity, and the results of the whole study could change, possibly including the stability of the network. While this could be a whole separate study, this issue is of central importance, and it is hard to judge the value of the results when we do not know what the network learned to do, if anything.
  
  (1.1) The reviewer raises a very interesting point of discussion. As they remarked, it is very hard to judge what the network learned to do. However, our model was not designed to solve a specific task and even defining precisely what "learning" entails in a primary sensory region is still an open question. As many before us, we hypothesized that one of the roles of the primary somatosensory cortex would be to represent stimuli features and that most of the learning process would happen in an unsupervised manner. This is indeed what we have demonstrated by showing the stimulus-specificity of changes as well as an increase of reliability of assembly sequences between repetitions after plasticity. We have added this to the Discussion in lines 523-525.
  
  (2) In this study, plasticity occurs only at E-to-E connections but not at others. However, it is well known that inhibitory connections in the cortex exhibit at the very least a substantial short-term plasticity. One would expect that not including these phenomena would have substantial consequences on the results.
  
  (1.2) This is indeed well known. Please consider that we do have short-term plasticity (called synapse dynamics in the manuscript) at all connections, including inhibitory ones. We thank the reviewer for pointing out this potential confusion in the wording. We have now clarified this in the Methods in lines: 691-697. Furthermore, we have listed not having long-term plasticity at inhibitory connections in the limitations part of the Discussion in line: 593.
  
  (3) Lines 134-135: "We calibrated layer-wise spontaneous firing rates and evoked activity to brief VPM inputs matching in vivo data from Reyes-Puerta et al. (2015)."
  
  (4) Can the authors show these results? It is an important comparison, and so it would be great to see firing rates (ideally, their distributions) for all the cell types and layers vs. experimental data, for the evoked and spontaneous conditions.
  
  (1.3) The layer- and cell type specific spontaneous firing rates were indeed hidden in the Methods and on Supplementary Figure S3. We now reference that figure in the Results in line: 136. Furthermore, we have amended Supplementary Figure S3 (panel A2), to show these rates in the evoked state as well.
  
  (5) That being said, the Reyes-Puerta et al. paper reports firing rates for the barrel cortex, doesn't it? Whereas here, the authors are simulating a non-barrel cortex. Is such a comparison appropriate?
  
  (1.4) As correctly pointed out by the reviewer, we made the assumption that these rates would generalize to the whole S1 because of the sparsity of experimental data. This assumption is discussed in length in Isbister et al. (2023) and now in the limitations part of the Discussion in lines: 564-568.
  
  (6) Comparison with STDP on pages 5-7 and Figure 2: if I got this right, the authors applied STDP to already generated spikes, that is, did not run a simulation with STDP. That seems strange. The spikes they use here were generated by the system utilizing their calcium-based plasticity rule. Obviously, the spikes would be different if STDP was utilized instead. The traces of synaptic weights would then also be different. The comparison therefore is not quite appropriate, is it?
  
  (1.5) Yes, the reviewer's understanding is correct. However, considering the findings of Morrison et al. 2007 [PMID: 17444756], and Zenke et al. 2017 [PMID: 28431369] (cited in the manuscript in lines: 165-166), running STDP in a closed loop simulation would most likely make the network “blow up” because of the positive feedback loop. Thus, we argue that our comparison is more conservative, since by using pre-generated spikes, we opened the loop and avoided positive feedback. This is now further explained in lines: 166-167.
  
  (7) Section 2.3 and Figure 5: I am not sure this analysis adds much. The main finding is that plasticity occurs more among cells in assemblies than among all cells. But isn't that expected given what was shown in the previous figures? Specifically, the authors showed that for cells that fire more, plasticity is more prominent. Obviously, cells that fire little or not at all won't belong to any assemblies. Therefore, we expect more plasticity in assemblies.
  
  (1.6) We thank the reviewer for this comment. We added additional panels (G1 and G2) to Figure 5 (and describe their content in lines: 329-337) showing that this is not the case. Firing-rate alone is indeed predictive of plastic changes, but co-firing in assemblies is even more so.
  
  (8) Section 2.4 and Figure 6: It is not clear that the results truly support the formulation of the section's title ("Synapse clustering contributes to the emergence of cell assemblies, and facilitates plasticity across them") and some of the text in the section. What I can see is that the effect on rho is strong for non-clustered synapses (Figure 6C and Figure S8A). In some cases, it is substantially higher than what is seen for clustered synapses. Furthermore, the wording "synapse clustering contributes to the emergence of cell assemblies" suggests some kind of causal role of clustered synapses in determining which neurons form specific cell assemblies. I do not see how the data presented supports that. Overall, it appears that the story about clustered synapses is quite complicated, with both clustered and non-clustered synapses driving changes in rho across the board.
  
  (1.7) We agree with the reviewer, it is “quite complicated” and we also see that the writing could have been better/more precise and supported by the data shown on the Figure. We updated both the section title and a big chunk of the text to take the suggestions into account in lines: 361-373.
  
  (9) Section 2.5 and Figure 7: Can we be certain that it is the edge participation that is a particularly good predictor of synaptic changes and/or strength, as opposed to something simpler? For example, could it be the overall number of synapses, excitatory synapses, or something along these lines, that the source and/or target neurons receive, that determine the rho dynamics? And then, I do not understand the claim that edge participation allows one to "delineate potentiation from depression". The only related data I can find is in Figure 7A3, about which the authors write "this effect was stronger for potentiation than depression". But I don't see what they mean. For both depression and facilitation, the changes observed are in the range of ~12% of probability values. And even if the effect is stronger, does it mean one can "delineate" potentiation from depression better? What does it mean, to "delineate"? If it is some kind of decoding based on the edge participation, then the authors did not show that.
  
  (1.8) We thank the reviewer for this comment. We have included an analysis of the predictive power of indegree of the pre and postsynaptic neuron of a connection on the rho dynamics in Figure 7 (panel B). Please consider, that the rho dynamics are described on the level of connections, while properties like indegree are on the level of nodes. Any procedure transferring a node based property to an edge based property involves choices e.g., should the values be added, multiplied, should one be preferential over the other, or should they be considered independently? As edge-based metrics avoid these arbitrary choices, we would argue that they are - ultimately - the simpler and more natural choice in this context.
  
  Though we believe that the metric of edge participation is simple, we recognize it is perhaps not common. Thus, we have switched to using a version of it that is perhaps more intuitive for the community at large i.e., as a metric of common innervation. Moreover, we have changed the name “(k+2) edge participation” to “(k)-edge indegree”, to make it even more accessible. For k=0, this is the number of neurons that commonly innervate the connection, i.e., a common neighbour. And for k=1, this is the number of connections that commonly innervate the connection. This is equivalent to edge participation from the next to last to the last neuron in a simplex. Furthermore, in lines: 391-418 we have added additional text and references explaining the intuition of why we think this metric is relevant, as it has been shown to affect correlated activity of pairs of neurons, as well as assembly formation.
  
  Furthermore, we have clarified the language referring to potentiation and depression in lines: 420-422 and 448.
  
  (10) "test novel predictions in the MICrONS (2021) dataset, which while pushing the boundaries of big data neuroscience, was so far only analyzed with single cells in focus instead of the network as a whole (Ding et al., 2023; Wang et al., 2023)." That is incorrect. For example, the whole work of Ding et al. analyzes connectivity and its relation to the neuron's functional properties at the network level.
  
  (1.9) We thank the reviewer for pointing this out. Indeed, the sentence was improperly worded. We have appropriately changed this phrasing in lines: 616-618.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper aims to understand the effects of plasticity in shaping the dynamics and structure of cortical circuits, as well as how that depends on aspects such as network structure and dendritic processing.
  
  Strengths:
  
  The level of biological detail included is impressive, and the numerical simulations appear to be well executed. Additionally, they have done a commendable job in open-sourcing the model.
  
  We thank the reviewer for their comments.
  
  Weaknesses:
  
  The main result of this work is that activity in their network model remains stable without the need for a homeostatic mechanism. However, as the authors acknowledge, this has been demonstrated in previous studies (e.g., Higgins et al. 2014). In those studies, stability was attributed to calcium-based rules combined with calcium concentrations at in vivo levels and background neuronal activity. Since the authors use the same calcium-based rule, it is unclear what new result, if any, is being presented. If the authors are suggesting that the mechanism in their simulations differs, that should be stated clearly, and evidence supporting that claim should be provided.
  
  (2.1) We do not see this as the main result of our study, but rather a critical validation step, since our calcium rule, while similar to previous ones, is not exactly the same (see equations (1) and especially (2) in Methods). This has been clarified in the text in lines: 150-151. Note in particular, that one of the main differences is the stochastic synaptic transmission and the role of calcium concentration on the release probability. Furthermore, our model involves multicompartmental neurons instead of point neuron models, which to our knowledge was never tested before with calcium-based plasticity rules at the network level. Moreover, determining the time required for stability to be reached is a necessary step to set up the simulation parameters to test the main hypotheses about rules governing the plastic changes.
  
  The other findings discussed in the paper are related to a characterization of the dependency of plastic changes on network structure. While this analysis is potentially interesting, it has the following limitations.
  
  First, I believe the authors should include an analysis of the generality and specificity of their results. All the findings seem to be derived from a single run of the simulation. How do the results vary with different network initializations, simulation times, or parameter choices?
  
  (2.2) All simulations were run with 3 different random seeds (mentioned in the Methods) and now shown in Supplementary Figure S8 for some selected analyses. The maximum duration of our simulations were limited by our hardware constraints. However, from the long (10 minutes) simulation we concluded that most changes happen within the first minute. This is how we determined 2 minutes as the simulation time for all other experiments. Parameters determining both the spontaneous and evoked network state are discussed in length in Isbister et al. (2023) and while we acknowledge that they are only shown in Supplementary Figure S3, we did not want to lengthen the manuscript with redundant details but rather refer to reader to the manuscript where this is discussed at large.
  
  Crucially, we tried slightly different parameters of the plasticity model in the early phases of the research, and while they changed the exact numerical values of our results, the main trends (i.e., stabilization time, assemblies, synapse clustering, and network topology influencing plastic changes) remained unchanged. This is now shown in Supplementary Figure S13 and referenced in the Discussion in lines: 572-575.
  
  Second, the presentation of the results is difficult to follow. The characterization comes across as a long list of experiments, making it hard to identify a central message or distinguish key findings from minor details. The authors provide little intuition about why certain outcomes arise, and the complexity of the simulation makes it challenging - if not impossible - to determine which model elements are essential for specific results and which mechanisms drive emergent properties. Additionally, the text often lacks crucial details. For instance, the description of k-edge participation should be expanded, and an explanation of what this method quantifies should be included. Overall, I believe the authors should focus on a smaller set of significant results and provide a more in-depth discussion.
  
  (2.3) We acknowledge the complexity of these large-scale simulations and the interpretation of their results. We appreciate the reviewer's feedback on the areas that needed more detail. To address this, we have extended the Results section describing k-edge indegree with more background and intuition in lines: 391-418. See also our reply to reviewer 1 (1.8) above.
  
  While the manuscript may appear to be "a long list of experiments," it is actually guided by the following logic: We choose a calcium-based rule because it was the natural choice in a multicompartmental model which already included calcium dynamics and NMDA receptors. After setting up the main network state, verifying stability (Figure 2), doing traditional basic analysis (Figure 3), and verifying that the changes are non-random (Figure 4); we elaborated on long-standing ideas about co-firing in cell assemblies (Figure 5) and spatial clustering of synapse on dendrites (Figure 6) interacting with plasticity. Finally as we had access to the network’s non-random connectivity we tried to link the network's topology to the observed plastic changes. This was done with a higher order perspective, given that there was previous evidence for the relevance of these structures on cofiring and correlated activity.
  
  While we understand the frustration, we would highlight that the study is the first of its kind at this scale and level of biological detail. Our goal was to offer a broad exploration of the factors influencing plasticity and their interactions at this scale. Thus, laying the groundwork for future studies to investigate specific aspects more deeply.
  
  The comparison of the model with the MICrONS dataset could be improved. In Figure 7B, the authors should show how the same quantification looks in a network model without plasticity. In Figure 8B, the data aligns with the model before plasticity, so it's unclear how this serves as a verification of the theoretical predictions.
  
  (2.4) Our only claim is that by being used to working with both functional and structural data we were able to develop a metric (k-edge indegree) that could be utilized to study the non-random, high-order topology of the MICrONS connectivity as well. On Figure 8, spike correlations in MICrONS more or less align with both cases (before vs. after plasticity); the only difference is that spike correlations looked different enough in the model so we thought they are worth showing for both cases. Moreover, as the changes are sparse (Figure 2 and 3) the synapse strength panel of Figure 7(D) looks almost exactly the same before plasticity (see first two panels of Author response image 1). In line with our results, the small and significant changes increase as k-edge indegree increases (last panel of Author response image 1). As the first two panels look almost the same and the third one is shown in a slightly different way (Figure 7C2) we would prefer not to include this in the manuscript, but only in our response.
  
  Author response image 1.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Ecker et al. utilized a biologically realistic, large-scale cortical model of the rat's non-barrel somatosensory cortex, incorporating a calcium-dependent plasticity rule to examine how various factors influence synaptic plasticity under in vivo-like conditions. Their analysis characterized the resulting plastic changes and revealed that key factors, including the co-firing of stimulus-evoked neuronal ensembles, the spatial organization of synaptic clusters, and the overall network topology, play an important role in affecting the extent of synaptic plasticity.
  
  Strengths:
  
  The detailed, large-scale model employed in this study enables the evaluation of diverse factors across various levels that influence the extent of plastic changes. Specifically, it facilitates the assessment of synaptic organization at the subcellular level, network topology at the macroscopic level, and the co-activation of neuronal ensembles at the activity level. Moreover, modeling plasticity under in vivo-like conditions enhances the model's relevance to experiments.
  
  We thank the reviewer for their comments.
  
  Weaknesses:
  
  (1) The authors claimed that, under in vivo-like conditions and in the presence of plasticity, firing rates and weight distributions remain stable without additional homeostatic mechanisms during a 10-minute stimulation period. However, the weights do not reach the steady state immediately after the 10-minute stimulation. Therefore, extended simulations are necessary to substantiate the claim.
  
  (3.1) We thank the reviewer for this comment, as it gave us the opportunity to clarify in the text our stabilization criteria. Indeed, the dynamical system of weight changes has not reached a zero-change steady state because the changes, while small, are non-zero. However, in a stochastic system with ongoing activity (stimulus- or noise-driven), non-zero changes are expected. Thus, we consider the system to be at steady state when changes become negligible relative to a null model given by a random walk. Our results show that this condition is met around the 2-minute mark, with negligible changes in the subsequent 8 minutes.
  
  Moreover, for spontaneous activity, we showed that an unstable network exhibiting synchronous activity can be stabilized into an asynchronous regime by the calcium-based plasticity rule within 10 minutes. These results show that the system reaches a stochastic steady state within 10 minutes without requiring homeostatic mechanisms. Our work reveals that incorporating more biological detail (i.e. calcium-based plasticity), reduces the need for additional mechanisms to stabilize network activity (e.g. fast homeostatic mechanisms).
  
  Interestingly, one might argue that after 10 minutes of stimulation the network might transition to a different weight configuration if the stimuli change or cease. We agree this is an intriguing question, which we added to the Discussion in lines 611-613. However, this scenario concerns continuous learning, not the system’s steady-state dynamics.
  
  (2) Another major limitation of the paper lies in its lack of mechanistic insights into the observed phenomena (particularly on aspects that are typically impossible to assess in traditional simplified models, like layer-specific and layer-to-layer pathways-specific plasticity changes), as well as the absence of discussions on the potential computational implications of the corresponding observed plastic changes.
  
  (3.2) Our study integrates recent experimental advances aiming to clarify their hypothesized inseparable functional roles in neocortical learning. In particular, we study three different kinds of mechanistic insight: co-firing in assemblies (Figure 5), synapse clustering on postsynaptic dendrites (Figure 6), and high-order network topology (Figure 7). Furthermore, layer specificity is shown (Figure 3A1, B1, B2, D1) and so is layer-to-layer specificity (Figure 4A2). In addition we also describe synapse clustering on postsynaptic dendrites (Figure 6) which is not available in simplified models either.
  
  As such, the mechanistic insights provided in our work are integrative in nature and aim to provide a first broad exploration of these phenomena and their interactions-which are rarely considered together in experimental or modelling studies. This foundation paves the way for future studies that examine specific aspects more deeply in this level of biological detail.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) I would suggest the authors explain more explicitly that their study uses plasticity for E-to-E connections and not others. Doing so in multiple places in the paper, but certainly in Methods and early in Results, would be helpful. This is stated in lines 117-119 ("To simulate long-term plasticity, we integrated our recently published calcium-based plasticity model that was used to describe functional long-term potentiation and depression between pairs of pyramidal cells"), but could be highlighted more.
  
  We have added it to several lines in the Methods: 621, 648, 649.
  
  (2) "Simulations were always repeated at least three times to assess the consistency of the results." This sounds important. How is this used for the analysis? Do the results reported combine the data from the 3 simulations? How did the authors check the "consistency of the results"? Did they run any statistical tests comparing the results between the 3 simulations or was it more of a visual check?
  
  The reported results come from a single simulation. Three simulations were run to check that no obvious qualitative differences could be found, such as a change of network regime, association between stimuli and assemblies. No statistical tests can be run with samples of size three. These are now shown in Supplementary Figure S8, and additional clarifying text has been added in Methods line: 722.
  
  (3) "We needed 12M core hours to run the simulation presented in this manuscript." The Methods section mentions ~2.4 M core hours for a 10-minute simulation, which may be confusing. It might be helpful to provide a table with all the simulations run for this study.
  
  We wanted to provide a rough estimate of the runtime, but did not run a deep profiling of all campaigns. The results depend on the actual hardware and configurations used (e.g., temporal resolution of synapse reporting). We understand the potential source of confusion and have clarified this in the Methods in lines 719-721 (and took it out from the Discussion).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) I found the paper somewhat challenging to follow, as there are many small points, making it unclear what the main message is. It sometimes feels like a list of 'we did this and found that.' It might be helpful if the authors focused on a smaller number of key results with more in-depth discussion. For instance, the discussion of network topology on page 9 is intriguing but condensed into a single, dense paragraph that is hard to follow. Clarifying how the random control is generated would also be beneficial.
  
  See our response to the public review’s third point (2.3).
  
  (2) Line 245: typo? "Furthermore, the maximal simplex dimension found in the subgraph was two higher than expected by chance.".
  
  We changed the grammar in line: 249.
  
  (3) Line 410: typo? "It has been previously shown before that assemblies have many edges".
  
  Noted and fixed in line: 463.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) The authors claimed that plasticity operates in a sparse and specific manner, with firing rates and weight distributions remaining stable without additional homeostatic mechanisms. However, as shown in Figure 2D inset, the weights do not reach their steady-state values immediately after the 10-minute stimulation. A similar issue is observed in Figure 2G. It would be necessary to show the claim is indeed true as the weights reach the steady states.
  
  See our response to the public review’s first point (3.1).
  
  (2) In the model, synapses undergo both short- and long-term plasticity, but the contribution of short-term plasticity to the stated claim is unclear. It would be helpful to demonstrate how the results of Figure 2 are affected when short-term plasticity is excluded.
  
  STP is needed to achieve the asynchronous in vivo-like firing state in our model (and is intimately linked to the fitting procedure of the plasticity rules - mean-field approximation is not possible due to the important role of synaptic failures in thresholded plasticity outcomes), thus it cannot be excluded. We have added this to the Methods in lines: 691-697.
  
  (3) It would be helpful to include a supplementary plot, similar to Figure 2F, illustrating the corresponding results for STDP.
  
  This is not possible as we did not run a different simulation with STDP, only evaluated the changes in connections with an STDP model using spikes from our simulation. We did not incorporate the STDP equations into our detailed network, as there is no canonical or unambiguous way for doing so (e.g., one would need to handle the fact the connections are multi-synaptic). Note however, that considering the findings of Morrison et al. 2007 [PMID: 17444756], and Zenke et al. 2017 [PMID: 28431369] (cited in the manuscript in lines: 165-166), running STDP in a closed loop simulation would most likely make the network “blow up” because of the positive feedback loop.
  
  (4) It would be helpful to provide mechanistic insights into the current observations and to discuss the potential computational implications of the observed plastic changes. Particularly on aspects that are typically impossible to examine in traditional models, like layer-specific plastic changes presented in Fig. 3A1, B1, B2, D1, and layer-to-layer pathways-specific plastic changes illustrated in Figure 4A2.
  
  See our response to the public review’s second point (3.2).
  
  (5) The use of the term 'assembly' in most places of the manuscript may cause confusion. To enhance clarity and foster effective discussions in the field, I would recommend replacing it with 'ensemble,' as suggested in Miehl et al. (2023), 'Formation and computational implications of assemblies in neural circuits' (The Journal of Physiology, 601(15), 3071-3090), which should also be cited.
  
  We read the mentioned manuscript when it was published (and appreciated it a lot), now reference it, and explain why we did not exactly follow the suggestion in lines: 293-299.
  
  (6) The title of Figure 5 is not directly supported by the current figure. To strengthen the alignment, it would be helpful to present the results from lines 303-306 in bar plots and incorporate them into Figure 5 to better substantiate the figure title.
  
  While the mentioned lines compare maximum values to those within the whole dataset, we think those 2*12*12 values are better presented in condensed matrices than bar plots (while the maximum values are still easily grasped from the colorbars). We have added panel G2 to the figure to address a comment by reviewer 1 (1.7), we believe that this further supports the title of the Figure.
  
  (7) Line 326, cite "Kirchner, J. H., & Gjorgjieva, J. (2021). Emergence of local and global synaptic organization on cortical dendrites. Nature Communications, 12(1), 4005." and "Kirchner, J. H., & Gjorgjieva, J. (2022). Emergence of synaptic organization and computation in dendrites. Neuroforum, 28(1), 21-30."
  
  Although we were aware of the mentioned manuscripts, we did not include them originally because they are models of a different species. However, we have now cited these in line: 347.
  
  (8) The contrast results for ensembles 11 and 12 do not appear to support the claims made in lines 339-341. Clarification on this point would be helpful.
  
  The reviewer is right, we have updated lines: 360-361, to clarify the difference between the two late assemblies.
  
  (9) For Figure 6C and 6D in Section 2.4, rather than presenting the results for individual ensembles (which could be moved to the supplementary materials), it would be easier if the authors could summarize the results by grouping them into three categories: early, middle, and late ensembles.
  
  We agree with the reviewer’s suggestion and tried it before, but as the results slightly depend on functional assembly size as well (not only temporal order) averaging them loses information (see different xlims of the panels). Given that the issue is complex we decided to show all the data on the Figure, but we have revised the text now to provide a more high-level interpretation.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.07.552264v6
www.biorxiv.org www.biorxiv.org

MGPfactXMBD: A Model-Based Factorization Method for scRNA Data Unveils Bifurcating Transcriptional Modules Underlying Cell Fate Determination

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  Comment#1: Ren et al developed a novel computational method to investigate cell evolutionary trajectory for scRNA-seq samples. This method, MGPfact, estimates pseudotime and potential branches in the evolutionary path by explicitly modeling the bifurcations in a Gaussian process. They benchmarked this method using synthetic as well as real-world samples and showed superior performance for some of the tasks in cell trajectory analysis. They further demonstrated the utilities of MGPfact using single-cell RNA-seq samples derived from microglia or T cells and showed that it can accurately identify the differentiation timepoint and uncover biologically relevant gene signatures. Overall I think this is a useful new tool that could deliver novel insights for the large body of scRNA-seq data generated in the public domain. The manuscript is written in a logical way and most parts of the method are well described.
  
  Thank you for reviewing our manuscript and for your positive feedback on MGPfact. We are pleased that you find it useful for identifying differentiation timepoints and uncovering gene signatures. We will continue to refine MGPfact and explore its applications across diverse datasets. Your insights are invaluable, and we appreciate your support.
  
  Comment#2: Some parts of the methods are not clear. It should be outlined in detail how pseudo time T is updated in Methods. It is currently unclear either in the description or Algorithm 1.
  
  Thanks to the reviewers' comments. We've added a description of how pseudotime T is obtained between lines 138 and 147 in the article. In brief, the pseudotime of MGPfact is inferred through Gaussian process regression on the downsampled single-cell transcriptomic data. Specifically, T is treated as a continuous variable representing the progression of cells through the differentiation process. We describe the relationship between pseudotime and expression data using the formula:
  
  Where f(T) is a Gaussian Process (GP) with covariance matrix S, and Ɛ represents the error term. The Gaussian process is defined as:
  
  Where is the variance set to 1e-6.
  
  During inference, we update the pseudotime by maximizing the posterior likelihood. Specifically, the posterior distribution of pseudotime T can be represented as:
  
  Where is the likelihood function of the observed data Y*, and is the prior distribution of the Gaussian process. This posterior distribution integrates the observed data with model priors, enabling inference of pseudotime and trajectory simultaneously. Due to the high autocorrelation of in the posterior distribution, we use Adaptive Metropolis within Gibbs (AMWG) sampling (Roberts and Rosenthal, 2009; Tierney, 1994). Other parameters are estimated using the more efficient SLICE sampling technique (Neal, 2003).
  
  Comment#3: There should be a brief description in the main text of how synthetic data were generated, under what hypothesis, and specifically how bifurcation is embedded in the simulation.
  
  Thank you for the reviewers' comments. We have added descriptions regarding the synthetic dataset in the methods section. The revised content is from line 487 to 493:
  
  “The synthetic datasets were generated using four simulators: dyngen (Saelens et al., 2019), dyntoy (Saelens et al., 2019), PROSSTT (Papadopoulos et al., 2019), and Splatter (Zappia et al., 2017), each modeling different trajectory topologies such as linear, branching, and cyclic. Splatter simulates branching events by setting expression states and transition probabilities, dyntoy generates random expression gradients to reflect dynamic changes, and dyngen focuses on complex branching structures within gene regulatory networks.”
  
  Comment#4: Please explain what the abbreviations mean at their first occurrence.
  
  We appreciate the reviewers' feedback. We have thoroughly reviewed the entire manuscript and made sure that all abbreviations have had their full forms provided upon their first occurrence.
  
  Comment#5: In the benchmark analysis (Figures 2/3), it would be helpful to include a few trajectory plots of the real-world data to visualize the results and to evaluate the accuracy.
  
  We appreciate the reviewer's feedback. To more clearly demonstrate the performance of MGPfact, we selected three representative cases from the dataset for visual comparison. These cases represent different types of trajectory structures: linear, bifurcation, and multifurcation. The revised content is between line 220 and 226.
  
  As shown in Supplementary Fig. 5, it is evident that MGPfact excels in capturing main developmental paths and identifying key bifurcation points. In the linear trajectory structure, MGPfact accurately predicted the linear structure without bifurcation events, showing high consistency with the ground truth (overall\=0.871). In the bifurcation trajectory structure, MGPfact accurately captured the main bifurcation event (overall\=0.636). In the multifurcation trajectory structure, although MGPfact predicted only one bifurcation point, its overall structure remains close to the ground truth, as evidenced by its high overall score (overall\=0.566). Overall, MGPfact demonstrates adaptability and accuracy in reconstructing various types of trajectory structures.
  
  Comment#6: It is not clear how this method selects important genes/features at bifurcation. This should be elaborated on in the main text.
  
  Thanks to the reviewers' comments. To enhance understanding, we've added detailed descriptions of gene selection in the main text and appendix, specifically from lines 150 to 161. In brief, MGPfact employs a Gaussian process mixture model to infer cell fate trajectories and identify independent branching events. We calculate load matrices using formulas 1 and 14 to assess each gene's contribution to the trajectories. Genes with an absolute weight greater than 0.05 are considered predominant in specific branching processes. Subsequently, SCENIC (Aibar et al., 2017; Bravo González-Blas et al., 2023) analysis was conducted to further infer the underlying regulons and annotate the biological processes of these genes.
  
  Comment#7: It is not clear how survival analysis was performed in Figure 5. Specifically, were critical confounders, such as age, clinical stage, and tumor purity controlled?
  
  To evaluate the predictive and prognostic impacts of the selected genes, we utilized the Cox multivariate regression model, where the effects of relevant covariates, including age, clinical stage, and tumor purity, were adjusted. We then conducted the Kaplan-Meier survival analysis again to ensure the reliability of the results. The revisions mainly include the following sections:
  
  (1) We modified the description of adjusting for confounding factors in the survival analysis, from line 637 to 640:
  
  “To adjust for possible confounding effects, the relevant clinical features including age, sex and tumor stage were used as covariates. The Cox regression model was implemented using R-4.2 package “survival”. And we generated Kaplan-Meier survival curves based on different classifiers to illustrate differences in survival time and report the statistical significance based on Log-rank test.”
  
  (2) We updated the images in the main text regarding the survival analysis, including Fig. 5a-b, Fig. 6c, and Supplementary Fig. 8e.
  
  Comment#8: I recommend that the authors perform some sort of 'robustness' analysis for the consensus tree built from the bifurcation Gaussian process. For example, subsample 80% of the cells to see if the bifurcations are similar between each bootstrap.
  
  We appreciate the reviewers' feedback. We performed a robustness analysis of the consensus tree using 100 training datasets. This involved sampling the original data at different proportions, and then calculating the topological similarity between the consensus trajectory predictions of MGPfact and those without sampling, using the Hamming-Ipsen-Mikhailov (HIM ) metric. A higher score indicates greater robustness. The relevant figure is in Supplementary Fig. 4, and the description is in the main text from line 177 to 182.
  
  The results indicate that the consensus trajectory predictions based on various sampling proportions of the original data maintain a high topological similarity with the unsampled results (HIM<sub>mean</sub>=0.686). This demonstrates MGPfact’s robustness and generalizability under different data conditions, hence the capability of capturing bifurcative processes in the cells’ trajectory.
  
  Reviewer #2:
  
  Comment#1: The authors present MGPfact<sup>XMBD</sup>, a novel model-based manifold-learning framework designed to address the challenges of interpreting complex cellular state spaces from single-cell RNA sequences. To overcome current limitations, MGPfact<sup>XMBD</sup> factorizes complex development trajectories into independent bifurcation processes of gene sets, enabling trajectory inference based on relevant features. As a result, it is expected that the method provides a deeper understanding of the biological processes underlying cellular trajectories and their potential determinants. MGPfact<sup>XMBD</sup> was tested across 239 datasets, and the method demonstrated similar to slightly superior performance in key quality-control metrics to state-of-the-art methods. When applied to case studies, MGPfact<sup>XMBD</sup> successfully identified critical pathways and cell types in microglia development, validating experimentally identified regulons and markers. Additionally, it uncovered evolutionary trajectories of tumor-associated CD8+ T cells, revealing new subtypes with gene expression signatures that predict responses to immune checkpoint inhibitors in independent cohorts. Overall, MGPfact<sup>XMBD</sup> represents a relevant tool in manifold learning for scRNA-seq data, enabling feature selection for specific biological processes and enhancing our understanding of the biological determinants of cell fate.
  
  Thank you for your thoughtful review of our manuscript. We are thrilled to hear that you find MGPfact<sup>XMBD</sup> beneficial for exploring cellular evolutionary paths in scRNA-seq data. Your insights are invaluable, and we look forward to incorporating them to further enrich our study. Thank you once again for your support and constructive feedback.
  
  Comment#2: How the methods compare with existing Deep Learning based approaches such as TIGON is a question mark. If a comparison would be possible, it should be conducted; if not, it should be clarified why.
  
  We appreciate the reviewer's comments. We have added a comparison with the sctour (Li, 2023) and TIGON methods (Sha, 2024).
  
  It is important to note that the encapsulation and comparison of MGPfact are based on traditional differentiation trajectory construction. Saelens et al. established a systematic evaluation framework that categorizes differentiation trajectory structures into topological subtypes such as linear, bifurcation, multifurcation, graph, and tree, focusing on identifying branching structures in the cell differentiation process (Saelens et al., 2019). The sctour and TIGON methods mentioned by the reviewer are primarily used for estimating RNA velocity, focusing on continuous temporal evolution rather than explicit branching structures, and do not explicitly model branches. Therefore, we considered the predictions of these two methods as linear trajectories and compared them with MGPfact. While scTour explicitly estimates pseudotime, TIGON uses the concept of "growth," which is analogous to pseudotime, so we made the necessary adaptations.
  
  Author response image 1 show that within this framework, compared to scTour (overall<sub>mean</sub>=0.448) and TIGON (overall<sub>mean</sub>=0.263), MGPfact still maintains a relatively high standard (overall<sub>mean</sub>=0.534). This indicates that MGPfact has a significant advantage in accurately capturing branching structures in cell differentiation, especially in applications where explicit modeling of branches is required.
  
  Author response image 1.
  
  Comparison of MGPfact with scTour and TIGON in trajectory inference performance across 239 test datasets. a. Overall scores; b.F1<sub>branches</sub>; c.HIM; d. cor<sub>dist</sub>; e. wcor<sub>features</sub>. All results are color-coded based on the trajectory types, with the black line representing the mean value. The “Overall” assessment is calculated as the geometric mean of all four metrics.
  
  Comment#3: Missing Methods:
  
  - The paper lacks a discussion of Deep Learning approaches for bifurcation analysis. e.g. scTour, Tigon.
  
  - I am missing comments on methods such CellRank, and alternative approaches to delineate a trajectory.
  
  We thank the reviewer for these comments.
  
  (1) As mentioned in response to Comments#2, the scTour and TIGON methods are primarily used for estimating RNA velocity, focusing on continuous temporal evolution rather than explicit branching structures, and they do not explicitly model branches. We consider the predictions of these two methods as linear trajectories and compare them with MGPfact. The relevant description and discussion have been addressed in the response.
  
  (2) We have added a description of RNA velocity estimation methods (scTour, TIGON, CellRank) in the introduction section. The revised content is from line 66 to 71:
  
  “Moreover, recent studies based on RNA velocity has provided insights into cell state transitions. These methods measure RNA synthesis and degradation rates based on the abundance of spliced and unspliced mRNA, such as CellRank (Lange et al., 2022). Nevertheless, current RNA velocity analyses are still unable to resolve cell-fates with complex branching trajectory. Deep learning methods such as scTour (Li, 2023) and TIGON (Sha, 2024) circumvent some of these limitations, offering continuous state assumptions or requiring prior cell sampling information.”
  
  Comment#4: Impact of MURP:
  
  The rationale for using MURP is well-founded, especially for trajectory definition. However, its impact on the final results needs evaluation.
  
  How does the algorithm compare with a random subselection of cells or the entire cell set?
  
  Thank you for the comments. We fully agree that MURP is crucial in trajectory prediction. As a downsampling method, MURP is specifically designed to address noise issues in single-cell data by dividing the data into several subsets, thereby maximizing noise reduction while preserving the main structure of biological variation (Ren et al., 2022). In MGPfact, MURP typically reduces the data to fewer than 100 downsampled points, preserving the core biological structure while lowering computational complexity. To assess MURP's impact, we conducted experiments by randomly selecting 20, 40, 60, 80, and 100 cells for trajectory inference. These results were mapped back to the original data using the KNN graph structure for final predictions, which were then compared with the MURP downsampling results. Supplementary results can be found in Supplementary Fig. 3, with additional descriptions in the main text from line 170 to 176.
  
  The results indicate that trajectory inference using randomly sampled cells has significantly lower prediction accuracy compared to that using MURP. This is particularly evident in branch assignment (F1<sub>branches</sub>) and correlation cor<sub>dist</sub>, where the average levels decrease by 20.5%-64.9%. In contrast, trajectory predictions using MURP for downsampling show an overall score improvement of 5.31%-185%, further highlighting MURP's role in enhancing trajectory inference within MGPfact.
  
  Comment#5: What is the impact of the number of components selected?
  
  Thank you for the comments. In essence, MGPfact consists of two main steps: 1) trajectory inference; 2) calculation of factorized scores and identification of high-weight genes. After step 1, MGPfact estimates parameters such as pseudotime T and bifurcation points B. In step 2, we introduce a rotation matrix to obtain factor scores W<sub>l</sub> for each trajectory l by rotating Y*.
  
  For all trajectories,
  
  where e<sub>l</sub> is the error term for the -th trajectory. The number of features in Y* must match the dimensions of the rotation matrix R to ensure the factorized score matrix W contains factor scores for trajectories, achieving effective feature representation and interpretation in the model.
  
  Additionally, to further illustrate the impact of the number of principal components (PCs) on model performance in step 1, we conducted additional experiments. We used 3 PCs as the default and adjusted the number to evaluate changes from this baseline. As shown in Author response image 2, setting the number of PCs to 1 significantly decreases the overall performance score (overall<sub>mean</sub>=0.363), as well as the wcor<sub>features</sub> and wcor<sub>dist</sub> metrics. In contrast, increasing the number of PCs does not significantly affect the metrics. It ought to be mentioned that number of components used should be determined by the intrinsic biological characteristics of the cell fate-determination. Our experiment based on a limited number of datasets may not represent more complex scenarios in other cell types.
  
  Author response image 2.
  
  Robustness testing of the number of MURP PCA components on 100 training datasets. With the number of principal components (PCs) set to 3 by default; we tested the impact of different number of components (1-10) on the prediction results. In all box plots, the asterisk represents the mean value, while the whiskers extend to the farthest data points within 1.5 times the interquartile range. Significance is denoted as follows: not annotated indicates non-significant; * P < 0.05; ** P < 0.01; *** P < 0.001; two-sided paired Student’s T-tests.
  
  Comment#6: Please comment on the selection of the kernel functions (rbf and polynomial) and explain why other options were discarded.
  
  Thank you for the comments. We have added a description regarding the selection of radial basis functions and polynomial kernels in lines 126-130. As the reviewers mentioned, the choice of kernel functions is crucial in the MGPfact analysis pipeline for constructing the covariance matrix of the Gaussian process. We selected the radial basis function (RBF) kernel and the polynomial kernel to balance capturing data complexity and computational efficiency. The RBF kernel is chosen for its ability to effectively model smooth functions and capture local variations in the data, making it well-suited to the continuous and smooth characteristics of biological processes; its hyperparameters offer modeling flexibility. The polynomial kernel is used to capture more complex nonlinear relationships between input features, with its hyperparameters also allowing further customization of the model. In contrast, other complex kernels, such as Matérn or spectral kernels, were omitted due to their interpretability challenges and the risk of overfitting with limited data. However, as suggested by the reviewers, we will consider and test the impact of other kernel functions on the covariance matrix of the Gaussian process and their role in trajectory inference in our subsequent phases of algorithm design.
  
  Comment#7: What is the impact of the Pseudotime method used initially? This section should be expanded with clear details on the techniques and parameters used in each analysis.
  
  We are sorry for the confusion. We've added a description of how pseudotime T is obtained between line 138 and 147 in the main text. And the specific hyperparameters involved in the model and their prior settings are detailed in the supplementary information.
  
  In brief, the pseudotime and related topological parameters of the bifurcative trajectories in MGPfact are inferred by Gaussian process regression from downsampled single-cell transcriptomic data (MURP). Specifically, T is treated as a continuous variable representing the progression of cells through the differentiation process. We describe the relationship between pseudotime and expression data as:
  
  where f(T) is a Gaussian Process (GP) with covariance matrix S, and ε represents the error term. The Gaussian process is defined as:
  
  where is the variance set to 1e-6. During inference, we update the pseudotime by maximizing the posterior liklihood. Specifically, the posterior distribution of pseudotime is obtained by combining the observed data Y* with the prior distribution of the Gaussian process model.
  
  We use the Markov Chain Monte Carlo method for parameter estimation, particularly employing the adaptive Metropolis-within-Gibbs (AMWG) sampling to handle the high autocorrelation of pseudotime.
  
  Comment#8: Enhancing Readability: For clarity, provide intuitive descriptions of each evaluation function used in simulated and real data. The novel methodology performs well for some metrics but less so for others. A clear understanding of these measurements is essential.
  
  To address the concern of readability, we have added descriptions of 5 evaluation metrics in the methodology section (Benchmarking MGPfact to state-of-the-art methods) in line 494 to 515. Additionally, we have included a summary and discussion of these metrics in the conclusion section in line 214-240 to help the readers better understand the significance and impact of these measurements.
  
  (1) In brief, the Hamming-Ipsen-Mikhailov (HIM) distance measures the similarity between topological structures, combining the normalized Hamming distance and the Ipsen-Mikhailov distance, which focus on edge length differences and degree distribution similarity, respectively. The F1<sub>branches</sub> is used to assess the accuracy of a model's branch assignment via Jaccard similarity between branch pairs. In trajectory inference, cor<sub>dist</sub> quantifies the similarity of inter-cell distances between predicted and true trajectories, evaluating the accuracy of cell ordering. The wcor<sub>features</sub> assesses the similarity of key features through weighted Pearson correlation, capturing biological variation. The Overall score is calculated as the geometric mean of these metrics, providing an assessment of overall performance.
  
  (2) For MGPfact and the other seven methods included in the comparison, each has its own focus. MGPfact specializes in factorizing complex cell trajectories using Gaussian process mixture models, making it particularly capable of identifying bifurcation events. Therefore, it excels in the accuracy of branch partitioning and similarity of trajectory topology. Among other methods, scShaper (Smolander et al., 2022) and TSCAN(Ji and Ji, 2016) are more suited for generating linear trajectories and excel in linear datasets, accurately predicting pseudotime. The Monocle series, as typical representatives of tree methods, effectively capture complex topologies and are suitable for analyzing cell data with diversified differentiation paths.
  
  Comment#9: Microglia Analysis:In Figures 3A-C, the genes mentioned in the text for each bifurcation do not always match those shown in the panels. Please confirm this.
  
  Thank you for pointing this out. We have carefully reviewed the article and corrected the error where the genes shown in the figures did not correspond to the descriptions in the article. The specific corrections have been made between line 257 and 264:
  
  “The first bifurcation determines the differentiated cell fates of PAM and HM, which involves a set of notable marker genes of both cell types, such as Apoe, Selplg (HM), and Gpnmb (PAM). The second bifurcation determines the proliferative status, which is crucial for the development and function of PAM and HM (Guzmán, n.d.; Li et al., 2019). The genes affected by the second bifurcation are associated with cell cycle and proliferation, such as Mki67, Tubb5, Top2a. The third bifurcation influences the development and maturity of microglia, of which the highly weighted genes, such as Tmem119, P2ry12, and Sepp1 are all previously annotated markers for establishment of the fates of microglia (Anderson et al., 2022; Li et al., 2019) (Supplementary Table 4).”
  
  Comment#10: Regulons:
  
  - The conclusions rely heavily on regulons. The Methods section describes using SCENIC, GENIE3, RCisTarget, and AUCell, but their relation to bifurcation analysis is unclear.
  
  - Do you perform trajectory analysis on all MURP-derived cells or within each identified trajectory based on bifurcation? This point needs clarification to make the outcomes comprehensible. The legend of Figure 4 provides some ideas, but further clarity is required.
  
  Thank you for the comments.
  
  (1) To clarify, we used the tools like SCENIC to annotate the highly weighted genes (HWG) resulted from the bifurcation analysis for transcription factor regulation activity and possible impacts on biological processes. We have added descriptions to the analysis of our microglial data. The revised content is between line 265 and 266:
  
  “Moreover, we retrieved highly active regulons from the HWG by MGPfact, of which the significance is quantified by the overall weights of the member genes.”
  
  (2) We apologize for any confusion caused by our description. It is important to clarify that we performed an overall trajectory analysis on all MURP results, rather than analyzing within each identified trajectory. Specifically, we first used MURP to downsample all preprocessed cells, where each MURP subset represents a group of cells. We then conducted trajectory inference on all MURP subsets and identified bifurcation points. This process generated multiple independent differentiation trajectories, encompassing all MURP subsets. To clearly convey this point, we have added descriptions in the legend of Figure 4. The revised content is between line 276 and 283:
  
  “Fig. 4. MGPfact reconstructed the developmental trajectory of microglia, recovering known determinants of microglia fate. a-c. The inferred independent bifurcation processes with respect to the unique cell types (color-coded) of microglia development, where phase 0 corresponds to the state before bifurcation; and phases 1 and 2 correspond to the states post-bifurcation. Each colored dot represents a metacell of unique cell type defined by MURP. The most highly weighted regulons in each trajectory were labeled by the corresponding transcription factors (left panels). The HWG of each bifurcation process include a set of highly weighted genes (HWG), of which the expression levels differ significantly among phases 1, 2, and 3 (right panels).”
  
  Comment#11: CD8+ T Cells: The comparison is made against Monocle2, the method used in the publication, but it would be beneficial to compare it with more recent methods. Otherwise, the added value of MGPfact is unclear.
  
  Per your request, we have expanded our comparative analysis to include not only Monocle2 but also more recent methods such as Monocle3 (Cao et al., 2019) and scFates Tree (Faure et al., 2023). We used adjusted R-squared values to evaluate each method's ability to explain trajectory variation. The results have been added to Table 2 and Supplementary Table 6. The revised content is between line 318 and 326:
  
  We assessed the goodness-of-fit (adjusted R-square) of the consensus trajectory derived by MGPfact and three methods (Monocle 2, Monocle 3 and scFates Tree) for the CD8+ T cell subtypes described in the original studies (Guo et al., 2018; Zhang et al., 2018). The data showed that MGPfact significantly improved the explanatory power for most CD8+ T cell subtypes over Monocle 2, which was used in the original studies (P < 0.05, see Table 2 and Supplementary Table 6), except for the CD8-GZMK cells in the CRC dataset. Additionally, MGPfact demonstrated better explanatory power in specific cell types when compared to Monocle 3 and scFates Tree. For instance, in the NSCLC dataset, MGPfact exhibited higher explanatory power for CD8-LEF1 cells (Table 2, R-squared = 0.935), while Monocle 3 and scFates Tree perform better in other cell types.
  
  Comment#12: Consensus Trajectory: A panel explaining how the consensus trajectory is generated would be helpful. Include both visual and textual explanations tailored to the journal's audience.
  
  Thank you for the comments. Regarding how the consensus trajectory is constructed, we have illustrated and described this in Figure 1 and the supplementary methods. Taking the reviewers' suggestions into account, we have added more details about the generation process of the consensus trajectory in the methods section to enhance the completeness of the manuscript. The revised content is from line 599 to 606:
  
  “Following MGPfact decomposition, we obtained multiple independent bifurcative trajectories, each corresponds to a binary tree within the temporal domain. These trajectories were then merged to construct a coherent diffusion tree, representing the consensus trajectory of cells’ fate. The merging process involves initially sorting all trajectories by their bifurcation time. The first (earliest) bifurcative trajectory is chosen as the initial framework, and subsequent trajectories are integrated to the initial framework iteratively by adding the corresponding branches at the bifurcation timepoints. As a result, the trajectories are ultimately merged into a comprehensive binary tree, serving as the consensus trajectory.”
  
  Comment#13: Discussion:
  
  - Check for typos, e.g., line 382 "pseudtime.".
  
  - Avoid considering HVG as the entire feature space.
  
  - The first three paragraphs are too similar to the Introduction. Consider shortening them to succinctly state the scenario and the implications of your contribution.
  
  Thank you for pointing out the typos.
  
  (1) We conducted a comprehensive review of the document to ensure there are no typographical errors.
  
  (2) We restructured the first three paragraphs of the discussion section to clarify the limitations in the use of current manifold-learning methods and removed any absolute language regarding treating HVGs as the entire feature space. The revised content is from line 419 to 430:
  
  “Single-cell RNA sequencing (scRNA-seq) provides a direct, quantitative snapshot of a population of cells in certain biological conditions, thereby revealing the actual cell states and functions. Although existing clustering and embedding algorithms can effectively reveal discrete biological states of cells, these methods become less efficient when depicting continuous evolving of cells over the temporal domain. The introduction of manifold learning offers a new dimension for discovery of relevant biological knowledge in cell fate determination, allowing for a better representation of continuous changes in cells, especially in time-dependent processes such as development, differentiation, and clonal evolution. However, current manifold learning methods face major limitations, such as the need for prior information on pseudotime and cell clustering, and lack of explainability, which restricts their applicability. Additionally, many existing trajectory inference methods do not support gene selection, making it difficult to annotate the results to known biological entities, thereby hindering the interpretation of results and subsequent functional studies.”
  
  Comment#14: Minor Comments:
  
  (1) Review the paragraph regarding the "current manifold-learning methods are faced with two major challenges." The message needs clarification.
  
  (2) Increase the quality of the figures.
  
  (3) Update the numbering of equations from #(.x) to (x).
  
  We thank the reviewer for these detailed suggestions.
  
  (1) We have thoroughly revised the discussion section, addressing overly absolute statements. The revised content is from line 426 to 428:
  
  “However, current manifold learning methods face major limitations, such as the need for prior information on pseudotime and cell clustering, and lack of explainability, which restricts their applicability.”
  
  (2) We conducted a comprehensive review of the figures in the article to more clearly present our results.
  
  (3) We have meticulously reviewed the equations in the article to ensure there are no display issues with the indices.
  
  Reference
  
  Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine J-C, Geurts P, Aerts J, van den Oord J, Atak ZK, Wouters J, Aerts S. 2017. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14:1083–1086. doi:10.1038/nmeth.4463
  
  Anderson SR, Roberts JM, Ghena N, Irvin EA, Schwakopf J, Cooperstein IB, Bosco A, Vetter ML. 2022. Neuronal apoptosis drives remodeling states of microglia and shifts in survival pathway dependence. Elife 11:e76564.
  
  Bravo González-Blas C, De Winter S, Hulselmans G, Hecker N, Matetovici I, Christiaens V, Poovathingal S, Wouters J, Aibar S, Aerts S. 2023. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat Methods. doi:10.1038/s41592-023-01938-4
  
  Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, Shendure J. 2019. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566:496–502. doi:10.1038/s41586-019-0969-x
  
  Faure L, Soldatov R, Kharchenko PV, Adameyko I. 2023. scFates: a scalable python package for advanced pseudotime and bifurcation analysis from single-cell data. Bioinformatics 39:btac746. doi:10.1093/bioinformatics/btac746
  
  Guo X, Zhang Y, Zheng L, Zheng C, Song J, Zhang Q, Kang B, Liu Z, Jin L, Xing R, Gao R, Zhang L, Dong M, Hu X, Ren X, Kirchhoff D, Roider HG, Yan T, Zhang Z. 2018. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med 24:978–985. doi:10.1038/s41591-018-0045-3
  
  Guzmán AU. n.d. Single-cell RNA sequencing of spinal cord microglia in a mouse model of neuropathic pain.
  
  Ji Z, Ji H. 2016. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 44:e117–e117. doi:10.1093/nar/gkw430
  
  Lange M, Bergen V, Klein M, Setty M, Reuter B, Bakhti M, Lickert H, Ansari M, Schniering J, Schiller HB, Pe’er D, Theis FJ. 2022. CellRank for directed single-cell fate mapping. Nat Methods 19:159–170. doi:10.1038/s41592-021-01346-6
  
  Li Q. 2023. scTour: a deep learning architecture for robust inference and accurate prediction of cellular dynamics. Genome Biology.
  
  Li Q, Cheng Z, Zhou L, Darmanis S, Neff NF, Okamoto J, Gulati G, Bennett ML, Sun LO, Clarke LE, Marschallinger J, Yu G, Quake SR, Wyss-Coray T, Barres BA. 2019. Developmental Heterogeneity of Microglia and Brain Myeloid Cells Revealed by Deep Single-Cell RNA Sequencing. Neuron 101:207-223.e10. doi:10.1016/j.neuron.2018.12.006
  
  Neal RM. 2003. Slice sampling. The annals of statistics 31:705–767.
  
  Papadopoulos N, Gonzalo PR, Söding J. 2019. PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes. Bioinformatics 35:3517–3519. doi:10.1093/bioinformatics/btz078
  
  Ren J, Zhang Q, Zhou Y, Hu Y, Lyu X, Fang H, Yang J, Yu R, Shi X, Li Q. 2022. A downsampling method enables robust clustering and integration of single-cell transcriptome data. Journal of Biomedical Informatics 130:104093. doi:10.1016/j.jbi.2022.104093
  
  Roberts GO, Rosenthal JS. 2009. Examples of adaptive MCMC. Journal of computational and graphical statistics 18:349–367.
  
  Saelens W, Cannoodt R, Todorov H, Saeys Y. 2019. A comparison of single-cell trajectory inference methods. Nat Biotechnol 37:547–554. doi:10.1038/s41587-019-0071-9
  
  Sha Y. 2024. Reconstructing growth and dynamic trajectories from single-cell transcriptomics data 6.
  
  Smolander J, Junttila S, Venäläinen MS, Elo LL. 2022. scShaper: an ensemble method for fast and accurate linear trajectory inference from single-cell RNA-seq data. Bioinformatics 38:1328–1335. doi:10.1093/bioinformatics/btab831
  
  Tierney L. 1994. Markov chains for exploring posterior distributions. the Annals of Statistics 1701–1728.
  
  Zappia L, Phipson B, Oshlack A. 2017. Splatter: simulation of single-cell RNA sequencing data. Genome Biol 18:174. doi:10.1186/s13059-017-1305-0
  
  Zhang L, Yu X, Zheng L, Zhang Y, Li Y, Fang Q, Gao R, Kang B, Zhang Q, Huang JY, Konno H, Guo X, Ye Y, Gao S, Wang S, Hu X, Ren X, Shen Z, Ouyang W, Zhang Z. 2018. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature 564:268–272. doi:10.1038/s41586-018-0694-x
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.02.587768v2
www.biorxiv.org www.biorxiv.org

Inclusive, Exclusive and Hierarchical Atlas of NFATc1+/PDGFR-α+ Cells in Dental and Periodontal Mesenchyme

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  In this study, Yang et al. investigated the locations and hierarchies of NFATc1+ and PDGFRα+ cells in dental and periodontal mesenchyme. By combining intersectional and exclusive reporters, they attempted to distinguish among NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1- PDGFRα+ cells. Using tissue clearing and serial section-based 3D reconstruction, they mapped the distribution atlas of these cell populations. Through DTA-induced ablation of PDGFRα+ cells, they demonstrated the crucial role of PDGFRα+ cells in the formation of the odontoblast cell layer and periodontal components.
  
  Thank you for your valuable comments and suggestions, which have greatly enhanced the quality of this research article. The manuscript has been significantly revised in accordance with the reviewers’ comments. All necessary experimental conditions and required data have been included, and all the questions and considerations have been well-addressed in the revised manuscript and supporting information.
  
  Main issues:
  
  (1) The authors did not quantify the contribution of PDGFRα+ cells or NFATc1+ cells to dental and periodontal lineages in PDGFRαCreER; Nfatc1DreER; LGRT mice. Zsgreen+ cells represented PDGFRα+ cells and their lineages. Tomato+ cells represented NFATc1+ cells and their lineages. Tomato+Zsgreen+ cells represented NFATc1+PDGFRα+ cells and their lineages. Conducting immunostaining experiments with lineage markers is essential to determine the physiological contributions of these cells to dental and periodontal homeostasis.
  
  Thanks for your question, we are sorry for the insufficient statement. Figure S9 provided statistical analysis of the number of PDGFR-α+ cells, NFATc1+ cells, and PDGFR-α+&NFATc1+ cells in the dental pulp and periodontal ligament (PDL). The results allow for a clear comparison of the contributions of single-positive and double-positive cells to both tissues. Additionally, the tracing results showed whether these three cell populations have the capacity to produce progeny cells. We further supplemented the analysis with immunofluorescence results of double-positive cells to identify their cell types, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells. This part is further discussed in the manuscript as below:
  
  Page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice... Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggest that the population of PDGFR-α+ and NFATc1+ co-expressing cells is heterogeneous.”
  
  (2) The authors attempted to use PDGFRαCreER; Nfatc1DreER;IR1 mice to illustrate the hierarchies of NFATc1+ and PDGFRα+ cells. According to the principle of the IR1 reporter, it requires sequential induction of PDGFRα-CreER and Nfatc1-DreER to investigate their genetic relationship. Upon induction by tamoxifen, NFATc1+PDGFRα- cells and NFATc1-PDGFRα+ cells were labeled by Tomato and Zsgreen, respectively. However, the reporter expression of NFATc1+PDGFRα+ cells was uncertain, most likely random. Therefore, the hierarchical relationship of NFATc1+ and PDGFRα+ cells cannot be reliably determined from PDGFRαCreER; Nfatc1DreER; IR1 mice.
  
  Thank you for your question. We have supplemented the control group (Pdgfr-αCreER; IR1) experimental data (Figure 8). By comparing the results of Pdgfr-αCreER; Nfatc1DreER; LGRT tracing assays, we confirmed that the expression pattern and range of PDGFR-a+ cells in pulp and PDL of Pdgfr-αCreER; IR1 mice are consistent with those observed in Pdgfr-αCreER; Nfatc1DreER; LGRT mice (Figure 6), and the same applies to NFATc1+ cells. All of our experimental results have been repeated multiple times. In addition, the IR1 system was initially developed by Professor Bin Zhou's lab and was validated for feasibility and stability in a paper published in Nature Medicine in 2017 (https://doi.org/10.1038/nm.4437). Moreover, Professor Zhou Bo O's team applied IR1 dual recombinases for bone lineage tracing in 2021 published in Cell Stem Cell, which also confirmed its feasibility and stability. (DOI: 10.1016/j.stem.2021.08.010)
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Yang et al. present an article investigating the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells within the dental and periodontal mesenchyme. The study explores their capacity for progeny cell generation and their relationships - both inclusive and hierarchical - under homeostatic conditions. Utilizing the Cre/loxP-Dre/Rox system to construct tool mice, combined with tissue transparency and continuous tissue slicing for 3D reconstruction, the researchers effectively mapped the distribution of NFATc1+ and PDGFR-α+ cells. Additionally, in conjunction with DTA mice, the study provides preliminary validation of the impact of PDGFR-α+ cells on dental pulp and periodontal tissues. Primarily, this study offers an in-situ distribution atlas for NFATc1+ and PDGFR-α+ cells but provides limited information regarding their origin, fate differentiation, and functionality.
  
  We would like to thank the reviewer for setting a high value on our study. Given many constructive suggestions, the manuscript has been revised to improve the quantity of this study. All the necessary discussions have also been added, and all the questions and concerns have been well-addressed in the revised manuscript. The point-to-point reply to the comments is listed below:
  
  Strengths:
  
  (1) Tissue transparency techniques and continuous tissue slicing for 3D reconstruction, combined with transgenic mice, provide high-quality images and rich, reliable data.
  
  (2) The Cre/loxP and Dre/Rox systems used by the researchers are powerful and innovative.
  
  (3) The IR1 lineage tracing model is significantly important for investigating cellular differentiation pathways.
  
  (4) This study provides effective spatial distribution information of NFATc1+/PDGFR-α+ cell populations in the dental and periodontal tissues of adult mice.
  
  Weaknesses:
  
  (1) In the functional experiment section, the investigation into the role of NFATc1+/PDGFR-α+ cell populations is somewhat lacking.
  
  Thank you so much for your comments and suggestions. We have supplemented the analysis with immunofluorescence results of double-positive cells to identify NFATc1+&PDGFR-α+ cell populations, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells. This part was shown as below:
  
  Page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice… Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggested that the population of PDGFR-a+ and NFATc1+ co-expressing cells is heterogeneous.”
  
  We also supplemented the discussion regarding the role of PDGFR-α+ population on page 17. Its potential role in pulp and periodontal formation had been suggested as well.
  
  Page 17 in the revised manuscript, “After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F).”
  
  (2) The author mentions that 3D reconstruction of consecutive tissue slices can provide more detailed information on cell distribution, so what is the significance of using tissue-clearing techniques in this article?
  
  Thank you for your insightful comment, and we are sorry for the insufficient statement here. In our study, the utilization of tissue clearing techniques was to address some of the shortcomings associated with the 3D reconstruction of consecutive tissue slices, such as the compromised integrity of samples due to section layering, leading to discontinuities along the z-axis and potential loss of positive signals (Fig. S5, S13). Additionally, unavoidable tissue damage during the sectioning process may result in the loss of some information. As one of the most advanced imaging technologies currently available, tissue clearing/imaging allows for direct observation of the spatial location and relationships of fluorescently labeled cells within the intact tissue, which is more persuasive. Also, evolving beyond the analysis of structural and molecular biology of selected tissue sections, and expanding the focus to entire organs and organisms, is a trend in the development of the biomedical field (Nat Methods. 2024 Jul;21(7):1153-1165; Nat Commun. 2024 Feb 26;15(1):1764). Admittedly, no method is flawless; thus, our employment of two advanced imaging approaches aims to answer questions regarding the spatial positioning and relationships of PDGFR-α single-positive, NFATc1 single-positive cells, and PDGFR-α+ NFATc1+ cells from multiple perspectives. This is done to enhance the credibility and persuasiveness of our results.
  
  We greatly appreciate your suggestion, which have significantly complemented the content of our article. The corresponding statements have been added in the revised manuscript as below:
  
  Page 6 in the revised manuscript, “As one of the most advanced imaging technologies currently available, tissue clearing/imaging allows for direct observation of the spatial location and relationships of fluorescently labeled cells within the intact tissue. Therefore, according to the existing SUMIC tissue deep clearing (TC) methods, we modified and improved a rapid and efficient procedure, which enable rapid single-cell resolution and quantitative panoptic 3D light-sheet imaging.”
  
  (3) After reading the entire article, it is confusing whether the purpose of the article is to explore the distribution and function of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues, or to compare the differences between tissue clearing techniques and 3D reconstruction of continuous histological slices using NFATc1+/PDGFR-α+ cells?
  
  We sincerely appreciate your question and apologize for any ambiguous descriptions.
  
  The purpose of our study is to map the atlas of NFATc1+/ PDGFR-α+ inclusive, exclusive and hierarchical distribution in dental and periodontal mesenchyme. Under this premise, the two advanced imaging techniques were merely employed as means to elucidate this issue Indeed, in the previous manuscript, we did overemphasize the comparison and description of the differences between tissue clearing techniques and 3D reconstruction of continuous slices, which led to unnecessary misunderstandings for which we are deeply apologetic. Consequently, in this version of the manuscript, we have diminished the descriptions comparing their advantages and disadvantages, focusing instead on exploring the importance of NFATc1+/PDGFR-α+ cells. We appreciate your suggestions once again.
  
  Page 6 in the revised manuscript, “These two 3D-reconstruction and imaging technologies complement each other to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+ NFATc1+ cells from multiple perspectives.”
  
  (4) The researchers did not provide a clear definition of the cell types of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.
  
  Thanks for your suggestions. We discovered through cell ablation experiments that the removal of PDGFR-α+ cells resulted in the destruction of the odontoblast layer in the dental pulp, shrinkage of the pulp core, and disruption of collagen fibers in the periodontal ligament. Combined with the results from lineage tracing, we conclude that PDGFR-α+ cells primarily constitute the mesenchymal cells that form the supporting tissues in both the dental pulp and periodontal ligament (Part 4.1). Through immunofluorescence staining, AlphaV was as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells, we observed that the double-positive cell population was a heterogeneous group, containing both mesenchymal stem cells (MSC) and hematopoietic cells (Part 4.2).
  
  (5) In studies related to long bones, the author defines the NFATc1+/PDGFR-α+ cell population as SSCs, which as a stem cell group should play an important role in tooth development or injury repair. However, the distribution patterns and functions of the NFATc1+/PDGFR-α+ cell population in these two conditions have not been discussed in this study.
  
  Thanks for your suggestions. The NFATc1+/PDGFR-α+ cell population was identified as playing an important role in tissue regeneration, especially in oral and maxillofacial tissues. Our research primarily focuses on the identification of NFATc1+ and PDGFR-α+ cells within dental and periodontal mesenchyme, highlighting their contribution to tissue homeostasis and regeneration. Although the NFATc1+/PDGFR-α+ cells were characterized in the context of other tissue types, their detailed role in tooth development and injury repair remains an area for further exploration.
  
  This part was further discussed on page 17-18 in the revised manuscript, “Cell ablation and immunofluorescence staining experiments further characterized the types and functions of PDGFR-α+/PDGFR-α+&NFATc1+ populations. After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F). Previous results confirmed the presence of double-positive cells in both dental pulp and periodontal tissues and provided insights into their hierarchical relationships in the periodontal ligament (Figure. 8). To further investigate the double-positive cell population, we developed an inducible dual-editing enzyme reporter system to label these cells with tdTomato signals. Using AlphaV as a marker for mesenchymal stem cells (MSCs) and CD45 for hematopoietic cells, we found that double-positive cells included components of both MSCs and hematopoietic cells (Figure S22B, C), indicating a heterogeneous population. Further experiments are necessary to determine whether the predominant role in this co-positive MSC population is played by PDGFR-α+ or NFATc1+ and to clarify the specific functions of these cells in the future.”
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This groundbreaking study provided the most advanced transgenic lineage tracing and advanced imaging techniques in deciphering dental/periodontal mesenchyme cells. In this study, authors utilized CRISPR/Cas9-mediated transgenic lineage tracing techniques to concurrently demonstrate the inclusive, exclusive, and hierarchical distributions of NFATc1+ and PDGFR-α+ cells and their lineage commitment in dental and periodontal mesenchyme.
  
  Strengths:
  
  In cooperating with tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the distribution and hierarchical relationship of NFATc1+ and PDGFR-α+ cells and progeny cells plainly emerged, which undoubtedly broadens our understanding of their in vivo fate trajectories in craniomaxillofacial tissue. Also, the experiment design is comprehensive and well-executed, and the results are convincing and compelling.
  
  Weaknesses:
  
  Minor modifications could be made to the paper, including more details on the advantages of the methodology used by the authors in this study, compared to other studies.
  
  Thanks for your constructive comments and advice on how to improve the quality of this research article. We have thoroughly and carefully corrected the manuscript based on your suggestion, and all the necessary data have been added to support our claims. Meanwhile, all the questions and concerns have been well-addressed in the revised manuscript and the revised supplementary information. Thus, we believe that the quality of this paper has been significantly enhanced. We thank you again for your great efforts.
  
  Recommendations For The Authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Line 134, the authors categorized the reporter systems into three types: intersectional reporters, exclusive reporters, and nested reporters. However, Figure 1A does not depict the nested reporters.
  
  Thanks for your helpful recommendation to improve the quality of this manuscript, and we are sorry for the mistake. In this revised manuscript, we have modified the content of Figure 1A, as displayed below:
  
  (2) Line 238, the authors mentioned that NFATc1 is expressed in the mandible and periodontal tissues based on their previous sequencing analyses. It would be better to cite the related reference or display the expression of NFATc1 in the Supplemental Figures.
  
  Thanks for your suggestions. We sincerely apologize for the typo that occurred during the writing process and have revised the original text to on page 9:
  
  “The previous sequencing analyses have reported the expression of NFATc1 in mandible and periodontal tissues20. (DOI: 10.1177/00220345221074356)”
  
  (3) Line 264, the figure callout "Figure 5E" does not exist, and the figure legends of Figure 5 contain the same error.
  
  We greatly appreciate your rigor and diligence, and we have corrected this error.
  
  (4) Line 280, the figure callout "Figure S12" is incorrect.
  
  Thank you for your efforts, and we are sorry for our negligence. The corresponding descriptions have been amended as below:
  
  Page 10 in the revised manuscript, “Consistent with the quantification of TC-based imaging results (Figure S9), the number of PDGFR-α+ cells and NFATc1+ cells were significantly higher than that in pulse group.”
  
  (5) Line 301, the figure callout "Figure 4" is erroneous.
  
  Thank you for your efforts, and we are sorry for our negligence. The corresponding descriptions have been amended as below:
  
  Page 11 in the revised manuscript, “After 11 days tracing, the number of PDGFR-α+ & NFATc1+ cells and PDGFR-α+NFATc1+ cells increased significantly (Figure 7)…”
  
  (6) Line 306, the sentence "Our previous study identified the presence of NFATc1+ cells in the cranium by single-cell sequencing (unpublished data)" could be improved by referencing specific data or findings.
  
  Thanks for your suggestions, and we are sorry for our negligence. The corresponding citation have been amended as below:
  
  Page 11 in the revised manuscript, “As a part of craniomaxillofacial hard tissue, we also intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue (our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing28”
  
  (7) Line 341, the statement "Moreover, no PDGFR-α+ cells were detected in the Nfatc1DreER; IR1 group," needs further explanation or context.
  
  Thanks for your suggestions. The corresponding descriptions have been amended as below:
  
  Page 13 in the revised manuscript, “Moreover, since the recombinase recognition sites are interleaved (loxP–rox–loxP–rox), recombination by one system will naturally remove a recognition site of the other system, rendering its reporter gene inactive for further recombination. The results showed no tdTomato+ cells or ZsGreen+ cells were detected in the Pdgfr-αCreER; IR1 or Nfatc1DreER; IR1 group respectively demonstrating the feasibility and accuracy of the IR1 system.”
  
  (8) Several statements in this text were duplicated. For instance, lines 365 to 376 are identical to lines 497 to 508. This redundancy should be addressed to improve the manuscript's clarity and conciseness.
  
  We greatly appreciate your suggestions, and we are sorry for the misunderstanding we may have caused. We have revised and integrated the entire Results 4 section (including lines 365 to 376 of the original manuscript) into the Discussion section to avoid unnecessary redundancy and misunderstandings. This adjustment also emphasizes that the goal of using two imaging techniques is to draw more credible conclusions from multiple perspectives, thereby mitigating the shortcomings of relying solely on existing advanced imaging methods. The revised content are as follows:
  
  Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”
  
  Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of ZsGreen and tdTomato signals. For example, the tdTomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) It should be further highlighted in the article what cell type the NFATc1+/PDGFR-α+ cells should be defined as in teeth and periodontal tissues.
  
  Thank you so much for your suggestions. We have supplemented the analysis with immunofluorescence results of double-positive cells to identify NFATc1+&PDGFR-α+ cell populations, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells.
  
  This part was on page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice… Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggested that the population of PDGFR-a+ and NFATc1+ co-expressing cells is heterogeneous.”
  
  We also supplemented the discussion regarding the role of PDGFR-α+ population on page 17. Its potential role in pulp and periodontal formation had been suggested as well:
  
  Page 17 in the revised manuscript: “After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F).”
  
  (2) The authors are advised to supplement the description of the cellular origin and the differentiation trajectory of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.
  
  Thank you for your suggestion. Our study currently focused more on mapping the distribution atlas of NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1-PDGFRα+ cells in adult homeostatic mice. In the next step, we plan to explore the differentiation trajectory of NFATc1+/PDGFRα+ cells during development using single-cell sequencing and other methods.
  
  (3) It is recommended to add figure labels to Figure 1B to facilitate reader comprehension.
  
  Thank you for your valuable suggestion to improve the quality of this manuscript. We have modified Figure 1B in the revised manuscript as follows:
  
  (4) Why compare 3D images from tissue clearing with 3D reconstructions of confocal imaging after consecutive tissue slicing?
  
  Thanks for your important and helpful comments to improve the quality of this manuscript, and we are sorry for the insufficient statement.
  
  The original intention of comparing the two methods was to is to draw more credible conclusions from multiple perspectives, thereby minimizing the limitations inherent in the singular use of current advanced imaging techniques. Indeed, the description in the previous manuscript could lead to misunderstandings among readers. Therefore, in the revised manuscript, we have modified and integrated the content of Results 4 section into the Discussion section to eliminate unnecessary verbosity and potential confusion.
  
  Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”
  
  Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of Zsgreen and tdTomato signals. For example, the td-tomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”
  
  (5) The experimental results section does not specify the age of the mice used, which lacks clarity for the reader and makes it difficult to determine at what developmental stage the observed distribution of NFATc1+/PDGFR-α+ cells occurs.
  
  Thank you for your suggestion. I apologize for overlooking this point. I only displayed the age of the mice in some of the figures. All the transgenic mice discussed in this article are adults around 12-14 weeks. I have added the specific weeks of age in the main text.
  
  (6) What is the rationale behind selecting day 1, day 3, and day 5 as the experimental time points in Figure 2B?
  
  Thanks for your questions. 48 hours after injection, TAM can be metabolized in the body and converted into 4-OHT, which then distributes thoroughly to various tissue systems through the bloodstream. Therefore, we chose to administer a booster dose 48 hours after the initial injection to ensure timely replenishment and achieve high labeling efficiency. This drug administration scheme has already been validated for feasibility in our preliminary studies.
  
  (7) In Figure 2E, why is there a large area of red signal visible in the tooth enamel?
  
  Thanks for your valuable comments and advice on how to improve the quality of this research article and our future work. As we discussed in the main text, the existing TC-based imaging techniques cannot meet the requirements for capturing as conspicuous tdTomato signals as ZsGreen, which may due to: 1) the editing efficiency of the DNA recombinase-mediated lineage-tracing system has limitations; 2) the lower presence of NFATc1+ cells in the region-of-interest (ROI) ensures weak signals of tdTomato; 3) the TC method as described may result in poor penetration of td-tomato fluorescence signals. Therefore, to clearly display the NFATc1+ cells in the ROI (periodontal ligament, pulp, and alveolar bone) as much as possible, we increased the intensity of excitation fluorescence of 561-channel of the Lightsheet fluorescence microscopy, which led to a large area of unrelated red signal in non-target areas (tooth enamel). In future work, we will further improve the TC procedure to shorten the sample processing time, and developing other transgenic mice to address this issue. Thanks again.
  
  (8) In the text at Line 249, the author notes that PDGFRα+ cells are widely distributed, and NFATc1+ cells are primarily located in the pulp horns. What is the relevance of their distribution to their function?
  
  Thank you very much for your suggestion. We found that PDGFRα+ cells are widely distributed in dental pulp tissue. Combined with the results from subsequent cell ablation experiments, it revealed that PDGFRα+ cells contribute to the formation of the odontoblast layer and the pulp core. In our supplementary data, we discovered through immunofluorescence staining that double-positive cells co-expressed AlphaV in the dental pulp, indicating that they possessed MSC components. We need to further investigate the relationship between their distribution and function in the future.
  
  (9) In Line 301 of the text, there is a mislabeling of Figure 4. Please verify this carefully throughout the document.
  
  Thank you for your efforts, and we are sorry for our negligence. We have made the necessary corrections and have meticulously reviewed the entire manuscript to ensure that there were no similar mistakes. The corresponding descriptions have been amended as below:
  
  Page 11 in the revised manuscript, “After 11 days tracing, the number of PDGFR-α+ & NFATc1+ cells and PDGFR-α+NFATc1+ cells increased significantly (Figure 7)…”
  
  (10) Between Lines 323 to 325, the author states: "the wider range of PDGFR-α+ cells than NFATc1+ cells were observed, which laid the foundation for our conjecture that NFATc1+ cells may contribute as subpopulation of PDGFR-α+ cells." This statement is inaccurate.
  
  Thank you for your suggestions. We apologize for the inaccuracies in our description and have made corrections in the original text.
  
  Page 12 in the revised manuscript, “the wider range of PDGFR-α+ cells than NFATc1+ cells were observed, we speculate that there may be a hierarchical relationship between the two.”
  
  (11) The author is advised to combine the use of single-cell sequencing data for cell trajectory analysis to corroborate the differentiation relationships between NFATc1+/PDGFR-α+ cells, discussing their specific origins and final differentiation fates.
  
  Thank you for your suggestion; it is very meaningful to us and will be the focus of our future research work.
  
  (12) In the Results 4 section, the comparison between tissue clearing imaging and 3D reconstruction of consecutive tissue slices could be discussed in the discussion section.
  
  We greatly appreciate your suggestions. We have revised and integrated the entire Results 4 section into the Discussion section to avoid unnecessary redundancy and misunderstandings. This adjustment also emphasizes that the goal of using two imaging techniques is to draw more credible conclusions from multiple perspectives, thereby mitigating the shortcomings of relying solely on existing advanced imaging methods. The revised content are as follows:
  
  Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”
  
  Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of Zsgreen and tdTomato signals. For example, the td-tomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”
  
  (13) The article only demonstrates the impact of removing PDGFR-α+ cells on the dental pulp and periodontal tissues of adult mice. What would be the impact of removing NFATc1α cells on teeth and periodontal tissues?
  
  Thank you for your suggestions. Our lab had been investigating the role of NFATc1+ cells in PDL and dental pulp tissues which is currently submitted to another journal. So please forgive me for not being able to present the data. The ablation assays showed that NFATc1+ cells may be involved in the formation of the odontoblast layer in dental pulp and in promoting osteogenic differentiation in the periodontal ligament.
  
  (14) The effects of removing PDGFR-α+ cells on the teeth and periodontal tissues of adult mice are shown in the article. What would be the impact on teeth and periodontal tissues if PDGFR-α cells were removed during early development?
  
  Thank you for your question. Our current research has not yet focused on the impact of PDGFR-α+ cells on the formation of periodontal ligaments and dental pulp tissue during the developmental stage. In our literature search, we found articles indicating that PDGFR-α was expressed at all stages of tooth development, and that PDGFR-α signaling was crucial for regulating the growth of the tooth apex and the proper extension of the palatal shelves during palatal fusion. Disruption of PDGFRα signaling interferes with apex growth and the critical extension of palatal shelves during craniofacial development. In the future, we would like to focus on the role of PDGFR-α cells during teeth development.
  
  (15) If the data on the skull are not presented in this paper, it is suggested not to overly describe it in the results section, or to include related skull data in supplementary figures.
  
  We appreciate your attention to detail and your suggestions for improving the clarity and presentation of our work. The corresponding results of cranium and cranial sutures region were shown in Video S7-9 in the revised manuscript.
  
  Reviewer #3 (Recommendations For The Authors):
  
  We sincerely appreciate your thorough review and positive feedback on our manuscript. In accordance with your recommendations, all the questions and concerns have been well-addressed in the revised manuscript. We believe these revisions further enhance the clarity and quality of our work. The point-to-point reply to the comments is listed below:
  
  (1) In line 181, the author claimed that "we modified and improved a rapid and efficient procedure...this ultrafast clearing technique could minimize the impact on transgenic mice." However, there is no mention in the main text of the amount of time required for other methods. How can the "rapid" element of your improved method be reflected? The author should briefly list a few other studies and discuss them.
  
  Thanks for your important and helpful comments, and we are sorry for the insufficient statement. In recent years, a variety of tissue clearing methods have emerged. Here is a summary of the methods and durations used for hard tissue clearing as published in several authoritative journals:
  
  Author response table 1.
  
  In comparison, our approach requires only approximately two days, thereby minimizing the potential damage to the tissue itself. Additionally, the study employs transgenic mice mediated by lineage tracing, and the shorter processing time also serves to reduce the impact on the fluorescence of the positive cells to a minimum.
  
  (2) In Figure S6, the author mentioned the use of another 3D reconstruction method-DICOM-3D. What is the advantage of this methodology? Is the conclusion drawn the same as the previous approaches? The author should propose corresponding discussions in this section.
  
  We sincerely appreciate your comments. The purpose of employing DICOM-3D reconstruction for the serial section images is to validate the constructed results obtained by Imaris. This method is based on sequential 2D DICOM images and utilizes 3D reconstruction and visualization technology to generate a stereoscopic 3D image with intuitive effects. Compared to Imaris reconstruction, this method offers a more straightforward and time-efficient approach. Regardless of the different reconstruction methods employed in this study, the ultimate goal remains consistent, which is to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+NFATc1+ cells from multiple perspectives, to enhance the credibility and persuasiveness of our results. We have also included the corresponding description in the revised manuscript as follows:
  
  Page 8-9 in the revised manuscript, “To enhance the comprehensive and accurate display of the reconstruction results and to mitigate the potential errors that may arise from relying on single reconstruction method, we employed an alternative 3D reconstruction method—DICOM-3D. This method is based on sequential 2D DICOM images and utilizes 3D reconstruction and visualization technology to generate a stereoscopic 3D image with intuitive effects, which was a comparatively straightforward and highly efficient approach. We transformed the serial IF images into DICOM format and subsequently reconstruct it, and the same conclusion can be drawn, namely, PDGFR-α+ cells almost constituted the whole structure of pulp and PDL, with NFATc1+ cells as subpopulation (Figure S6).
  
  (3) Line 292: Why was the tdTomato signal in confocal-based reconstruction more conspicuous than the TC procedure? Some descriptions would be beneficial for readers' understanding.
  
  Thank you very much for your comments. We hypothesize that the current light-sheet systems have inherent limitations in capturing tdTomato signals of intact tissue, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues. The corresponding descriptions in the revised manuscript are shown as follows:
  
  Page 11 in the revised manuscript, “We hypothesize that the current light-sheet systems for intact tissue-imaging have inherent limitations in capturing tdTomato signals, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues.”
  
  (4) Part 2.2, line 305: What is the purpose of analyzing the cranium and cranial sutures region through TC technology?
  
  Thank you for your comments. There are three main purposes of this part of the experiment. First, our research group has long been committed to studying the distribution and role of NFATc1+ SSCs in a variety of hard tissues, and our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing. Therefore, in this work, we also intend to investigated the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells in cranium and cranial sutures region based on transgenic lineage tracing techniques. Second, as a part of craniomaxillofacial hard tissue, we intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue; In addition, the results in Video S7-9 further demonstrated that our improved tissue clearing procedure in this work is universal for a variety of hard tissues, which lay a foundation for our future researches.
  
  Page 11 in the revised manuscript, “As a part of craniomaxillofacial hard tissue, we also intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue (our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing28”
  
  (5) Some images before & after the tissue-clearing procedure need to be provided in the supplemental file.
  
  Thanks for your important and helpful comments to improve the quality of this manuscript. We have included the corresponding description and photographs in the main text and the supplemental file as follows:
  
  Page 7 in the revised manuscript, “As shown in Figure S1A-B, we recorded bright-field images of the maxilla before and after clearing, and our procedure achieved high transparency of the whole tissue. On this basis, whole-tissue imaging can be achieved, with the observation of different cell type distribution in spatial 3D structure.”
  
  (6) In part 5, line 394, the author investigated the consequences of the ablation of PDGFR-α+ cells in dental pulp and periodontal mesenchymal tissues, but some research objectives and mechanisms need to be discussed here, regarding: "why choosing to ablation PDGFR-α+ cells instead of NFATc1+ cells? Was the hierarchical relationship between PDGFR-α+ cells and NFATc1+ cells considered during the experimental design?", etc.
  
  Thank you very much for your suggestion, it has been very helpful. We chose PDGFR-α+ cells as the subject for the cell ablation experiments based on the results from the previous lineage tracing and hierarchical relationship studies. We have included the corresponding description and photographs in the main text and the supplemental file as follows:
  
  Page 13 in the revised manuscript, “The results from the aforementioned lineage tracing experiments showed that PDGFR-α+ cells constitute a significant component of both dental pulp and periodontal tissues. Additionally, the hierarchical relationship experiments revealed that a portion of NFATc1+ cells in the periodontal ligament derives from PDGFR-α+ progenitor cells. Therefore, investigating the role of PDGFRα+ cells in dental pulp and periodontal tissues has become more urgent.”
  
  (7) Some claims in the main text were lack of literature citation, such as in lines 207 and 234.
  
  Thank you very much for your comments. We are deeply sorry for the mistakes. We have added the relevant references at the appropriate locations in the main text as follows:
  
  (1) line 207 of previous manuscript (page 8, line 206 in the revised manuscript): We sincerely apologize for the typo that occurred during the writing process and have revised the original text to: which was consistent with RNA-sequencing results in the previous study20. (DOI: 10.1177/00220345221074356)
  
  (2) line 234 of previous manuscript (page 9, line 234 in the revised manuscript): “we employed an alternative 3D reconstruction method—DICOM-3D27.” (DOI: 10.1177/09544119211020148)
  
  (8) What were the specific reasons for the conspicuous tdTomato signal in the reconstructed images obtained by traditional serial section-based confocal imaging, which were not as evident in TC imaging?
  
  Thank you very much for your comments. Traditional sectioning and subsequent confocal imaging can clearly display fluorescence signals on a single plane (Figure 3B, Figure 6B, Figure S3, S8, S11, S16, S19), therefore, after 3D reconstruction of multiple planes, it will still have a high resolution (Figure 3, 4, 7, 8). However, for TC imaging, the current light-sheet systems have inherent limitations in capturing tdTomato signals of intact tissue, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues.
  
  (9) In tissue clearing techniques, do the chemical reagents and procedures used affect the signal intensity of tdTomato and Zsgreen?
  
  We appreciate your helpful comment. In this work, we modified and improved a rapid and efficient tissue deep clearing (TC) procedure based the existing SUMIC method, and (Nature Cardiovascular Research, 2024, 3, 474–491; Cell, 2023, 186, 382-397.e24.). These researches have confirmed that the chemical reagents used in this method do not affect the inherent fluorescence signal of transgenic animals. With our improvements, we minimized the sample processing time as much as possible to avoid any potential adverse effects. The results in Figure 2, Figure 5, and Figure S1 indicated that after TC procedure, the tissue exhibit significant ZsGreen signals and certain tdTomato signals, which sufficiently support our conclusions.
  
  (10) How did you address the issue of sample integrity and discontinuities in the z-axis caused by the stratification of slices in your reconstructions?
  
  We greatly appreciate your comments. Currently, reconstruction techniques based on continuous sectioning cannot fully eliminate the discontinuities in the z-axis. Therefore, it is for this reason that we need to compensate for this deficiency by imaging the whole tissue through TC procedure. These two 3D-reconstruction and imaging technologies complement each other to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+NFATc1+ cells from multiple perspectives. Additionally, this deficiency can be minimized by improving the technical skills, reducing section thickness, and to minimize tissue loss during sectioning, which is our future research endeavors.
  
  (11) In Figure 2B, the schematic representation of the operational principle "Cre-loxp/Dre-loxp" does not correspond to the genotype "CreER/DreER". Please correct it.
  
  Thanks for your important comments. We are sincerely sorry for the mistake. We have modified Figure 2B in the revised manuscript as below:
  
  (12) Line 450, the specific distribution and differences of PDGFR-α+, NFATc1+, and PDGFR-α+&NFATc1+ cells in pulp and periodontal tissues need to be further described and explained.
  
  Thank you for your question. We have described this part on page 16 in the revised manuscript, “In PDL tissue, pulse data demonstrated widespread and abundant expression of PDGFR-α single-positive cells as well as NFATc1 single-positive cells, with no significant alteration in expression pattern or quantity after lineage tracing. Consequently, we conclude that in periodontal ligament and dental pulp tissues, PDGFR-α single-positive and NFATc1 single-positive cells primarily label intrinsic periodontal mesenchyme in PDL. Conversely, PDGFR-α+&NFATc1+ cells exhibited a more confined localization in PDL. The tracing data clearly illustrated that PDGFR-α+&NFATc1+ cells successfully gave rise to numerous progenies, which become predominant constituents within the periodontal ligament. In pulp tissue, the distribution of PDGFR-α single-positive cells was similar as that in PDL, primarily labeled odontoblast cell layer and there was not a significant increase in ZsGreen signal after tracing assay.”
  
  (13) In Figure S9, the sparse presence of NFATc1+ cells in pulp and periodontal tissue raises questions about the plasticity and differentiation potential of these cells. The author should include relevant discussions in this section.
  
  Thanks for your suggestion. Considering the plasticity and differentiation potential of NFATc1+ cells, we conducted immunofluorescence staining and found that the PDGFR-α+&NFATc1+ cell lineage in dental pulp and periodontal tissues represents a heterogeneous population. This population includes non-terminally differentiated mesenchymal stem cells (MSCs) as well as hematopoietic cells, indicating significant heterogeneity. We have also added this part of the discussion on page 17 of the manuscript.
  
  Page 17 in the revised manuscript, “Cell ablation and immunofluorescence staining experiments further characterized the types and functions of PDGFR-α+/PDGFR-α+&NFATc1+ populations. After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F). Previous results confirmed the presence of double-positive cells in both dental pulp and periodontal tissues and provided insights into their hierarchical relationships in the periodontal ligament (Figure. 8). To further investigate the double-positive cell population, we developed an inducible dual-editing enzyme reporter system to label these cells with tdTomato signals. Using AlphaV as a marker for mesenchymal stem cells (MSCs) and CD45 for hematopoietic cells, we found that double-positive cells included components of both MSCs and hematopoietic cells (Figure S22B, C), indicating a heterogeneous population. Further experiments are necessary to determine whether the predominant role in this co-positive MSC population is played by PDGFR-α+ or NFATc1+ and to clarify the specific functions of these cells in the future.”
  
  (14) Part 3, line 351, the authors were unable to confirm the hierarchical relationship between PDGFR-α+ and NFATc1+ cells in the dental pulp region. Could this be due to limitations in experimental design or technical methods? Have you considered other factors that might explain these results?
  
  Thank you for your question. We believe that the possible reason was that PDGFR-α+ cells were a widely distributed constitutive component of dental pulp tissue, while NFATc1+ cells had a more limited expression range, resulting in a significant difference between the two. Therefore, we were unable to calculate the differences. In the future, we could further investigate the hierarchical relationship between the two by increasing the sample size or through in vitro experiments such as immunoprecipitation.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.10.602887v2
www.biorxiv.org www.biorxiv.org

New submission 20/06/2023, 13:52:01

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers for their time in evaluating the strengths and weaknesses of our manuscript.
  
  We are pleased to see that all reviewers recognized the high significance of our work, noting that the manuscript addresses “longstanding question of which cell types are infected during congenital or perinatal rubella virus infection”. As noted by reviewer 1, “This study reveals a new cellular target that will have important implications for basic studies on rubella virus-host interactions and for the potential development of therapies or improved vaccines targeting this virus. As the rubella virus is a pathogen of high concern during human pregnancy, this study also has important implications in the field of neonatal infectious diseases”.
  
  Below, we provide responses (in blue) to specific critiques:
  
  Reviewer #1 (Public Review):
  
  A weakness is that the current data do not provide information on the full replicative potential of the rubella virus in microglia, or whether the virus persists in this system.
  
  See our response below. Briefly, we include new experimental evidence from primary tissue, microglia-transplanted organoids, and Vero cells to further characterize the dynamics of viral infection.
  
  Reviewer #1 (Recommendations for the authors):
  
  Most of the viral assays in the brain slices and organoids examine viral protein synthesis, which is a surrogate for genome replication. However, basic virological characterization is lacking and would improve the robustness of the model and its potential utility to understand better rubella virus-microglia interactions. Questions the authors should consider with new experiments include:
  
  Are new virions produced? Can viruses be detected in the media?
  
  Or, are the infections abortive, with viral protein synthesis occurring, but no virus production?
  
  We performed RV titering experiments in dissociated microglia co-cultured with other cell types, as well as Vero cells as a control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. See Author response image 1. We now include these data in Supplementary Figure 2D.
  
  Author response image 1.
  
  Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.
  
  While we could not detect an increase in the viral particles from microglia mixed cultures, we confirmed the presence of GFP from the RV-GFP reporter construct, and we believe it serves as a proof that the virus can infect microglia cells and lead to production of functional viral protein (Author response image 2, Figure 1E-F of the current manuscript):
  
  Author response image 2.
  
  We also observed an increase in RV RNA over time in tissue slice infections, using qPCR (Author response image 3, not included in the manuscript).
  
  Author response image 3.
  
  Modest increase in RV RNA over time in brain slice infections. Rubella virus RNA measured by qPCR relative to GAPDH gene, in n=3 samples (2 technical replicates each condition). Brain slices were exposed to RV, then collected at end of inoculation (4 hours post infection), or at 3 or 5 days post infection, and processed for RNA extraction and RT-qPCR.
  
  How long do the infections persist in the model? What is the fate of infected microglia over time? Time courses to monitor infection and cell health would be useful.
  
  We performed a longer infection with RV in organoids transplanted with microglia, and after two weeks of infection, we can detect multiple microglia cells positive for the RV capsid. These data are now included in Figure 4 of the current manuscript.
  
  Author response image 4.
  
  After 2 weeks post infection, microglia remain positive for RV capsid.
  
  Reviewer #2 (Public Review):
  
  Weaknesses
  
  The set of data is rather descriptive. It suggests that microglia are the predominant brain target of RV in vivo, without identifying the targeting mechanism that provides cell type specificity. Moreover, what are the diffusible cues released from the brain environment that increase microglia infection and RV replication?
  
  We agree with the reviewer that identifying molecular mechanisms that underlie this phenotype will be very interesting to explore in future research, and we acknowledge the limitation of the study in the Discussion.
  
  It is unclear why brain organoids not supplemented by microglia are susceptible to RV inoculation.
  
  We could not detect RV capsid in organoids without microglia after 72 hours of inoculation. We attribute any changes seen at the level of single cell transcriptomics in the absence of microglia transplantation to exposure to virus-associated particles, including but not limited to viral RNA species, viral proteins, or even other components of the viral stocks made in Vero cells. These factors may induce transcriptomic differences even in the absence of RV infection. In the text, we take care to refer to these condition as “Rubella virus-exposed” rather than “Rubella virus- infected”. We now include the following panel from Author response image 5 in Figure 4B of the current manuscript.
  
  Author response image 5.
  
  Organoids without microglia do not show positive RV immunofluorescence.
  
  Reviewer #2 (Recommendations for the authors):
  
  Several points could be further addressed to improve the data set and shed more light on some aspects of this manuscript:
  
  • Figure 1. Additional microglia markers should be used to reinforce the evidence that microglia cells are the principal RV targets. Since Iba1 is a marker of activated microglia, does RV have a selective tropism to all microglia or only to activated ones in human fetal brain slices?
  
  The reviewer brings up an interesting point that, in our mind, can be separated into two independent questions:
  
  Are Iba1-positive cells bona fide microglia, or are there other cell populations of macrophage/monocyte origin that are labeled with Iba1? Therefore, additional markers should be used for immunolabeling;
  
  Is RV infection selective for microglia “activation” status, when only 5mmune-primed cells can be infected?
  
  For the first point, we have previously shown that in the developing human brain, virtually all Iba1-positive cells are also P2RY12-positive (unpublished; Author response image 6). Therefore, in primary human brain slices, there is a negligible amount of non-microglia macrophages. However, in culture microglia quickly lose their “homeostatic” identity, including P2RY12 expression, as quickly as six hours after ex vivo extraction (Gosselin et al., 2017; DOI: 10.1126/science.aal3222).
  
  Author response image 6.
  
  P2RY12 co-localizes with Iba1 in primary brain tissue from gestational week 17.5, including cells with more ameboid morphology (arrows)
  
  However, in organoids at 2 weeks post-RV exposure, we found microglia with both ameboid and more ramified morphology (Author response image 7). It would be challenging and beyond the scope of this manuscript to use morphology or Iba1 intensity levels to determine cause and effect as microglia activation state relates to RV infectivity (i.e. do activated microglia preferentially get infected with the virus, or do infected microglia become activated and upregulate Iba1 levels and change morphology).
  
  Author response image 7.
  
  Examples of microglia with round (top) and ramified (bottom) morphology that co-localize with RV capsid staining.
  
  Regarding RV tropism in the 2D culture of microglia, some Iba- cells are infected by RV as they show capsid staining. What are these cells? Are neurons and/or glia also susceptible to RV in vitro infection? Are non-microglial cells getting RV infected in the absence of microglia?
  
  In the absence of microglia cells, a small proportion of non-microglia cells get infected with RV. There is no statistically significant difference in the number of cells that get infected with RV in the presence or absence of microglia across different cell types. We add these data as Supplement Figure 3.
  
  Author response image 8.
  
  Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.
  
  • Figure 3. The low rate of Rubella virus infection in homogenous CD11b+ cell culture raises the question of whether the Rubella virus can infect microglia at a specific activation stage. It is also surprising that there is no infection of such cell population (also CD11b+) alone while cultured in 2D, as reported in figure 2. Why such a difference?
  
  It is well established that culture of microglial cells isolated from brain tissue alters their molecular properties, which likely alters the cell surface protein composition. In the revised discussion, we include activation as a possible mechanism that will require further investigation.
  
  • Fig 4A-B, it is unclear whether organoids that are not engrafted with microglia get infected upon RV (with active viral replication) inoculation. If non-microglia-supplemented organoids are indeed infected and allow RV replication, this suggests that organoids might not be the ideal system to model human fetal brain RV infection at GW18-23.
  
  We could not detect RV capsid in organoids without microglia after 72 hours of inoculation. We include the following panel from Author respone image 9 in Figure 4 now.
  
  Author response image 9.
  
  Organoids without microglia do not show positive RV immunofluorescence.
  
  • Figure 4E, why are cells derived from microglia-free organoids so much enriched in the UMAP plots as compared to the other organoid condition? Is RV impacting cell fitness, proliferation, or neurodifferentiation?
  
  This perceived difference is due to data presentation. Based on cell proportions, cells from organoids that were treated with microglia are more represented in the scRNAseq data, and this difference most likely comes from user-introduced imbalance in cell loading and possible cell losses during demultiplexing (Author response image 10, panel A). Cell number composition across different conditions and cell types, including RV and MG treatment, are shown in Supplement Figure 4 of the current manuscript (Author response image 10, panel B).
  
  Contribution of each condition can be visualized via UCSC single cell data browser: https://cells.ucsc.edu/?ds=rubella-organoids
  
  Author response image 10.
  
  Data composition depending on condition. A. Cell number contribution from organoids with and without microglia. B. Contribution of each condition to each cluster composition.
  
  • Figure 4F-H. If microglia is the predominant target for RV in the brain, why are microglia-free organoids susceptible to RV and who are the other cellular targets, whose infection leads to activation of interleukin pathway genes and dysregulation of brain developmental markers in selected subpopulations (RGCs, ENs..).
  
  Thank you for bringing this point. We did not detect any appreciable RV genomic RNA in our published single cell data, nor did we identify RV capsid in the RV-exposed organoids without microglia. Our experiments on dissociated cell cultures show that a small population (~1-4%) of other cell types was positive for the RV capsid, including neuron-enriched and glial-enriched fractions (Author response image 11; Supplementary Figure 3C in current manuscript). We expect a similar proportion of non-microglia cells to be infected in the brain organoids. One possible explanation for the robust interferon response even in the absence of productive infection in other cell types is exposure to virions and virus-associated particles, including but not limited to viral RNA species, viral proteins, or even other components of the viral stocks made in Vero cells (which is a cell line that should not produce interferons, but may produce other unmeasured cytokines as a virally infected cell culture).
  
  Author response image 11.
  
  Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells cultured with or without microglia.
  
  • QRT-PCR validations of some of these key brain targets should be performed.
  
  We agree with the reviewer that further validation of the predicted molecular changes downstream of Rubella exposure would be valuable. We have opted to validate IFITM3 and NOVA1 expression differences using immunostaining, and the results are consistent with our predictions from scRNAseq, and the data is presented in revised Figure 5 and 6 of the current manuscript.
  
  Reviewer #3 (Public Review):
  
  Weaknesses of the paper: Overall, additional control experiments are needed to support the stated conclusions. Affinity chromatography is used to purify microglia and other cell types, but the overall cell enrichment is not quantified.
  
  We appreciate the reviewer concern. However, affinity based enrichments rarely guarantee purity of the enrichment, and we do not believe accurate estimation of the purification purity would alter the biological interpretation of the data.
  
  In cell mixing experiments, the authors do not rule out the possibility that the added non- microglia cells also become infected, releasing additional infectious viruses. The finding that a diffusible factor is required for RV infection would be unusual if not unprecedented; therefore, additional data are required to support this claim and rule out other interpretations.
  
  We provide quantification of non-microglia cells that are positive for RV capsid in the presence and absence of microglia. Small (~1-4%) of non-microglia cells get infected with the virus and can potentially release more of the virus (see Author response image 12), but we do not know how this newly produced virus would be different from the one that was applied to the cells directly. To follow up our co-culture experiments, we wanted to exclude a possibility of microglia engulfing RV- infected cells in co-cultures, therefore we separated the two cell fractions by a liquid-permeable membrane (Figure 3 of the current manuscript). It is possible that factors secreted by other cell populations in the transwell assay experiments act on microglia cells to upregulate a yet unidentified receptor on microglia surface or other infection-dependent molecule rendering them infectable by the virus.
  
  We re-phrase the text by de-emphasizing “soluble factors” and focusing on excluding phagocytosis of infected cells as a possible mechanism of RV capsid immunoreactivity in microglia cells.
  
  Author response image 12.
  
  Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.
  
  The methods section would be improved by including details about the iPSC line that was used.
  
  We include the following section in Materials and Methods:
  
  iPSC lines.
  
  All work related to human iPS cells has been approved by the UCSF Committee on Human Research and the UCSF GESCR (Gamete, Embryo, and Stem Cell Research) Committee. Human iPS cell line “WTC-10” derived from healthy 30-year-old Japanese male fibroblasts was from the Conklin Lab, UCSF (Bershteyn et al., 2017; Kreitzer et al., 2013). Human iPSC line “13325” was derived from 9-year-old female fibroblasts originally obtained from Coriell cell repository. Human iPSC line “1323-4” derived from healthy 48-year-old Caucasian female fibroblasts (gift from the Conklin Lab, UCSF) was used for immunofluorescence validation analysis as we found that this line generates more reproducible brain organoid differentiations.
  
  and by a more thorough description of virus-specific details, including the numbers of infectious particles added per volume of incubation media.
  
  We now include the following data in the Materials and Methods section:
  
  Rubella virus infection
  
  Cells cultured in 2D were inoculated by adding RV stock virus to culture media in 1:1 dilution (250 ul of media to the equal volume of viral stock, 1.75x105 total ffu/well) to achieve a multiplicity of infection (MOI) of 2. After four hours, media was exchanged with fresh cell culture media. Cortical brain slices were treated with 500 ul of RV viral stock (3.5x105 total ffu/slice) applied over the slice culture filter for four hours, and then the viral culture media was removed and replaced with fresh slice culture media. Organoids were treated in 6-well plates with 2ml of 1:1 dilution of viral stock:organoid maintenance media (7x105 total ffu) for four hours, and then viral media was exchanged for fresh media. For all experimental conditions, cells were fixed and processed for downstream analysis at 72 hours post infection. Supernatant from non-infected Vero cells (mock) or heat-inactivated RV (650C, 30 mins) was used as control.
  
  In addition to immunofluorescence, adding additional data to demonstrate and quantify virus infection (PCR and plaque assays. or immunofluorescence using an anti-double-stranded RNA antibody such as J2) from the infected brain slices and organoids would provide greater assurance that the virus is indeed replicating under the experimental conditions.
  
  We performed RV titering experiment in dissociated microglia co-cultured with other cell types, as well as Vero cells control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. We now include these data in Supplementary Figure 2D.
  
  Author response image 13.
  
  Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.
  
  Unfortunately, we did not find J2 staining informative because we could detect signal in both wild type RV infection conditions and in heat-inactivated RV, presumably due to native dsRNA species present in cells. We did not detect any increase or difference in the pattern of staining between RV and heat-inactivated virus-exposed conditions (Author response image 14; not included in the manuscript).
  
  Author response image 14.
  
  J2 antibody labels dsRNA in both RV-exposed and control heat- inactivated virus conditions, presumably due to native dsRNA that is not unique to the viral replication.
  
  Organoid imaging with immunofluorescence would be very informative in demonstrating the presence of microglia and also in showing which cells are virus-infected in the context of organoid structures.
  
  We provide images from 72hrs and 2 week RV infection, providing a zoomed-out view of organoids with microglia and RV capsid staining. We also provide images of 72hrs post- infection in organoids without microglia Author response image 15, Figure 4C in current manuscript).
  
  Author response image 15.
  
  Microglia in organoids co-localize with RV capsid staining.
  
  GenBank accession numbers are listed for the recombinant RV and GFP-RV reporter, but a search using those numbers did not locate the deposits--perhaps the deposits were very recent?
  
  Both viral construct information is now available on GenBank:
  
  M33 RV strain can be found here: https://www.ncbi.nlm.nih.gov/nuccore/OM816674
  
  RV-GFP can be found here: https://www.ncbi.nlm.nih.gov/nuccore/OM816675
  
  The authors incorrectly refer to the GFP virus as a new strain; it is not a viral strain and should be referred to as a reporter virus.
  
  Thank you, we changed the description to
  
  “To confirm functional transcription and translation of the viral genome, a new reporter construct of RV designed to express GFP within the non-structural P150 gene was generated (RV-GFP, GenBank Accession OM816675)”
  
  Given that the authors show that Vero cell cultures are infected by the Rubella virus in the absence of other cells, additional evidence is needed to demonstrate that a diffusible factor from other cells enables microglia to be infected by the Rubella virus.
  
  We have revised the manuscript to indicate that our data is consistent with the possibility that a diffusible factor is involved. Our experiment utilizing transwell assay argues against phagocytosis and physical interactions as primary drivers, but future studies will be needed to determine if soluble factors are involved.
  
  The authors did not detect Rubella virus transcripts in the single-cell RNA sequencing experiment, nor was a microglia cluster found.
  
  Indeed, microglia recovery using scRNAseq is very inefficient. We note this limitation in the discussion.
  
  Innate immune responses can be activated in the presence of viral particles but without virus replication, as in inactivated viral vaccines; therefore changes in interferon responses do not necessarily prove virus replication.
  
  We agree with the reviewer on this point, it is difficult, if at all possible, to entirely eliminate the possibility that some of the transcriptomic changes, particularly the interferon responses, are not induced by the exposure to viral particles. We have revised the manuscript to more rigorously described the conditions as “RV-exposed”.
  
  Figure 4: it would be helpful to define the abbreviations used in the figure legend (e.g. IPC, RG, EN). In the volcano plots, the gene names are blocked by the dots, and the figure becomes very pixelated when enlarged to read the text.
  
  We have added abbreviations and replaced the figure files with higher resolution images (Figure 6 in current manuscript).
  
  The value of including Supplemental Figure 2 (MOG) is not clear because it receives little mention in the text and also seems to be previously published data that could be cited.
  
  We have removed the figure and replaced it with a citation and a link to the Cell Browser.
  
  Supplemental Figure 4: In panel G, the legend shows "YH10" and "13325". These terms are not described in the Figure legend, nor did a search of the manuscript identify these terms. In its current form Supp. Fig. 4G is not interpretable. In addition, would be more clear to use the term "RV-infected" instead of "treated" to describe the addition of the virus.
  
  We have expanded the Methods section to include the description of different organoid lines and added a revised legend for Supplementary Figure 4. We do not provide evidence of RV infecting organoids without microglia, therefore we have revised the claims that organoid cells become infected with the virus and replaced it with “RV-exposed” to better reflect the conditions studied.
  
  Reviewer #3 (Recommendations for the authors):
  
  1) Demonstrate and quantify virus replication to provide data to complement the imaging. In order of data quality, plaque assays would be most convincing in demonstrating infection and release of infectious virus, while a time course of PCR on RV transcripts would support a conclusion of replicating virus. Further, staining with an anti-double-stranded RNA antibody (J2) would represent evidence of virus replication.
  
  In response to the reviewer’s comment, we performed an RV titering experiment in dissociated microglia co-cultured with other cell types, as well as Vero cells control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. We now include these data in Supplementary Figure 2D.
  
  Author response image 16.
  
  Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.
  
  We detected a very modest increase in RV RNA in infected brain slices over time using RT- qPCR (see Author response image 17, not included in current manuscript)
  
  Author response image 17.
  
  Modest increase in RV RNA over time in brain slice infections. Rubella virus RNA measured by qPCR relative to GAPDH gene, in n=3 samples (2 technical replicates each condition). Brain slices were exposed to RV, then collected at end of inoculation (4 hours post infection), or at 3 or 5 days post infection, and processed for RNA extraction and RT-qPCR.
  
  Unfortunately, we did not find J2 staining informative because we could detect signal in both wild type RV infection conditions and in heat-inactivated RV, presumably due to native dsRNA species present in cells. We did not detect any increase of difference in the pattern of staining between RV and heat-inactivated virus-exposed conditions (Author response image 18; not included in the manuscript).
  
  Author response image 18.
  
  J2 antibody labels dsRNA in both RV-exposed and control heat- inactivated virus conditions, presumably due to native dsRNA that is not unique to the viral replication.
  
  We utilized FISH to detect negative-stranded (non-genomic) RV RNA as an alternative to J2 to indicate RNA replication. However, it proved to be not very sensitive, as a small quantity of negative-strand RV RNA could be detected in highly infected Vero cells, but negative-strand RV RNA was not detected in more modestly infected microglia (based on positive-strand RV RNA quantification), as in Author response image 19, not included in current manuscript.
  
  Author response image 19.
  
  FISH probes to positive strand (genomic) and negative strand (replication template) RV RNA in Vero cells and microglia co-cultures. A: representative images of Vero cells infected with RV (top row) or Zika virus as control (bottom row). At 72hpi, cells were fixed and processed for immunofluorescence with anti-RV capsid antibody (RVcap) or Zika virus antibody (Zika4G2), and then FISH was performed using probes to positive strand (+) or negative strand (-) RV RNA. Negative strand RV RNA difficult to visualize at low-power magnification, and required quantification within cell borders defined by wheat germ agglutinin staining with results in panel B. B: In Vero cells, negative strand RV RNA is detected in strongly infected cells. Infection strength determined by intensity of RV capsid immunofluorescence staining and positive strand RV RNA (RVcap/(+) 2/3 indicates robust infection, RVcap/(+) 1 indicates weak infection). ZIKVinf = Zika virus infected control. C: In microglia co-cultures, positive strand RV RNA detected in cells with RV capsid immunopositivity (RVcap_pos). RVinf = RV infected. RVHI = heat-inactivated RV. D: In microglia co-cultures, negative strand RV RNA quantification not significantly different between mock, heat-inactivated RV (RVHI), or RV- infected conditions (RVinf), including cells with weak positive-strand RV RNA (RVinf, (+)<8) or cells with stronger positive-strand RV RNA ((RVinf, (+)>=8). Two biological replicates (bHR60 and bHR61), n indicates number of cells counted.
  
  While we could not detect an increase in the viral particles from microglia mixed cultures, we confirmed the presence of GFP from the RV-GFP reporter construct, and we believe it serves as a proof that the virus can infect microglia cells and lead to production of functional viral protein (see Author response image 20, Figure 1E-F of the current manuscript)
  
  Author response image 20.
  
  Thus, overall we detect replication of viral RNA and protein (qPCR, RV-GFP), but not an appreciable increase in released newly-made virions. The discussion now reflects this more clearly in the current manuscript.
  
  2) The claim of requiring a diffusible factor to enable RV infection requires additional data. A suggestion would be to include further characterization of affinity-purified cells to define the levels of cell enrichment and to determine which other cell types are present, It is also important to test the RV infection of the fractionated cell types alone before adding to the microglia, in order to demonstrate whether RV is replicating in cell types other than microglia.
  
  We performed quantifications of RV capsid-positive cells in each of the affinity-purified cell populations: neuron-enriched (purified with PSA-NCAM beads), glia-enriched (PSA-NCAM depleted cell fraction), or non-microglia fraction (“Flow through”, depleted of CD11b+ cells). We show that across each condition, we have low infectivity (ranging from ~1 to 4% of total cell population) after 72 hours post-infection. We include these data in Supplementary Figure 3.
  
  Author response image 21.
  
  Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.
  
  Another approach to limit cell heterogeneity would be to use iPSC-derived cells, which are highly enriched as a single cell type as a specific cell type, to test the requirement for additional cell types to achieve RV infection of microglia.
  
  In our prior publication (Popova et al. 2021) we have identified a number of molecular differences between primary and iPSC derived microglia. iPSC derived microglia like cells could show differences in infection tropism from primary microglia, and those results may be difficult to interpret biologically. We agree with the reviewer that iPSC derived cells would be an interesting model, there are now several distinct protocols for deriving microglia like cells from pluripotent stem cells and we feel that embarking on a protocol comparison project would fall outside the scope of the current manuscript.
  
  3) Consider a longer organoid infection. The authors did not identify viral RNA transcripts in their organoid scRNAseq data after a 72-hour infection. Although the 72-hour time point seems right for cells in 2D culture, it’s possible that the infection in the organoids is slower because the virus has to spread inwardly. It would be worth trying a time course out to 2 weeks, collecting organoids every few days and then imaging and doing pcr or plaque assays. Zoomed-out views that show immunofluorescence of the entire organoid would also be beneficial in assessing organoid quality and immunofluorescent staining to identify cell types,
  
  We performed longer RV infection for two weeks and now present data on RV capsid in microglia in 72 hrs and 2 weeks post-infection (Author response image 22, Figure 4C of the current manuscript). We have also validated one of the scRNAseq-generated gene candidates in combination with different cell type markers and present data on whole organoids immunostained with NeuN for neurons and EOMES for intermediate progenitor cells that demonstrate the overall structure of the organoids (Author response image 23; Figure 6 of the current manuscript).
  
  Author response image 22.
  
  Microglia in organoids co-localize with RV capsid staining. Organoid with microglia were exposed to RV for 72 hrs or two weeks.
  
  Author response image 23.
  
  Organoids labeled with splice regulator NOVA1 (magenta), neuronal marker NeuN (green) and intermediate progenitor cell marker EOMES (cyan).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.10.24.513565v2
www.biorxiv.org www.biorxiv.org

New submission 18/12/2023, 07:20:45

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?
  
  Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)
  
  Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.
  
  We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.
  
  Author response image 1.
  
  Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?
  
  We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.
  
  Author response image 2.
  
  Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.
  
  We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)
  
  Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.
  
  Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).
  
  First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.
  
  The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.
  
  Author response image 3.
  
  Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.
  
  To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).
  
  Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.
  
  Author response image 4.
  
  Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).
  
  We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.
  
  In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.
  
  Author response image 5.
  
  Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.
  
  We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)
  
  Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.
  
  Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)
  
  Author response image 6.
  
  Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.
  
  We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.
  
  The concerns mentioned above are:
  
  (1) sLNvs express PDF and sNPF but not Dh31;
  
  (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;
  
  (3) PDFR presents in sLNvs but not l-LNvs.
  
  (4) PDF is not included in the analysis.
  
  To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.
  
  First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.
  
  Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).
  
  Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.
  
  Author response image 7.
  
  Finally, we revised the text to (Line 286-317):
  
  To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).
  
  Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.
  
  Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.
  
  Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.
  
  We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.
  
  Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.
  
  The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))
  
  Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.
  
  We have revised the relevant parts.
  
  ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)
  
  DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)
  
  Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).
  
  We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)
  
  Reviewer #3 (Recommendations For The Authors):
  
  Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.
  
  We have attempted to enlarge the font as much as possible in the revised version.
  
  Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?
  
  In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).
  
  Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).
  
  We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.
  
  Point 4-Line 792. Could induced be introduced?
  
  Yes, we have now corrected this typo.
  
  Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.
  
  We have now revised the labels (Fig. S1).
  
  Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.
  
  We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.
  
  Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.
  
  We now have deleted them.
  
  Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?
  
  We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.
  
  Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).
  
  In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.
  
  Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.
  
  We have revised related part in the manuscript (Line 223-231):
  
  Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.
  
  Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE：UDP was gained by accident)
  
  Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0
  
  Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?
  
  In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.
  
  Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.
  
  We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”
  
  Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?
  
  We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.
  
  Point 14-Finally, the sentence ¨The figure is...¨ requires revision.
  
  We have now revised it: “The exact cell number for each subset is annotated in Table S4”.
  
  Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.
  
  First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.
  
  Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.
  
  Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?
  
  The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).
  
  Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).
  
  We have now revised this. (Line 286-317)
  
  Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?
  
  This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.
  
  As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.
  
  Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.
  
  We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.
  
  Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?
  
  We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.
  
  Point 21-Line 314. The sentence needs revision.
  
  We have revised these sentences.
  
  Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.
  
  We have revised related description.
  
  ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)
  
  DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)
  
  Reference
  
  Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.
  
  Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.
  
  Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.
  
  Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.
  
  Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.
  
  Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.
  
  Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.
  
  Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.
  
  He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.
  
  Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.
  
  Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.
  
  Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.
  
  Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.
  
  Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.
  
  Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.
  
  Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.
  
  Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.
  
  Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.
  
  Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.
  
  Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.
  
  Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.
  
  Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.
  
  Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.
  
  Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.26.559642v2
www.biorxiv.org www.biorxiv.org

New submission 15/08/2023, 08:20:42

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  This paper describes the development and initial validation of an approach-avoidance task and its relationship to anxiety. The task is a two-armed bandit where one choice is 'safer' - has no probability of punishment, delivered as an aversive sound, but also lower probability of reward - and the other choice involves a reward-punishment conflict. The authors fit a computational model of reinforcement learning to this task and found that self-reported state anxiety during the task was related to a greater likelihood of choosing the safe stimulus when the other (conflict) stimulus had a higher likelihood of punishment. Computationally, this was represented by a smaller value for the ratio of reward to punishment sensitivity in people with higher task-induced anxiety. They replicated this finding, but not another finding that this behavior was related to a measure of psychopathology (experiential avoidance), in a second sample. They also tested test-retest reliability in a sub-sample tested twice, one week apart and found that some aspects of task behavior had acceptable levels of reliability. The introduction makes a strong appeal to back-translation and computational validity, but many aspects of the rationale for this task need to be strengthened or better explained. The task design is clever and most methods are solid - it is encouraging to see attempts to validate tasks as they are developed. There are a few methodological questions and interpretation issues, but they do not affect the overall findings. The lack of replicated effects with psychopathology may mean that this task is better suited to assess state anxiety, or to serve as a foundation for additional task development.
  
  We thank the reviewer for their kind comments and constructive feedback. We agree that the approach taken in this paper appears better suited to state anxiety, and further work is needed to assess/improve its clinical relevance.
  
  Reviewer #1 (Recommendations For The Authors):
  
  1) For the introduction, the authors communicate well the appeal of tasks with translational potential, and setting up this translation through computational validity is a strong approach. However, I had some concerns about how the task was motivated in the introduction:
  
  a) The authors state that current approach-avoidance tasks used in humans do not resemble those used in the non-human literature, but do not provide details on what exactly is missing from these tasks that makes translation difficult.
  
  Our intention for the section that the reviewer refers to was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we note that the phrasing was perhaps unfair to recent tasks that were explicitly designed to be translatable across species. Therefore, we have amended the text to the following:
  
  In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli, which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).
  
  b) Although back-translation to 'match' human paradigms to non-animal paradigms is useful for research, this isn't the end goal of task development. What really matters is how well these tasks, whether in humans or not, capture psychopathology-relevant behavior. Many animal paradigms were developed and brought into extensive use because they showed sensitivity to pharmacological compounds (e.g., benzodiazepines). The introduction accepts the validity of these paradigms at face value, and doesn't address whether developing human tests of psychopathology based on sensitivity to existing medication classes is the best way to generate new insights about psychopathology.
  
  We agree that whilst paradigms with translational and computational validity have merits of their own for neuroscientific theory, clinical validity (i.e. how well the paradigm reflects a phenomenon relevant to psychopathology) is key in the context of clinical applications. While our findings of associations between task performance and self-reported (state) anxiety suggest that our approach is a step in the right direction, the lack of associations with clinical measures was disappointing. Although future work is needed to more directly test the sensitivity of the current approach to psychopathology, this may mean that it, and its non-human counterparts, do not measure behaviours relevant to pathological anxiety. Since our primary focus in this paper was on translational and computational validity, we have opted to discuss the author’s suggestion in the ‘Discussion’ section, as follows:
  
  Further, it is worth noting that many animal paradigms were developed and widely adopted due to their sensitivity to anxiolytic medication (Cryan & Holmes, 2005). Given the lack of associations with clinical measures in our results, it is possible that current translational models of anxiety may not fully capture behaviours that are directly relevant to pathological anxiety. To develop translational paradigms of clinical utility, future research should place a stronger emphasis on assessing their clinical validity in humans.
  
  c) The authors may want to bring in the literature on the description-experience gap (e.g., PMID: 19836292) when discussing existing decision tasks and their computational dissimilarity to non-human operant conditioning tasks.
  
  We thank the reviewer for this useful addition to the introduction. We have now added the following to the 'Introduction’ section:
  
  Moreover, evidence from economic decision-making suggests that explicit offers of probabilistic outcomes can impact decision-making differently compared to when probabilistic contingencies need to be learned from experience (referred to as the ‘description-experience gap’; Hertwig & Erev, 2009); this finding raises potential concerns regarding the use of offer-based tasks in humans as approximations of non-human tasks that do not involve explicit offers.
  
  d) How does one evaluate how computationally similar human vs. non-human tasks are? What are the criteria for making this judgement? Specific to the current tasks, many animal learning tasks are not learning tasks in the same sense that human learning tasks are, in terms of the number of trials used and if the animals are choosing from a learned set of contingencies versus learning the contingencies during the testing.
  
  The computational similarity of human and non-human strategies in a given translational task can be tested empirically. This can be done by fitting models to the data and assessing whether similar models explain choices, even if parameter distributions might vary across species due to, for example, physiological differences. Indeed, non-human animals require much more training to perform even uni-dimensional reinforcement learning, but once they are trained, it should be possible to model their responses. In fact, it should even be possible to take training data into account in some cases. For example, the training phase of the Vogel/Geller-Seifter preclinical tests require an animal to learn to emit a certain action (e.g. lever press) simply to obtain some reward. In the next phase, an aversive outcome is introduced as an additional outcome, but one could model both the training and test phase together – the winning model in our studies would be a suitable candidate to model behaviour here. As we also discuss predictive validity in the ‘Discussion’ section, we opted to add the following text there too:
  
  … computational validity would also need to be assessed directly in non-human animals by fitting models to their behavioural data. This should be possible even in the face of different procedures across species such as number of trials or outcomes used (shock or aversive sound). We are encouraged by our finding that the winning computational model in our study relies on a relatively simple classical reinforcement learning strategy. There exist many studies showing that non-human animals rely on similar strategies during reward and punishment learning (Mobbs et al., 2020; Schultz, 2013); albeit to our knowledge this has never been modelled in non-human animals where rewards and punishment can occur simultaneously.
  
  2) What do the authors make of the non-linear relationship between probability of punishment and probability of choosing the conflict stimulus (Fig 2d), especially in the high task-induced anxiety participants? Did this effect show up in the replication sample as well?
  
  Figures 2c-e were created by binning the continuous predictors of outcome probabilities into discrete bins of equal interval. Since punishment probability varied according to Gaussian random walks, it was also distributed with more of its mass in the central region (~ 0.4), and so values at the extreme bins were estimated on fewer data and with greater variance. The non-linear relationships are likely thus an artefact of our task design and plotting procedure. The pattern was also evident in the replication sample, see Author response image 1:
  
  Author response image 1.
  
  However, since these effects were estimated as linear effects in the logistic regression models, and to avoid overfitting/interpretations of noise arising from our task design, we now plot logistic curves fitted to the raw data instead.
  
  3) How correlated were learning rate and sensitivity parameters? The EM algorithm used here can sometimes result in high correlations among these sets of parameters.
  
  As the reviewer suspects the parameters were strongly correlated, especially across the punishment-specific parameters. The Pearson’s r estimates for the untransformed parameter values were as follows:
  
  Reward parameters: discovery sample r = -0.39; replication sample r = -0.78
  
  Punishment parameters: discovery sample r = -0.91; replication sample r = -0.85
  
  We have included the correlation matrices of the estimated parameters as Supplementary Figure 2 in the ‘Computational modelling’ section of the Supplement.
  
  We have now also re-fitted the winning model using variational Bayesian inference (VBI) via Stan, and found that the cross-parameter correlations were much lower than when the data were fitted using EM. We also ran a sensitivity analysis assessing whether using VBI changed the main findings of our studies. This showed that the correlation between task-induced anxiety and the reward-punishment sensitivity index was robust to fitting method, as was the mediating effect of reward-punishment sensitivity index on anxiety’s effect on choice. This indicates that overall our key findings are robust to different methods of parameter-fitting.
  
  We now direct readers to these analyses from the new ‘Sensitivity analyses’ section in the manuscript, as follows:
  
  As our procedure for estimating model parameters (the expectation-maximisation algorithm, see ‘Methods’) produced high inter-parameter correlations in our data (Supplementary Figure 2), we also re-estimated the parameters using Stan’s variational Bayesian inference algorithm (Stan Development Team, 2023) – this resulted in lower inter-parameter correlations, but our primary computational finding, that the effect of anxiety on choice is mediated by relative sensitivity to reward/punishment was consistent across algorithms (see Supplement section 9.8 for details).
  
  We have included the relevant analyses comparing EM and VBI in the Supplement, as follows:
  
  [9.8 Sensitivity analysis: estimating parameters via expectation maximisation and variational Bayesian inference algorithms]
  
  Given that the expectation maximisation (EM) algorithm produced high inter-parameter correlations, we ran a sensitivity analysis by assessing the robustness of our computational findings to an alternative method of parameter estimation – (mean-field) variational Bayesian inference (VBI) via Stan (Stan Development Team, 2023). Since, unlike EM, the results of VBI are very sensitive to initial values, we fitted the data 10 times with different initial values.
  
  Inter-parameter correlations
  
  The VBI produced lower inter-parameter correlations than the EM algorithm (Supplementary Figure 8).
  
  Sensitivity analysis
  
  Since multicollinearity in the VBI-estimated parameters was lower than for EM, indicating less trade-off in the estimation, we re-tested our computational findings from the manuscript as part of a sensitivity analysis. We first assessed whether we observed the same correlations between task-induced anxiety and punishment learning, and reward-punishment sensitivity index (Supplementary Figure 9a). Punishment learning rate was not significantly associated with task-induced anxiety in any of the 10 VBI iterations in the discovery sample, although it was in 9/10 in the replication sample. On the other hand, the reward-punishment sensitivity index was significantly associated with task-induced anxiety in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. This suggests that the correlation of anxiety and sensitivity index is robust to these two fitting approaches.
  
  We also re-estimated the mediation models, where in the EM-estimated parameters, we found that the reward-punishment sensitivity index mediated the relationship between task-induced anxiety and task choice proportions (Supplementary Figure 9b). Again, we found that the reward-punishment sensitivity index was a significant mediator in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. Punishment learning rate was also a significant mediator in 9/10 iterations in the replication sample, although it was not in the discovery sample for all iterations, and this was not observed for the EM-estimated parameters.
  
  Overall, we found that our key results, that anxiety is associated with greater sensitivity to punishment over reward, and this mediates the relationship between anxiety and approach-avoidance behaviour, were robust across both fitting methods.
  
  As an aside, we were unable to run the model fitting using Markov chain Monte Carlo sampling approaches due to the computational power and time required for a sample of this size (Pike & Robinson, 2022, JAMA Psychiatry).
  
  4) What is the split-half reliability of the task parameters?
  
  We thank the reviewer for this query. We have now included a brief section on the (good-to-excellent) split-half reliability of the task in the manuscript:
  
  We assessed the split-half reliability of the task by correlating the overall proportion of conflict option choices and model parameters from the winning model across the first and second half of trials. For overall choice proportion, reliability was simply calculated via Pearson’s correlations. For the model parameters, we calculated model-derived estimates of Pearson’s r values from the parameter covariance matrix when first- and second-half parameters were estimated within a single model, following a previous approach recently shown to accurately estimate parameter reliability (Waltmann et al., 2022). We interpreted indices of reliability based on conventional values of < 0.40 as poor, 0.4 - 0.6 as fair, 0.6 - 0.75 as good, and > 0.75 as excellent reliability (Fleiss, 1986). Overall choice proportion showed good reliability (discovery sample r = 0.63; replication sample r = 0.63; Supplementary Figure 5). The model parameters showed good-to-excellent reliability (model-derived r values ranging from 0.61 to 0.85 [0.76 to 0.92 after Spearman-Brown correction]; Supplementary Figure 5).
  
  5) The authors do a good job of avoiding causal language when setting up the cross-sectional mediation analysis, but depart from this in the discussion (line 335). Without longitudinal data, they cannot claim that "mediation analyses revealed a mechanism of how anxiety induces avoidance".
  
  Thank you for spotting this, we have now amended the text to:
  
  … mediation analyses suggested a potential mechanism of how anxiety may induce avoidance.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors develop a computational approach-avoidance-conflict (AAC) task, designed to overcome limitations of existing offer based AAC tasks. The task incorporated likelihoods of receiving rewards/ punishments that would be learned by the participants to ensure computational validity and estimated model parameters related to reward/punishment and task induced anxiety. Two independent samples of online participants were tested. In both samples participants who experienced greater task induced anxiety avoided choices associated with greater probability of punishment. Computational modelling revealed that this effect was explained by greater individual sensitivities to punishment relative to rewards.
  
  Strengths:
  
  Large internet-based samples, with discovery sample (n = 369), pre-registered replication sample (n = 629) and test-retest sub group (n = 57). Extensive compliance measures (e.g. audio checks) seek to improve adherence.
  
  There is a great need for RL tasks that model threatening outcomes rather than simply loss of reward. The main model parameters show strong effects and the additional indices with task based anxiety are a useful extension. Associations were broadly replicated across samples. Fair to excellent reliability of model parameters is encouraging and badly needed for behavioral tasks of threat sensitivity.
  
  We thank the reviewer for their comments and constructive feedback.
  
  The task seems to have lower approach bias than some other AAC tasks in the literature. Although this was inferred by looking at Fig 2 (it doesn't seem to drop below 46%) and Fig 3d seems to show quite a strong approach bias when using a reward/punishment sensitivity index. It would be good to confirm some overall stats on % of trials approached/avoided overall.
  
  The range of choice proportions is indeed an interesting statistic that we have now included in the manuscript:
  
  Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.14, min/max = [0.01, 0.99]).
  
  Weaknesses:
  
  The negative reliability of punishment learning rate is concerning as this is an important outcome.
  
  We agree that this is a concerning finding. As reviewer 3 notes, this may have been due to participants having control over the volume used to play the aversive sounds in the task (see below for our response to this point). Future work with better controlled experimental settings will be needed to determine the reliability of this parameter more accurately.
  
  This may also have been due to the asymmetric nature of the task, as only one option could produce the punishment. This means that there were fewer trials on which to estimate learning about the occurrence of a punishment. Future work using continuous outcomes, as the reviewer suggests below, whilst keeping the asymmetric relationship between the options, could help in this regard.
  
  We have included the following comment on this issue in the manuscript:
  
  Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed punishment sensitivity). Further, the asymmetric nature of the task may have impacted our ability to estimate the punishment learning rate, as there were fewer occurrences of the punishment compared to the reward.
  
  The Kendall's tau values underlying task induced anxiety and safety reference/ various indices are very weak (all < 0.1), as are the mediation effects (all beta < 0.01). This should be highlighted as a limitation, although the interaction with P(punishment|conflict) does explain some of this.
  
  We now include references to the effect sizes to emphasise this limitation. We also note, as the reviewer suggests, that this may be due to crudeness of overall choice proportion as a measure of approach/avoidance, as it is contaminated with variables such as P(punishment|conflict).
  
  One potentially important limitation of our findings is the small effect size observed in the correlation between task-induced anxiety and avoidance (Kendall's tau values < 0.1, mediation betas < 0.01). This may be attributed to the simplicity of using overall choice proportion as a measure of approach/avoidance, as the effect of anxiety on choice was also influenced by punishment probability.
  
  The inclusion of only one level of reward (and punishment) limits the ecological validity of the sensitivity indices.
  
  We agree that using multi-level outcomes will be an important question for future work and now explicitly note this in the manuscript, as below:
  
  Using multi-level or continuous outcomes would also improve the ecological validity of the present approach and interpretation of the sensitivity parameters.
  
  Appraisal and impact:
  
  Overall this is a very strong paper, describing a novel task that could help move the field of RL forward to take account of threat processing more fully. The large sample size with discovery, replication and test-retest gives confidence in the findings. The task has good ecological validity and associations with task-based anxiety and clinical self-report demonstrate clinical relevance. The authors could give further context but test-retest of the punishment learning parameter is the only real concern. Overall this task provides an exciting new probe of reward/threat that could be used in mechanistic disease models.
  
  We thank the reviewer again for helping us to improve our analyses and manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Additional context:
  
  In the introduction "cognitive tasks that bear little semblance to those used in the non-human literature" seems a little unfair. One study that is already cited (Ironside et al, 2020) used a task that was adapted from non-human primates for use in humans. It has almost identical visual stimuli (different levels of simultaneous reward and aversive outcome/punishment) and response selection processes (joystick) between species and some overlapping brain regions were activated across species for conflict and aversiveness. The later point that non-human animals must be trained on the association between action and outcome is well taken from the point of view of computational validity but perhaps not sufficient to justify the previous statement.
  
  Our intention for this section was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we agree that this phrasing is unfair to recent studies such as those by Ironside and colleagues. Therefore, we have amended the text to the following:
  
  In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases to approach/move towards positive stimuli and avoid/move away from negative stimuli which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).
  
  It would be good to speculate on why task induced anxiety made participants slower to update their estimates of punishment probability.
  
  Although a meta-analysis of reinforcement learning studies using reward and punishment outcomes suggests a positive association between punishment learning rate and anxiety symptoms (and depressed mood), we paradoxically found the opposite effect. However, previous work has suggested that distinct forms of anxiety associate differently with anxiety (Wise & Dolan, 2020, Nat. Commun.), where somatic anxiety was negatively correlated with punishment learning rate whereas cognitive anxiety showed the opposite effect. We have now added the following to the manuscript, and noted that future work is needed to understand the potentially complex relationship between anxiety and learning from punishments:
  
  Notably, although a recent computational meta-analysis of reinforcement learning studies showed that symptoms of anxiety and depression are associated with elevated punishment learning rates (Pike & Robinson, 2022), we did not observe this pattern in our data. Indeed, we even found the contrary effect in relation to task-induced anxiety, specifically that anxiety was associated with lower rates of learning from punishment. However, other work has suggested that the direction of this effect can depend on the form of anxiety, where cognitive anxiety may be associated with elevated learning rates, but somatic anxiety may show the opposite pattern (Wise & Dolan, 2020) and this may explain the discrepancy in findings. Additionally, parameter values are highly dependent on task design (Eckstein et al., 2022), and study designs to date may be more optimised in detecting differences in learning rate (Pike & Robinson, 2022) – future work is needed to better understand the potentially complex association between anxiety and punishment learning rate. Lastly, as punishment learning rate was severely unreliable in the test-retest analyses, and the associations between punishment learning rate and state anxiety were not robust to an alternative method of parameter estimation (variational Bayesian inference), the negative correlation observed in our study should be treated with caution.
  
  Were those with more task-based anxiety more inflexible in general?
  
  The lack of associations across reward learning rate and task-induced anxiety suggest that this was not a general inflexibility effect. To test the reviewer’s hypothesis more directly, we conducted a sensitivity analysis by examining the model with a general learning rate – this did not support a general inflexibility effect. Please see the new section in the Supplement below:
  
  [9.10 Sensitivity analysis: anxiety and inflexibility]
  
  As anxious participants were slower to update their estimates of punishment probability, we determined whether this was due to greater general inflexibility by examining the model including two sensitivity parameters, but one general learning rate (i.e. not split by outcome). The correlation between this general learning rate and task-induced anxiety was not significant in either samples (discovery: tau = -0.02, p = 0.504; replication: tau = -0.01, p = 0.625), suggesting that the effect is specific to punishment.
  
  Was the 16% versus 20% of the two samples with clinically relevant anxiety symptoms significantly different? What about other demographics in the two samples?
  
  The difference in proportions were not significantly different (χ2 = 2.33, p = 0.127). The discovery sample included more females and was older on average compared to the replication sample – information which we now report in the manuscript:
  
  The discovery sample consisted of a significantly greater proportion of female participants than the replication sample (59% vs 52%, χ2 = 4.64, p = 0.031). The average age was significantly different across samples (discovery sample mean = 37.7, SD = 10.3, replication sample mean = 34.3, SD = 10.4; t785.5 = 5.06, p < 0.001). The differences in self-reported psychiatric symptoms across samples did not reach significance (p > 0.086).
  
  It would be interesting to know how many participants failed the audio attention checks.
  
  We have now included information about what proportion of participants fail each of the task exclusion criteria in the manuscript:
  
  Firstly, we excluded participants who missed a response to more than one auditory attention check (see above; 8% in both discovery and replication samples) – as these occurred infrequently and the stimuli used for the checks were played at relatively low volume, we allowed for incorrect responses so long as a response was made. Secondly, we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4/6% in discovery and replication samples, respectively). Lastly, we excluded those who did not respond on 20 or more trials (1/2% in discovery and replication samples, respectively). Overall, we excluded 51 out of 423 (12%) in the discovery sample, and 98 out of 725 (14%) in the replication sample.
  
  There doesn't appear to be a model with only learning from punishment (i.e. no reward learning) included in the model comparison. It would be interesting to see how it compared.
  
  We have fitted the suggested model and found that it is the least parsimonious of the models. Since participants were monetarily incentivised based on the rewards only, this was to be expected. We have now added this ‘punishment learning only’ model and its variant including a lapse term into the model comparison. The two lowest bars on the y-axis in Author response image 2 represent these models.
  
  Author response image 2.
  
  Were sex effects examined as these have been commonly found in AAC tasks. How about other covariates such as age?
  
  We have now tested the effects of sex and age on behaviour and on parameter values. There were indeed some significant effects, albeit with some inconsistencies across the two samples, which for completeness we have included in the manuscript, as follows:
  
  While sex was significantly associated with choice in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).
  
  Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).
  
  Minor: Still a few placeholders (Supplementary Table X/ Table X) in the methods
  
  We thank the reviewer for spotting these errors. We have now corrected these references.
  
  Reviewer #3 (Public Review):
  
  This study investigated cognitive mechanisms underlying approach-avoidance behavior using a novel reinforcement learning task and computational modelling. Participants could select a risky "conflict" option (latent, fluctuating probabilities of monetary reward and/or unpleasant sound [punishment]) or a safe option (separate, generally lower probability of reward). Overall, participant choices were skewed towards more rewarded options, but were also repelled by increasing probability of punishment. Individual patterns of behavior were well-captured by a reinforcement learning model that included parameters for reward and punishment sensitivity, and learning rates for reward and punishment. This is a nice replication of existing findings suggesting reward and punishment have opposing effects on behavior through dissociated sensitivity to reward versus punishment.
  
  Interestingly, avoidance of the conflict option was predicted by self-reported task-induced anxiety. This effect of anxiety was mediated by the difference in modelled sensitivity to reward versus punishment (relative sensitivity). Importantly, when a subset of participants were retested over 1 week later, most behavioral tendencies and model parameters were recapitulated, suggesting the task may capture stable traits relevant to approach-avoidance decision-making.
  
  We thank the reviewer for their useful analysis of our study. Indeed, it was reassuring to see that performance indices were reliable across time.
  
  However, interpretation of these findings are severely undermined by the fact that the aversiveness of the auditory punisher was largely determined by participants, with the far-reaching impacts of this not being accounted for in any of the analyses. The manipulation check to confirm participants did not mute their sound is highly commendable, but the thresholding of punisher volume to "loud but comfortable" at the outset of the task leaves substantial scope for variability in the punisher delivered to participants. Indeed, participants' ratings of the unpleasantness of the punishment was moderate and highly variable (M = 31.7 out of 50, SD = 12.8 [distribution unreported]). Despite having this rating, it is not incorporated into analyses. It is possible that the key finding of relationships between task-induced anxiety, reward-punishment sensitivity and avoidance are driven by differences in the punisher experienced; a louder punisher is more unpleasant, driving greater task-induced anxiety, model-derived punishment sensitivity, and avoidance (and vice versa). This issue can also explain the counterintuitive findings from re-tested participants; lower/negatively correlated task-induced anxiety and punishment-related cognitive parameters may have been due to participants adjusting their sound settings to make the task less aversive (retest punisher rating not reported). It can therefore be argued that the task may not actually capture meaningful cognitive/motivational traits and their effects on decision-making, but instead spurious differences in punisher intensity.
  
  We thank the reviewer for raising this important potential limitation of our study. We agree that how participants self-adjusted their sound volume may important consequences for our interpretations of the data. Unfortunately, despite the scalability of online data collection, this highlights one of its major weaknesses in the lack of controllability over experimental parameters. The previous paper from which we obtained our aversive sounds (Seow & Hauser, 2021, Behav Res, doi.org/10.3758/s13428-021-01643-0) contains useful analyses with regards to this discussion. When comparing the unpleasantness of the sounds played at 50% vs 100% volume, the authors indeed found that the lower volumes lead to lower unpleasantness ratings. However, the magnitude of this effect did not appear to be substantial (Fig. 4 from the paper), and even at 50% volume, the scream sounds we used were rated in the top quartile for unpleasantness, on average. This implies that the sounds have sufficient inherent unpleasantness, even when played at half intensity. We find this reassuring, in the sense that any self-imposed volume effects may not be large. Of note, our instructions to participants to adjust the volume to a ‘loud but comfortable’ level was based on the same phrasing used in this study.
  
  To the reviewers point on how this might affect the reliability of the task, we have included the following in the ‘Discussion’ section:
  
  Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed other measures).
  
  Please see below for analyses accounting for punishment unpleasantness ratings.
  
  This undercuts the proposed significance of this task as a translational tool for understanding anxiety and avoidance. More information about ratings of punisher unpleasantness and its relationship to task behavior, anxiety and cognitive parameters would be valuable for interpreting findings. It would also be of interest whether the same results were observed if the aversiveness of the punisher was titrated prior to the task.
  
  As suggested, we have now included sensitivity analyses using the unpleasantness ratings that show their effect is minimal on our primary inference. We report relevant results below in the ‘Recommendations For The Authors’ section. At the same time, we think it is important to acknowledge that unpleasantness is a combination of both the inherent unpleasantness of the sound and the volume it is presented at, where only the latter is controlled by the participant. Therefore, these analyses are not a perfect indicator of the effect of participant control. For convenience, we reproduce the key findings from this sensitivity analysis here:
  
  Approach-avoidance hierarchical logistic regression model
  
  We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.
  
  Mediation model
  
  When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).
  
  More generally, whether or not to titrate the punishments (and indeed the rewards) is an interesting experimental decision, which we think should be guided by the research question. In our case, we were interested in individual differences in reward/punishment learning and sensitivity and their relation to anxiety, so variation in how aversive the sounds affected approach-avoidance decisions was an important aspect of our design. In studies where the aim is to understand more general processes of how humans act under approach-avoidance conflict, it may be better to tightly control the salience of reinforcers.
  
  Ultimately, the best test of the causal role of anxiety on avoidance, and against the hypothesis that our results were driven by spurious volume control effects, would be to run within-subjects anxiety interventions, where these volume effects are naturally accounted for. This will be an important direction for future studies using similar measures. We have added a paragraph in the ‘Discussion’ section on this point:
  
  Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.
  
  Although the procedure and findings reported here remain valuable to the field, claims of novelty including its translational potential are perhaps overstated. This study complements and sits within a much broader literature that investigates roles for aversion and cognitive traits in approach-avoidance decisions. This includes numerous studies that apply reinforcement learning models to behavior in two-choice tasks with latent probabilities of reward and punishment (e.g., see doi: 10.1001/jamapsychiatry.2022.0051), as well as other translationally-relevant paradigms (e.g., doi: 10.3389/fpsyg.2014.00203, 10.7554/eLife.69594, etc).
  
  We agree with the reviewer that our approach builds on previous work in reinforcement learning, approach-avoidance conflict and translational measures of anxiety. Whilst there are by now many studies using two-choice learning tasks with latent reward and punishment probabilities, our main, and which we refer to as ‘novel’, aim was to bring these fields together in such a way so as to model anxiety-related behaviour.
  
  We note that we do not make strong statements about whether these effects speak to traits per se, and as Reviewer 1 notes, the evidence from our study suggests that the present measure may be better suited to assessing state anxiety. While computational model parameters can and are certainly often interpreted as constituting stable individual traits, a more simple interpretation of our findings may be that state anxiety is associated with a momentary preference for punishment avoidance over reward pursuit. This can still be informative for the study of anxiety, especially given the notion of a continuous relationship between adaptive/state anxiety and maladaptive/persistent anxiety.
  
  Having said that, we agree with the underlying premise of the reviewer’s point that how the measure relates to trait-level avoidance/inhibition measures will be an interesting question for future work. We appreciate the importance of using tasks such as ours and those highlighted by the reviewer as trait-level measures, especially in computational psychiatry. We have now included a discussion on the potential roles of cognitive/motivational traits, in line with the reviewer’s recommendation – briefly, we have included the suggested references by the reviewer, discussed the measure’s potential relevance to cognitive/motivational traits, and direct interested readers to the broader literature. Please see below for details.
  
  Reviewer #3 (Recommendations For The Authors):
  
  As stated in the public review, punisher unpleasantness and its relationship to key findings (including for retest) should be reported and discussed.
  
  We signpost readers to our new analyses, incorporating unpleasantness ratings into the statistical models, from the main manuscript as follows:
  
  Since participants self-determined the volume of the punishments in the task, and therefore (at least in part) their aversiveness, we conducted sensitivity analyses by accounting for self-reported unpleasantness ratings of the punishment (see the Supplement). Our finding that anxiety impacts approach-avoidance behaviour was robust to this sensitivity analysis (p < 0.001), however the mediating effect of the reward-sensitivity sensitivity index was not (p > 0.1; see Supplement section 9.9 for details).
  
  We reproduce the relevant section from the Supplement below. Overall, we found that the effect of anxiety on choices (via its interaction with punishment probability) remained significant after accounting for unpleasantness, however the mediating effect of reward-punishment sensitivity was no longer significant when unpleasantness ratings were included in the model. As noted above, unpleasantness ratings are not a perfect measure of self-imposed sound volume, and indeed punishment sensitivity is essentially a computationally-derived measure of unpleasantness, which makes it difficult to interpret the mediation model which contains both of these measures. However, since we found that anxiety affected choice over and above and effects of self-imposed sound volume (using unpleasantness ratings as a proxy measure), we argue that the task still holds value as a model of anxiety-related avoidance.
  
  [Supplement Section 9.9: Sensitivity analyses of punishment unpleasantness]
  
  Distribution of unpleasantness
  
  The punishments were rated as unpleasant by the participants, on average (discovery sample: mean rating = 31.1 [scored between 0 and 50], SD = 13.1; replication sample: mean rating = 32.1, SD = 12.7; Supplementary Figure 10).
  
  Approach-avoidance hierarchical logistic regression model
  
  We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness ratings survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.
  
  Mediation model
  
  When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).
  
  Test-retest reliability of unpleasantness
  
  The test-retest reliability of unpleasantness ratings was excellent (ICC(3,1) = 0.75), although participants gave significantly lower ratings in the second session (t56 = 2.7, p = 0.008, d = 0.37; mean difference of 3.12, SD = 8.63).
  
  Reliability of other measures with/out unpleasantness
  
  To assess the effect of accounting for unpleasantness ratings on reliability estimates of task performance, we extracted variance components from linear mixed models, following a standard approach (Nakagawa et al., 2017) – note that this was not the method used to estimate reliability values in the main analyses, but we used this specific approach to compare the reliability values with and without the covariate of unpleasantness ratings. The results indicated that unpleasantness ratings did not have a material effect on reliability (Supplementary Figure 14).
  
  We discuss the findings of these sensitivity analyses in the ‘Discussion’ section, as follows:
  
  Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.
  
  Introduction and discussion should spend more time relating the task and current findings to existing procedures and findings examining individual differences in avoidance and cognitive/motivational correlates.
  
  We thank the reviewer for the opportunity to expand on the literature. Whilst there are numerous behavioural paradigms in both the human and non-human literature that involve learning about rewards and punishments, our starting point for the introduction was the state-of-the-art in translational models of approach-avoidance conflict models of anxiety. Therefore, for the sake of brevity and logical flow of our introduction, we have opted to bring in the discussion on other procedures primarily in the ‘Discussion’ section of the manuscript.
  
  We have now included the reviewer’s suggested citations from their ‘Public Review’ as follows:
  
  Since we developed our task with the primary focus on translational validity, its design diverges from other reinforcement learning tasks that involve reward and punishment outcomes (Pike & Robinson, 2022). One important difference is that we used distinct reinforcers as our reward and punishment outcomes, compared to many studies which use monetary outcomes for both (e.g. earning and losing £1 constitute the reward and punishment, respectively; Aylward et al., 2019; Jean-Richard-Dit-Bressel et al., 2021; Pizzagalli et al., 2005; Sharp et al., 2022). Other tasks have been used that induce a conflict between value and motor biases, relying on prepotent biases to approach/move towards rewards and withdraw from punishments, which makes it difficult to approach punishments and withdraw from rewards (Guitart-Masip et al., 2012; Mkrtchian et al., 2017). However, since translational operant conflict tasks typically induce a conflict between different types of outcome (e.g. food and shocks/sugar and quinine pellets; Oberrauch et al., 2019; van den Bos et al., 2014), we felt it was important to implement this feature. One study used monetary rewards and shock-based punishments, but also included four options for participants to choose from on each trial, with rewards and punishments associated with all four options (Seymour et al., 2012). This effectively requires participants to maintain eight probability estimates (i.e. reward and punishment at each of the four options) to solve the task, which may be too difficult for non-human animals to learn efficiently.
  
  We have also included a discussion on the measure’s potential relevance to cognitive/motivational traits as follows:
  
  Finally, whilst there is a broad literature on the roles of behavioural inhibition and avoidance tendency traits on decision-making and behaviour (Carver & White, 1994; Corr, 2004; Gray, 1982), we did not replicate the correlation of experiential avoidance and avoidance responses or the reward-punishment sensitivity index. Since there were also no significant correlations across task performance indices and clinical symptom measures, our findings suggest that the measure may be more sensitive to behaviours relating to state anxiety, rather more stable traits. Nevertheless, how performance in the present task relates to other traits such as behavioural approach/inhibition tendencies (Carver & White, 1994), as has been found in previous studies on reward/punishment learning (Sharp et al., 2022; Wise & Dolan, 2020) and approach-avoidance conflict (Aupperle et al., 2011), will be an important question for future work.
  
  We also now direct readers to a recent, comprehensive review on applying computational methods to approach-avoidance behaviours in the ‘Introduction’ section:
  
  A fundamental premise of this approach is that the brain acts as an information-processing organ that performs computations responsible for observable behaviours, including approach and avoidance (for a recent review on the application of computational methods to approach-avoidance conflict, see Letkiewicz et al., 2023).
  
  I am curious why participants were excluded if they made the same response on 20+ consecutive trials. How does this represent a cut-off between valid versus invalid behavioral profiles?
  
  We apologise for the lack of clarity on this point in our original submission – this exclusion criterion was specifically if participants used the same response key (e.g. the left arrow button) on 20 or more consecutive trials, indicating inattention. Since the left-right positions of the stimuli were randomised across trials, this did not exclude participants who repeatedly chose the same option frequently. However, as we show in the Supplement, this, along with the other exclusion criteria, did not affect our main findings.
  
  We have now clarified this as follows:
  
  … we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4%/6% in discovery and replication samples, respectively) – note that as the options randomly switched sides on the screen across trials, this did not exclude participants who frequently and consecutively chose a certain option.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.04.535526v3
www.biorxiv.org www.biorxiv.org

Muscarinic receptors mediate motivation via preparatory neural activity in humans

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  Public Reviews:
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroencephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.
  
  Strengths:
  
  The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.
  
  Weaknesses:
  
  The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. Generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.
  
  We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.
  
  We agree that the sample size being smaller than planned due to the pandemic restrictions is a weakness for this study, and hope that future studies into cholinergic effects on motivation in humans will use larger sample sizes. They should also ensure women are not excluded from sample populations, which will become even more important if the research progresses to clinical populations.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.
  
  Strengths:
  
  This manuscript addresses an interesting and timely question and does so using an impressive within subject pharmacological design and a task well designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.
  
  Weaknesses:
  
  A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to covid). Nonetheless, it is worth stating explicitly that this sample size is relatively small for the effect sizes typically observed in such studies highlighting the need for future confirmatory studies.
  
  We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.
  
  We agree that the small sample size is a weakness of the study, and hope that future work into cholinergic modulation of motivation can involve larger samples to replicate and extend this work.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  Thank you for addressing my comments and clarifying the analysis sections. Women can be included in such studies by performing a pregnancy test before each test session, but I understand how this could have added to the pandemic limitations. Best of luck with your future work!
  
  Thank you for your time in reviewing this paper, and your helpful comments.
  
  Reviewer #3 (Recommendations for the authors):
  
  The authors have done a great job at addressing my concerns and I think that the manuscript is now very solid. That said, I have one minor concern.
  
  Thank you for your time in reviewing this paper, and your helpful comments.
  
  For descriptions of mass univariate analyses and cluster correction, I am still a bit confused on exactly what terms were in the regression. In one place, the authors state:
  
  On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model 'variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)'.
  
  I take this to mean that the regression model includes a voltage regressor and a three-way interaction term, along with participant level intercept terms.
  
  However, elsewhere, the authors state:
  
  "We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant."
  
  I take this to mean that the regression model included regressors for incentive, distractorPresent, THP, along with their 2 and 3 way interactions. I think that this seems like the more reasonable model - but I just want to 1) verify that this is what the authors did and 2) encourage them to articulate this more clearly and consistently throughout.
  
  We apologise for the lack of clarity about the whole-brain regression analyses.
  
  We used Wilkinson notation for this formula, where ‘A*B’ denotes ‘A + B + A:B’, so all main effects and lower-order interactions terms were included in the regression, as your second interpretation says. The model written out in full would be:
  
  'variable ~1 + voltage + incentive + distractorPresent + THP + incentive*distractorPresent + incentive*THP + distractorPresent*THP + incentive*distractorPresent*THP + (1 | participant)'
  
  We will clarify this in the Version of Record.
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors used a motivated saccade task with distractors to measure response vigor and reaction time (RT) in healthy human males under placebo or muscarinic antagonism. They also simultaneously recorded neural activity using EEG with event-related potential (ERP) focused analyses. This study provides evidence that the muscarinic antagonist Trihexyphenidyl (THP) modulates the motivational effects of reward on both saccade velocity and RT, and also increases the distractibility of participants. The study also examined the correlational relationships between reaction time and vigor and manipulations (THP, incentives) with components of the EEG-derived ERPs. While an interesting correlation structure emerged from the analyses relating the ERP biomarkers to behavior, it is unclear how these potentially epiphenomenal biomarkers relate to relevant underlying neurophysiology.
  
  Strengths:
  
  This study is a logical translational extension from preclinical findings of cholinergic modulation of motivation and vigor and the CNV biomarker to a normative human population, utilizing a placebo-controlled, double-blind approach.
  
  While framed in the context of Parkinson's disease where cholinergic medications can be used, the authors do a good job in the discussion describing the limitations in generalizing their findings obtained in a normative and non-age-matched cohort to an aged PD patient population.
  
  The exploratory analyses suggest alternative brain targets and/or ERP components that relate to the behavior and manipulations tested. These will need to be further validated in an adequately powered study. Once validated, the most relevant biomarkers could be assessed in a more clinically relevant population.
  
  Weaknesses:
  
  The relatively weak correlations between the main experimental outcomes provide unclear insight into the neural mechanisms by which the manipulations lead to behavioral manifestations outside the context of the ERP. It would have been interesting to evaluate how other quantifications of the EEG signal through time-frequency analyses relate to the behavioral outcomes and manipulations.
  
  The ERP correlations to relevant behavioral outcomes were not consistent across manipulations demonstrating they are not reliable biomarkers to behavior but do suggest that multiple underlying mechanisms can give rise to the same changes in the ERP-based biomarkers and lead to different behavioral outcomes.
  
  We thank the reviewer for their review and their comments.
  
  We agree that these ERPs may not be reliable biomarkers yet, given the many-to-one mapping we observed where incentives and THP antagonism both affected the CNV in different ways, and hope that future studies will help clarify the use and limitations of the CNV as a potential biomarker of invigoration.
  
  Our original hypothesis was specifically about the CNV as an index of preparatory behaviour, but we plan to look at potential changes to frequency characteristics in future work. We have included this in the discussion of future investigations. (page 16, line 428):
  
  “Future investigations of other aspects of the EEG signals may illuminate us. Such studies could also investigate other potential signals that may be more sensitive to invigoration and/or muscarinic antagonism, including frequency-band power and phase-coherence, or measures of variability in brain signals such as entropy, which may give greater insight into processes affected by these factors.”
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroengephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.
  
  Strengths:
  
  The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.
  
  Weaknesses:
  
  The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. The generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.
  
  We thank the reviewer for their review, and their comments.
  
  We agree that our study was underpowered, not reaching our target of 27 participants due to pandemic restrictions halting our recruitment, and hope that future studies into muscarinic antagonism in motivation will have larger sample sizes, and include male and female participants across a range of ages, to assess generalisability.
  
  We only included men to prevent the chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”
  
  While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.
  
  We have updated the Methods/Drugs section to explain this (page 17, line 494):
  
  “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”
  
  And we reference to this in the Methods/Participants section (page 18, line 501):
  
  “We recruited 27 male participants (see Drugs section above),…”
  
  We agree that future work is needed to replicate this in different samples, and that this work cannot tell us the mechanism by which the drug is dampening invigoration, but we think that showing these effects do occur and can be linked to anticipatory/preparatory activity rather than overall reward sensitivity is a useful finding.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.
  
  Strengths:
  
  This manuscript addresses an interesting and timely question and does so using an impressive within-subject pharmacological design and a task well-designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.
  
  Weaknesses:
  
  In full disclosure, I have previously reviewed this manuscript in another journal and the authors have done a considerable amount of work to address my previous concerns. However, I have a few remaining concerns that affect my interpretation of the current manuscript.
  
  Some of the EEG signals (figures 4A&C) have profiles that look like they could have ocular, rather than central nervous, origins. Given that this is an eye movement task, it would be useful if the authors could provide some evidence that these signals are truly related to brain activity and not driven by ocular muscles, either in response to explicit motor effects (ie. Blinks) or in preparation for an upcoming saccade.
  
  We thank the reviewer for re-reviewing the manuscript and for raising this issue.
  
  All the EEG analyses (both ERP and whole-brain) are analysing the preparation period between the ready-cue and target appearance when no eye-movements are required. We reject trials with blinks or saccades over 1 degree in size, as detected by the Eyelink software according the sensitive velocity and acceleration criteria specified in the manuscript (Methods/Eye-tracking, page 19, line 550). This means that there should be no overt eye movements in the data. However, microsaccades and ocular drift are still possible within this period, which indeed could drive some effects. To measure this, we counted the number of microsaccades (<1 degree in size) in the preparation period between incentive cue and the target onset, for each trial. Further, we measure the mean absolute speed of the eye during the preparation period (excluding the periods during microsaccades) for each trial.
  
  We have run a control analysis to check whether including ocular drift speed or number of microsaccades as a covariate in the whole-brain regression analysis changes the association between EEG and the behavioural metrics at frontal or other electrodes. Below we show these ‘variable ~ EEG’ beta-coefficients when controlling for each eye-movement covariate, in the same format as Figure 4. We did not run the permutation testing on this due to time/computational costs (it takes >1 week per variable), so p-values were not calculated, only the beta-coefficients. The beta-coefficients are almost unchanged, both in time-course and topography, when controlling for either covariate. The frontal associations to velocity and distractor pull remain, suggesting they are not due to these eye movements.
  
  We have added this figure as a supplemental figure.
  
  For additional clarity in this response, we also plot the differences between these covariate-controlled beta-coefficients, and the true beta-coefficients from figure 4 (please note the y-axis scales are -0.02:0.02, not -0.15:0.15 as in Figure 4 and Figure 4-figure supplement 2). This shows that the changes to the associations between EEG and velocity/distractor-pull were not frontally-distributed, demonstrating eye-movements were not driving these effects. Relatedly, the RT effect’s change was frontally-distributed, despite Figure 4 showing the true relationship was central in focus, again indicating that effect was also not related to these eye movements.
  
  Author response image 1.
  
  Difference in beta-coefficients when eye-movement covariates are included. This is the difference from the beta-coefficients shown in Figure 4, please note the smaller y-axis limits.
  
  The same pattern was seen if we controlled for the change in eye-position from the baseline period (measured by the eye-tracker) at each specific time-point, i.e., controlling for the distance the eye had moved from baseline at the time the EEG voltage is measured. The topographies and time-course plots were almost identical to the above ones:
  
  Author response image 2.
  
  Controlling for change in eye-position at each time-point does not change the regression results. Left column shows the beta-coefficients between the variable and EEG voltage, and the right column shows the difference from the main results in Figure 4 (note the smaller y-axis limits for the right-hand column).
  
  Therefore, we believe the brain-behaviour regressions are independent of eye-movements. We have included the first figure presented here as an additional supplemental figure, and added the following to the text (page 10, line 265):
  
  “An additional control analysis found that these results were not driven by microsaccades or ocular drift during the preparation period, as including these as trial-wise covariates did not substantially change the beta-coefficients (Figure 4 – Figure Supplement 2).”
  
  For other EEG signals, in particular, the ones reported in Figure 3, it would be nice to see what the spatial profiles actually look like - does the scalp topography match that expected for the signal of interest?
  
  Yes, the CNV is a central negative potential peaking around Cz, while the P3a is slightly anterior to this (peaking between Cz and FCz). We have added the topographies to the main figure (see point below).
  
  This is the topography of the mean CNV (1200:1500ms from the preparation cue onset), which is maximal over Cz, as expected.
  
  The P3a’s topography (200:280ms after preparation cue) is maximal slightly anterior to Cz, between Cz and FCz.
  
  A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to COVID). That said, they only report the sample size in one place in the methods rather than through degrees of freedom in their statistical tests conducted throughout the results. In part because of this, I am not totally clear on whether the sample size for each analysis is the same - or whether participants were removed for specific analyses (ie. due to poor EEG recordings, for example).
  
  We apologise for the lack of clarity here. All 20 participants were included in all analyses, although the number of trials included differed between behavioural and EEG analyses. We only excluded trials with EEG artefacts from the EEG analyses, not from the purely behavioural analyses such as Figures 1&2, although trials with blinks/saccades were removed from behavioural analyses too. Removing the EEG artefactual trials from the behavioural analyses did not change the findings, despite the lower power. The degrees of freedom in the figure supplement tables are the total number of trials (less 8 fixed-effect terms) included in the single-trial / trial-wise regression analyses we used.
  
  We have clarified this in the Methods/Analysis (page 20, line 602):
  
  “Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.”
  
  And we state the number of participants and trials in the start of the behavioural results (page 3, line 97):
  
  “We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT.”
  
  and EEG results section (page 7, line 193):
  
  “We used single-trial linear mixed-effects regression to see the effects of Incentive and THP on each ERP (20 participants, 16627 trials; Distractor was included too, along with all interactions, and a random intercept by participant).”
  
  Beyond this point, but still related to the sample size, in some cases I worry that results are driven by a single subject. In particular, the interaction effect observed in Figure 1e seems like it would be highly sensitive to the single subject who shows a reverse incentive effect in the drug condition.
  
  Repeating that analysis after removing the participant with the large increase in saccadic RT with incentives did not remove the incentive*THP interaction effect – although it did weaken slightly from (β = 0.0218, p = .0002) to (β=0.0197, p=.0082). This is likely because that while that participant did have slower RTs for higher incentives on THP, they were also slower for higher incentives under placebo (and similarly for distractor present/absent), making them less of an outlier in terms of effects than in raw RT terms. Below is Author response image 3 the mean-figure without that participant, and Author response image 4 that participant shown separately.
  
  Author response image 3.
  
  Author response image 4.
  
  There are not sufficient details on the cluster-based permutation testing to understand what the authors did or whether it is reasonable. What channels were included? What metric was computed per cluster? How was null distribution generated?
  
  We apologise for not giving sufficient details of this, and have updated the Methods/Analysis section to include these details, along with a brief description in the Results section.
  
  To clarify here, we adapted the DMGroppe Mass Univariate Testing toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour – i.e. does adding the voltage at this time/channel explain additional variance in the variable not captured in our main behavioural analyses. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution of cluster mass (across times/channels per iteration), and calculated the p-value as the proportion of this distribution further from zero than the absolute true t-statistics (two-tailed test).
  
  We have given greater detail for this in the Methods/Analysis section (page 20, line 614):
  
  “We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.”
  
  And we have added a brief explanation to the Results section also (page 9, line 246):
  
  “We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant. This analysis therefore asks whether trial-to-trial neural variability predicts behavioural variability. To assess significance, we used cluster-based permutation tests (DMGroppe Mass Univariate toolbox; Groppe, Urbach, & Kutas, 2011), shuffling the trials within each condition and person, and repeating it 2500 times, to build a null distribution of ‘cluster mass’ from the t-statistics (Bullmore et al., 1999; Maris & Oostenveld, 2007) which was used to calculate two-tailed p-values with a family-wise error rate (FWER) of .05 (see Methods/Analysis for details).”
  
  The authors report that "muscarinic antagonism strengthened the P3a" - but I was unable to see this in the data plots. Perhaps it is because the variability related to individual differences obscures the conditional differences in the plots. In this case, event-related difference signals could be helpful to clarify the results.
  
  We thank the reviewer for spotting this wording error, this should refer to the incentive effect weakening the P3a, as no other significant effects were found on the P3a, as stated correctly in the previous paragraph. We have corrected this in the manuscript (page 9, line 232):
  
  “This suggests that while incentives strengthened the incentive-cue response and the CNV and weakened the P3a, muscarinic antagonism strengthened the CNV,”
  
  The reviewer’s suggestion for difference plots is very valuable, and we have added these to Figure 3, as well as increasing the y-axis scale for figure 3c to show the incentives weakening the P3a more clearly, and adding the topographies suggested in an earlier comment. The difference waves for Incentive and THP effects show that both are decreasing voltage, albeit with slightly different onset times – Incentive starts earlier, thus weakening the positive P3a, while both strengthen the negative CNV. The Incentive effects within THP and Placebo separately illustrate the THP*Incentive interaction.
  
  We have amended the Results text and figure (page 7, line 200):
  
  “The subsequent CNV was strengthened (i.e. more negative; Figure 3d) by incentive (β = -.0928, p < .0001) and THP (β = -0.0502, p < .0001), with an interaction whereby THP decreased the incentive effect (β= 0.0172, p = .0213). Figure 3h shows the effects of Incentive and THP on the CNV separately, using difference waves, and Figure 3i shows the incentive effect grows more slowly in the THP condition than the Placebo condition.
  
  For mediation analyses, it would be useful in the results section to have a much more detailed description of the regression results, rather than just reporting things in a binary did/did not mediate sort of way. Furthermore, the methods should also describe how mediation was tested statistically (ie. What is the null distribution that the difference in coefficients with/without moderator is tested against?).
  
  We have added a more detailed explanation of how we investigated mediation and mediated moderation, and now report the mediation effects for all tests run and the permutation-test p-values.
  
  We had been using the Baron & Kenny (1986) method, based on 4 tests outlined in the updated text below, which gives a single measure of change in absolute beta-coefficients when all the tests have been met, but without any indication of significance; any reduction found after meeting the other 3 tests indicates a partial mediation under this method. We now use permutation testing to generate a p-value for the likelihood of finding an equal or larger reduction in the absolute beta-coefficients if the CNV were not truly related to RT. This found that the CNV’s mediation of the Incentive effect on RT was highly significant, while the Mediated Moderation of CNV on THP*Incentive was weakly significant.
  
  During this re-analysis, we noticed that we had different trial-numbers in the different regression models, as EEG-artefactual trials were not excluded from the behavioural-only model (‘RT ~ 1 + Incentive’). However, this causes issues with the permutation testing as we are shuffling the ERPs and need the same trials included in all the mixed-effects models. Therefore, we have redone these mediation analyses, including only the trials with valid ERP measures (i.e. no artefactual trials) in all models. This has changed the beta-coefficients we report, but not the findings or conclusions of the mediation analyses. We have updated the figure to have these new statistics.
  
  We have updated the text to explain the methodology in the Results section (page 12, line 284):
  
  “We have found that neural preparatory activity can predict residual velocity and RT, and is also affected by incentives and THP. Finally, we ask whether the neural activity can explain the effects of incentives and THP, through mediation analyses. We used the Baron & Kenny ( 1986) method to assess mediation (see Methods/Analysis for full details). This tests whether the significant Incentive effect on behaviour could be partially reduced (i.e., explained) by including the CNV as a mediator in a mixed-effects single-trial regression. We measured mediation as the reduction in (absolute) beta-coefficient for the incentive effect on behaviour when the CNV was included as a mediator (i.e., RT ~ 1 + Incentive + CNV + Incentive*CNV + (1 | participant)). This is a directional hypothesis of a reduced effect, and to assess significance we ran a permutation-test, shuffling the CNV within participants, and measuring the change in absolute beta-coefficient for the Incentive effect on behaviour. This generates a distribution of mediation effects where there is no relationship between CNV and RT on a trial (i.e., a null distribution). We ran 2500 permutations, and calculated the proportion with an equal or more negative change in absolute beta-coefficient, equivalent to a one-tailed test. We ran this mediation analysis separately for the two behavioural variables of RT and residual velocity, but not for distractor pull as it was not affected by incentive, so failed the assumptions of mediation analyses (Baron & Kenny, 1986; Muller et al., 2005). We took the mean CNV amplitude from 1200:1500ms as our Mediator.
  
  Residual velocity passed all the assumption tests for Mediation analysis, but no significant mediation was found. That is, Incentive predicted velocity (β=0.1304, t(1,16476)=17.3280, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted velocity when included alongside Incentive (β=0.0015, t(1,16475)=1.9753, p=.0483). However, including CNV did not reduce the Incentive effect on velocity, and in fact strengthened it (β=0.1318, t(1,16475)=17.4380, p<.0001; change in absolute coefficient: Δβ=+0.0014). Since there was no mediation (reduction), we did not run permutation tests on this.
  
  However, RT did show a significant mediation of the Incentive effect by CNV: Incentive predicted RT (β=-0.0868, t(1,16476)=-14.9330, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted RT when included alongside Incentive (β=0.0127, t(1,16475)=21.3160, p<.0001). The CNV mediated the effect of Incentive on RT, reducing the absolute beta-coefficient (β=-0.0752, t(1,16475)=-13.0570, p<.0001; change in absolute coefficient: Δβ= -0.0116). We assessed the significance of this change via permutation testing, shuffling the CNV across trials (within participants) and calculating the change in absolute beta-coefficient for the Incentive effect on RT when the permuted CNV was included as a mediator. We repeated this 2500 times to build a null distribution of Δβ, and calculated the proportion with equal or stronger reductions for a one-tailed p-value, which was highly significant (p<.0001). This suggests that the Incentive effect on RT is partially mediated by the CNV’s amplitude during the preparation period, and this is not the case for residual velocity.
  
  We also investigated whether the CNV could explain the cholinergic reduction in motivation (THP*Incentive interaction) on RT – i.e., whether CNV mediation the THP moderation. We measured Mediated Moderation as suggested by Muller et al. (2005; see Methods/Analysis for full explanation): Incentive*THP was associated with RT (β=0.0222, t(1,16474)=3.8272, p=.0001); and Incentive*THP was associated with CNV (β=0.1619, t(1,16474)=2.1671, p=.0302); and CNV*THP was associated with RT (β=0.0014, t(1,16472)=2.4061, p=.0161). Mediated Moderation was measured by the change in absolute Incentive*THP effect when THP*CNV was included in the mixed-effects model (β=0.0214, t(1,16472)=3.7298, p=.0002; change in beta-coefficient: Δβ= -0.0008), and permutation-testing (permuting the CNV as above) found a significant effect (p=.0132). This indicates cholinergic blockade changes how incentives affect preparatory negativity, and how this negativity reflects RT, which can explain some of the reduced invigoration of RT. However, this was not observed for saccade velocity.
  
  And we have updated the Methods/Analysis section with a more detailed explanation too (page 21, line 627):
  
  “For the mediation analysis, we followed the 4-step process (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):
  
  (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))
  
  (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))
  
  (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))
  
  (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)
  
  The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or smaller than the true values (as Mediation is a one-tailed prediction).
  
  Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):
  
  (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))
  
  (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))
  
  (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)
  
  Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or smaller than the true change.”
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The analysis section could benefit from greater detail. For example, how exactly did they assess that the effects of the drug on peak velocity and RT were driven by non-distracting trials? Ideally, for every outcome, the analysis approach used should be detailed and justified.
  
  We apologise for the confusion from this. To clarify, we found a 2-way regression (incentive*THP) on both residual velocity and saccadic RT and this pattern was stronger in distractor-absent trials for residual velocity, and stronger in distractor-present trials for saccadic RT, as can be seen in Figure 1d&e. However, as there was no significant 3-way interaction (incentive*THP*distractor) for either metric, and the 2-way interaction effects were in the same direction in distractor present/absent trials for both metrics, we think these effects were relatively unaffected by distractor presence.
  
  We have updated the Results section to make this clearer: (page 3, line 94):
  
  We measured vigour as the residual peak velocity of saccades within each drug session (see Figure 1c & Methods/Eye-tracking), which is each trial’s deviation of velocity from the main sequence. This removes any overall effects of the drug on saccade velocity, while still allowing incentives and distractors to have different effects within each drug condition. We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT. As predicted, residual peak velocity was increased by incentives (Figure 1d; β = 0.1266, p < .0001), while distractors slightly slowed residual velocity (β = -0.0158, p = .0294; see Figure 1 – Figure supplement 1 for full behavioural statistics). THP decreased the effect of incentives on velocity (incentive * THP: β = -0.0216, p = .0030), indicating that muscarinic blockade diminished motivation by incentives. Figure 1d shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was absent; the 3-way (distractor*incentive*THP) interaction was not significant (p > .05), suggesting that the distractor-present trials had the same effect but weaker (Figure 1d).
  
  Saccadic RT (time to initiation of saccade) was slower when participants were given THP (β = 0.0244, p = < .0001), faster with incentives (Figure 1e; β = -0.0767, p < .0001), and slowed by distractors (β = 0.0358, p < .0001). Again, THP reduced the effects of incentives (incentive*THP: β = 0.0218, p = .0002). Figure 1e shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was present; as the 3-way (distractor*incentive*THP) interaction was not significant and the direction of effects was the same in the two, it suggests the effect was similar in both conditions. Additionally, the THP*Incentive interactions were correlated between saccadic RT and residual velocity at the participant level (Figure 1 – Figure supplement 2).
  
  We have given more details of the analyses performed in the Methods section and the results, as requested by you and the other reviewers (page 20, line 602):
  
  Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.
  
  We used single-trial linear-mixed effects models to analyse our data, including participant as a random effect of intercept, with the formula ‘~1 + incentive*distractor*THP + (1 | participant)’. We z-scored all factors to give standardised beta coefficients.
  
  For the difference-wave cluster-based permutation tests (Figure 3 – Figure supplement 4), we used the DMGroppe Mass Univariate toolbox (Groppe et al., 2011), with 2500 permutations, to control the family-wise error rate at 0.05. This was used for looking at difference waves to test the effects of incentive, THP, and the incentive*THP interaction (using difference of difference-waves), across all EEG electrodes.
  
  We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.
  
  For the mediation analysis, we followed the 4-step process (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):
  
  (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))
  
  (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))
  
  (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))
  
  (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)
  
  The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or more negative than the true value (as Mediation is a one-tailed prediction). For this mediation analysis, we only included trials with valid ERP measures, even for the models without the ERP included (e.g., model #1), to keep the trial-numbers and degrees of freedom the same.
  
  Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):
  
  (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))
  
  (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))
  
  (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)
  
  Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or more negative than the true change.
  
  (2) Please explain why only men were included in this study. We are all hoping that men-only research is a practice of the past.
  
  We only included men to prevent any chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”
  
  While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.
  
  We have updated the Methods/Drugs section to explain this (page 17, line 494):
  
  “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”
  
  And we have referenced this in the Methods/Participants section (page 18, line 501):
  
  “Our sample size calculations suggested 27 participants would detect a 0.5 effect size with .05 sensitivity and .8 power. We recruited 27 male participants (see Drugs section above)”
  
  (3) Please explain acronyms (eg EEG) when first used.
  
  Thank you for pointing this out, we have explained EEG at first use in the abstract and the main text, along with FWER, M1r, and ERP which had also been missed at first use.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors say: "Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and increased the pull of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity." But I found this statement to be misleading since the primary effects of the drug seem to have been to decrease the frequency of distractor-repulsed saccades... so "decreased push" would probably be a better analogy than "increased pull".
  
  Thank you for noticing this, we agree, and have changed this to (page 5, line 165):
  
  “Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and decreased the repulsion of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity.”
  
  I don't see anything in EEG preprocessing about channel rejection and interpolation. Were these steps performed? There are very few results related to the full set of electrodes.
  
  We did not reject or interpolate any channels, as visual inspection found no obvious outliers in terms of noisiness, and no channels had standard deviations (across time/trials) higher than our standard cutoff (of 80). The artefact rejection was applied across all EEG channels, so any trials with absolute voltages over 200uV in any channel were removed from the analysis. On average 104/120 trials were included (having passed this check, along with eye-movement artefact checks) per condition per person, and we have added the range of these, along with totals across conditions to the Analysis section and a statement about channel rejection/interpolation (page 20, line 588):
  
  “Epochs were from -200:1500ms around the preparation cue onset, and were baselined to the 100ms before the preparation cue appeared. Visual inspection found no channels with outlying variance, so no channel rejection or interpolation was performed. We rejected trials from the EEG analyses where participants blinked or made saccades (according to EyeLink criteria above) during the epoch, or where EEG voltage in any channel was outside -200:200μV (muscle activity). On average 104/120 trials per condition per person were included (SD = 21, range = 21-120), and 831/960 trials in total per person (SD=160, range=313-954). A repeated-measures ANOVA found there were no significant differences in number of trials excluded for any condition (p > .2).”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.07.28.454154v3
www.biorxiv.org www.biorxiv.org

Fast Evolution of SOS-Independent Multi-Drug Resistance in Bacteria

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Review #1:
  
  Summary:
  
  Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affect the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as a means to delay the rise of resistance.
  
  In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8-hour treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8-hour treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed in subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.
  
  The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistance towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotics tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.
  
  The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive the accumulation of mutations that can confer resistance to different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.
  
  We sincerely appreciate your overall summary of the manuscript and their positive evaluation of our work.
  
  Strengths:
  
  A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants to the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted in an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.
  
  In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.
  
  We are grateful to you for your positive assessment of the strengths of our manuscript and your recognition of its potential future applications.
  
  Weaknesses:
  
  Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.
  
  We greatly appreciate you for raising this issue, as it is an important premise that must be clearly stated throughout the entire manuscript. To verify that the observed increase in mutation rate is a bona fide increase and not due to experimental error, we used a non-selective antibiotic, rifampicin, to evaluate the mutation frequency after drug induction, as it is a gold-standard method documented in other studies [Heterogeneity in efflux pump expression predisposes antibiotic-resistant cells to mutation, Science, 362, 6415, 686-690, 2018.]. In the absence of ampicillin treatment, the natural mutation rates detected using rifampicin were consistent between the wild-type and the ΔrecA strain. However, after ampicillin treatment, the mutation rate detected using rifampicin was significantly elevated only in the ΔrecA strain (Fig. 1G). We also employed other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments to treat the cells (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure or a bona fide increase in mutation rate, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics.
  
  Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.
  
  We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild-type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.
  
  Furthermore, the choice of selective and non-selective conditions was made to ensure the successful isolation of mutants in both strains. Specifically, if selective conditions (50 μg/mL ampicillin) were applied to the wild-type strain, it would have been nearly impossible to recover colonies for WGS analysis, as wild-type cells are highly susceptible to ampicillin at this concentration (Top, Author response image 1). Conversely, under non-selective conditions, ΔrecA mutants carrying resistance mutations may not have been effectively isolated, which would have limited our ability to identify resistance mutations in these strains (Bottom, Author response image 1 Thus, the use of different selection pressures was essential for achieving the objective of mutation identification in this study.
  
  Author response image 1.
  
  After 8 hours of antibiotic treatment, the wild type or the ΔrecA cells were plated on agar plates either without ampicillin or with 50 μg/mL ampicillin and incubated for 24-48 hours. Top: Under selective conditions, no wild type colonies were recovered, indicating high susceptibility to the antibiotic, preventing further analysis. Bottom: In non-selective conditions, both ΔrecA resistant mutants and non-resistant cells grew, making it difficult to distinguish and isolate the mutants carrying resistance mutations.
  
  Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.
  
  Thanks for this suggestion. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.
  
  Reviewer #2:
  
  Summary:
  
  This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. To investigate this, the authors employed a sophisticated experimental framework based on a modified Adaptive Laboratory Evolution (ALE) workflow. This workflow involves numerous steps culminating in the measurement of antibiotic resistance. The study presents evidence that a recA strain develops ampicillin resistance mutations more quickly than the wild-type, as shown by measuring the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of 15 recA-colonies resistant to ampicillin revealed predominantly inactivation of genes involved in the multi-drug efflux pump system, whereas, in the wild-type, mutations appear to enhance the activity of the chromosomal ampC cryptic promoter. By analyzing mutants involved in the SOS response, including a lexA3 mutant incapable of inducing the SOS response, the authors conclude that the rapid evolution of antibiotic resistance occurs in an SOS-independent manner when recA is absent.
  
  Furthermore, RNA sequencing (RNA-seq) of the four experimental conditions suggests that genes related to antioxidative responses drive the swift evolution of antibiotic resistance in the recA-strain.
  
  We greatly appreciate your overall summary of the manuscript and their positive evaluation of our work.
  
  Weaknesses:
  
  However, a potential limitation of this study is the experimental design used to determine the 'rapid' evolution of antibiotic resistance. It may introduce a significant bottleneck in selecting ampicillin-resistant mutants early on. A recA mutant could be more susceptible to ampicillin than the wild-type, and only resistant mutants might survive after 8 hours, potentially leading to their enrichment in subsequent steps. To address this concern, it would be critical to perform a survival analysis at various time points (0h, 2h, 4h, 6h, and 8h) during ampicillin treatment for both recA and wild-type strains, ensuring there is no difference in viability.
  
  We appreciate your suggestion. We measured the survival fraction at 0, 2, 4, 6, and 8 hours after ampicillin treatment. The results show no significant difference in antibiotic sensitivity between the wild-type and ΔrecA strain (Fig. S2). We therefore added a description int the main text, “Meanwhile, after 8 hours of treatment with 50 μg/mL ampicillin, the survival rates of both wild type and ΔrecA strain were consistent (Fig. S2)”.
  
  The observation that promoter mutations are absent in ΔrecA strains could be explained by previous research indicating that amplification of the AmpC genes is a mechanism for E. coli resistance to ampicillin, which does not occur in a recA-deficient background (PMID# 19474201).
  
  We are very grateful to you for providing this reference. We did examine the amplification of the ampC gene in both wild-type and _recA-_deficient strains, but we found no significant changes in its copy number after ampicillin treatment (Author response image 2). Therefore, the results and discussion regarding gene copy number were not included in this manuscript.
  
  Author response image 2.
  
  Copy number variations of genes in the chromosome before and after exposure to ampicillin at 50 µg/mL for 8 hours in the wild type and ΔrecA strain.
  
  The section describing Figure 3 is poorly articulated, and the conclusions drawn are apparent. The inability of a recA strain to induce the SOS response is well-documented (lines 210 and 278). The data suggest that merely blocking SOS induction is insufficient to cause 'rapid' evolution in their experimental conditions. To investigate whether SOS response can be induced independently of lexA cleavage by recA, alternative experiments, such as those using a sulA-GFP fusion, might be more informative.
  
  Thanks for your suggestion. We agree that detecting the expression level of SulA can provide valuable information to reveal the impact of the SOS system on rapid drug resistance. In addition to fluorescence visualization and quantification of SulA expression, regulating the transcription level of the sulA gene can achieve the same objective. Therefore, in our transcriptome sequencing analysis, we focused on evaluating the transcription level of sulA (Fig. 4E).
  
  In Figure 4E, the lack of increased SulA gene expression in the wild-type strain treated with ampicillin is unexpected, given that SulA is an SOS-regulated gene. The fact that polA (Pol I) is going down should be taken into account in the interpretation of Figures 2D and 2E.
  
  Thank you for your observation regarding the lack of increased SulA gene expression in the wild-type strain treated with ampicillin in Figure 4E. We agree that SulA is typically an SOS-regulated gene, and its expression is expected to increase in response to DNA damage induced by antibiotics like ampicillin. However, in our experimental conditions, the observed lack of increased SulA expression could be due to different factors. One possibility is that the concentration of ampicillin used, or the duration of treatment, was not applicable to induce a strong SOS response in the wild type strain under the specific conditions tested. Additionally, differences in experimental setups such as timing, sampling, or cellular stress responses could account for the lack of a pronounced upregulation of SulA.
  
  You may state that the fact that polA (Pol I) is going down should be taken into account in the interpretation of Figures 3D and 3E, and we agree with you.
  
  The connection between compromised DNA repair, the accumulation of Reactive Oxygen Species (ROS) based on RNA-seq data, and accelerated evolution is merely speculative at this point and not experimentally established.
  
  We greatly appreciate your comments. First, the correlation between DNA mutations and the accumulation of reactive oxygen species (ROS) has been experimentally confirmed. As shown in Fig. 4I, after the addition of the antioxidant GSH, DNA resistance mutations were not detected in the ΔrecA strain treated with ampicillin for 8 hours, compared to those without the addition of GSH, proving that the rapid accumulation of ROS induces the enhancement of DNA resistance mutations. Second, the enhancement of DNA resistance mutations in relation to bacterial resistance has been widely validated and is generally accepted. Finally, we appreciate the your suggestion to strengthen the evidence supporting ROS enhancement. To address this, we have added an experiment to measure ROS levels. Through flow cytometry, we found that ROS levels significantly increased in both the wild-type and ΔrecA strain after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.
  
  Reviewer #3:
  
  Summary:
  
  In the present work, Zhang et al investigate the involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term drug resistance evolution that can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by the accumulation of reactive oxygen species and inhibition of DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance). However, although the authors perform several multi-disciplinary experiments, there are several caveats to the authors' proposal that ultimately do not fully support their interpretation that the observed antimicrobial resistance evolution phenotype is due to compromised DNA repair.
  
  We greatly appreciate your overall summary of the manuscript and positive evaluation of our work.
  
  Strengths:
  
  The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. This is primarily supported by their observations that resistance evolution phenotypes only exist for recA deletion mutants and not other genes in the SOS response.
  
  Thanks for your positive comments.
  
  Weaknesses:
  
  The authors do not show any direct evidence (1) that these phenotypes exist in strains harboring deletions in other DNA repair genes outside of the SOS response, (2) that DNA damage is increased, (3) that reactive oxygen species accumulate, (4) that accelerated resistance evolution can be reversed by anything other than recA complementation. The authors do not directly test alternative hypotheses. The conclusions drawn are therefore premature.
  
  We sincerely thank you for your insightful comments. First, in this study, our primary focus is on the role of recA deficiency in bacterial antibiotic resistance evolution. Therefore, we conducted an in-depth investigation on E. coli strains lacking RecA and found that its absence promotes resistance evolution through mechanisms involving increased ROS accumulation and downregulation of DNA repair pathways. While we acknowledge the importance of other DNA repair genes outside of the SOS response, exploring them is beyond the scope of this paper. However, in a separate unpublished study, we have identified the involvement of another DNA recombination protein, whose role in resistance evolution is not yet fully elucidated, in promoting resistance development. This finding is part of another independent investigation.
  
  Regarding DNA damage and repair, our paper emphasizes that resistance-related mutations in DNA are central to the development of antibiotic resistance. These mutations are a manifestation of DNA damage. To demonstrate this, we measured mutation frequency and performed whole-genome sequencing, both of which confirmed an increase in DNA mutations.
  
  We appreciate the reviewer's suggestion to provide additional evidence for ROS accumulation, and we have now supplemented our manuscript with relevant experiments. Through flow cytometry, we found that ROS levels significantly increased in both the wild type and ΔrecA strains after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.
  
  Finally, in response to your question about reversing accelerated resistance evolution, we would like to highlight that, in addition to recA complementation, we successfully suppressed rapid resistance evolution by supplementing with an antioxidant, GSH (Fig. 4I). This further supports our hypothesis that increased ROS levels play a key role in driving accelerated resistance evolution in the absence of RecA.
  
  Recommendations for the authors:
  
  Reviewer #1:
  
  The author's model asserts that deletion of recA impairs DNA repair in E. coli, leading to an accumulation of ROS in the cell, and ultimately driving the rapid rise of resistance mutations. However, the experimental evidence does not adequately address whether the resistance mutations are true, de novo mutations that arose due to beta-lactam treatment, or mutations that confer cross-resistance enriched by ampicillin selection.
  
  a. Major: In Figure 1F & G, the authors show that the ∆recA strain, following ampicillin treatment, has higher resistance and mutation frequency towards rifampicin than WT. However, it is not clear whether the elevated resistance and mutagenesis are driven by mutations enriched by the ampicillin treatment (e.g. mutations in acrB, as seen in Figure 2) or by "new" mutations in the rpoB gene. As the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, including rifampicin, in what is known as cross-resistance. Therefore, the mutation frequency calculation, which relies on quantifying rifampicin-resistant clones, might be confounded by bacteria with mutations that confer cross-resistance. A better approach to calculate mutation frequency would be to employ an assay that does not require antibiotic selection, such as a lac-reversion assay. This would mitigate the confounding effects of cross-resistance of drug-resistant mutations.
  
  We appreciate your thoughtful comments regarding the potential for cross-resistance to confound the mutation frequency calculation based on rifampicin-resistant clones. Indeed, as noted, ampicillin selection can enrich for mutants with enhanced efflux activity, which may confer cross-resistance to a range of antibiotics, including rifampicin.
  
  However, we believe that the current approach of calculating mutation frequency using rifampicin-resistant mutants is still valid in our specific context. Rifampicin targets the RNA polymerase β subunit, and resistance typically arises from specific mutations in the rpoB gene. These mutations are well-characterized and distinct from those typically associated with efflux-related cross-resistance. Thus, the likelihood of cross-resistance affecting our mutation frequency calculation is minimized in this scenario.
  
  Additionally, while the lac-reversion assay could be an alternative, it focuses on specific metabolic pathway mutations (such as those affecting lacZ) and would not necessarily capture the same types of mutations relevant to rifampicin resistance or antibiotic-induced mutagenesis. Given our experimental objective of understanding how ampicillin induces mutations that confer antibiotic resistance, the current approach of using rifampicin selection provides a direct and relevant measurement of mutation frequency under antibiotic stress.
  
  b. Major: It is important to establish what the basal mutation frequencies/rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has an inherently higher mutagenesis than WT. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.
  
  Thanks for your suggestion. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.
  
  c. Major: In the text, the authors write, "To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we randomly selected 15 colonies on non-selected LB agar plates from the wild type surviving isolates, and antibiotic screening plates containing 50 μg/mL ampicillin from the ΔrecA resistant isolates, respectively." Why were the WT clones picked from non-selective plates and the recA mutant from selective ones for WGS? It appears that such a procedure would bias the recA mutant clones to show more mutations (caused by selection on the ampicillin plate). The authors need to address this discrepancy.
  
  We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild-type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.
  
  Furthermore, the choice of selective and non-selective conditions was made to ensure the successful isolation of mutants in both strains. Specifically, if selective conditions (50 μg/mL ampicillin) were applied to the wild type strain, it would have been nearly impossible to recover colonies for WGS analysis, as wild-type cells are highly susceptible to ampicillin at this concentration (Top, Author response image 1). Conversely, under non-selective conditions, ΔrecA mutants carrying resistance mutations may not have been effectively isolated, which would have limited our ability to identify resistance mutations in these strains (Bottom, Author response image 1). Thus, the use of different selection pressures was essential for achieving the objective of mutation identification in this study.
  
  d. Major: In some instances, the authors do not use accurate language to describe their data. In Figure 2A, the authors randomly selected 15 ∆recA clones from a selective plate with 50 µg/mL of ampicillin. These clones were then subjected to WGS, which subsequently identified resistant mutations. Based on the described methods, these mutations are a result of selection: in other words, resistant mutations were preexisting in the bacterial population, and the addition of ampicillin selection killed off the sensitive cells, enabling the proliferation of the resistant clones. However, the in Figure 2 legend and associated text, the authors suggest that these mutations were "induced" by beta-lactam exposure, which is misleading. The data does not support that.
  
  We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics.
  
  Furthermore, we used two different ∆recA strains, and the results were consistent between the strains (Fig. S3). Given that spontaneous mutations can occur with significant variability in populations, if resistance mutations pre-existed before antibiotic exposure, the selective outcomes should have varied between the two strains.
  
  Most importantly, we found that the addition of anti-oxidative compound GSH prevented the evolution of antibiotic from the treatment of ampicillin in the ΔrecA strain. If we assume that resistant bacteria preexist in the ∆recA strain, then the addition of GSH should not affect the evolution of resistance. Therefore, we believe that the resistance mutations we detected were not simply the result of selection from preexisting mutations but were indeed induced by beta-lactam exposure.
  
  e. Major: For Figure 4J, using WGS the authors show that the addition of GSH to WT and ∆recA cells inhibited the rise of resistance mutations; no resistance mutations were reported. However, in the "Whole genome sequencing" section under "Materials and Methods", they state that "Resistant clones were isolated by selection using LB agar plates with the supplementation of ampicillin at 50 μg/mL". These clones were then genome-extracted and sequenced. Given the methodology, it is surprising that the WGS did not reveal any resistance mutations in the GSH-treated cells. How were these cells able to grow on 50 μg/mL ampicillin plates for isolation in the first place? The authors need to address this.
  
  We sincerely apologize for the confusion caused by the incorrect expression in the "Materials and Methods" section. Indeed, when bacteria were treated with the combination of antibiotics and GSH, resistance was significantly suppressed, and no resistant clones could be isolated from selective plates (i.e., LB agar supplemented with 50 μg/mL ampicillin).
  
  To address this, we instead plated the bacteria treated with antibiotics and GSH onto non-selective plates (without ampicillin) and randomly selected 15 colonies for WGS. None of them showed resistance mutations. We will revise the text in the "Materials and Methods" section to accurately reflect this procedure and provide clarity.
  
  f. Minor: for Figure 1G, it is misleading to have both "mutation frequency" and "mutant rate" in the y-axis; the two are defined and calculated differently. Based on the Materials and Materials, "mutation frequency" would be the appropriate term. Also, for the ∆recA strain, it is a bit unusual to see mutation frequencies that are tightly clustered. Usually, mutation frequencies follow the Luria-Delbruck distribution. Can the authors explain why the ∆recA data looks so different compared to, say, the WT mutation frequencies?
  
  Thank you for your insightful feedback. We agree that having both "mutation frequency" and "mutant rate" on the y-axis is misleading, as these terms are defined and calculated differently. To avoid confusion, we will revise Figure 1G to use only "mutation frequency" as the correct term, in line with the methods described in the Materials and Methods section.
  
  Regarding the ∆recA strain's mutation frequencies, we acknowledge that the data appear more tightly clustered compared to the expected Luria-Delbruck distribution seen in the wild type strain. In fact, the y-axis of the Figure 1G is logarithmic, this causes the data to appear more clustered.
  
  We further added the basal mutation frequency in the wild type and ∆recA strains before the exposure to ampicillin. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.
  
  g. Minor: It needs to be made clear in the Main Text what the selective antibiotic agar plate used was, rifampicin or ampicillin. I am assuming it was rifampicin, as ampicillin plates would yield resistance frequencies close to 100%, given the prior treatment of the culture with ampicillin.
  
  Thanks for your comments. Depending on the objective, we used different selective plates. For example, when testing the mutation frequency of antibiotic resistance, we used a selective plate containing rifampicin in order to utilize a non-inducing antibiotic, which is the standard method for calculating resistance mutation frequency. In the WGS experiment, to obtain mutations specific to ampicillin resistance, we selected a selective plate containing ampicillin.
  
  Reviewer #2:
  
  The Y-axis label (log10 mutant rate) in Figure 1G is misleading or incorrect.
  
  Thanks for your comments and we apologize for this misleading information. The Figure 1G has been revised accordingly.
  
  In line 393 of the discussion, the authors claim that excessive ROS accumulation drives the evolution of ampicillin resistance, which has not been conclusively demonstrated. Additional experiments are needed to support this statement.
  
  We greatly appreciate your comments. First, the correlation between DNA mutations and the accumulation of reactive oxygen species (ROS) has been experimentally confirmed. As shown in Fig. 4I, after the addition of the antioxidant GSH, DNA resistance mutations were not detected in the ΔrecA strain treated with ampicillin for 8 hours, compared to those without the addition of GSH, proving that the rapid accumulation of ROS induces the enhancement of DNA resistance mutations. Second, the enhancement of DNA resistance mutations in relation to bacterial resistance has been widely validated and is generally accepted. Finally, we appreciate the your suggestion to strengthen the evidence supporting ROS enhancement. To address this, we have added an experiment to measure ROS levels. Through flow cytometry, we found that ROS levels significantly increased in both the wild-type and ΔrecA strain after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.
  
  The abstract is overly complex and difficult to read, e.g. "Contrary to previous findings, it is shown that this accelerated resistance development process is dependent on the hindrance of DNA repair, which is completely orthogonal to the SOS response").
  
  Thank you for the valuable feedback regarding the complexity of the abstract. We agree that certain sections could be simplified for clarity. In response, we have revised the abstract to make it more concise and easier to understand. For example, the sentence “Contrary to previous findings, it is shown that this accelerated resistance development process is dependent on the hindrance of DNA repair, which is completely orthogonal to the SOS response” has been rewritten as: "Unlike earlier studies, we found that the rapid development of resistance relies on the hindrance of DNA repair, a mechanism that operates independently of the SOS response."
  
  Reviewer #3:
  
  As indicated above, direct evidence is needed to show (1) that these phenotypes exist in strains harboring deletions in other DNA repair genes outside of the SOS response, (2) that DNA damage is increased, (3) that reactive oxygen species accumulate, (4) that accelerated resistance evolution can be reversed by anything other than recA complementation. There are also other resistance evolution mechanisms untested here, including transcription-coupled repair (TCR) mechanisms involving Mfd. These need to be shown in order to draw the conclusions proposed.
  
  We sincerely thank you for your insightful comments. First, in this study, our primary focus is on the role of recA deficiency in bacterial antibiotic resistance evolution. Therefore, we conducted an in-depth investigation on E. coli strains lacking RecA and found that its absence promotes resistance evolution through mechanisms involving increased ROS accumulation and downregulation of DNA repair pathways. While we acknowledge the importance of other DNA repair genes outside of the SOS response and other resistance evolution mechanisms including the TCR mechanism, exploring them is beyond the scope of this paper. However, in a separate unpublished study, we have identified the involvement of another DNA recombination protein, whose role in resistance evolution is not yet fully elucidated, in promoting resistance development. This finding is part of another independent investigation.
  
  Regarding DNA damage and repair, our paper emphasizes that resistance-related mutations in DNA are central to the development of antibiotic resistance. These mutations are a manifestation of DNA damage. To demonstrate this, we measured mutation frequency and performed whole-genome sequencing, both of which confirmed an increase in DNA mutations.
  
  We appreciate the reviewer's suggestion to provide additional evidence for ROS accumulation, and we have now supplemented our manuscript with relevant experiments. Through flow cytometry, we found that ROS levels significantly increased in both the wild type and ΔrecA strains after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.
  
  Finally, in response to your question about reversing accelerated resistance evolution, we would like to highlight that, in addition to recA complementation, we successfully suppressed rapid resistance evolution by supplementing with an antioxidant, GSH (Fig. 4I). This further supports our hypothesis that increased ROS levels play a key role in driving accelerated resistance evolution in the absence of RecA.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.03.29.486198v4
www.biorxiv.org www.biorxiv.org

Stable sequential dynamics in prefrontal cortex represents subjective estimation of time

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control.
  
  We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we have performed additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity.
  
  Strengths:
  
  This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals.
  
  We thank the reviewer for the positive comments.
  
  Weaknesses:
  
  It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis.
  
  We thank the reviewer for the comment. Our imaging data generally yielded 50-150 cells in each session. The 18 neurons mentioned by the reviewer are from the duration cell category. We have now provided the number of imaged cells from each rat in the new Supplementary figure 1D. In addition, we have plotted the duration cells’ sequential activity of individual trials for each rat in new Supplementary figure 1B and 1C. These data demonstrate robust sequential activities from the duration cells.
  
  In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.
  
  We thank the reviewer for the suggestions. We have now performed analyses of the neural population trajectories as the reviewer suggested. We have calculated the neural population trajectories using the first two principal components of the neural activities during nose poke events. While both correct and incorrect trials show similar shapes of the trajectories, correct trials show more expanded paths, with longer lengths on average. These new results are now updated in Figure 4. Since type I or type II errors would likely generate trajectories not following the general direction which is different from our observations, these results are consistent with our conclusion that scaling errors contribute to the incorrect behavior timing in these rats.
  
  In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors.
  
  To clarify the original Figure 4G, the correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggests a possible use of this neural mechanism to time the action of the rats.
  
  In addition, we have performed the analysis suggested by the reviewer in our revision. We calculated two types of scaling factors. On individual cell level, we computed the peak position of individual trials to the expected positions from averaged template. And on neural population level, we searched for a scaling multiplier to resample the calcium activity data and minimized the differences between scaled activity and the expected template. Using these two factors, we found that correct trials show significantly larger scaling compared to incorrect trials, consistent with our original interpretation that behavior errors are primarily correlated with scaling errors in the neural activities (type III error). These new results are now incorporated in Figure 4 and we have also updated the main text for the descriptions.
  
  Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions."
  
  We agree with the reviewer, and have now modified this sentence in the abstract.
  
  Reviewer #2 (Public Review):
  
  In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions.
  
  This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues.
  
  Main Concerns
  
  (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available).
  
  We would like to respond to the reviewer’s comments 1, 2 and 4 together, since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of discussions go beyond the scope of this study, and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to be answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’sarticle, the experience of time cannot be measured.
  
  Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response to the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we have now performed a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the experimental rats during nose poke and analyzed its periodicity among different trials. We found that the coding cells (including duration, start and end cells) activities were not modulated by these motions, arguing against this possibility. These data are now included in the new Supp. Figure 2, and we have added corresponding texts in the manuscript.
  
  Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should be linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see graph below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation.
  
  In order to further test the relationship to motivation, we have measured the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We found that this reward-seeking time was positively correlated with the trial durations, suggesting that the durations were correlated with motivation to some degree. And when we scaled the activities of the duration cells by this reward-seeking time, we found that the patterns of the sequential activities were largely diminished, and showed a significantly lower peak entropy compared to the same activities scaled by trial durations. The remaining sequential pattern may be due to the correlation between trial durations and motivation (Supp. Figure 2), and the sequential pattern reflects timing more prominently. These analyses provide further evidence that the sequential activities were not coding motivations. These data are included in Figure 2F, 2K and supp. Figure 3 in revised manuscript.
  
  Author response image 1.
  
  Regarding whether the scaling sequential activity we report represents behavioral timing or true time estimation, we did not have evidence on this point. However, a previous study has shown that PFC silencing led to disruption of the mouse’s timing behavior without affecting the execution of the task (PMID: 24367075), arguing against the behavior timing interpretation. The main surprising finding of our present study is that these duration cells are different from the start and end cells
  
  in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clues regarding whether they are connected with reward-related or motion-related brain regions. This may help partially resolve the “time” vs.
  
  “motor” debate the reviewer mentioned.
  
  (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.
  
  (3) The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.
  
  We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. It appears that the reviewer requires we conduct our analysis using each rat individually. In our revised manuscript, we have conducted and reported analyses with individual rat in the original Figure 1C, Figure 2C, G, K, Figure 4F.
  
  (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke.
  
  (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task.
  
  We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We have now incorporated more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Please refer explicitly to the three types of cells in the abstract.
  
  We have now modified the abstract as suggested during revision.
  
  (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported.
  
  We have now cited and discussed the study in the discussion section of the revised manuscript.
  
  (3) Please state the number of studied animals at the beginning of the results section.
  
  We have now provided this information as requested. The numbers of rats are also plotted in Figure 1D for each analysis.
  
  (4) Why do the middle and right panels of Figure 2E show duration cells.
  
  Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.
  
  (5) Which behavioral sessions of Figure 1B were analyzed further.
  
  We have now labeled the analyzed sessions in Figure 1B with red color in the revised manuscript.
  
  (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells.
  
  We thank the reviewer for the suggestion and have now modified the figure accordingly in the revised manuscript.
  
  (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC.
  
  We thank the reviewer for the question. In our experience, mice with lens implanted in the mPFC did not show observable difference with mice without surgery in the acquisition of the task and the distribution of the nose-poke durations. In our dataset, rats with the lens implantation showed similar nose-poking behavior as those without lens implantation (Figure 1B). Thus, it seems that the effect of ablation, if any, was quite limited, in the scope of our task.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.26.582071v2
www.biorxiv.org www.biorxiv.org

Vitamin D induces SIRT1 activation through K610 deacetylation in colon cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers and appreciate their recommendations to improve this work.
  
  Reviewer 1:
  
  Reviewer 1 recognizes that ‘This is an important finding that is relevant to the actions of VDR on colorectal cancer. The data presented to support the presented conclusion is convincing’.
  
  Reviewer 1 identifies as a major weakness ‘that the site of SIRT1 regulatory lysine acetylation is defined by mutational analysis rather than by direct biochemical analysis.
  
  However, as the reviewer mentions “previous reports of K610 acetylation using mass spec https://www.phosphosite.org/proteinAction.action?id=5946&showAllSites=true), and the absence of SIRT1 mutant K610R in the immunoprecipitates using anti-acetylated lysine antibodies presented in Fig. 4E clearly overcome this weakness”.
  
  In addition, overall SIRT1 acetylation is reduced by vitamin D and by the specific SIRT1 activator SRT1720 as shown by decreased SIRT1 in the anti-acetyl-lysine immunoprecipates, (Fig. 4A and B). The second weakness identified by Reviewer 1 concerns “the use of only one shRNA to deplete VDR in CRC cells.”
  
  We have made efforts to demonstrate that the results are specific, though we do not have results with alternative shRNAs for a variety of reasons. To mitigate this issue, we have compared two colon cancer cells originating from the same patient which differ in the presence/absence of VDR. SW480, derive from the primary tumor and express VDR, whereas SW620 cells were derived from a lung metastasis and lack VDR. Similar, to the comparison of HCT116 with shVDR HCT116 cells presented in this study, VD induced SIRT1 levels in SW480 in contrast to a lack of induction in SW620, as shown in Author response image 1. This result provides support for the specificity of the shVDR.
  
  Author response image 1.
  
  Vitamin D requires the presence of VDR to increase SIRT1 protein levels. SW480 and SW620 cell lines derive from the same patient, from primary tumor and lung metastasis respectively and differ in their VDR content. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. Representative western-blot, where TBP was used as a loading control, of four biological replicates. Statistical analysis by ANOVA and values represent mean ± SEM; *p<0.05; *** p<0.001.
  
  The referee noticed the inclusion of an siRNA for SIRT1 in Table 1. We apologize for that, since this is an error, and no results are presented in this study with SIRT1 depletion. Table 1 has been modified accordingly.
  
  Concerning the third and fourth weaknesses that Reviewer 1 identifies, we agree that mapping the interacting domains in both VDR and SIRT1 and in vitro reconstitution would improve the present study. However, we believe that these would constitute long-term studies that themselves are not strictly necessary at this stage. Consequently, we favor the publication of the present body of work. In vitro reconstitution of the present work and the putative relevance of the proposed mechanism of vitamin D action via SIRT1 on types of cancer other than colon (eg breast etc), are certainly very interesting and warrant further investigation.
  
  Reviewer 2:
  
  This reviewer acknowledges that “…this study provides very interesting and solid information on the link between vitamin D and colorectal cancer. It is likely that this study will provide insight into the importance of vitamin D in other types of cancer. It may also lead to new therapeutic strategies for specific cases. This article is convincing, although the authors can improve their study as outlined…”
  
  We acknowledge the proposed changes and recommendations, and have changed the text and Figures as suggested the by Reviewer as follows:
  
  Figure 1
  
  Figure 1E and F: the cell lines used were described in the figure legend, but we agree that including the name in the figure brings more clarity and these are now added.
  
  Figure 1G: the statistical analysis was for all panels of Figure 1 as described in the Figure legend (lines 731-32), We have amended the original omission of panels 1G and 1H. In panel G, * represents statistical analysis by ANOVA (comparing the four groups) whereas # was the analysis by Students t test (comparing the two indicated groups), where * or #p<0.05. We hope to have clarified this point now.
  
  Figure 2
  
  Figure 2C: We showed originally the SIRT1/VDR interaction by immunoprecipitation of VDR and detection of SIRT1 in immunoprecipitates. We also showed immunoprecipitation of exogenously expressed Myc-SIRT1 (WT or mutants) and detection of VDR in immunoprecipitates (Figure 4F). The reviewer requests that we perform the inverse IP for endogenous SIRT1, that is immunoprecipitate SIRT1 and detect VDR in the immunoprecipates, which we now supply for the reviewer in Author response image 2.
  
  Author response image 2
  
  Immunoprecipitation of endogenous SIRT1 to show interaction with VDR. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. Representative western-blots, where TBP was used as a loading control.
  
  Figure 3
  
  Figure 3D: ‘The authors should indicate the color of the different stainings’. Immunostainings have been revealed with DAB (diaminebenzidine); thus, positiveness is highlighted by light or dark brown according to their low or high protein expression. Counterstaining has been performed with hematoxilin, which stains nuclei in dark blue and cytoplasm in light blue.
  
  Do the authors mean that the secondary antibody marks in brown/red? If so, these results are inconsistent with the text considering that hematoxylin was used for non-tumor tissue. This part needs to be clarified.
  
  We thank the Reviewer for asking us to clarify this issue. Neither the primary nor anti-Ig horseradish peroxidase-conjugated secondary antibodies presented positiveness resulting from these antibodies individually. Therefore, secondary antibody does not mark in any color. Hematoxylin has been used as counterstaining for both non-tumor as well as for tumor tissues.
  
  What about the level of FOXO3A in these tissues/tumors?
  
  We did not prove the tumor sections for specific SIRT1 substrates such as FoxO3A since their levels may not entirely depend on SIRT1 specific deacetylation.
  
  What is the level of 1,25(OH)2D3 in these patients?
  
  We agree with this referee that this information would be very useful, but unfortunately, we do not have data on vitamin D levels for these patients since they were not specifically recruited for this study and vitamin D levels are not routinely measured.
  
  Figure 3D, the following information is missing: "A detailed amplification is shown in the lower left of each micrograph."
  
  We decided not to include the amplification in micrographs because the aim of the manuscript is focused on protein levels, not localization and including the amplification was more confusing than enlightening. This has been amended now in the text.
  
  Figure 3E, it says p=0.325, in the legend p<0.01, and in the text there is a trend. Which is the correct version?
  
  We really apologize for this misunderstanding. As stated in the Figure, p=0.325 and therefore it does not reach statistical significance. We have amended the main text and figure legend to report that differences between SIRT1 expression levels of healthy and cancer human colon samples are not statistically significant.
  
  Figure 4
  
  Figure 4F. The quality of the presented blots is not optimal. It needs to be improved. In addition, the number of independent biological experiments is not indicated.
  
  We have substituted the representative western-blot and included statistical analysis of four independent biological replicates. Since 4F is now a bigger panel, it has required a slight reorganization of the whole Figure, but the rest of panels remain with the originals. Now we indicate in the figure legend that at least three independent biological replicas were analyzed. In addition, we supply below the four experiments for the reviewer in Author response image 3.
  
  Author response image 3
  
  Immunoprecipitation of exogenous myc-tagged SIRT1 to show interaction with VDR of wild type (WT) or mutants. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. FT: Flow Through. TBP as a loading control.
  
  Regarding the last general comment concerning the number of independent experiments performed, this is indicated in the Figure legends (lines 732-36, 757-58, 82324, 840-41). All the in vitro experiments were performed at least as three independent experiments and not by repeating a western blot. A representative western blot is shown, and the statistical analysis corresponds to the analysis of the three biological replicates. For experiments with patient samples, the number of patients appears clearly indicated in the corresponding panel.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.22.529558v2
www.biorxiv.org www.biorxiv.org

Predicting individual traits from models of brain dynamics accurately and reliably using the Fisher kernel

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors attempt to validate Fisher Kernels on the top of HMM as a way to better describe human brain dynamics at resting state. The objective criterion was the better prediction of the proposed pipeline of the individual traits.
  
  Strengths:
  
  The authors analyzed rs-fMRI dataset from the HCP providing results also from other kernels.
  
  The authors also provided findings from simulation data.
  
  Weaknesses:
  
  (1) The authors should explain in detail how they applied cross-validation across the dataset for both optimization of parameters, and also for cross-validation of the models to predict individual traits.
  
  Indeed, there were details about the cross-validation for hyperparameter tuning and prediction missing. This problem was also raised by Reviewer #2. We have now rephrased this section in 4.4 and added details: ll. 804-813:
  
  “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters λ (and τ in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).“ and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”
  
  (2) They discussed throughout the paper that their proposed (HMM+Fisher) kernel approach outperformed dynamic functional connectivity (dFC). However, they compared the proposed methodology with just static FC.
  
  We would like to clarify that the HMM is itself a method for estimating dynamic (or time-varying) FC, just like the sliding window approach, see also Vidaurre, 2024 (https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00363/124983) for an overview of terminology.
  
  See also our response to Q3.
  
  (3) If the authors wanted to claim that their methodology is better than dFC, then they have to demonstrate results based on dFC with the trivial sliding window approach.
  
  We would like to be clear that we do not claim in the manuscript that our method outperforms other dynamic functional connectivity (dFC) approaches, such as sliding window FC. We have now made changes to the manuscript to make this clearer.
  
  First, we have clarified our use of the term “brain dynamics” to signify “time-varying amplitude and functional connectivity patterns” in this context, as Reviewer #2 raised the point that the former term is ambiguous (ll.33-35: “One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”).
  
  Second, our focus is on our method being a way of using dFC for predictive modelling, since there currently is no widely accepted way of doing this. One reason why dFC is not usually considered in prediction studies is that it is mathematically not trivial how to use the parameters from estimators of dynamic FC for a prediction. This includes the sliding window approach. We do not aim at comparing across different dFC estimators in this paper. To make these points clearer, we have revised the introduction to now say:
  
  Ll. 39-50:
  
  “One reason why brain dynamics are not usually considered in this context pertains to their representation: They are represented using models of varying complexity that are estimated from modalities such as functional MRI or MEG. Although there exists a variety of methods for estimating time-varying or dynamic FC (Lurie et al., 2019), like the commonly used sliding-window approach, there is currently no widely accepted way of using them for prediction problems. This is because these models are usually parametrised by a high number of parameters with complex mathematical relationships between the parameters that reflect the model assumptions. How to leverage these parameters for prediction is currently an open question.
  
  We here propose the Fisher kernel for predicting individual traits from brain dynamics, using information from generative models that do not assume any knowledge of task timings. We focus on models of brain dynamics that capture within-session changes in functional connectivity and amplitude from fMRI scans, in this case acquired during wakeful rest, and how the parameters from these models can be used to predict behavioural variables or traits. In particular, we use the Hidden Markov Model (HMM), which is a probabilistic generative model of time-varying amplitude and functional connectivity (FC) dynamics (Vidaurre et al., 2017).”
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The manuscript presents a valuable investigation into the use of Fisher Kernels for extracting representations from temporal models of brain activity, with the aim of improving regression and classification applications. The authors provide solid evidence through extensive benchmarks and simulations that demonstrate the potential of Fisher Kernels to enhance the accuracy and robustness of regression and classification performance in the context of functional magnetic resonance imaging (fMRI) data. This is an important achievement for the neuroimaging community interested in predictive modeling from brain dynamics and, in particular, state-space models.
  
  Strengths:
  
  (1) The study's main contribution is the innovative application of Fisher Kernels to temporal brain activity models, which represents a valuable advancement in the field of human cognitive neuroimaging.
  
  (2) The evidence presented is solid, supported by extensive benchmarks that showcase the method's effectiveness in various scenarios.
  
  (3) Model inspection and simulations provide important insights into the nature of the signal picked up by the method, highlighting the importance of state rather than transition probabilities.
  
  (4) The documentation and description of the methods are solid including sufficient mathematical details and availability of source code, ensuring that the study can be replicated and extended by other researchers.
  
  Weaknesses:
  
  (1) The generalizability of the findings is currently limited to the young and healthy population represented in the Human Connectome Project (HCP) dataset. The potential of the method for other populations and modalities remains to be investigated.
  
  As suggested by the reviewer, we have added a limitations paragraph and included a statement about the dataset: Ll. 477-481: “The fMRI dataset we used (HCP 1200 Young Adult) is a large sample taken from a healthy, young population, and it remains to be shown how our findings generalise to other datasets, e.g. other modalities such as EEG/MEG, clinical data, older populations, different data quality, or smaller sample sizes both in terms of the number of participants and the scanning duration”.
  
  We would like to emphasise that this is a methodological contribution, rather than a basic science investigation about cognition and brain-behaviour associations. Therefore, the method would be equally usable on different populations, even if the results vary.
  
  (2) The possibility of positivity bias in the HMM, due to the use of a population model before cross-validation, needs to be addressed to confirm the robustness of the results.
  
  As pointed out by both Reviewers #2 and #3, we did not separate subjects into training and test set before fitting the HMM. To address this issue, we have now repeated the predictions for HMMs fit only to the training subjects. We show that this has no effect on the results. Since this question has consequences for the Fisher kernel, we have also added simulations showing how the different kernels react to increasing heterogeneity between training and test set. These new results are added as results section 2.4 (ll. 376-423).
  
  (3) The statistical significance testing might be compromised by incorrect assumptions about the independence between cross-validation distributions, which warrants further examination or clearer documentation.
  
  We have now replaced the significance testing with repeated k-fold cross-validated corrected tests. Note that this required re-running the models to be able to test differences in accuracies on the level of individual folds, resulting in different plots throughout the manuscript and different statistical results. This does not, however, change the main conclusions of our manuscript.
  
  (4) The inclusion of the R^2 score, sensitive to scale, would provide a more comprehensive understanding of the method's performance, as the Pearson correlation coefficient alone is not standard in machine learning and may not be sufficient (even if it is common practice in applied machine learning studies in human neuroimaging).
  
  We have now added the coefficient of determination to the results figures.
  
  (5) The process for hyperparameter tuning is not clearly documented in the methods section, both for kernel methods and the elastic net.
  
  As mentioned above in the response to Reviewer #1, we have now added details about hyperparameter tuning for the kernel methods and the non-kernelised static FC regression models (see also Reviewer #1 comment 1): Ll.804-813: “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters (and in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).” and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”, as well as ll.913-917: “All time-averaged FC models are fitted using the same (nested) cross-validation strategy as described above (10-fold CV using the outer loop for model evaluation and the inner loop for model selection using grid-search for hyperparameter tuning, accounting for family structure in the dataset, and repeated 100 times with randomised folds).”
  
  (6) For the time-averaged benchmarks, a comparison with kernel methods using metrics defined on the Riemannian SPD manifold, such as employing the Frobenius norm of the logarithm map within a Gaussian kernel, would strengthen the analysis, cf. Jayasumana (https://arxiv.org/abs/1412.4172) Table 1, log-euclidean metric.
  
  We have now added the log-Euclidean Gaussian kernel proposed by the reviewer to the model comparisons. The additional model does not change our conclusions.
  
  (7) A more nuanced and explicit discussion of the limitations, including the reliance on HCP data, lack of clinical focus, and the context of tasks for which performance is expected to be on the low end (e.g. cognitive scores), is crucial for framing the findings within the appropriate context.
  
  We have now revised the discussion section and added an explicit limitations paragraph: Ll. 475-484:
  
  “We here aimed to show the potential of the HMM-Fisher kernel approach to leverage information from patterns of brain dynamics to predict individual traits in an example fMRI dataset as well as simulated data. The fMRI dataset we used (HCP 1200 Young Adult) is a large sample taken from a healthy, young population, and it remains to be shown how the exhibited performance generalises to other datasets, e.g. other modalities such as EEG/MEG, clinical data, older populations, different data quality, or smaller sample sizes both in terms of the number of participants and the scanning duration. Additionally, we only tested our approach for the prediction of a specific set of demographic items and cognitive scores; it may be interesting to test the framework in also on clinical variables, such as the presence of a disease or the response to pharmacological treatment.”
  
  (8) While further benchmarks could enhance the study, the authors should provide a critical appraisal of the current findings and outline directions for future research, considering the scope and budget constraints of the work.
  
  In addition to the new limitations paragraph (see previous comment), we have now rephrased our interpretation of the results and extended the outlook paragraph: Ll. 485-507:
  
  “There is growing interest in combining different data types or modalities, such as structural, static, and dynamic measures, to predict phenotypes (Engemann et al., 2020; Schouten et al., 2016). While directly combining the features from each modality can be problematic, modality-specific kernels, such as the Fisher kernel for time-varying amplitude and/or FC, can be easily combined using approaches such as stacking (Breiman, 1996) or Multi Kernel Learning (MKL) (Gönen & Alpaydın, 2011). MKL can improve prediction accuracy of multimodal studies (Vaghari et al., 2022), and stacking has recently been shown to be a useful framework for combining static and time-varying FC predictions (Griffin et al., 2024). A detailed comparison of different multimodal prediction strategies including kernels for time-varying amplitude/FC may may be the focus of future work.
  
  In a clinical context, while there are nowadays highly accurate biomarkers and prognostics for many diseases, others, such as psychiatric diseases, remain poorly understood, diagnosed, and treated. Here, improving the description of individual variability in brain measures may have potential benefits for a variety of clinical goals, e.g., to diagnose or predict individual patients’ outcomes, find biomarkers, or to deepen our understanding of changes in the brain related to treatment responses like drugs or non-pharmacological therapies (Marquand et al., 2016; Stephan et al., 2017; Wen et al., 2022; Wolfers et al., 2015). However, the focus so far has mostly been on static or structural information, leaving the potentially crucial information from brain dynamics untapped. Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this work, the authors use a Hidden Markov Model (HMM) to describe dynamic connectivity and amplitude patterns in fMRI data, and propose to integrate these features with the Fisher Kernel to improve the prediction of individual traits. The approach is tested using a large sample of healthy young adults from the Human Connectome Project. The HMM-Fisher Kernel approach was shown to achieve higher prediction accuracy with lower variance on many individual traits compared to alternate kernels and measures of static connectivity. As an additional finding, the authors demonstrate that parameters of the HMM state matrix may be more informative in predicting behavioral/cognitive variables in this data compared to state-transition probabilities.
  
  Strengths:
  
  - Overall, this work helps to address the timely challenge of how to leverage high-dimensional dynamic features to describe brain activity in individuals.
  
  - The idea to use a Fisher Kernel seems novel and suitable in this context.
  
  - Detailed comparisons are carried out across the set of individual traits, as well as across models with alternate kernels and features.
  
  - The paper is well-written and clear, and the analysis is thorough.
  
  Potential weaknesses:
  
  - One conclusion of the paper is that the Fisher Kernel "predicts more accurately than other methods" (Section 2.1 heading). I was not certain this conclusion is fully justified by the data presented, as it appears that certain individual traits may be better predicted by other approaches (e.g., as shown in Figure 3) and I found it hard to tell if certain pairwise comparisons were performed -- was the linear Fisher Kernel significantly better than the linear Naive normalized kernel, for example?
  
  We have revised the abstract and the discussion to state the results more appropriately. For instance, we changed the relevant section in the abstract to (ll. 24-26):
  
  “We show here, in fMRI data, that the HMM-Fisher kernel approach is accurate and reliable. We compare the Fisher kernel to other prediction methods, both time-varying and time-averaged functional connectivity-based models.”,
  
  and in the discussion, removing the sentence
  
  “resulting in better generalisability and interpretability compared to other methods”,
  
  and adding (given the revised statistical results) ll. 435-436:
  
  “though most comparisons were not statistically significant given the narrow margin for improvements.”
  
  In conjunction with the new statistical approach (see Reviewer #2, comment 3), we have now streamlined the comparisons. We explained which comparisons were performed in the methods ll.880-890:
  
  “For the main results, we separately compare the linear Fisher kernel to the other linear kernels, and the Gaussian Fisher kernel to the other Gaussian kernels, as well as to each other. We also compare the linear Fisher kernel to all time-averaged methods. Finally, to test for the effect of tangent space projection for the time-averaged FC prediction, we also compare the Ridge regression model to the Ridge Regression in Riemannian space. To test for effects of removing sets of features, we use the approach described above to compare the kernels constructed from the full feature sets to their versions where features were removed or reduced. Finally, to test for effects of training the HMM either on all subjects or only on the subjects that were later used as training set, we compare each kernel to the corresponding kernel constructed from HMM parameters, where training and test set were kept separate.“
  
  Model performance evaluation is done on the level of all predictions (i.e., across target variables, CV folds, and CV iterations) rather than for each of the target variables separately. That means different best-performing methods depending on the target variables are to be expected.
  
  - While 10-fold cross-validation is used for behavioral prediction, it appears that data from the entire set of subjects is concatenated to produce the initial group-level HMM estimates (which are then customized to individuals). I wonder if this procedure could introduce some shared information between CV training and test sets. This may be a minor issue when comparing the HMM-based models to one another, but it may be more important when comparing with other models such as those based on time-averaged connectivity, which are calculated separately for train/test partitions (if I understood correctly).
  
  The lack of separation between training and test set before fitting the HMM was also pointed out by Reviewer #2. We are addressing this issue in the new Results section 2.4 (see also our response to Reviewer #2, comment 2).
  
  Recommendations for the authors:
  
  The individual public reviews all indicate the merits of the study, however, they also highlight relatively consistent questions or issues that ought to be addressed. Most significantly, the authors ought to provide greater clarity surrounding the use of the cross-validation procedures they employ, and the use of a common atlas derived outside the cross-validation loop. Also, the authors should ensure that the statistical testing procedures they employ accommodate the dependencies induced between folds by the cross-validation procedure and give care to ensuring that the conclusions they make are fully supported by the data and statistical tests they present.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Overall, the study is interesting but demands further improvements. Below, I summarize my comments:
  
  (1) The authors should explain in detail how they applied cross-validation across the dataset for both optimization of parameters, and also for cross-validation of the models to predict individual traits.
  
  How did you split the dataset for both parameters optimization, and for the CV of the prediction of behavioral traits?
  
  A review and a summary of various CVs that have been applied on the same dataset should be applied.
  
  We apologise for the oversight and have now added more details to the CV section of the methods, see our response to Reviewer #1 comment 1:
  
  In ll. 804-813:
  
  “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters (and in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).“ and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”
  
  (2) The authors should explain in more detail how they applied ICA-based parcellation at the group-level.
  
  A. Did you apply it across the whole group? If yes, then this is problematic since it rejects the CV approach. It should be applied within the folds.
  
  B. How did you define the representative time-source per ROI?
  
  A: How group ICA was applied was stated in the Methods section (4.1 HCP imaging and behavioural data), ll. 543-548:
  
  “The parcellation was estimated from the data using multi-session spatial ICA on the temporally concatenated data from all subjects.”
  
  We have now added a disclaimer about the divide between training and test set:
  
  “Note that this means that there is no strict divide between the subjects used for training and the subjects for testing the later predictive models, so that there is potential for leakage of information between training and test set. However, since this step does not concern the target variable, but only the preprocessing of the predictors, the effect can be expected to be minimal (Rosenblatt et al., 2024).”
  
  We understand that in order to make sure we avoid data leakage, it would be desirable to estimate and apply group ICA separately for the folds, but the computational load of this would be well beyond the constraints of this particular work, where we have instead used the parcellation provided by the HCP consortium.
  
  B: This was also stated in 4.1, ll. 554-559: “Timecourses were extracted using dual regression (Beckmann et al., 2009), where group-level components are regressed onto each subject’s fMRI data to obtain subject-specific versions of the parcels and their timecourses. We normalised the timecourses of each subject to ensure that the model of brain dynamics and, crucially, the kernels were not driven by (averaged) amplitude and variance differences between subjects.”
  
  (3) The authors discussed throughout the paper that their proposed (HMM+Fisher) kernel approach outperformed dynamic functional connectivity (dFC). However, they compared the proposed methodology with just static FC.
  
  A. The authors didn't explain how static and dFC have been applied.
  
  B. If the authors wanted to claim that their methodology is better than dFC, then they have to demonstrate results based on dFC with the trivial sliding window approach.
  
  C. Moreover, the static FC networks have been constructed by concatenating time samples that belong to the same state across the time course of resting-state activity.
  
  So, it's HMM-informed static FC analysis, which is problematic since it's derived from HMM applied over the brain dynamics.
  
  I don't agree that connectivity is derived exclusively from the clustering of human brain dynamics!
  
  D. A static approach of using the whole time course, and a dFC following the trivial sliding-window approach should be adopted and presented for comparison with (HMM+Fisher) kernel.
  
  We do not intend to claim our manuscript that our method outperforms other methods for doing dynamic FC. Indeed, we would like to be clear that the HMM itself is a method for capturing dynamic FC. Please see our responses to public review comments 2 and 3 by reviewer #1, copied below, which is intended to clear up this misunderstanding:
  
  We would like to clarify that the HMM is itself a method for estimating dynamic (or time-varying) FC, just like the sliding window approach, see also Vidaurre, 2024 (https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00363/124983) for an overview of terminology.
  
  We would like to be clear that we do not claim in the manuscript that our method outperforms other dynamic functional connectivity (dFC) approaches, such as sliding window FC. We have now made changes to the manuscript to make this clearer.
  
  First, we have clarified our use of the term “brain dynamics” to signify “time-varying amplitude and functional connectivity patterns” in this context, as Reviewer #2 raised the point that the former term is ambiguous.
  
  Second, our focus is on our method being a way of using dFC for predictive modelling, since there currently is no widely accepted way of doing this. One reason why dFC is not usually considered in prediction studies is that it is mathematically not trivial how to use the parameters from estimators of dynamic FC for a prediction. This includes the sliding window approach. We do not aim at comparing across different dFC estimators in this paper. To make these points clearer, we have revised the introduction to now say:
  
  Ll. 39-50:
  
  “One reason why brain dynamics are not usually considered in this context pertains to their representation: They are represented using models of varying complexity that are estimated from modalities such as functional MRI or MEG. Although there exists a variety of methods for estimating time-varying or dynamic FC (Lurie et al., 2019), like the commonly used sliding-window approach, there is currently no widely accepted way of using them for prediction problems. This is because these models are usually parametrised by a high number of parameters with complex mathematical relationships between the parameters that reflect the model assumptions. How to leverage these parameters for prediction is currently an open question.
  
  We here propose the Fisher kernel for predicting individual traits from brain dynamics, using information from generative models that do not assume any knowledge of task timings. We focus on models of brain dynamics that capture within-session changes in functional connectivity and amplitude from fMRI scans, in this case acquired during wakeful rest, and how the parameters from these models can be used to predict behavioural variables or traits. In particular, we use the Hidden Markov Model (HMM), which is a probabilistic generative model of time-varying amplitude and functional connectivity (FC) dynamics (Vidaurre et al., 2017).”
  
  To the additional points raised here:
  
  A: How static and dynamic FC have been estimated is explicitly stated in the relevant Methods sections 4.2 (The Hidden Markov Model), which explains the details of using the HMM to estimate dynamic functional connectivity; and 4.5 (Regression models based on time-averaged FC features), which explains how static FC was computed.
  
  B: We are not making this claim. We have now modified the Introduction to avoid further misunderstandings, as per ll. 33-36: “One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”
  
  C: This is not how static FC networks were constructed; we apologise for the confusion. We also do not perform any kind of clustering. The only “HMM-informed static FC analysis” is the static FC KL divergence model to allow for a more direct comparison with the time-varying FC KL divergence model, but we have included several other static FC models (log-Euclidean, Ridge regression, Ridge regression Riem., Elastic Net, Elastic Net Riem., and Selected Edges), which do not use HMMs. This is explained in Methods section 4.5.
  
  D: As explained above, we have included four (five in the revised manuscript) static approaches using the whole time course, and we do not claim that our method outperforms other dynamic FC models. We also disagree that using the sliding window approach for predictive modelling is trivial, as explained in the introduction of the manuscript and under public review comment 3.
  
  (4) Did you correct for multiple comparisons across the various statistical tests?
  
  All statistical comparisons have been corrected for multiple comparisons. Please find the relevant text in Methods section 4.4.1.
  
  (5) Do we expect that behavioral traits are encapsulated in resting-state human brain dynamics, and on which brain areas mostly? Please, elaborate on this.
  
  While this is certainly an interesting question, our paper is a methodological contribution about how to predict from models of brain dynamics, rather than a basic science study about the relation between resting-state brain dynamics and behaviour. The biological aspects and interpretation of the specific brain-behaviour associations are a secondary point and out of scope for this paper. Our approach uses whole-brain dynamics, which does not require selecting brain areas of interest.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Beyond the general principles included in the public review, here are a few additional pointers to minor issues that I would wish to see addressed.
  
  Introduction:
  
  - The term "brain dynamics" encompasses a broad spectrum of phenomena, not limited to those captured by state-space models. It includes various measures such as time-averaged connectivity and mean EEG power within specific frequency bands. To ensure clarity and relevance for a diverse readership, it would be beneficial to adopt a more inclusive and balanced approach to the terminology used.
  
  The reviewer rightly points out the ambiguity of the term “brain dynamics”, which we use in the interest of readability. The HMM is one of several possible descriptions of brain dynamics. We have now included a statement early in the introduction to narrow this down:
  
  Ll. 32-35:
  
  “… the patterns in which brain activity unfolds over time, i.e., brain dynamics. One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”
  
  And ll. 503-507:
  
  “Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, as one of many possible descriptions of brain dynamics, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”
  
  Figures:
  
  - The font sizes across the figures, particularly in subpanels 2B and 2C, are quite small and may challenge readability. It is advisable to standardize the font sizes throughout all figures to enhance legibility.
  
  We have slightly increased the overall font sizes, while we are generally following figure recommendations set out by Nature. The font sizes are the same throughout the figures.
  
  - When presenting performance comparisons, a horizontal layout is often more intuitive for readers, as it aligns with the natural left-to-right reading direction. This is not just a personal preference; it is supported by visualization best practices as outlined in resources like the NVS Cheat Sheet (https://github.com/GraphicsPrinciples/CheatSheet/blob/master/NVSCheatSheet.pdf) and Kieran Healy's book (https://socviz.co/lookatdata.html).
  
  We have changed all figures to use horizontal layout, hoping that this will ease visual comparison between the different models.
  
  - In the kernel density estimation (KDE) and violin plot representations, it appears that the data displays may be truncated. It is crucial to indicate where the data distribution ends. Overplotting individual data points could provide additional clarity.
  
  To avoid confusion about the data distribution in the violin plots, we have now overlaid scatter plots, as suggested by the reviewer. Overlaying the fold-level accuracies was not feasible (since this would result in ~1.5 million transparent points for a single figure), so we instead show the accuracies averaged over folds but separate for target variables and CV iterations. Only the newly added coefficient of determination plots had to be truncated, which we have noted in the figure legend.
  
  - Figure 3 could inadvertently suggest that time-varying features correspond to panel A and time-averaged features to panel B. To avoid confusion, consider reorganizing the labels at the bottom into two rows for clearer attribution.
  
  We have changed the layout of the time-varying and time-averaged labels in the new version of the plots to avoid this issue.
  
  Discussion:
  
  - The discussion on multimodal modeling might give the impression that it is more effective with multiple kernel learning (MKL) than with other methods. To present a more balanced view, it would be appropriate to rephrase this section. For instance, stacking, examples of which are cited in the same paragraph, has been successfully applied in practice. The text could be adjusted to reflect that Fisher Kernels via MKL adds to the array of viable options for multimodal modeling. As a side thought: additionally, a well-designed comparison between MKL and stacking methods, conducted by experts in each domain, could greatly benefit the field. In certain scenarios, it might even be demonstrated that the two approaches converge, such as when using linear kernels.
  
  We would like to thank the reviewer for the suggestion about the discussion concerning multimodal modelling. We agree that there are other relevant methods that may lead to interesting future work and have now included stacking and refined the section: ll. 487-494:
  
  “While directly combining the features from each modality can be problematic, modality-specific kernels, such as the Fisher kernel for time-varying amplitude and/or FC, can be easily combined using approaches such as stacking (Breiman, 1996) or Multi Kernel Learning (MKL) (Gönen & Alpaydın, 2011). MKL can improve prediction accuracy of multimodal studies (Vaghari et al., 2022), and stacking has recently been shown to be a useful framework for combining static and time-varying FC predictions (Griffin et al., 2024). A detailed comparison of different multimodal prediction strategies including kernels for time-varying amplitude/FC may be the focus of future work.”
  
  - The potential clinical applications of brain dynamics extend beyond diagnosis and individual outcome prediction. They play a significant role in the context of biomarkers, including pharmacodynamics, prognostic assessments, responder analysis, and other uses. The current discussion might be misinterpreted as being specific to hidden Markov model (HMM) approaches. For diagnostic purposes, where clinical assessment or established biomarkers are already available, the need for new models may be less pressing. It would be advantageous to reframe the discussion to emphasize the potential for gaining deeper insights into changes in brain activity that could indicate therapeutic effects or improvements not captured by structural brain measures. However, this forward-looking perspective is not the focus of the current work. A nuanced revision of this section is recommended to better reflect the breadth of applications.
  
  We appreciate the reviewer’s thoughtful suggestions regarding the discussion of potential clinical applications. We have included the suggestions and refined this section of the discussion: Ll. 495-507:
  
  “In a clinical context, while there are nowadays highly accurate biomarkers and prognostics for many diseases, others, such as psychiatric diseases, remain poorly understood, diagnosed, and treated. Here, improving the description of individual variability in brain measures may have potential benefits for a variety of clinical goals, e.g., to diagnose or predict individual patients’ outcomes, find biomarkers, or to deepen our understanding of changes in the brain related to treatment responses like drugs or non-pharmacological therapies (Marquand et al., 2016; Stephan et al., 2017; Wen et al., 2022; Wolfers et al., 2015). However, the focus so far has mostly been on static or structural information, leaving the potentially crucial information from brain dynamics untapped. Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”
  
  Reviewer #3 (Recommendations For The Authors):
  
  - I wondered if the authors could provide, within the Introduction, an intuitive description for how the Fisher Kernel "preserves the structure of the underlying model of brain dynamics" / "preserves the mathematical structure of the underlying HMM"? Providing more background may help to motivate this study to a general audience.
  
  We agree that this would be helpful and have now added this to the introduction: Ll.61-67:
  
  “Mathematically, the HMM parameters lie on a Riemannian manifold (the structure). This defines, for instance, the relation between parameters, such as: how changing one parameter, like the probabilities of transitioning from one state to another, would affect the fitting of other parameters, like the states’ FC. It also defines the relative importance of each parameter; for example, how a change of 0.1 in the transition probabilities would not be the same as a change of 0.1 in one edge of the states’ FC matrices.”
  
  To communicate the intuition behind the concept, the idea was also illustrated in Figure 1, panel 4 by showing Euclidean distances as straight lines through a curved surface (4a, Naïve kernel), as opposed to the tangent space projection onto the curved manifold (4b, Fisher kernel).
  
  - Some clarifications regarding Figure 2a would be helpful. Was the linear Fisher Kernel significantly better than the linear Naive normalized kernel? I couldn't find whether this comparison was carried out. Apologies if I have missed it in the text. For some of the brackets indicating pairwise tests and their significance values, the start/endpoints of the bracket fall between two violins; in this case, were the results of the linear and Gaussian Fisher Kernels pooled together for this comparison?
  
  We have now streamlined the statistical comparisons and avoided plotting brackets falling between two violin plots. The comparisons that were carried out are stated in the methods section 4.4.1. Please see also our response to above to Reviewer #3 public review, potential weaknesses, point 1, relevant point copied below:
  
  In conjunction with the new statistical approach (see Reviewer #2, comment 3), we have now streamlined the comparisons. We explained which comparisons were performed in the methods ll.880-890:
  
  “For the main results, we separately compare the linear Fisher kernel to the other linear kernels, and the Gaussian Fisher kernel to the other Gaussian kernels, as well as to each other. We also compare the linear Fisher kernel to all time-averaged methods. Finally, to test for the effect of tangent space projection for the time-averaged FC prediction, we also compare the Ridge regression model to the Ridge Regression in Riemannian space. To test for effects of removing sets of features, we use the approach described above to compare the kernels constructed from the full feature sets to their versions where features were removed or reduced. Finally, to test for effects of training the HMM either on all subjects or only on the subjects that were later used as training set, we compare each kernel to the corresponding kernel constructed from HMM parameters, where training and test set were kept separate”.
  
  - The authors may wish to include, in the Discussion, some remarks on the use of all subjects in fitting the group-level HMM and the implications for the cross-validation performance, and/or try some analysis to ensure that the effect is minor.
  
  As suggested by reviewers #2 and #3, we have now performed the suggested analysis and show that fitting the group-level HMM to all subjects compared to only to the training subjects has no effect on the results. Please see our response to Reviewer #2, public review, comment 2.
  
  - The decision to use k=6 states was made here, and I wondered if the authors may include some support for this choice (e.g., based on findings from prior studies)?
  
  We have now refined and extended our explanation and rationale behind the number of states: Ll. 586-594: “The number of states can be understood as the level of detail or granularity with which we describe the spatiotemporal patterns in the data, akin to a dimensionality reduction, where a small number of states will lead to a very general, coarse description and a large number of states will lead to a very detailed, fine-grained description. Here, we chose a small number of states, K=6, to ensure that the group-level HMM states are general enough to be found in all subjects, since a larger number of states increases the chances of certain states being present only in a subset of subjects. The exact number of states is less relevant in this context, since the same HMM estimation is used for all kernels.”
  
  - (minor) Abstract: "structural aspects" - do you mean structural connectivity?
  
  With “structural aspects”, we refer to the various measures of brain structure that are used in predictive modelling. We have now specified: Ll. 14-15: “structural aspects, such as structural connectivity or cortical thickness”.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.02.530638v3
www.biorxiv.org www.biorxiv.org

Oxidized low-density lipoprotein potentiates angiotensin II-induced Gq activation through the AT1-LOX1 receptor complex: Implications for renal dysfunction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This study demonstrates a key role of oxLDL in enhancing Ang II-induced Gq signaling by promoting the AT1/LOX1 receptor complex formation. Importantly, Gq-mediated calcium influx was only observed in LOX1 and AT1 both expressing cells, and AT1-LOX1 interaction aggravated renal damage and dysfunction under the condition of a high-fat diet with Ang II infusion, so this study indicated a new therapeutic potential of AT1-LOX1 receptor complex in CKD patients with dyslipidemia and hypertension.
  
  Strengths:
  
  This study is very exciting and the work is also very detailed, especially regarding the mechanism of LOX1-AT1 receptor interaction and its impact on oxidative stress, fibrosis, and inflammation.
  
  Weaknesses:
  
  The direct evidence for the interaction between AT1 and LOX1 receptors in cell membrane localization is relatively weak. Here I raise some questions that may further improve the study.
  
  Major points:
  
  (1) The authors hypothesized that in the interaction of AT1/LOX1 receptor complex in response to ox-LDL and AngII, there should be strong evidence of fluorescence detection of colocalization for these two membrane receptors, both in vivo and in vitro. Although the video evidence for AT1 internalization upon complex activation is shown in Figure S1, the more important evidence should be membrane interaction and enhanced signal of intracellular calcium influx.
  
  Thank you for your valuable feedback. We agree that demonstrating the colocalization and interaction of AT1 and LOX-1 receptors at the membrane is critical to supporting our hypothesis.
  
  In response, we have previously provided visual evidence of membrane co-localization of the AT1/LOX-1 receptor complex using an in situ PLA assay with anti-FLAG and antiV5 antibodies in CHO cells expressing FLAG-tagged AT1 and V5-tagged LOX-1 (Yamamoto et al., FASEB J 2015). This was further supported by immunoprecipitation of membrane proteins in CHO cells co-expressing LOX-1 and AT1, which confirmed the presence of the receptor complex. In the current study, we offer additional evidence of enhanced intracellular calcium influx following simultaneous stimulation with oxLDL and Ang II, confirming the functional activation of the AT1/LOX-1 receptor complex (Fig. 1g-j and Fig. 3e-h). Together, these findings provide substantial support for the colocalization of AT1 and LOX-1 and their influence on downstream signaling in our in vitro experiments.
  
  However, we acknowledge the limitation of direct evidence for membrane co-localization of LOX-1 and AT1 in vivo. This constraint is attributed to the fact that both available anti-AT1 and anti-LOX-1 antibodies are derived from rabbits, making coimmunofluorescence or PLA challenging in our study. To address this, we employed coimmunofluorescent staining with megalin, a well-established marker for proximal renal tubules, as shown in Fig. S10. We found that both AT1 and LOX-1 co-localized with megalin, particularly at the brush borders, indicating their presence in the same renal compartments relevant to AT1/LOX-1 signaling.
  
  We have revised the manuscript to highlight the functional evidence from calcium influx assays, supported by prior PLA results, demonstrating the interaction between LOX-1 and AT1. Additionally, we included a figure showing the co-localization of AT1 and LOX-1 with megalin in proximal renal tubules to reinforce these findings. Lastly, we have emphasized in the discussion the limitation regarding the lack of direct in vivo evidence for membrane co-localization of LOX-1 and AT1.
  
  (2) Co-IP experiment should be provided to prove the AT1/LOX1 receptor interaction in response to ox-LDL and AngII in AT1 and LOX1 both expressing cells but not in AT1 only expressing cells.
  
  We thank the reviewer for the insightful suggestion to validate the AT1/LOX1 receptor interaction under various stimulation conditions. In our previous study (Yamamoto et al., FASEB J 2015), we demonstrated the interaction between AT1 and LOX1 receptors through Co-IP and in situ PLA assays in cells overexpressing both receptors, without stimulation. These experiments provided solid evidence of the receptor interaction under static conditions at the cell membrane.
  
  However, as noted in the previous work, we did not perform Co-IP experiments under AngII or oxLDL stimulation. The primary reason for this is that both AngII and oxLDL trigger internalization of the AT1 and/or LOX1 receptors, which may complicate the detection of receptor interaction at the membrane via Co-IP. This is supported by our realtime imaging, which showed a reduction in AT1 and/or LOX1 puncta following stimulation, indicating internalization of the receptors (Fig. 2a).
  
  While we acknowledge the reviewer’s interest in investigating the interaction under AngII stimulation, we believe that the current data—especially from the PLA and Co-IP assays under static conditions—strongly support the interaction of AT1 and LOX1 receptors at the membrane.
  
  (3) The authors mentioned that the Gq signaling-mediated calcium influx may change gene expression and cellular characteristics, including EMT and cell proliferation. They also provided evidence that oxidative stress, fibrosis, and inflammation were all enhanced after activating both receptors and inhibiting Gq was effective in reversing these changes. However, single stimulation with ox-LDL or AngII also has strong effects on ROS production, inflammation, and cell EMT, which has been extensively proved by previous studies. So, how to distinguish the biased effect of LOX1 or AT1r alone or the enhanced effect of receptor conformational changes mediated by their receptor interaction? Is there any better evidence to elucidate this point?
  
  Thank you for raising this important point regarding the distinction between the individual effects of LOX-1 or AT1R activation and the enhanced effects mediated by their interaction. In our study, the concentration of oxLDL used (2–10 μg/ml) was significantly lower than concentrations typically employed in other studies (which often exceed 20 μg/ml). As a result, oxLDL alone produced minimal effects, aside from a reduction in cell proliferation observed in the BrdU assay. This suggests that oxLDL, at the concentrations used in our experiments, does not elicit a strong cellular response on its own.
  
  The key to distinguishing the effect of the LOX-1/AT1 interaction lies in the amplification of Gq signaling, a pathway specifically activated by AngII. The distinction between the individual effects of LOX-1 or AT1R and the enhanced effects due to their interaction is centered on the increased activation of Gq signaling. In our experiments, co-treatment with oxLDL and AngII led to a significant increase in IP1 levels and calcium influx— both critical indicators of Gq signaling activation. While AngII alone also raised IP1 levels, the combined treatment with oxLDL further amplified the Gq signaling response, as reflected in the enhanced calcium influx. Importantly, oxLDL alone did not alter IP1 levels, even at high concentrations (100 μg/ml) (Takahashi et al., iScience 2021).
  
  This enhancement of Gq signaling provides strong evidence of the synergistic interaction between LOX-1 and AT1, which surpasses the individual effects of either receptor alone. The LOX-1/AT1 interaction is thus crucial for the observed amplification of AngIIspecific signaling pathways. The combination of increased IP1 levels and calcium influx serves as compelling evidence of this interaction, clearly differentiating the effects of individual receptor activation from the enhanced response driven by receptor conformational changes and interaction.
  
  Thank you again for your insightful comment, which has helped us to better articulate the significance of receptor interaction in this study.
  
  (4) How does the interaction between AT1 and LOX1 affect the RAS system and blood pressure? What about the serum levels of rennin, angiotensin, and aldosterone in ND-fed or HFD-fed mice?
  
  Thank you for your insightful question regarding the effects of AT1 and LOX-1 interaction on the renin-angiotensin system (RAS) and blood pressure, as well as the plasma levels of renin, angiotensin, and aldosterone in normal diet (ND)-fed and high-fat diet (HFD)-fed mice.
  
  OxLDL binds to LOX-1, amplifying AT1 receptor activation and Gq signaling, which enhances the effects of Ang II. This interaction between AT1 and LOX-1 can lead to increased vasoconstriction, oxidative stress, and inflammation, which contribute to elevated blood pressure. This pathway may play a crucial role in modulating the RAS, particularly under conditions of elevated oxLDL, such as those induced by a HFD. Regarding the components of the RAS, we focused on plasma aldosterone levels, as this is a direct consequence of Ang II signaling. As shown in Fig. S7, when mice were treated with a pressor dose of Ang II infusion and subjected to a HFD to elevate oxLDL levels, we did not observe a significant increase in plasma aldosterone levels (102.8 ± 11.6pg/mL vs. 141.8 ± 15.0 pg/mL, P = 0.081).
  
  In terms of blood pressure, Fig. 7b shows that no significant changes were observed under these treatment conditions, despite the AT1/LOX-1 interaction. These findings suggest that while oxLDL, via the AT1/LOX-1 interaction, can enhance Ang II signaling, its effect on blood pressure was not apparent in our study. This may be due to several factors, including heterogeneous cellular responses to the combined treatment across different cell types, as shown by the lack of reaction in vascular endothelial cells, vascular smooth muscle cells, and macrophages (Fig. S2). This may also be attributed to the high concentration of angiotensin II used in this study, which could have saturated aldosterone production under our experimental conditions. We have revised the manuscript to reflect these points.
  
  Thank you again for your thoughtful comment, which has allowed us to expand and refine the discussion on this important aspect of our study.
  
  Reviewer #2 (Public Review):
  
  (1) Individuals with chronic kidney disease often have dyslipidemia, with the latter both a risk factor for atherosclerotic heart disease and a contributor to progressive kidney disease. Prior studies suggest that oxidized LDL (oxLDL) may cause renal injury through the activation of the LOX1 receptor. The authors had previously reported that LOX1 and AT1 interact to form a complex at the cell surface. In this study, the authors hypothesize that oxLDL, in the setting of angiotensin II, is responsible for driving renal injury by inducing a more pronounced conformational change of the AT1 receptor which results in enhanced Gq signaling.
  
  They go about testing the hypothesis in a set of three studies. In the first set, they engineered CHO cell lines to express AT1R alone, LOX1 in combination with AT1R, or LOX1 with an inactive form of AT1R and indirectly evaluated Gq activity using IP1 and calcium activity as read-outs. They assessed activity after treatment with AngII, oxLDL, or both in combination and found that treatment with both agents resulted in the greatest level of activity, which could be effectively blocked by a Gq inhibitor but not a Gi inhibitor nor a downstream Rho kinase inhibitor targeting G12/13 signaling. These results support their hypothesis, though variability in the level of activation was dramatically inconsistent from experiment to experiment, differing by as much as 20-fold. In contrast, within the experiment, differences between the AngII and AngII/oxLDL treatments, while nominally significant and consistent with their hypothesis, generally were only 10-20%. Another example of unexplained variability can be found in Figures 1g-1j. AngII, at a concentration of 10-12, has no effect on calcium flux in one set of studies (Figure 1g, h) yet has induced calcium activity to a level as great as AngII + oxLDL in another (Figure 1i). The inconsistency of results lessens confidence in the significance of these findings. In other studies with the LOX1-CHO line, they tested for conformational change by transducing AT1 biosensors previously shown to respond to AngII and found that one of them in fact showed enhanced BRET in the setting of oxLDL and AngII compared to AngII alone, which was blocked by an antibody to AT1R. The result is supportive of their conclusions. Limiting enthusiasm for these results is the fact that there isn't a good explanation as to why only 1 sensor showed a difference, and the study should have included a non-specific antibody to control for non-specific effects.
  
  We sincerely appreciate the reviewer’s thorough and insightful feedback, especially regarding the variability observed in our experimental results. As the reviewer pointed out, the differences in activation levels between the calcium influx assay and the IP1 assay, particularly between AngII and AngII/oxLDL co-treatment, were indeed significant. These differences can be attributed to the inherent sensitivity of these assays, which are used to indirectly evaluate Gq activity. Despite the variability, we believe that the reliability of our results is supported by the consistent directional trends across both assays, which align with our hypothesis.
  
  Regarding the inconsistencies in intracellular calcium dynamics observed in Fig. 1i, we have performed additional analysis of calcium kinetics during ligand stimulation, similar to the analysis in Fig. 1g. As shown in Author response image 1, the background signal in the experiment related to Fig. 1i was relatively higher than in Fig. 1g and 1h. This elevated background, which may have been influenced by variations between cells and experimental days, resulted in a higher percent change from baseline in samples treated with AngII alone. However, the combined effect of AngII with oxLDL was still apparent. This clarification further supports the consistency of our findings.
  
  Author response image 1.
  
  In reference to the BRET sensor experiments, we acknowledge the reviewer’s concern regarding the variability in sensor responses. As outlined in Devost et al. (J Biol Chem. 2017), the sensitivity of AT1 intramolecular FlAsH-BRET biosensors in detecting conformational changes induced by AngII is highly dependent on the insertion site of the FlAsH sequence. In our experiments, co-treatment with oxLDL and AngII enhanced AT1 conformational changes, but this effect was only detectable with the CHO-LOX-1-AT1-3p3 sensor (with FlAsH inserted in the third intracellular loop), and not with the CHO-LOX-1-AT1-C-tail P1 sensor (with FlAsH inserted at the C-terminal tail). This differential sensitivity likely explains why only one sensor showed a significant response, highlighting the critical role of FlAsH insertion site selection in these assays. We hope these clarifications address the reviewer’s concerns and improve confidence in the significance of our findings.
  
  (2) The authors then repeated similar studies using publicly available rat kidney epithelial and fibroblast cell lines that have an endogenous expression of AT1R and LOX1. In these studies, oxLDL in combination with AngiI also enhanced Gq signaling, while knocking down either AT1R or LOX1, and treatment with inhibitors of Gq and AT1R blocked the effects. Like the prior set of studies, however, the effects are very modest and there was significant inter-experimental variability, reducing confidence in the significance of the findings. The authors then tested for evidence that the enhanced Gq signaling could result in renal injury by comparing qPCR results for target genes. While the results show some changes, their significance is difficult to assess. A more global assessment of gene expression patterns would have been more appropriate. In parallel with the transcriptional studies, they tested for evidence of epithelial-mesenchymal transition (EMT) using a single protein marker (alpha-smooth muscle actin) and found that its expression increased significantly in cells treated with oxLDL and AngII, which was blocked by inhibition of Gq inhibition and AT1R. While the data are sound, their significance is also unclear since EMT is a highly controversial cell culture phenomenon. Compelling in vivo studies have shown that most if not all fibroblasts in the kidney are derived from interstitial cells and not a product of EMT. In the last set of studies using these cell lines, the authors examined the effects of AngII and oxLDL on cell proliferation as assayed using BrdU. These results are puzzling---while the two agents together enhanced proliferation which was effectively blocked by an inhibitor to either AT1R or Gq, silencing of LOX1 had no effect.
  
  Thank you for your thorough review and comments. We acknowledge your concerns regarding the modest effects observed and the variability in experimental outcomes. We would like to address your points systematically.
  
  (1) Gq signaling and experimental variability:
  
  Regarding the question of Gq signaling in Fig. 3, as previously mentioned, the observed differences in the IP1 assay are likely due to the sensitivity of the assay and the technical issues associated with detecting calcium influx and IP1 levels. While the overall differences between treatments may appear modest, the most critical comparison— between AngII alone and AngII combined with oxLDL—consistently showed significant differences, which aligns with the calcium influx results shown in Fig. 1. Notably, we found that the EC50 for IP1 production decreased by 80% in response to co-treatment with oxLDL and AngII, compared to AngII treatment alone. These findings demonstrate the robustness of Gq signaling enhancement with co-treatment, even if the absolute differences in the IP1 assay appear small.
  
  (2) Gene expression in Fig. 4:
  
  Regarding the gene expression analysis in Fig. 4, we used relatively low concentrations of oxLDL (5 μg/ml) compared to the higher concentrations typically employed in other studies (mostly exceeding 20 μg/ml). This may explain the lack of robust responses in some conditions. However, in combination with AngII, the co-treatment significantly upregulated several genes, particularly pro-inflammatory markers such as IL-6, TNFα, IL1β, and MCP-1 in NRK49F cells. These results suggest that the co-treatment induces a complex response, potentially activating multiple downstream signaling pathways beyond just Gq signaling, which may obscure more straightforward effects.
  
  While we agree that a more global assessment of gene expression would provide further insights, due to cost constraints, we focused on key representative genes that are highly relevant to inflammation and fibrosis in this study.
  
  (3) EMT in renal fibrosis:
  
  We appreciate the reviewer’s insightful comments regarding the role of EMT in renal fibrosis. Regarding full EMT, in which epithelial cells completely transition into mesenchymal cells, previous studies using the unilateral ureteral obstruction (UUO) model suggest that full EMT may not play a significant role (J Clin Invest. 2011 Feb;121(2):468-74). The role of full EMT remains controversial in the context of renal fibrosis, with most kidney fibroblasts thought to originate from interstitial cells rather than through full EMT.
  
  Recent studies, however, suggest that partial epithelial-mesenchymal transition (pEMT) could be involved in CKD, especially in association with inflammation, oxidative stress, and elevated TGF-β levels—conditions also present in our model involving Ang II infusion combined with an HFD. pEMT refers to a state in which epithelial cells acquire mesenchymal traits, such as increased α-SMA expression and secretion of pro-fibrotic cytokines, while remaining attached to the basement membrane without fully transitioning into fibroblasts (Front Physiol. 2020 Sep 15;11:569322). This phenomenon has been observed in kidney fibrosis models, including UUO, which shares inflammatory and oxidative stress conditions with our Ang II and HFD treatment model. The observed increase in α-SMA in our model may thus indicate a pEMT-like state, indirectly contributing to fibrosis through the secretion of growth factors and cytokines.
  
  We are mindful of the importance of not overstating EMT's role. Accordingly, we interpret increased α-SMA expression as a potential marker of the pEMT process rather than definitive evidence of its presence or direct role in fibroblast formation. Furthermore, we acknowledge limitations in providing direct in vivo evidence for pEMT and recognize that further mechanistic studies are needed to elucidate its specific role in renal fibrosis, despite inherent challenges.
  
  In response to the reviewer’s concern, we have revised the manuscript to clarify that our data support the possibility of pEMT contributing to fibrosis in this model, without overstating its impact. We also acknowledge the challenges in translating in vitro pEMT findings to in vivo models, where detecting the subtle effects of pEMT is inherently challenging.
  
  (4) BrdU assay and fibroblast proliferation (Fig. 6b):
  
  In Fig. 6b, the BrdU assay shows that fibroblast proliferation was significantly enhanced by the co-treatment with AngII and oxLDL, and this effect was abolished by LOX-1 knockdown, similar to the results observed with AT1 knockdown. These findings strongly suggest a combinatorial effect of AT1/LOX-1 interaction in promoting fibroblast proliferation, supporting the idea that the co-treatment operates through a coordinated mechanism involving both receptors. Notably, LOX-1 silencing did not affect the proliferation induced by AngII alone, as this response is independent of LOX-1.
  
  We will incorporate these points into the Discussion section of the manuscript, specifically regarding the differences in sensitivity between the Ca influx and IP1 assays, as well as the emerging role of partial EMT in renal fibrosis. This will provide a clearer context for the interpretation of our findings and further strengthen the discussion on the significance of these phenomena.
  
  Thank you again for your valuable feedback, which has helped us improve the clarity and depth of our manuscript.
  
  (3) The final set of studies looked to test the hypothesis in mice by treating WT and Lox1KO mice with different doses of AngII and either a normal or high-fat diet (to induce oxLDL formation). The authors found that the combination of high dose AngII and a highfat diet (HFD) increased markers of renal injury (urinary 8-ohdg and urine albumin) in normal mice compared to mice treated with just AngII or HFD alone, which was blunted in Lox1-KO mice). These results are consistent with their hypothesis. However, there are other aspects of these studies that are either inconsistent or complicating factors that limit the strength of the conclusions. For example, Lox1- KO had no effect on renal injury marker expression in mice treated with low-dose AngII and HFD. It also should be noted that Lox1-KO mice had a lower BP response to AngII, which could have reduced renal injury independent of any effects mediated by the AT1R/LOX1 interaction. Another confounding factor was the significant effect the HFD diet had on body weight. While the groups did not differ based on AngII treatment status, the HFD consistently was associated with lower total body weight, which is unexplained. Next, the authors sought to find more direct evidence of renal injury using qPCR of candidate genes and renal histology. The transcriptional results are difficult to interpret; moreover, there were no significant histologic differences between groups. They conclude the study by showing the pattern of expression of LOX1 and AT1R in the kidney by immunofluorescence and conclude that the proteins overlap in renal tubules and are absent from the glomerulus. Unfortunately, they did not co-stain with any other markers to identify the specific cell types. However, these results are inconsistent with other studies that show AT1R is highly expressed in mesangial cells, renal interstitial cells, near the vascular pole, JG cells, and proximal tubules but generally absent from most other renal tubule segments.
  
  Thank you for your valuable comments and for raising these important points. We appreciate the opportunity to clarify several aspects of our study and address the limitations and inconsistencies you have pointed out.
  
  (1) Renal injury markers (urinary albumin and 8-OHdG) and the effect of LOX-1 loss of- function:
  
  Our results showed that the combination of high-dose AngII and HFD led to a significant increase in renal injury markers, such as urinary albumin and 8-OHdG, in WT mice. In LOX-1 KO mice, this increase was significantly blunted, supporting a protective role of LOX-1 loss-of-function. However, as you noted, at low-dose AngII, there was no significant difference in urinary 8-OHdG between ND-fed and HFD-fed mice. Despite this, we observed a significant increase in urinary albumin in HFD-fed WT mice compared to ND-fed mice under low-dose AngII, and this difference was abolished in LOX-1 KO mice. Moreover, gene expression analysis showed that oxidative stress markers such as p67phox and p91phox (Fig. 8b), as well as p40phox, p47phox (Fig. S8), and inflammatory markers like IL1β (Fig. 8b), were significantly elevated in HFD-fed WT mice even with low-dose AngII, while these increases were absent in LOX-1 KO mice. These results suggest that the LOX-1/AT1 interaction contributes to renal injury under both low- and high-dose AngII conditions.
  
  We acknowledge that the treatment duration may have influenced our results, as urine and renal tissue samples were only examined at a single time point (1.5 months after treatment initiation). The impact of AT1/LOX-1 interaction may evolve over time, and different treatment durations might yield varying outcomes. This is a limitation of our study, which we have addressed in the revised manuscript.
  
  (2) Blood pressure and its effect on renal injury:
  
  As shown in Fig. 7b and Fig S6f, LOX-1 KO mice exhibited a lower blood pressure response to high-dose AngII compared to WT mice, which could indeed have contributed to the reduced renal injury in the LOX-1 KO group, independent of the AT1/LOX-1 interaction. However, it is important to note that the differences in renal injury markers between AngII alone and AngII + HFD were largely abolished in LOX-1 KO mice, suggesting the in vivo relevance of the LOX-1/AT1 interaction observed in vitro. Additionally, as shown in Fig. 7d (urinary albumin), Fig. 8b (p67phox, p91phox), and Fig. S8b (p40phox, p47phox), even under subpressor doses of AngII, where no significant blood pressure differences were observed, HFD-fed WT mice exhibited exacerbated renal injury compared to ND-fed mice. These effects were ameliorated in LOX-1 KO mice, indicating that the protective effects in LOX-1 KO mice are at least partly independent of blood pressure changes and that the AT1/LOX-1 interaction plays a significant role in modulating renal injury under co-treatment with AngII and HFD.
  
  (3) HFD and body weight changes:
  
  We agree with your observation regarding the effect of HFD on body weight, which was consistently lower in HFD-fed groups, despite no differences in AngII treatment status. This is an atypical presentation compared to previous studies mostly showing increased body weight by feeding of HFD. The HFD used in this study was intended to elevate oxLDL levels, as previously reported (Atherosclerosis 200:303–309 (2008)). As shown in Fig. S6d and S6e, this can be attributed to reduced food intake in HFD-fed mice. Although modest, this weight reduction may influence renal function. This point is added in the limitation.
  
  (4) Histological findings and qPCR results:
  
  As discussed in the manuscript, despite significant changes in urinary markers and gene expression, we did not observe histological evidence of fibrosis or mesangial expansion, even under co-treatment with AngII and HFD. This may be due to the relatively short treatment period of 4 weeks, and a longer duration might be necessary to detect such changes. Additionally, we acknowledge that we did not detect increased Gq signaling in kidney tissue, which is another limitation of the study. Nevertheless, the gene expression data on oxidative stress, fibrosis, inflammation, and renal injury markers (e.g., p67phox, IL1β) are consistent with our hypothesis that the AT1/LOX-1 interaction exacerbates renal injury under AngII and HFD conditions.
  
  (5) Immunostaining for AT1 and LOX-1:
  
  Due to the use of rabbit-derived antibodies for both AT1 and LOX-1, it was technically not feasible to perform co-immunostaining for both receptors simultaneously. Instead, we performed co-immunofluorescent staining using megalin, a well-established marker of proximal renal tubules, to help localize these receptors. As shown in Fig. S10, both AT1 and LOX-1 were co-localized with megalin, particularly at the brush borders of proximal tubules. This pattern suggests the presence of these receptors in renal compartments relevant to AT1/LOX-1 signaling. While we did not perform additional co-staining with other markers to identify specific cell types, the strong localization with megalin provides robust evidence of their expression in proximal renal tubules, which is consistent with the literature on AT1R in this nephron segment. We acknowledge that previous studies have identified AT1R expression in mesangial cells, renal interstitial cells, the vascular pole, juxtaglomerular (JG) cells, and proximal tubules. In our immunofluorescence experiments, we did not detect significant AT1R expression in the glomerulus or mesangium. This finding aligns with other reports showing strong expression of AT1R in proximal tubules (Am J Physiol Renal Physiol. 2021 Apr 1;320(4)), although it does not exclude the possibility of AT1 expression in other compartments, given the sensitivity limitations of the immunofluorescence. Our focus on proximal tubules allowed us to observe clear AT1/LOX-1 co-localization in this region, particularly in the context of oxLDL and AngII signaling. Given that the AT1/LOX-1 interaction is crucial in kidney disease pathogenesis, this co-localization in proximal tubules highlights a key site of action for these receptors in the renal system.
  
  In summary, while our study focused on the co-localization of AT1 and LOX-1 in proximal tubules, we agree that further exploration of AT1R expression in other renal cell types would provide a more comprehensive understanding of its role across different kidney compartments. We have addressed this in the revised discussion.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor points:
  
  (1) In this study, AT1/LOX1 receptor complex was mainly observed in some renal cells, how about other types of cells that also highly express LOX1 and AT1r? Such as cardiomyocytes? Vascular endothelial cells?
  
  Thank you for your insightful comment. In our study, we demonstrated that enhanced Gq signaling through co-treatment with AngII and oxLDL was not observed in other cell types, including vascular endothelial cells, smooth muscle cells, and macrophages, as indicated by the lack of an IP1 increase in response to the co-treatment (Fig. S2). The factors contributing to this heterogeneous response remain unclear, and further investigation is needed to explore this observation more thoroughly.
  
  (2) Has the author detected such an effect on the AT2 receptor?
  
  We greatly appreciate the reviewer’s insightful inquiry regarding the potential interaction between the AT2 receptor and LOX-1. In our previous work (Yamamoto et al., FASEB J 2015), we conducted an immunoprecipitation (IP) assay to investigate the interaction between LOX-1 and AT2 on cell membranes. The results of this assay demonstrated that, unlike AT1, LOX-1 exhibits minimal binding to the AT2 receptor under the experimental conditions tested. Specifically, our IP studies showed that while LOX-1 readily coimmunoprecipitated with AT1, indicating a strong interaction, this was not the case with AT2, where the binding was negligible. These findings suggest that the interaction between LOX-1 and AT1 is receptor-specific and that LOX-1 does not significantly associate with AT2 to influence signaling pathways.
  
  (3) Which kind of ARBs are more effective for the inhibition of this AT1/LOX1 receptor conformational change?
  
  Thank you for your insightful question regarding the effectiveness of ARBs in inhibiting the AT1/LOX-1 receptor conformational change. Based on our current understanding, any ARB should similarly block the downstream signaling resulting from the interaction between AT1 and LOX-1. This is because all ARBs function by inhibiting the binding of Ang II to AT1, thereby preventing receptor activation and the conformational changes that facilitate its interaction with LOX-1. Additionally, our previous study (FASEB J. 2015) demonstrated that even in the absence of Ang II, the activation of AT1 via the binding of oxLDL to LOX-1 was similarly blocked by ARBs, including olmesartan, telmisartan, valsartan, and losartan.
  
  When oxLDL and Ang II are co-treated, the Gq signaling pathway is significantly amplified due to the interaction between LOX-1 and AT1. In this setting, all ARBs act by competitively inhibiting Ang II binding to AT1, effectively reducing Gq signaling.
  
  However, a subtle but important difference arises when considering the inverse agonist activity of certain ARBs. Olmesartan, telmisartan, and valsartan are thought to act not only as competitive inhibitors of Ang II but also as inverse agonists, meaning they reduce the baseline activity of the AT1 receptor by preventing the conformational changes in the absence of Ang II. This inverse agonist property is particularly relevant in pathological conditions where AT1 receptor activation can occur independently of Ang II binding, such as in the presence of oxLDL. In these cases, ARBs with inverse agonist activity may offer an additional therapeutic advantage by reducing receptor activation beyond what is achieved by simple antagonism.
  
  Thus, while the general efficacy of ARBs in blocking the AT1/LOX-1 interaction could be under similar conditions of oxLDL and Ang II co-treatment, ARBs with inverse agonist properties may provide additional benefit by further reducing AT1 activity.
  
  We have revised the manuscript to clarify these points and to highlight the role of inverse agonist activity in ARB efficacy under these conditions.
  
  Thank you again for your valuable comment, which has allowed us to refine our discussion on the relative efficacy of ARBs in inhibiting AT1/LOX-1 receptor interaction.
  
  Reviewer #2 (Recommendations For The Authors):
  
  My comments were pretty thorough in the public review. The only other comments I would add are the following:
  
  (1) Why are there so few overlapping LOX1 and ATR puncta in Supplementary Figure 1 if the receptors co-localize? The figure would suggest a very small proportion of the receptors actually are co-localized.
  
  Thank you for your insightful comment regarding the apparent scarcity of overlapping LOX-1 and AT1R puncta in Fig. S1. We agree that at first glance, the low number of colocalized puncta may raise questions about the extent of interaction between these receptors. However, based on our previous findings reported in FASEB J 2015, we believe this phenomenon can be explained by the dynamic nature of the LOX-1 and AT1 interaction.
  
  As we reported in FASEB J 2015, the interaction between LOX-1 and AT1 is sensitive to buffer conditions. Specifically, in non-reducing conditions, LOX-1 and AT1 form complexes, whereas in reducing buffer, this interaction is not observed. This suggests that the interaction between these receptors is not stabilized by strong covalent (disulfide) bonds but is instead transient, likely involving non-covalent interactions. Thus, LOX-1 and AT1 may form and dissociate repeatedly, contributing to a dynamic receptor complex rather than a permanent colocalization. This transient interaction could explain the relatively low number of overlapping puncta observed at a given time point in the liveimaging analysis.
  
  Moreover, as you pointed out, it is likely that only a small fraction of LOX-1 and AT1 are physically co-localized at any one moment. However, when these receptors do interact, co-treatment with oxLDL and Ang II has been shown to significantly enhance Gq signaling. This suggests that the functional consequence of the LOX-1/AT1 interaction, particularly in response to stimuli such as oxLDL and Ang II, is more critical than the frequency of receptor colocalization at any one time.
  
  We have revised the manuscript to include this explanation and to clarify the dynamic nature of the LOX-1/AT1 interaction. This revision also highlights the importance of considering not just the number of colocalized receptors but also the functional outcomes of their interaction, such as enhanced Gq signaling in response to co-treatment.
  
  Thank you again for your careful observation, which has allowed us to better communicate the complexity of the receptor dynamics in our study.
  
  (2) Tubulin is misspelled in Figure 5 ("tublin").
  
  Thank you for pointing out the typographical error in Fig. 5. We have corrected the spelling of "tubulin" in the revised figure. We appreciate your attention to detail, and we apologize for the oversight.
  
  (3) Why does the number of replicates differ for some experimental sets (i.e. Figure 1h vs other panels in Figure 1, Figure 2d vs other panels in Figure 2, Figure 7: Lox-1KO treated with High dose AngII and HFD? There aren't obvious reasons why the number of replicates should differ so much within a set of studies.
  
  We are grateful to the reviewer for highlighting the discrepancies in the number of replicates across different figures in our manuscript. We would like to provide detailed explanations for each case.
  
  (1) Fig. 1h vs Other Panels in Fig. 1:
  
  The calcium influx assay (Fig. 1h) required a higher number of replicates due to the inherent biological variability associated with calcium signaling. To achieve statistical significance and account for variability in these measurements, we conducted additional replicates. Other panels, such as those measuring IP1 accumulation (Fig. 1a–f), displayed more consistent and reproducible results, allowing us to use fewer replicates while still maintaining statistical power.
  
  (2) Fig. 2d vs Fig. 2b and 2c:
  
  The difference in the number of replicates between Fig. 2d (N=8) and Fig. 2b and 2c (N=4) is due to the distinct nature of the measurements and the variability expected in each assay. In Fig. 2d, which measures the effects of a LOX-1 neutralizing antibody on BRET, additional replicates were needed to ensure the robustness of the statistical analysis due to the greater complexity and sensitivity of the assay. The inclusion of an antibody treatment introduces more variability, necessitating a higher number of replicates (N=8) to confidently assess the effects of the neutralizing antibody. In contrast, Fig. 2b and 2c involved BRET measurements of AT1 conformational changes without antibody intervention. These assays are more reproducible and have less experimental variability, allowing for a smaller sample size (N=4) while still achieving reliable and statistically significant results. The differences in sample size across these panels were carefully considered to ensure appropriate statistical power for each specific experimental condition.
  
  (3) Fig. 7: LOX-1 KO Mice Treated with High-dose AngII vs Saline:
  
  We acknowledge the reviewer’s concern regarding the higher number of LOX-1 KO mice treated with high-dose Ang II compared to the saline group. The number of saline-treated mice was indeed sufficient for reliable statistical analysis. However, the decision to increase the number of mice in the high-dose Ang II group was driven by the anticipated higher variability in the physiological responses under these conditions, such as blood pressure and renal injury. To ensure that we captured the full spectrum of responses and to maintain robust statistical power in the high-dose group, we opted to include more mice in this cohort.
  
  We hope this response provides clarity on the rationale behind the varying number of replicates across different experiments. We have rigorously applied appropriate statistical methods to account for these differences, ensuring that the conclusions drawn are robust and scientifically sound. We appreciate the reviewer’s understanding of the experimental constraints and variations that can arise in complex studies such as these.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.13.594020v2
www.biorxiv.org www.biorxiv.org

Rabphilin-3A negatively regulates neuropeptide release, through its SNAP25 interaction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Joint Public Review:
  
  The molecular mechanisms that mediate the regulated exocytosis of neuropeptides and neurotrophins from neurons via large dense-core vesicles (LDCVs) are still incompletely understood. Motivated by their earlier discovery that the Rab3-RIM1 pathway is essential for neuronal LDCV exocytosis, the authors now examined the role of the Rab3 effector Rabphilin-3A in neuronal LDCV secretion. Based on multiple live and confocal imaging approaches, the authors provide evidence for a synaptic enrichment of Rabphilin-3A and for independent trafficking of Rabphilin-3A and LDCVs. Using an elegant NPY-pHluorin imaging approach, they show that genetic deletion of Rabphilin-3A causes an increase in electrically triggered LDCV fusion events and increased neurite length. Finally, knock-out-replacement studies, involving Rabphilin-3A mutants deficient in either Rab3- or SNAP25-binding, indicate that the synaptic enrichment of Rabphilin-3A depends on its Rab3 binding ability, while its ability to bind to SNAP25 is required for its effects on LDCV secretion and neurite development. The authors conclude that Rabphilin-3A negatively regulates LDCV exocytosis and propose that this mechanism also affects neurite growth, e.g. by limiting neurotrophin secretion. These are important findings that advance our mechanistic understanding of neuronal large dense-core vesicle (LDCV) secretion.
  
  The major strengths of the present paper are:
  
  (i) The use of a powerful Rabphilin-3A KO mouse model.
  
  (ii) Stringent lentiviral expression and rescue approaches as a strong genetic foundation of the study.
  
  (iii) An elegant FRAP imaging approach.
  
  (iv) A cutting-edge NPY-pHluorin-based imaging approach to detect LDCV fusion events.
  
  We thank the reviewers for their positive evaluation of our manuscript.
  
  Weaknesses that somewhat limit the convincingness of the evidence provided and the corresponding conclusions include the following:
  
  (i) The limited resolution of the various imaging approaches introduces ambiguity to several parameters (e.g. LDCV counts, definition of synaptic localization, Rabphilin-3A-LDCV colocalization, subcellular and subsynaptic localization of expressed proteins, AZ proximity of Rabphilin-3A and LDCVs) and thereby limits the reliability of corresponding conclusions. Super-resolution approaches may be required here.
  
  We thank the reviewer for their constructive suggestion. We fully agree that super-resolution imaging would produce a more precise localization of RPH3A and co-localization with DCVs. We have now repeated our (co)-localization experiments with STED microscopy. We find that RPH3A colocalized with the pre-synaptic marker Synapsin1 and, to a lesser extent, with the post synaptic marker Homer and DCV marker chromogranin B (new Figure 1). This indicates that RPH3A is highly enriched in synapses, mostly the pre-synapse, and that RPH3A partly co-localizes with DCVs.
  
  (ii) The description of the experimental approaches lacks detail in several places, thus complicating a stringent assessment.
  
  We apologize for the lack of detail in explaining the experimental approaches. We have included a more detailed description in the revised manuscript.
  
  (iii) Further analyses of the LDCV secretion data (e.g. latency, release time course) would be important in order to help pinpoint the secretory step affected by Rabphilin-3A.
  
  We agree. To address this comment, we have now included the duration of the fusion events (new Figure S2D-F). The start time of the fusion events are shown in the cumulative plots in now Figure 3F and I. The kinetics are normal in the RPH3A KO neurons.
  
  (iv) It remains unclear why a process that affects a general synaptic SNARE fusion protein - SNAP25 - would specifically affect LDCV but not synaptic vesicle fusion.
  
  We agree that we have not addressed this issue systematically enough in the original manuscript. We have now added a short discussion on this topic in the Discussion of the revised manuscript (p 15, line 380-386). In brief, we do not claim full selectivity for the DCV pathway. Some effects of RPH3A deficiency on the synaptic vesicle cycle have been observed. Furthermore, because DCVs typically do not mix in the synaptic vesicle cluster and fuse outside the active zone (and outside the synapse), DCVs might be more accessible to RPH3A regulation.
  
  (v) The mechanistic links between Rabphilin-3A function, LDCV density in neurites, neurite outgrowth, and the proposed underlying mechanisms involving trophic factor release remain unclear.
  
  We agree that we have not addressed all these links systematically enough in the original manuscript, although we feel that we have at least postulated the best possible working model to link RPH3A function to DCV exocytosis/neurotrophic factor release and neurite outgrowth (p 15-16, line 396-400). Of course, a single study cannot support all these links with sufficient experimental evidence. We have now added a short text on what we can conclude exactly based on our experiments and how we see the links between RPH3A function, DCV exocytosis/neurotrophic factor release, neurite outgrowth and DCV density in neurites (p 13-14, line 317-325).
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The manuscript by Hoogstraaten et al. investigates the effect of constitutive Rabphilin 3A (RPH3A) ko on the exocytosis of dense core vesicles (DCV) in cultured mouse hippocampal neurons. Using mCherry- or pHluorin-tagged NPY expression and EGFP- or mCherry tagged RPHA3, the authors first analyse the colocalization of DCVs and RPH3A. Using FRAP, the authors next analyse the mobility of DCVs and RAB3A in neurites. The authors go on to determine the number of exocytotic events of DCVs in response to high-frequency electrical stimulation and find that RPH3A ko increases the number of exocytotic events by a factor 2-3, but not the fraction of released DCVs in a given cell (8x 50Hz stim). In contrast, the release fraction is also increased in RBP3A KOs when doubling the stimulation number (16x 50Hz). They further observe that RPH3A ko increases dendrite and axon length and the overall number of ChgrB-positive DCVs. However, the overall number of DCVs and dendritic length in ko cells directly correlate, indicating that the number of vesicles per dendritic length remains unaffected in the RPH3A KOs. Lentiviral co-expression of tetanus toxin (TeNT) showed a non-significant trend to reduce axon and dendrite length in RPH3a KOs. Finally, the authors use co-expression of RAB3A and SNAP25 constructs to show that RAB3A but not SNAP25 interaction is required to allow the exocytosis-enhancing effect in RPH3A KOs.
  
  While the authors' methodology is sound, the microscopy results are performed well and analyzed appropriately, but their results in larger parts do not sufficiently support their conclusions. Moreover, the experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims.
  
  Overall, I thus feel that the manuscript does not provide a sufficient advance in knowledge.
  
  Strengths:
  
  - The authors' methodology is sound, and the microscopy results are performed well and analyzed appropriately.
  
  - Figure 2: The exocytosis imaging is elegant and potentially very insightful. The effect in the RPH3A KOs is convincing.
  
  - Figure 4: the logic of this experiment is elegant. It shows that the increased number of DCV fusion events in RPH3A KOs is related to the interaction of RPH3A with RAB3A but not with SNAP25.
  
  We thank the reviewer for their positive evaluation of our manuscript.
  
  Weaknesses:
  
  - The results in larger parts do not sufficiently support the conclusions.
  
  - The experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims.
  
  - Not of sufficient advance in knowledge for this journal
  
  - The significance of differences in control experiments WT vs. KO) varies between experiments shown in different figures.
  
  - Axons and dendrites were not analyzed separately in Figures 1 and 2.
  
  - The colocalization study in Figure 1 would require super-resolution microscopy.
  
  To address the reviewers’ comments, we have provided a more detailed explanation of our analysis (p 19-20, line 521-542). In addition, we have repeated our colocalization experiments using STED microscopy, see Joint Public Review item (i).
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Hoogstraaten et al investigated the involvement of rabphilin-3A RPH3A in DCV fusion in neurons during calcium-triggered exocytosis at the synapse and during neurite elongation. They suggest that RPH3A acts as an inhibitory factor for LDV fusion and this is mediated partially via its interaction with SNAP25 and not Rab3A/Rab27. It is a very elegant study although several questions remain to be clarified.
  
  Strengths:
  
  The authors use state-of-the-art techniques like tracking NPY-PHluorin exocytosis and FRAP experiments to quantify these processes providing novel insight into LDCs exocytosis and the involvement of RPH3A.
  
  We thank the reviewer for their positive evaluation of our manuscript.
  
  Weaknesses:
  
  At the current state of the manuscript, further supportive experiments are necessary to fully support the authors' conclusions.
  
  We thank the reviewer for their comments and suggestions. We have performed additional experiments to support our conclusions, see Joint Public Review items (i) – (iv)
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The molecular mechanism of regulated exocytosis has been extensively studied in the context of synaptic transmission. However, in addition to neurotransmitters, neurons also secrete neuropeptides and neurotrophins, which are stored in dense core vesicles (DCVs). These factors play a crucial role in cell survival, growth, and shaping the excitability of neurons. The mechanism of release for DCVs is similar, but not identical, to that used for SV exocytosis. This results in slow kinetic and low release probabilities for DCV compared to SV exocytosis. There is a limited understanding of the molecular mechanisms that underlie these differences. By investigating the role of rabphilin-3A (RPH3A), Hoogstraaten et al. uncovered for the first time a protein that inhibits DCV exocytosis in neurons.
  
  Strengths:
  
  In the current work, Hoogstraaten et al. investigate the function of rabphilin-3A (RPH3A) in DVC exocytosis. This RAB3 effector protein has been shown to possess a Ca2+ binding site and an independent SNAP25 binding site. Using colocalization analysis of confocal imaging the authors show that in hippocampal neurons RPH3A is enriched at pre- and post-synaptic sites and associates specifically with immobile DCVs. Using site-specific RPH3A mutants they found that the synaptic location was due to its RAB3 interaction site. They further could show that RPH3A inhibits DCV exocytosis due to its interaction with SNAP25. They came to that conclusion by comparing NPY-pHluorin release in WT and RPH3A KO cells and by performing rescue experiments with RPH3A mutants. Finally, the authors showed that by inhibiting stimulated DCV release, RPH3A controlled the axon and dendrite length possibly through the reduced release of neurotrophins. Thereby, they pinpoint how the proper regulation of DCV exocytosis affects neuron physiology.
  
  We thank the reviewer for their positive evaluation of our manuscript.
  
  Weaknesses:
  
  Data context
  
  One of the findings is that RPH3A accumulates at synapses and is mainly associated with immobile DCVs.
  
  However, Farina et al. (2015) showed that 66% of all DCVs are secreted at synapses and that these DCVs are immobile prior to secretion. To provide additional context to the data, it would be valuable to determine if RPH3A KO specifically enhances secretion at synapses. Additionally, the authors propose that RPH3A decreases DCV exocytosis by sequestering SNAP25 availability. At first glance, this hypothesis appears suitable. However, due to RPH3A synaptic localization, it should also limit SV exocytosis, which it does not. In this context, the only explanation for RPH3A's specific inhibition of DCV exocytosis is that RPH3A is located at a synapse site remote from the active zone, thus protecting the pool of SNAP25 involved in SV exocytosis from binding to RPH3A. This hypothesis could be tested using super-resolution microscopy.
  
  We thank the reviewer for their suggestion. We have now performed super resolution microscopy, see Joint Public Review item (i). However, these new data do not necessarily explain the stronger effect of RP3A deficiency on DCV exocytosis, relative to SV exocytosis. We have added a short discussion on this topic to the revised manuscript, see Joint Public Review item (iv).
  
  Technical weakness
  
  One technical weakness of this work consists in the proper counting of labeled DCVs. This is significant since most findings in this manuscript rely on this analysis. Since the data was acquired with epi-fluorescence or confocal microscopy, it doesn't provide the resolution to visualize individual DCVs when they are clumped. The authors use a proxy to count the number of DCVs by measuring the total fluorescence of individual large spots and dividing it by the fluorescence intensity of discrete spots assuming that these correspond to individual DCVs. This is an appropriate method but it heavily depends on the assumption that all DCVs are loaded with the same amount of NPY-pHluorin or chromogranin B (ChgB). Due to the importance of this analysis for this manuscript, I suggest that the authors show that the number of DCVs per µm2 is indeed affected by RPH3A KO using super-resolution techniques such as dSTORM, STED, SIM, or SRRF.
  
  The reviewer is correct that this is a crucial issue, that we have not addressed optimally until now. We have previously devoted a large part of a previous manuscript to this issue, but have not referred to this previous work clearly enough. We have now clarified this (p 7, line 187-190). In brief, we have previously quantified the ratio between fluorescent intensity of ChgB and NPY-pHluorin in confocal microscopy over the number of dSTORM puncta in sparse areas of WT mouse hippocampal neurons (Persoon et al., 2018). This quantification yielded a unitary fluorescence intensity per vesicle that was very stable of different neurons. Although there might be some underestimation of the total number of DCVs when using confocal microscopy, the study of Persoon et al. (2018) has demonstrated that these parameters correlate well and that the estimations are accurate. Considering that the rF/F0 is similar in RPH3A WT and KO neurons (now Figure S2I), meaning that the intensity of NPY-pHluorin of one fusion event is comparable, we can presume that this correlation also applies for the RPH3A KO neurons.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Major points:
  
  (1) The authors perform an extensive analysis regarding the colocalization of RPH3A and DCVs (Figure 1 upper part). This analysis is hampered by the fact that the recorded data has in relation to vesicle size limited resolution (> 1 µm) to allow making strong claims here. In my view, super-resolution microscopy would be required for the co-localization studies shown in Figure 1.
  
  We fully agree and have now performed super-resolution microscopy, see Joint Public Review item (i)
  
  (2) The FRAP experiments (Figure 1 lower part) cannot be sufficiently understood from what is presented. The methods say that both laser channels were activated during bleaching but NPY-pHluorin is not bleached in Fig.1E. Explanation of the bleaching is not very circumspect. In 1D, it is rather EGFP-RPH3A that is entering the bleached area than the NPY vesicles. These experiments require a more careful explanation of methodology, observed results, and their interpretation. Overall, the observed effects in the original kymograph traces require a better explanation.
  
  We acknowledge that NPY-pHluorin in Figure 1E (now Figure 2C) is not completely bleached. NPY-pHluorin appeared to be more difficult to bleach than NPY-mCherry. However, it is important to clarify that we merely bleached the neurites to remove the stationary puncta and facilitate our analysis of DCV/RPH3A dynamics. This bleaching step does not affect the interpretation of our results. We apologize that this was not clearly stated in the text and have made the necessary adjustments in legend, results- and methods section, (p 6-7, line 162-163; p 5, line 140-142 and p 19, line 508-513). Additionally, we apologize for the accidental switch of the kymographs for NPY-mCherry and EGFP-RPH3A in Figure 1D (now Figure 2B, C). We greatly appreciate identifying this error.
  
  (3) Figure 1: The authors need to mention whether axons, dendrites, or both were analyzed throughout the different panels and how they were identified. Is it possible that axons were wrapping around dendrites in their cultures (compare e.g. Shimojo et al., 2015)? Given the limited spatial resolution and because of this wrapping, interpretation of results could be affected.
  
  We completely agree with the reviewer’s assessment and conclusion. We are unable to distinguish axons from dendrites using this experimental design. We have made sure to specify in the text that our observation that RPH3A does not co-travel with DCVs is true for both dendrites and axons, (p 5, line 150).
  
  (4) Figure 2: The exocytosis imaging is elegant and potentially very insightful. The effect in the RPH3A KOs is convincing. However, the authors determine the efficacy of exocytosis from NPY-pHluorin unquenching of DCVs only. This is only one of several possible parameters to read out the efficiency of exocytosis. Kinetics like e.g. delay between stimulation and start of exocytosis events or release time course of NPY after DCV fusion were not determined. Such analysis could give a better insight into what process before or after the fusion of DCVs is affected by RPH3A ko.
  
  We fully agree with the reviewer. We have now included the duration of the fusion events (new Figure S2D-F). The start time of the fusion events are shown in the cumulative plots in now Figure 3F and I. The kinetics are normal in the RPH3A KO neurons.
  
  Moreover, it needs to be mentioned whether 2C and D are from WT or ko cultures. It would be best to show representative examples from both genotypes.
  
  We have now adjusted this in the new figure (now Figure 3C, D).
  
  The number of fusion events is much increased but the release fraction is not significantly changed. While this is consistent with results in Figure 4C it is at variance with 4F. This raises questions about the reliability of the effects in RPH3A KOs.
  
  The release fraction indicates the number of fusion events normalized to the total DCV pool. In Figure 4D, we observed a slightly bigger pool size, which explains the lack of significance when analyzing the released fraction. In Figure 4G, however, DCV pool sizes are similar between KO and WT, leading to a statistically significant effect on release fraction in KO neurons. Furthermore, Figures 4B and E distinctly show a substantial increase in fusion events in RPH3A KO neurons. This variability in pool size observed could potentially be attributed to variation in culture or inherent biological variability.
  
  Given the increased number of ChgrB-positive DCVs in RPH3A KOs (shown in Figure 2) and that only the cumulative number of exocytosis events were analysed, how can the authors exclude that the RPH3A ko only affects vesicle number but not release, if the % change in released vesicles is not different to WT? Kinetics of release don't seem to be affected. Importantly, what was the density of NPY-pHluorin vesicles in WT vs. ko?
  
  In Figure 2 (now Figure 5) we show that RPH3A KO neurons are larger and contain more endogenous ChgB+ puncta than WT neurons. This increased number of ChgrB+ puncta scales with their size as puncta density is not increased. A previous study (Persoon et al., 2018) has demonstrated a strong correlation between DCV number and neuron size. Our data show that RPH3A deficiency increased DCV exocytosis, but the released fraction of vesicles depends on the total number of DCVs, which we determined during live recording by dequenching NPY-pHluorin using NH4+. Considering that this is an overexpression of a heterologous DCV-fusion reporter, and not endogenous staining of DCVs, as in the case of ChgrB+ puncta, some variability is not unexpected.
  
  Also in these experiments, the question arises of whether the authors analyse axons, dendrites, or both throughout the different panels and how they were identified.
  
  In our experimental design we record all fusion events per cell, including both axons and dendrites but excluding the cell soma. We have clarified this in the method section, (p 19, line 508 and p 19, line 521-522).
  
  (5) Figure 3: in D the authors show that ChgrB-pos. DCV density is slightly increased in KOs. How does this relate to the density of NPY-pHluorin DCVS in Figure 2?
  
  We do not observe a difference in NPY-pHluorin density (see Author response image 1). However, it is important to note that we relied on tracing neurites in live recording images to determine the neuronal size. In contrast, the ChgB density was based on dendritic length using MAP2 (post-hoc) staining was limited. In addition, Chgr+ puncta represent an endogenous DCV staining, NPY-pHluorin quantification is based on overexpression of a heterologous DCV-fusion reporter. These two factors likely contribute some variability.
  
  Author response image 1.
  
  The authors show a non-significant trend of TeNT coexpression to reduce axon and dendrite lengths in RPH3A KOs. While this trend is visible, I think one cannot draw conclusions from that when not reaching significance. The argument of the authors that the increased axon and dendrite lengths are created by growth factor peptide release from DCV during culture time is interesting. However, the fact that TeNT expression shows a trend toward reducing this effect on axons/dendrites is not sufficient to prove the release of such growth factors.
  
  We agree. We have toned down this speculation in the revised manuscript, (p 15-16, line 395-400).
  
  Lastly, the authors don't provide insight into the mechanisms, of how RPH3A ko increases the number of DCVs per µm dendritic length in the neurons. In my view, there are too many loose ends in this story of how RPH3A ko first increases spontaneous release of DCVs and then enhances neurite growth and DCV density. Did the authors e.g. measure the spontaneous release of DCVs in their cultures?
  
  We measured spontaneous release of DCVs during the 30s baseline recording prior to stimulation. We observed no difference in spontaneous release between WT and KO neurons (now Figure S2H). However, baseline recording lasted only 30 seconds. It is possible that this was too short to detect subtle effects.
  
  Other points:
  
  (1) Figure 4: the logic of this experiment is elegant. It shows that the increased number of DCV fusion events in RPH3A KOs is related to the interaction of RPH3A with RAB3A but not with SNAP25. As mentioned above, it is irritating that the reduction of fusion events in KOs and on the release fraction is sometimes reaching significance, but sometimes it does not. Likewise, the absence of significant effects on DCV numbers is not consistent with the results shown in Figures 3C and D.
  
  DCV numbers in Figure 3 (now Figure 5) are determined by staining for endogenous ChgB, whereas in Figure 4D and G DCV numbers are determined by overexpressing NPY-pHluorin and counting the dequenched puncta following a NH4+ puff.
  
  (2) Figure 1B: truncation of the y-axis needs to be clearly indicated.
  
  We have replaced this figure with new Figure 1 and have indicated truncations of the y-axis when needed (new Figure 1E).
  
  (3) Page 10: "Given that neuropeptides are key modulators of adult neurogenesis (Mu et al., 2010), and that RPH3A depletion leads to increased DCV exocytosis, it is coherent that we observed longer neurites in RPH3A KO neurons." I cannot follow the argument of the authors here: what has neurogenesis to do with neurite length?
  
  We apologize for the confusion. We have clarified this in the revised text, (p 16, line 398-400).
  
  Minor point:
  
  There are some typos in the manuscript. e.g., page 8: "... may partially dependent on regulated secretion...); page 6: "...to dequence all...".
  
  Thank you for noticing, we have corrected the typos.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Supplementary Figure S1A, in my opinion, should be in Figure 1A as it illustrates all the constructs used in this study and helps the reader to follow it up.
  
  We thank the reviewer for their suggestion. However, we feel that with the adjustments we have made in Figure 1, the illustrations of the constructs fit better in Figure S1, since new Figure 1 shows the localization of endogenous RPH3A and not that of the constructs.
  
  (2) One of the conclusions of the manuscript is the synaptic localization of the different RPH3A mutants. The threshold for defining synaptic localization is not clear either from the images nor from the analysis: for example, the Menders coefficient for VGut1-Syn1 which is used as a positive control, ranges from 0.65-0.95 and that of RPH3A and Syn1 ranges from 0.5-0.95. These values should be compared to all mutants and the conclusions should be based on such comparison.
  
  We agree. We have now repeated our initial co-localization experiment with all the RPH3A mutants (now Figure S1D-F).
  
  (3) Strengthening this figure with STED/SIM/dSTORM microscopy can verify and add a new understanding of the subtle changes of RPH3A localization.
  
  We fully agree and have now added super-resolution microscopy data, see Joint Public Review item (i).
  
  (4) As RAB3A/RAB27A (ΔRAB3A/RAB27A) loses the punctate distribution, please clarify how can it function at the synapse and not act as a KO. Is it sorted to the synapse and how does it is sorted to the synapse?
  
  We used lentiviral delivery to introduce our constructs, resulting in the overexpression of ΔRAB3A/RAB27A mutant RPH3A. This overexpression likely compensates for the loss of the punctate distribution of RPH3A, thereby maintaining its limiting effect on DCV exocytosis. It is plausible that under physiological conditions, the mislocalization of RPH3A would lead to increased exocytosis, similar to what we observed in the KO.
  
  (5) Is RPH3A expressed in both excitatory and inhibitory neurons?
  
  We agree this is an important question. Single cell RNA-seq already suggests the protein is expressed in both, but we nevertheless decided to test expression of RPH3A protein in excitatory and inhibitory neurons, using immunocytochemistry with VGAT and VGLUT as markers in hippocampal and striatal WT neurons. We found that RPH3A is expressed in both VGLUT+ hippocampal neurons and VGAT+ striatal neurons (new Figure S1A, B).
  
  (6) The differential use of ChgB and NPY as markers for DCVs should be clarified and compared as these are used at different stages of the manuscript.
  
  We have previously addressed the comparison between ChgB and NPY-pHluorin (Persoon et al., 2018). We made sure to indicate this more clearly throughout the manuscript to clarify the use of the two markers.
  
  (7) FRAP experiments- A graph describing NPY recovery should be added as a reference to 2H and discussed.
  
  We agree. We have made the necessary adjustments (new Figure 2G).
  
  (8) Figure 2E shows some degree of "facilitation" between the 2 8x50 pulses RPH3A KO neurons. Can the author comment on that? What was the reason for using this dual stimulation protocol?
  
  There is indeed some facilitation between the two 8 x 50 pulses in KO neurons and to a lesser extent also in the WT neurons, which we have observed before in WT neurons (Baginska et al., 2023). Baginska et al. (2023) showed recently that different stimulation protocols can influence certain fusion dynamics, like the ratio of persistent and transient events and event duration. We used two different stimulation protocols to thoroughly investigate the effect of RPH3A on exocytosis, and assess the robustness of our findings regarding the number of fusion events. Fusion kinetics was similar in WT an KO neurons for both stimulation protocols (new Figure 2D-F).
  
  (9) Figure 3 quantifies dendrites length and then moves to quantify both axon and dendrites for the Tetanus toxin experiment. What are the effects of KO on axon length? In the main figures, it is not mentioned but in S3 it seems not to be affected. How does it reconcile with the main conclusion on neurite length?
  
  Figure 3H (now Figure 6C) shows the effect of the KO on axon length: the axon length is increased in RPH3A KO neurons compared to WT, similar to dendrite length. Re-expressing RPH3A in KO neurons rescues axonal length to WT levels. In Figure S3, we observe a similar trend as in main Figure 3 (new Figure 6), yet this effect did not reach significance. Based on this, we concluded that neurite length is increased upon RPH3A depletion.
  
  (10) For lay readers, please explain the total pool and how you measured it. However, see the next comment.
  
  We agree. We have now defined this better in the revised manuscript, (p 19, line 524-527 and p 20, line 535-539).
  
  (11) It is a bit hard to understand if the total number of DCV was increased in the KO and if the pool size was increased and in which figure it is quantified. Some sentences like: "A trend towards a larger intracellular DCV pool in KO compared to WT neurons was observed" do not fit with "No difference in DCV pool size was observed between WT and KO neurons (Figure S2D)" or with "During stronger stimulation (16 bursts of 50 APs at 50 Hz), the total fusion and released fraction of DCVs were increased in KO neurons compared to WT". They are not directly supported, or not related to specific figures. Please indicate if the total DCVs pool, as measured by NH4, was increased and based on that, the fraction of the releasable DCVs following the long stimulation. From Figure 2H, the conclusion is an increase in fusion events. In general, NH4 is not quantified clearly- is it quantified in Figure S2C? And if it is a trend, how can it become significant in Figure 3?
  
  We agree there has been some inconsistency in the way we describe the data on the total number of DCVs. We have addressed this in the revised text to ensure better clarity. The total DCV pool measured by NPY-pHluorin was not significantly increased in KO neurons, we see a trend towards a bigger DCV pool in the 2x8 50 Hz stimulation paradigm (now Figure S2C), therefore the released fraction of vesicles is not increased in Figure 1G (now Figure 3G). The number of DCV in Figure 3 (now Figure 5) is based on endogenous ChgB staining and not overexpression like the DCV pool measured by NPY-pHluorin. In Figure 3 (now Figure 5) we show that RPH3A KO neurons have slightly more ChgB+ puncta compared to WT.
  
  (12) In Figure 3, the quantification is not clear, discrete puncta are not visible but rather a smear of chromogranin staining. How was it quantified? An independent method to count DCV number, size, and distribution like EM is necessary to support and add further understanding.
  
  We acknowledge that discrete ChgB puncta are not completely visible in Figure 3 (now Figure 5). Besides the inherent limitation in resolution with confocal imaging, we believe that this is due to ChgB accumulation in the KO neurons, as shown in now Figure 5D. Nonetheless, to address this concern of the reviewer, we have selected other images that represent our dataset (now Figure 5A). Furthermore, the number of ChgB+ DCVs was calculated using SynD software (Schmitz et al., 2011; van de Bospoort et al., 2012) (see previous reply). EM would offer valuable independent confirmation on the total DCV number, size and distribution. However, with the current method we already know that vesicle numbers are at least similar. Does that justify the (major) investment in a quantitative EM study? Moreover, this issue does not affect the central message of the current study.
  
  (13) Can the author discuss if the source of DCVs that are released at the synapse is similar or different from the source of DCVs fused while neurites elongate?
  
  With our current experimental design, we are unable to draw conclusions regarding this aspect. We are not sure how experiments to identify this source (probably the Golgi?) would be crucial to sustain the central message of our study.
  
  (14) An interesting and related question: what are the expression levels of RPH3A during development and neuronal growth during the nervous system development?
  
  While we have not specifically examined the expression levels of RPH3A over development, public databases show that RPH3A expression increases over time in mice, consistent with other synaptic proteins (Blake et al., 2021; Baldarelli et al., 2021; Krupke et al., 2017). We have now added this to the revised manuscript (p 2, line 55-56).
  
  (15) The conclusion from Figure 4 about the contribution of SNAP25 interaction to RPH3A inhibitory effect is not convincing. The data are scattered and in many neurons, high levels of fusion events were detected. Further or independent experiments are needed to support this conclusion. For example, is the interaction with SNAP25 important for its inhibitory activity in other DCV-releasing systems like adrenal medulla chromaffin cells?
  
  We agree that further studies in other DCV-releasing systems like chromaffin cells would provide valuable insight into the role of SNAP25 interaction in RPH3A’s inhibitory effect on exocytosis. However, we believe that starting new series of experiments in another model system is outside of the scope of our current study.
  
  (16) Furthermore, the number of DCVs in the KO is similar in this experiment, raising some more questions about the quantification of the number of vesicles, that differ, in different sections of the manuscript (points # 10,11).
  
  The total DCV pool in the fusion experiments is measured by overexpression NPY-pHluorin, this cannot be directly compared to the number of endogenous ChgB+ DCV in Figure 3 (now Figure 5), see also item (11)
  
  (17) The statement - "RPH3A is the only negative regulator of DCV" is not completely accurate as other DCV inhibitors like tomosyn were described before.
  
  We agree. By this statement, we intend to convey that RPH3A is the only negative regulator of DCVs without substantial impact on synaptic vesicle exocytosis, unlike Tomosyns. We have clarified this in the revised text, (p 15, line 366-367).
  
  (18) The support for the effect of KO on the "clustering of DCVs" is not convincing.
  
  The intensity of endogenous ChgB puncta was decreased in RPH3A KO neurons (now Figure 5E). However, the peak intensity induced by single NPY-pHluorin labeled DCV fusion events (quanta) was unchanged (now Figure S2I). This indicates that the decrease in ChgB puncta intensity must be due to a reduced number of DCVs (quanta) in this specific location. We have interpreted that as ‘clustering’, or maybe ‘accumulation’. However, we only put forward this possibility. We are now more careful in our speculations within the text, (p 11 line 271-277).
  
  (19) Final sentence: "where RPH3A binds available SNAP25, consequently restricting the assembly of SNARE complexes" should be either demonstrated or rephrased as no effect of trans or general SNARE complex formation is shown.
  
  We agree. We have made the necessary adjustments in the text, (p 15, line 387-389).
  
  (20) A scheme summarizing RPH3A's interaction with synaptic proteins and its effects on DCVs release, maybe even versus its effects on SVs release, should be considered as a figure or graphic abstract.
  
  We have included a working model in Figure 7.
  
  (21) Figure 4 logically should come after Figure 2 to summarize the fusion-related chapter before moving to neurite elongation.
  
  We have placed Figure 4 after Figure 2 (now Figure 3).
  
  Reviewer #3 (Recommendations For The Authors):
  
  One important finding of this study is that RPH3A downregulates neuron size, possibly by inhibiting DCV release. Additionally, the authors demonstrated that the number of DCVs is directly proportional to the number of DCVs per µm2, and that RPH3A KO reduces DCV clustering. This conclusion was drawn by comparing ChgB with NPY-pHluorin loading of the DCVs. However, this comparison is not valid as ChgB is expressed at an endogenous level and NPY-pHluorin is over-expressed. In the KO situation where DCV exocytosis is enhanced, the available endogenous ChgB may be depleted faster than the overexpressed NPY-pHluorin. Hoogstraaten et al. should either perform a study in which ChgB is overexpressed to test whether the difference in DCV remains or at least provides an alternative interpretation of their data.
  
  We thank the reviewer for this comment. The reviewer challenges one or two conclusions in our original manuscript (It is not entirely clear to what exactly “This conclusion” refers): (a) “the number of DCVs is directly proportional to the number of DCVs per µm2”, and (b) “that RPH3A KO reduces DCV clustering”. The reviewer probably means that the number of DCVs per neuron is directly proportional to size of the neuron (a) and states this (these) conclusion(s) are “not valid as ChgB is expressed at an endogenous level and NPY-pHluorin is over-expressed” because “endogenous ChgB may be depleted faster than the overexpressed NPY-pHluorin”. We have three arguments to conclude that faster depletion of ChgB cannot affect these two conclusions: (1) DCVs bud off from the Golgi with newly synthesized (fresh) ChgB. Whether or not a larger fraction of DCVs is released does not influence this initial ChgB loading into DCVs (together with over-expressed NPY-pHluorin); (2) in hippocampal neurons merely 1-6% of the total DCV pool undergoes exocytosis (the current study and also extensively demonstrated in Persoon et al., 2018). RPH3A KO neurons release few percent more of the total DCV pool. Hence, “depletion of ChgB” is only marginally different between experimental groups; and (c) the proposed experiment overexpressing ChgB will not help scrutinize our current conclusions as ChgB overexpression is known to affect DCV biogenesis and the total DCV pool, most likely much more than a few percent more release by RPH3A deficiency.
  
  Hoogstraaten et al. conducted a thorough analysis of the impact of RPH3A KO and its rescue using various mutants on dendrite and axon length (see Supplementary Figure 3). However, they did not test the effect of the ΔSNAP25 mutant. The authors demonstrated that this mutant is the least efficient in rescuing DCV exocytosis (Figure 4E). Hence the neurons expressing this mutant should have a similar size to the KO neurons. This finding would strongly support the argument that DCV exocytosis regulates neuron size. Otherwise, it would suggest that RPH3A may have a function in regulating exocytosis at the growth cones that is independent of SNAP25. Since the authors most probably have the data that allows them to measure the neuron size (acquired for Supplementary Figure 2), I suggest that they perform the required analysis.
  
  We agree this is important and performed new experiments to determine the dendrite length of RPH3A WT, KO and KO neurons expressing the ΔSNAP25 mutant. We observed that the dendrite length of RPH3A KO neurons expressing ΔSNAP25 mutant is indeed similar to KO neurons (new Figure S3C). Although not significant we observe a clear trend towards bigger neurons compared to WT. This strengthens our conclusion that increased DCV exocytosis contributes to the observed increased neuronal size.
  
  The authors displayed the result of DCV exocytosis in two ways. One is by showing the number of exocytosis events the other is to display the proportion of DCVs that were secreted. They do the latter by dividing the secreted DCV by the total number of DCVs. These are visualized at the end of the experiment through NH4+ application. While this method works well for synaptic secretion as the marker of SV is localized to the SV membrane and remains at the synapse upon SV exocytosis, it cannot be applied in the same manner when it is the DCV content that is labeled as it is released upon secretion. Hence, the total pool of vesicles should be the number of DCV counted upon NH4+ application in addition to those that are secreted. This way of analyzing the total pool of DCV might also explain the difference in this pool size between KO neurons stimulated two times with 8 stimuli instead of one time with 16 stimuli (Sup Fig 2 C and D). This is an important point as it affects the conclusions drawn from Figure 2.
  
  We thank the reviewed for this comment. We agree, and we have made the necessary adjustments throughout the manuscript.
  
  The kymogram of DCV exocytic events displayed in Figure 2D shows a majority of persistent (>20s long) events. This is strange as NPY-pHluori corresponds to the released cargo. Previous work using the same labeling and stimulation technique showed that content release occurs in less than 10s (Baginska et al. 2023). The authors should comment on that difference.
  
  In Baginska et al. (2023), the authors distinguished between persistent and transient events. The transient events are shorter than 10s for the 2x8 and 16x stimulation paradigms, whereas persistent events can last for more than 10s. In our study we did not make this distinction. However, in response to this reviewer, we have now quantified the fusion duration per cell. These new data show that the mean duration is similar between genotypes for both stimulation paradigms. We have added these new data (new Figure S2D-F).
  
  In Figures 1D and E, some puncta in the kymogram appeared to persist after bleaching. This raises questions about the effectiveness of the bleaching procedure for the FRAP experiment.
  
  The reviewer is correct that NPY-pHluorin in Figure 1E (now Figure 2C) is not fully bleached. NPY-pHluorin was more resistant to bleaching than NPY-mCherry. However, we merely bleached the neurites to facilitate our analysis by reducing fluorescence of the stationary puncta without causing phototoxicity. Some remaining fluorescence after bleaching does not affect our conclusions in any way.
  
  In the discussion, the paragraph titled "RPH3A does not travel with DCVs in hippocampal neurons" is quite confusing and would benefit from a streamlined explanation.
  
  We thank the reviewed for this comment. We made the necessary adjustments to make this paragraph clearer, (p 14, line 339-351).
  
  First paragraph of page 8 "TeNT expression in KO neurons restored neurite length to WT levels. When compared to KO neurons without TeNT, neurite length was not significantly decreased but displayed a trend towards WT levels (Figure 3G, H)." These two sentences are confusing as they seem contradictory.
  
  We agree that this conclusion has been too strong. However, we do not see a contradiction. The significant effect between KO and control neurons on both axon and dendrite length is lost upon TeNT expression (which forms the basis for our conclusions cited by the reviewer, now Figure 6B, C). While the difference between KO neurons +/- TeNT did not reach statistical significance. The (strong) trend is clearly in the same direction. We have refined our original conclusion in the revised manuscript, (p 12, line 304-306).
  
  The data availability statement is missing.
  
  We have added the data availability statement, (p 21, line 571-572).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.05.574432v2
www.biorxiv.org www.biorxiv.org

Artificial selection for microbial collective composition can succeed or fail depending on the initial and target values

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Common comments
  
  (1) Significance of zero mutation rate
  
  Reviewers asked why we included mutation rate even though setting mutation rate to zero doesn’t change results. We think that including non-zero mutation rate makes our results more generalisable, and thus is a strength rather than weakness. To better motivate this choice, we have added a sentence to the beginning of Results:
  
  (2) Writing the mu=0 case first
  
  Reviewers suggested that we should first focus on the mu=0 case, and then generalize the result. The suggestions are certainly good. However, given the large amount of work involved in a re-organization, we have decided to adhere to our current narrative. However, we now only include equations where mu=0 in the main text, and have moved the case of nonzero mutation rate to Supplementary Information.
  
  (3) Making equations more accessible
  
  We have taken three steps to make equations more readable.
  
  ● Equations in the main text correspond to the case of zero-mutation rate.
  
  ● The original section on equation derivation is now in a box in the main text so that readers have the choice of skipping it but interested readers can still get a gist of where equations came from.
  
  ● We have provided a much more detailed interpretation of the equation (see page 10).
  
  (4) Validity of the Gaussian approximation
  
  Reviewers raised concerns about the validity of Gaussian approximation on F frequency𝑓(𝜏). The fact that our calculations closely match simulations suggest that this approximation is reasonable. Still, we added a discussion about the validity of this approximation in Box 1.
  
  We also added to SI with various cases of initial S and F sizes. This figure shows that when either initial S or initial F is small, the distribution of𝑓(𝜏) is not normal. However, if initial S and F are both on the order of hundreds, then the distribution of 𝑓(𝜏) is approximately Gaussian.
  
  Public Reviews:
  
  Summary:
  
  The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.
  
  Strengths:
  
  To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle are important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.
  
  This work may or may not be hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamic systems.
  
  We have added this clarification in the main text:
  
  “Note that here, selection outcome is path-dependent in the sense of being sensitive to initial conditions. This phenomenon is distinct from hysteresis where path-dependence results from whether a tuning parameter is increased or decreased.
  
  Weaknesses:
  
  (1) Connecting structure and function
  
  In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main Figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.
  
  We have added this to introduction: “This is a common quest: whenever a collective function depends on both populations, collective function is maximised, by definition, at an intermediate frequency (e.g. too little of either population will hamper function [23]).”
  
  (2) Explain intra-collective and inter-collective selection better for readers.
  
  The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.
  
  This is a great point. We have added in Abstract:
  
  “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”
  
  and in Introduction
  
  “A selection cycle consists of three stages (Fig. 1). During collective maturation, intra-collective selection favors fast-growing individuals within a collective. At the end of maturation, inter-collective selection acts on collectives and favors those achieving the target composition. Finally during collective reproduction, offspring collectives sample stochastically from the parents, a process dominated by genetic drift.”
  
  (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.
  
  I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main Figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck is imposed on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.
  
  We have added an analytical approximation for N0˘, the Newborn size below which all target frequencies can be achieved in SI.
  
  Also, we have added lines indicating N0˘ in Fig4a.
  
  (4) Consideration of environmental stochasticity.
  
  The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.
  
  You are correct that our work considers only demographic stochasticity.
  
  Indeed, considering other types of stochasticity will be an exciting future research direction. We added in the main text:
  
  “Overall our model considers mutational stochasticity, as well as demographic stochasticity in terms of stochastic birth and stochastic sampling of a parent collective by offspring collectives. Other types of stochasticity, such as environmental stochasticity and measurement noise, are not considered and require future research.”
  
  (5) Assumption about mutation rates
  
  If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.
  
  This is a great point. We have added this to the beginning of Results to better motivate our study:
  
  “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations. This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around. When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.
  
  See answer on common question 1.
  
  (6) Minor points
  
  In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.
  
  We added a description in caption 3b.
  
  In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?
  
  We increased the resolution of Fig 5b so that the gold region near FF is more visible.
  
  We have added Fig 5c and the following explanation to the main text:
  
  “From numerical simulations, we identified two accessible regions: a small region near FF and a band region spanning from S to F (gold in Fig. 5b i). Intuitively, the rate at which FF grows faster than S+F is greater than the rate at which F grows faster than S (see section VIII in Supplementary Information). Thus, the problem can initially be reduced to a two-population problem (i.e. FF versus F+S; Fig. 5c left), and then expanded to a three-population problem (Fig. 5c right).”
  
  Recommendations For The Authors
  
  Since the conclusion of the model greatly depends on the noise (variation) of F and S in the Gaussian distribution, it would be nice to have a plot where the y-axis is the variation in terms of frequency and the x-axis is the s_0 or f_0 (frequency). In the plot, I would love to see how the variation in the frequency depends on the initial frequency of S and F. Maybe this is just trivial.
  
  In the SI, we added Fig6a, as per your request. Previous Fig6 became Fig6b.
  
  Reviewer #2 (Public review):
  
  The authors provide an analytical framework to model the artificial selection of the composition of communities composed of strains growing at different rates. Their approach takes into account the competition between the targeted selection at the level of the meta-community and the selection that automatically favors fast-growing cells within each replicate community. Their main finding is a tipping point or path-dependence effect, whereby compositions dominated by slow-growing types can only be reached by community-level selection if the community does not start and never crosses into a range of compositions dominated by fast growers during the dynamics.
  
  These results seem to us both technically correct and interesting. We commend the authors on their efforts to make their work reproducible even when it comes to calculations via extensive appendices, though perhaps a table of contents and a short description of these appendices at the start of SI would help navigate them.
  
  Thank you for the suggestion. We have added a paragraph at the beginning of SI.
  
  The main limitation in the current form of the article is that it could clarify how its assumptions and findings differ from and improve upon the rest of the literature:
  
  - Many studies discuss the interplay between community-level evolution and species- or strain-level evolution. But "evolution" can be a mix of various forces, including selection, drift/randomness, and mutation/innovation.
  
  - This work's specificity is that it focuses strictly on constant community-level selection versus constant strain-level selection, all other forces being negligible (neither stochasticity nor innovation/mutation matter at either level, as we try to clarify now).
  
  Note that intra-collective selection is not strictly “constant” in the sense that selection favoring F is the strongest at intermediate F frequency (Fig 3). However, we think that you mean that intra- and inter-collective selection are present in every cycle, and this is correct for our case, and for community selection in general.
  
  - Regarding constant community-level selection, it is only briefly noted that "once a target frequency is achieved, inter-collective selection is always required to maintain that frequency due to the fitness difference between the two types" [pg. 3 {section sign}2]. In other words, action from the selector is required indefinitely to maintain the community in the desired state. This assumption is found in a fraction of the literature, but is still worth clarifying from the start as it can inform the practical applicability of the results.
  
  This is a good point. We have added to abstract:
  
  “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”
  
  - More importantly, strain-level evolution also boils down here to pure selection with a constant target, which is less usual in the relevant literature. Here, (1) drift from limited population sizes is very small, with no meaningful counterbalancing of selection, (2) pure exponential regime with constant fitness, no interactions, no density- or frequency-dependence, (3) there is no innovation in the sense that available types are unchanging through time (no evolution of traits such as growth rate or interactions) and (4) all the results presented seem unchanged when mutation rate mu = 0 (as noted in Appendix III), meaning that the conclusions are not "about" mutation in any meaningful way.
  
  With regard to point (1), Figure 4a (reproduced below) shows how Newborn size affects the region of achievable targets. Indeed at large Newborn size (e.g. 5000 and above), no target frequency is achievable (since drift is too small to generate sufficient inter-community variation and consequently all communities are dominated by fast-growing F). However at Newborn size of for example 1000, there are two regions of accessible target frequencies. At smaller Newborn size, all target frequencies become achievable due to drift becoming sufficiently strong.
  
  With regard to points (2) and (3), we have added to Introduction
  
  “To enable the derivation of an analytical expression, we have made the following simplifications.
  
  First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”
  
  With regard to point (4), we view this as a strength rather than weakness. We have added the following to the beginning of Results and Discussions:
  
  “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”
  
  “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”
  
  See Point 1 of Common comments.
  
  - Furthermore, the choice of mutation mechanism is peculiar, as it happens only from slow to fast grower: more commonly, one assumes random non-directional mutations, rather than purely directional ones from less fit to fitter (which is more of a "Lamarckian" idea). Given that mutation does not seem to matter here, this choice might create unnecessary opposition from some readers or could be considered as just one possibility among others.
  
  We have added the following justification:
  
  “This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around.”
  
  It would be helpful to have all these points stated clearly so that it becomes easy to see where this article stands in an abundant literature and contributes to our understanding of multi-level evolution, and why it may have different conclusions or focus than others tackling very similar questions.
  
  Finally, a microbial context is given to the study, but the assumptions and results are in no way truly tied to that context, so it should be clear that this is just for flavor.
  
  We have deleted “microbial” from the title, and revised our abstract:
  
  Recommendations For The Authors
  
  (1) More details concerning our main remark above:
  
  - The paragraph discussing refs [24, 33] is not very clear in how they most importantly differ from this study. Our impression is that the resource aspect is not very important for instance, and the main difference is that these other works assume that strains can change in their traits.
  
  We are fairly sure that resource depletion is important in Rainey group’s study, as the attractor only evolved after both strains grew fast enough to deplete resources by the end of maturation. Indeed, evolution occurred in interaction coefficients which dictate the competition between strains for resources.
  
  Regardless, you raised an excellent point. As discussed earlier, we have added the following:
  
  “To enable the derivation of an analytical expression, we have made the following simplifications.
  
  First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”
  
  - We would advise the main text to focus on mu = 0, and only say in discussion that results can be generalized.
  
  Your suggestion is certainly good. However, given the large amount of work involved in a reorganisation, we have decided to adhere to our current narrative. However, as discussed earlier, we have added this at the beginning of Results to help orient readers:
  
  “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”
  
  “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”
  
  (2) We think the material on pg. 5 "Intra-collective evolution is the fastest at intermediate F frequencies, creating the "waterfall" phenomenon", although interesting, could be presented in a different way. The mathematical details on how to find the probability distribution of the maximum of independent random variables (including Equation 1) will probably be skipped by most of the readers (for experienced theoreticians, it is standard content; for experimentalists, it is not the most relevant), as such I would recommend displacing them to SM and report only the important results.
  
  This is an excellent suggestion. We have put a sketch of our calculations in a box in the main text to help orient interested readers. As before, details are in SI.
  
  Similarly, Equations 2, 3, and 4 are hard to read given the large amount of parameters and the low amount of simplification. Although exploring the effect of the different parameters through Figures 3 and 4 is useful, I think the role of the equations should be reconsidered:
  
  i. Is it possible to rewrite them in terms of effective variables in a more concise way?
  
  See Point 3 of Common comments.
  
  ii. Is it possible to present extreme/particular cases in which they are easier to interpret?
  
  We have focused on the case where the mutation rate is zero. This makes the mathematical expressions much simpler (see above).
  
  (3) Is it possible to explain more in detail why the distribution of f_k+1 conditional to f_k^* is well approximated by a Gaussian? Also, have you explored to what extent the results would change if this were not true (in light of the few universal classes for the maximum of independent variables)?
  
  Despite the appeal to the CLT and the histograms in the Appendix suggesting that the distribution looks a bit like a Gaussian at a certain scale, fluctuations on that scale are not necessarily what is relevant for the results - a rapid (and maybe wrong) attempt at a characteristic function calculation suggests that in your case, one does not obtain convergence to Gaussians unless we renormalize by S(t=0) and F(t=0), so it seems there is a justification missing in the text as is for the validity of this approximation (or that it is simply assumed).
  
  See point 4 of Common comments.
  
  Reviewer #3 (Public Reviews):
  
  The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such differences in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but the collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) larger.
  
  A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.
  
  I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection, but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.
  
  A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.
  
  Thanks so much for these positive comments.
  
  I however found the description of the results too succinct and I think that more could be done to unpack the mathematical results in a way that is understandable to a broader audience. Moreover, the phenomenon the authors characterize is of purely ecological nature. Here, mutations of the growth rate are, in my understanding, neither necessary (non-trivial equilibria can be maintained also when \mu =0) nor sufficient (community-level selection is necessary to keep the system far from the absorbing state) for the phenomenon described. Calling this dynamics community evolution reflects a widespread ambiguity, and is not ascribable just to this work. I find that here the authors have the opportunity to make their message clearer by focusing on the case where the 'mutation' rate \mu vanishes (Equations 39 & 40 of the SI) - which is more easily interpretable, at least in some limits - while they may leave the more general equations 3 & 4 in the SI.
  
  See points 1-4 of Common comments.
  
  Combined with an analysis of the deterministic equations, that capture the possibility of maintaining high frequencies of fast growers, the authors could elucidate the dynamics that are induced by the presence of a second level of selection, and speculate on what would be the result of real open-ended evolution (not encompassed by the simple 'switch mutations' generally considered in evolutionary game theory), for instance discussing the invasibility (or not) of mutant types with slightly different growth rates.
  
  Indeed, evolution is not restricted to two types. However, our main goal here is to derive an analytical expression, and it was difficult for even two types. For three-type collectives, we had to resort to simulations. Investigating the case where fitness effects of mutations are continuously distributed is beyond the scope of this study.
  
  The single most important model hypothesis that I would have liked to be discussed further is that the two types do not interact. Species interactions are not only essential to achieve inheritance of composition in the course of evolution but are generally expected to play a key role even on ecological time scales. I hope the authors plan to look at this in future work.
  
  In our system, the S and F do interact in a competitive fashion: even though S and F are not competing for nutrients (which are always in excess), they are competing for space. This is because a fixed number of cells are transferred to the next cycle. Thus, the presence of F will for example reduce the chance of S being propagated. We have added this clarification to our main text:
  
  “Note that even though S and F do not compete for nutrients, they compete for space: because the total number of cells transferred to the next cycle is fixed, an overabundance of one population will reduce the likelihood of the other being propagated.”
  
  Recommendations For The Authors
  
  I felt the authors could put some additional effort into making their theoretical results meaningful for a population of readers who, though not as highly mathematically educated as they are, can nonetheless appreciate the implications of simple relations or scaling. Below, you find some suggestions:
  
  (1) In order to make it clear that there is a 'natural' high-frequency equilibrium that can be reached even in the absence of selection, the authors could examine first the dynamics of the deterministic system in the absence of mutations, and use its equilibria to elucidate the combined role of the 'fitness' difference \omega and of the generation duration \tau in setting its value. The fact that these parameters always occur in combination (when there are no mutations) is a general and notable feature of the stochastic model as well. Moreover, this model would justify why you only focus on decreasing the frequency in the new generation.
  
  Note that the ‘natural’ high-frequency equilibrium in the absence of collective selection is when fast grower F becomes fixed in the population. Following your suggestion, we have introduced two parameters 𝑅τ and 𝑊τ to reflect the coupling between ‘fitness’ and ‘generation duration’:
  
  (2) Since the phenomenon described in the paper is essentially ecological in nature (as the author states, it does not change significantly if the 'mutation rate' \mu is set to zero), I would put in the main text Equations 39 & 40 of the SI in order to improve intelligibility.
  
  See Point 2 at the beginning of this letter.
  
  These equations can be discussed in some detail, especially in the limit of small f^*_k, where I think it is worth discussing the different dependence of the mean and the variance of the frequency distribution on the system's parameters.
  
  This is a great suggestion. We have added the following:
  
  “In the limit of small , Equation (3) becomes f while Equation (4) becomes . Thus, both Newborn size (N<sub>0</sub>) and fold-change in F/S during maturation (W<sub>τ</sub>) are important determinants of selection progress.
  
  (3) I would have appreciated an explanation in words of what are the main conceptual steps involved in attaining Equation 2, the underlying hypotheses (notably on community size and distributions), and the expected limits of validity of the approximation.
  
  See points 3 and 4 at the beginning of this letter.
  
  (4) I think that some care needs to be put into explaining where extreme value statistics is used, and why is the median of the conditional distribution the most appropriate statistics to look at for characterizing the evolutionary trajectory (which seems to me mostly reliant on extreme values).
  
  Great point! We added an explanation of using median value in Box 1.
  
  and also added figure 7 to explaining it in SI.
  
  Showing in a figure the different distributions you are considering (for instance, plotting the conditional distribution for one generation in the trajectories displayed in Figure 2) would be useful to understand what information \bar f provides on a sequence of collective generations, where in principle there may be memory effects.
  
  Thanks for this suggestion. We have added to Fig 2d panel to illustrate the shape and position of F frequency distributions in each step in the first two selection cycles.
  
  (5) Similarly, I do not understand why selecting the 5% best communities should push the system's evolution towards the high-frequency solution, instead of just slowing down the improvement (unless you are considering the average composition of the top best communities - which should be justified). I think that such sensitivity to the selection intensity should be appropriately referenced and discussed in the main text, as it is a parameter that experimenters are naturally led to manipulate.
  
  In the main text, we have added this explanation:
  
  “In contrast with findings from an earlier study [23], choosing top 1 is more effective than the less stringent “choosing top 5%”. In the earlier study, variation in the collective trait is partly due to nonheritable factors such as random fluctuations in Newborn biomass. In that context, a less stringent selection criterion proved more effective, as it helped retain collectives with favorable genotypes that might have exhibited suboptimal collective traits due to unfavorable nonheritable factors. However, since this study excludes nonheritable variations in collective traits, selecting the top 1 collective is more effective than selecting the top 5% (see Fig. 11 in Supplementary Information).”
  
  (6) Equation 1 could be explained in simpler terms as the product between the probability that one collective reaches the transmitted value times the probability that all others do worse than that. The current formulation is unclear, perhaps just a matter of English formulation.
  
  We have revised our description to state:
  
  “Equation (1) can be described as the product between two terms related to probability: (i) describes the probability density that any one of the g Adult collectives achieves f given , and (ii) describes the probability that all other g – 1 collectives achieve frequencies above f and thus not selected.”
  
  (7) I think that the discussion of the dependence of the boundaries of the 'waterfall' region with the difference in growth rate \omega is important and missing, especially if one wants to consider open-ended evolution of the growth rate - which can occur at steps of different magnitude.
  
  We added a new chapter and figure in supplementary information on the threshold values when \omega varies. As expected, smaller \omega enlarges the success area.
  
  We have also added a new figure panel to show how maturation time affects selection efficacy.
  
  (8) Notations are a bit confusing and could be improved. First of all, in most equations in the main text and SI, what is initially introduced as \omega appears as s. This is confusing because the letter s is also used for the frequency of the slow type.
  
  The letter S is used to denote an attribute of cells (S cells), the type of cells (Equations 1-3 of the SI) and the number of these cells in the population, sometimes with different meanings in the same sentence. This is confusing, and I suggest referring to slow cells or fast cells instead (or at least to S-cells and F-cells), and keeping S and F as variables for the number of cells of the two types.
  
  All typos related to the notation have been fixed. We use S and F as types, and S and F (italic) and population numbers.
  
  (9) On page 3, when introducing the sampling of newborns as ruled by a binomial distribution, the information that you are just transmitting one collective is needed, while it is conveyed later.
  
  We have added this emphasis:
  
  “At the end of a cycle, a single Adult with the highest function (with F frequency f closest to the target frequency ) is chosen to reproduce g Newborn collectives each with N<sub>0</sub> cells (‘Selection’ and ’Reproduction’ in Fig. 1).”
  
  (10) I found that the abstract talks too early about the 'waterfall' phenomenon. As this is a concept introduced here, I suggest the authors first explain what it is, then use the term. It is a useful metaphor, but it should not obscure the more formal achievements of the paper.
  
  We feel that the “waterfall” analogy offers a gentle helping hand to orient those who have not thought much about the phenomenon. We view abstract as an opportunity to attract readership, and thus the more accessible the better.
  
  (11) In the SI there are numerous typos and English language issues. I suggest the authors read carefully through it, and add line numbers to the next version so that more detailed feedback is possible.
  
  Thank you for going through SI. We have gone through the SI, and fixed problems.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.07.531234v4
www.biorxiv.org www.biorxiv.org

New submission 16/11/2023, 09:29:54

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We sincerely thank and express our appreciation to each of the reviewers for their thorough critique of our manuscript.
  
  Reviewer #1 (Recommendations For The Authors):
  
  The analysis of whole study comes from only 4 cells from L2/3 of ferret visual cortex; however, it is well known that there is a high level of functional heterogeneity within the cortical neurons. Do those four neurons have similar or different response properties? If the four neurons are functionally different, the weak or no correlation may result from heterogenous distribution pattern of mitochondria associated with heterogenous functionality.
  
  This is an important consideration and often a limitation of CLEM studies. While cortical neurons do exhibit a high degree of functional heterogeneity (similar to spine activity), the 4 cells examined all had selective (OSI > 0.4) somatic responses to oriented gratings, although they differed in their exact orientation preference. Due to experimental limitations of recording from a large population of dendritic spines, we did not characterize other response properties for which their sensitivity might differ. We did not consider orientation preference a metric of study, but instead characterized the difference in preference from the somatic output, allowing comparisons across spines. In addition, our measurements were limited to proximal, basal dendrites rather than any location in the dendritic tree. Nonetheless, we attempted to address this concern by examining spines with functionally heterogenous visual responses within single cells, as reported in our manuscript: mitochondrial volume within a 1 µm radius was correlated with difference in orientation preference relative to the soma across all 4 cells, the mean r = 0.49 +/- 0.22 s.d.), suggesting that cell-to-cell variability has a minimal impact on our main conclusions.
  
  Even with our limited measurements, there is a large amount of functional heterogeneity in dendritic spine responses (Extended Data Figure 2, Scholl et al. 2021), far greater than the small differences in somatic responses of these 4 cells (Figure 3, Scholl et al. 2021). Moreover, the individual dendrites from these 4 cells exhibited substantial heterogeneity in the distribution of mitochondria. We cannot rule out whether heterogeneity at various scales may obscure certain relationships or result in the weak correlations we observed. We also note that future advancements in volume electron microscopy should allow for greater sample sizes that can better address the role of functional (and structural) heterogeneity. We have added text in the Discussion section about the potential structure-function relationships that might be obscured or revealed by neuron heterogeneity.
  
  The authors argued that "mitochondria are not necessarily positioned to support the energy needs of strong spines." However, the overall energy needs of a spine depend not only on the synaptic strength but also on the frequency of synaptic activity. Is there a correlation between the mitochondria volume around a spine and the overall activity of the spine? This data needs to be analyzed to confirm the distribution of mitochondria is independent of local energy needs.
  
  The reviewer is correct, but our experimental paradigm was not optimally designed to measure the ‘frequency’ of synaptic activity in vivo. This could have been accomplished with flashed gratings or, perhaps, presenting drifting gratings at different temporal frequencies. For spines tuned to higher temporal frequencies in V1, we may expect greater energy needs, although as the reviewer suggests, energy needs will depend on synapse (and bouton) size. Because we are not able to directly measure activity frequency as carefully or beautifully as can be done ex vivo or in nerve fibers, we do not feel confident in attempting such analysis in this study. Instead, based on previous studies, we assumed that larger synapses might be able to transmit higher frequencies, and thus have higher energy demands. We believe future in vivo studies will more directly measure synaptic frequency for comparison with mitochondria.
  
  We have added text in the Discussion about this caveat and potential future experiments.
  
  In the results section, the authors briefly mentioned that "We also considered other spine response properties related to tuning preference; specifically, orientation selectivity and response amplitude at the preferred orientation. For either metric, we observed no relationship to mitochondria within 1 μm radius (selectivity: 1 μm: r = -0.081, p = 0.269, n = 60; max response amplitude: 1 μm: r = -0.179, p = 0.078, n = 64) but did see a weak, significant relationship to both at a 5 μm radius (selectivity: r = 0.175, p = 0.027, n = 121; max response amplitude: r =-0.166, p = 0.030, n = 129)." Here only statistic results were given while the data were not presented in the figure illustration. Based on the methods and Figure 3B, it seems that the preferred orientations were calculated based on the vector summation. How did the authors calculate the "response amplitude at the preferred orientation"? This needs to be clarified. In addition, given the huge variety of orientation selectivity, using the response amplitude at the preferred orientation may not be the best parameter to correlate with the mitochondria volume which is indicative of energy needs. It might be necessary to include the baseline activity without visual stimulation and the average response for visual stimuli of different orientations in the analysis.
  
  We apologize for this oversight, as the details are present in our previous study (Scholl et al., 2021). Response amplitude and preferred orientation were calculated from a Gaussian curve fitting procedure with specific parameters describing those exact values (see Scholl et al. 2021 or Scholl et al. 2013). Only spines with selective responses (vector strength index > 0.1) and passing our SNR criterion were used for these analyses. We have now added this information to the Methods section and referred to it in the Results. With respect to the reviewer’s other concern, we also examined the average response amplitude (across all visual stimuli). There we found no relationship between the volume of mitochondria within 1 or 5 microns of a spine, however, because spines differ greatly in their selectivity (range = 0 – 0.8) the average response may not be an appropriate metric to compare across spines.
  
  A continuation from the former point, do the spines with similar preferred orientation to the somatic Ca signal have similar Ca signal strength, orientation selectivity index and other characteristics to the spines with different preferred orientation? As shown in the examples (Figure 3B), the spine on the right with different orientation preference compared with its soma has considerably larger response in non-preferred orientation compared to the spine on the left. Thus, the overall activity of the spine on the right may be higher than the spine which has similar preferred orientation to the soma. The authors showed that a positive correlation between difference in orientation preference and mitochondria volume (Figure 3C). Could this be simply due to higher spine activity for non-preferred orientation or spontaneous activity? Thus, the mitochondria might be positioned to support spines with higher overall activity rather than diverse response property.
  
  The reviewer brings up an interesting consideration. We examined the response properties of spines co-tuned (∆θpref < 22.5 deg) and differentially-tuned (∆θpref > 67.5 deg) to the soma. The orientation selectivity was not different between the two groups (p = 0.12, Wilcoxon ranksum test), although there was a small trend towards co-tuned inputs being more selective. We found that calcium response amplitudes for the preferred stimulus were also not different (p = 0.58, Wilcoxon ranksum test). These analyses are now included as a sentence in the Results.
  
  We agree with the reviewer that higher spontaneous activity in non-preferred spines may help explain the mitochondrial relationship we observe. However, our current dataset does not have sufficiently long recordings to measure spontaneous synaptic activity. Further, when considering a non-anesthetized preparation, spontaneous activity is highly dependent on brain state and an animal’s self-driven brain activity, which all must be experimentally controlled or measured to accurately address this.
  
  In addition, the information about the orientation selectivity of the soma is also missing. Do the four cells shown here all have similar level of orientation selectivity? Or some have relative weak orientation selectivity in the soma?
  
  Yes, all 4 cells have a similar OSI (range = 0.4 – 0.57, mean = 0.46 +/ 0.08 s.d.). This has been added to the Results section.
  
  This study focused on only a fraction of spines that are (1) responsive (2) osi > 0.1. However, in theory energy consumption is also related to non-responsive spines and spines with weak orientation tuning. What is the percentage of tuned and untuned spines? What's the correlation of mitochondria volume and spine activity level for untuned spines? I also recommend including the non-responsive spines into the analysis. For example, for each mitochondrion calculate the averaged overall activity of spines within certain distance from the mitochondrion, including the non-responsive spines. I would predict there may be more active spines and higher overall spine activity of dentritic segments near a mitochondrion than segments far from a mitochondrion.
  
  A majority of spines were tuned for orientation (~91%), although we specifically chose to only analyze data from spines with verifiable, independent calcium events. All analyses except those involving measurements of orientation preference use all dendritic spines (i.e. tuned and untuned). We have clarified this in the Results.
  
  These other ‘silent’ (i.e. without resolvable visual activity) spines may significantly contribute to energy demands of a dendrite too, as our methods (GCaMP6s expression) likely only capture synaptic events driving Ca+2 influx through NMDA receptors or VGCCs. We expect that glutamate imaging (e.g. iGlusnfr) may open the door to additional analyses to fully characterize functional relationship between spines and mitochondria.
  
  The correlation coefficient for mitochondria volume and difference in orientation preference is relatively low (r=0.3150). With such weak correlation, the explanatory power of this data is limited.
  
  We agree that while the correlation is significant, it is not particularly strong. To better represent the noise surrounding this measurement, we performed a bootstrap correlation analysis, sampling with replacement (1 micron: mean r = 0.31 +/- 0.11 s.e., 5 micron: mean r = 0.02 +/- 0.10 s.e.). We now include this in the Results.
  
  Why do the numbers of spines in different figures vary? For example, n=60 for 1micron in Figure 3, 54 in Figure 3c, 31 in Figure 4b, 51 in Figure 4e and so on.
  
  We apologize for the lack of clarity. Each analysis presented different requirements of the data. For example, orientation preference was computed only for selective (OSI > 1) spines (Fig. 3c), but this requirement did not apply to comparisons with selectivity or response amplitude (Fig. 3d). Similarly, as stated in the Results and Methods, measurements of local heterogeneity require a minimum number of neighboring spines (n > 2), limiting the number of usable spines for analysis (Fig. 4). We have clarified this in the text.
  
  In Figure 6a, the sample sizes of mito+ spines and mito- spines are extremely unbalanced, which affects the stat power of the analysis. I recommend performing a randomization test.
  
  We thank the reviewer for this suggestion. We ran permutation tests to compare the similarity in mean values between equally sampled values from each distribution. These tests supported our original analysis and conclusions. We have added these tests to the Results.
  
  Ca signals are approximations of electrical signals. How well are spinal calcium signals correlated to synaptic strength and local depolarization? This should be put into discussion.
  
  There is unlikely a simple, direct relationship between spine calcium signal and synaptic strength or membrane depolarization, and this has never been addressed in vivo. Koester and Johnston (2005) performed paired recordings in slice and showed that single presynaptic action potentials producing successful transmission generate widely different calcium amplitudes (Fig. 3). Another study from Sobczyk, Scheuss, and Svoboda (2005) used two-photon glutamate uncaging on single spines and showed that micro-EPSC’s evoked are uncorrelated with the spine calcium signal amplitude. We have added a note about this in the discussion.
  
  In Figure 4i, the negative correlation may depend on the 4 data points on the right side. How influential are those data points?
  
  Spearman’s correlation coefficient analysis is robust to outliers and it is highly unlikely these datapoints are critical with our sample size (n > 100 spines).
  
  Raw data of Ca responses were missing.
  
  Some data has been published with the parent publication (Scholl et al., 2021). As spine imaging data is difficult to obtain and highly unique, we prefer to provide raw data directly upon reasonable request of the corresponding author.
  
  What is the temporal frequency of the drifting grating? Was it fixed or the speed of the grating was fixed?
  
  This was fixed to 4 Hz and this is now included in the Methods.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Most of the measurements were based on the distance from the base of the spine neck, and "only on spines with measurable mitochondrial volume at each radius" were analyzed. To better understand the causality, it may also be interesting to have an analysis based on the distance from mitochondria. Would the result be different if the measurements are not 1µm / 5µm from spine but 1µm / 5µm from mitochondria? (e.g. total spine volume in 1µm / 5µm from mitochondria).
  
  In fact, our first iteration of this study focused on exactly this metric: measuring the distance to nearest mitochondria. However, after lengthy discussions between the authors, we ultimately decided this metric was inferior to a volumetric one. Our decision was based on several factors: (1) distance to mitochondrion is ill defined (e.g. distance to the a mitochondrion center or nearest membrane edge?), (2) the total amount of mitochondrial volume within a dendritic shaft should allow the greatest amount of energetic support (e.g. more cristae for ATP production, greater capacity for calcium buffering), and (3) we would not account for the geometry of individual mitochondria or their placement near a spine (e.g. when 2 different mitochondria are next to the same spine) We have added further clarification of our reasoning to the Results.
  
  Nonetheless, we present the reviewer some of our original analyses correlating distance to mitochondria (from the base of the spine and including the spine neck length):
  
  Author response image 1.
  
  Here, we examined the relationship to spine head volume, spine-soma orientation preference difference, and the local orientation preference heterogeneity. No relationship showed any significant correlation. Again, this may not be surprising given the drawbacks of measuring ‘distance to mitochondria’.
  
  Is there a selection criterion for the spine for the analysis? Are filopodia spines excluded in the analysis?
  
  Spines were analyzed regardless of structural classification; however, they were only analyzed if they had a synaptic density with synaptic vesicle accumulation. In our dataset (including those visualized in vivo and reconstructed from the EM volume) we observed no filopodia.
  
  The result states that "56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm.". In other words, around 43% of spines had mitochondria within 1 μm. It would be interesting to show whether there is a correlation between mitochondria size and spine density.
  
  We agree that this is an interesting measurement. It has been reported that mitochondrial unit length along the dendrite co-varies with linear synapse density in the neocortical distal dendrites of mice (Turner et al., 2022). This was specifically true in distal portions of dendrites more than 60 µm from the soma, because mitochondria volume increases as a function of distance roughly up to this point, then remains relatively constant beyond this distance.
  
  To investigate this possibility, we calculated the local spine density around an individual spine and compared to the mitochondria volume within 1 or 5 µm. We found no evidence of a correlation between local spine density and the volume of mitochondria (1 µm: Spearman r = -0.07447, p = 0.2859; 5 µm: r = -0.04447, p = 0.3141). However, the majority of our measurements are more proximal than 60 µm (our median distance of all spines = 49.4 µm, max = 114 µm) and this may be one reason why observe no correlation.
  
  In Figure 3B, the drifting grating directions are examined from 0 to 315 degrees in the experiment. However, in Figure 3C and 3D, the spine-soma difference of orientation preference was limited to 0 to 90 degree in the graph. Is the graph trimmed, or is there a cause that limits the spine-soma difference of orientation preference to 90?
  
  Ferret visual cortical neurons are highly sensitive to grating direction and the responses are fit by a double Gaussian curve which estimates the ‘orientation preference’ (0-180 deg). We then calculated the absolute difference in orientation preference and wrapped that value in circular space so the maximum difference possible is 90 deg (e.g. 135 deg -> 45 deg).
  
  In Figure 4D-F, how is the temporal correlation of calcium activity determined? Is it based on stimulated activity or basal activity? A brief explanation may be helpful to the readers. Also, scale bars could be added to Fig 4D.
  
  Temporal correlation is computed as the signal correlation between 2 spines over the entire imaging session at that field of view. Specifically, we measured the Pearson correlation between each spine’s ∆F/F trace. To measure the local spatiotemporal correlation, we computed correlations between all neighboring spines within 5 microns and took the average of those values. We have clarified this in the Results section.
  
  Figure 3C and Figure 4D displayed a significant correlation in 1µm range and such correlation drastically diminished once the criterion changed to 5µm range. It would be interesting to include the criterion of intermediate ranges. It would be interesting to see if there is a trend or tendency or if there is a "cut-off" limit.
  
  We agree with the reviewer that the drastic change in the correlations between 1 and 5 µm range was surprising to see. While these volumetric measurements are time consuming, we returned to our data and measured an intermediate point of 3 µm. Investigating relationships reported in our study, we found no significant trends for spine-soma similarity (Spearman’s r = -0.011, p = 0.54) or local heterogeneity (Spearman’s r = 0.11, p = 0.23). This suggests that a potential ‘critical distance’ might be less than 3 µm; however, far more additional measurements and analyses would be needed to attempt to identify exactly what this distance is.
  
  In Figure 5, it is shown that spines having mitochondrion in the head or neck are larger. However, only 10 spines are found with mitochondria inside. In the current dataset, are mitochondria abundantly found in large spines? Further analysis or justification would be informative to address this.
  
  In our dataset, mitochondria were found in ~5% of all spines. Spines with mitochondria have a median volume of approximately 0.6 µm3, roughly twice as large as than those without mitochondria, as the reviewer suggests. In the entire population of spines without mitochondria, a volume of 0.6 µm3 represents roughly the 82nd percentile. In other words, of the total population of 157 spines without mitochondria, only 29 had equal or greater volume than the median spine with a mitochondrion. We believe this trend is clearly shown in Figure 5A and is supported by our analysis, including new permutation tests suggested by Reviewer 1.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors state that their unsupervised method "quickly and accurately identified mitochondria," but the methods section only says that segmentations were proofread. Was every segmentation examined and judged to be accurate, or was only a subset of the 324 mitochondria checked?
  
  After deep learning-based extraction, each mitochondrion segmentation was manually proofread. For each dendrite segment, this was ~10-20 mitochondria, so it did not take long to manually inspect and edit each mitochondrion segmentation.
  
  The EM image of the mitochondrion in the spine head in Figure 2C is low resolution and does not apply to the bulk of the data. Images more representative of the analyzed data should be added to supplement the cartoons.
  
  Our primary rationale for including this specific image was to show that the mitochondria located within spines are small, round, and to include a view of the synapse as well as the mitochondrion. We now include enlarge and additional EM images to Figure 1C.
  
  The majority of spines did not have any mitochondria within a 1 micron radius and were excluded from the correlation analyses, so most of the conclusions are based on a minority of spines. It would be interesting to see comparisons between spines with and without nearby mitochondria. Correlations between the absolute distance to any mitochondrion, synapse size, and mismatch to soma orientation would be especially interesting.
  
  The reviewer brings up a good point. It is true that many spines were excluded from our analysis based on the fact that they did not have nearby mitochondria within 1 or 5 µm (56.8% of spines had no mitochondria volume within 1 μm and 12.1% of spines had none within 5 μm). We compared the distributions of synapse size, mismatch to soma, and orientation selectivity of two groups of spines – those with at least some mitochondria within 1 µm (n = 65) versus spines without any mitochondria within 5 µm (n = 19).
  
  We found no difference in the distributions between spine volume (1 µm: median = 0.29 µm3, IQR = 0.41 µm3; no mitochondria within 5 µm: median = 0.40 µm3, IQR = 0.37 µm3; p = 0.67) or PSD area (1 µm: median = 0.26 µm2, IQR = 0.33 µm2; no mitochondria within 5 µm: median = 0.31 µm2, IQR = 0.36 µm2; p = 0.49). For functional measures, we also saw no difference in orientation selectivity (1 µm: median = 0.29, IQR = 0.28; no mitochondria within 5 µm: median = 0.28, IQR = 0.15; p = 0.74) or mismatch to soma orientation (1 µm: median = 0.54 deg, IQR = 0.86 deg; no mitochondria within 5 µm: median = 0.46 deg, IQR = 0.47 deg; p = 0.75). We now include analyses in the Results.
  
  We also looked at the absolute distances to mitochondria and did not find any significant relationships to spine head volume, spine-soma orientation preference difference, or the local orientation preference heterogeneity (see our response to reviewer #2 for more information).
  
  In Figure 1A the mitochondria appear to be taking up a substantial fraction of the dendritic shaft diameter, even for distal dendrites. It would be useful to know the absolute diameter of the dendrites and mitochondria, given that this is not rodent data and there is no reference for either in the ferret.
  
  We agree with the reviewer’s point, although we would like to remind the reviewer that these are basal dendrites of layer 2/3 cells. Basal dendrites tend to be thinner than apical branches. Interestingly, in some cases, the dendrite even swells to accommodate a mitochondrion. We did not incorporate this measurement in our study because it is not trivial; dendrite diameter is variable and dendrites are not perfect cylinders. Although we did not make precise measurements across our dendrites, the diameter is comparable to what has been seen in mouse cortex (Turner et al., 2022), roughly 500-1000 nm, but as small as 100 nm at some pinch points. In terms of mitochondria, many were roughly spherical or oblong, therefore the maximum diameters we report are roughly similar to, if not a bit larger than, those of the cross-sectional diameter.
  
  As a rule, PSD area is correlated with spine volume, which makes the observation that spines with mitochondria have larger volume but not PSD area surprising. With n=10 it is difficult to draw conclusions, but it would be interesting to know the PSD area-to-volume ratio of other spines of the same volume and synapse size.
  
  We were also somewhat surprised to see this, but exactly as the reviewer mentioned, we believe it to be a limitation of the sample size. The difference in volume was large enough to be detected despite a small sample size. We saw a trend towards larger synapses when spines have mitochondria (the median was approximately 60% larger), and we would expect with a larger comparison that PSD area would be significantly greater in spines with mitochondria.
  
  We calculated the PSD area-to-spine head volume ratio for spines with or without mitochondria. Spines with mitochondria had a significantly lower ratio compared to those without (Mann-Whitney test, p = 0.0056, mito - = 0.78, n = 10; mito + = 0.53, n = 157). As the reviewer mentions, it is somewhat difficult to draw a conclusion from this, but it appears that the PSD does not scale with the increased spine head size.
  
  Author response image 2.
  
  The only way to definitively address this is to increase the sample size, which is becoming easier to achieve with the progression of volume EM imaging and analysis techniques in recent times. We look forward to addressing this in the future.
  
  Nothing is made of the significant fact that these data come from the visual system of a carnivore, not a mouse. Consideration of differences in visual physiology between rodents and carnivores would be worthwhile to put the function of these dendrites in context.
  
  We thank the reviewer for this consideration and have added text to the Discussion.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.14.549063v2
www.medrxiv.org www.medrxiv.org

Antigenic drift and subtype interference shape A(H3N2) epidemic dynamics in the United States

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor.
  
  Major Strengths:
  
  The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights.
  
  Major Weaknesses:
  
  While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.
  
  Reviewer #2 (Public Review):
  
  Summary: This paper aims to achieve a better understanding of how the antigenic or genetic compositions of the dominant influenza A viruses in circulation at a given time are related to key features of seasonal influenza epidemics in the US. To this end, the authors analyze an extensive dataset with a range of statistical, data science and machine learning methods. They find that the key drivers of influenza A epidemiological dynamics are interference between influenza A subtypes and genetic divergence, relative to the previous one or two seasons, in a broader range of antigenically related sites than previously thought.
  
  Strengths: A thorough investigation of a large and complex dataset.
  
  Weaknesses: The dataset covers a 21 year period which is substantial by epidemiological standards, but quite small from a statistical or machine learning perspective. In particular, it was not possible to follow the usual process and test predictive performance of the random forest model with an independent dataset.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. It's a strong paper representing a thorough and fascinating exploration of potential drivers, and it makes a trove of relevant data readily available to the community.
  
  Strengths:
  
  This paper makes links between epidemiological and evolutionary data for influenza. Placing each in the context of the other is crucial for understanding influenza dynamics and evolution and this paper does a thorough job of this, with many analyses and nuances. The results on the extent to which evolutionary factors relate to epidemic burden, and on interference among influenza types, are particularly interesting. The github repository associated with the paper is clear, comprehensive, and well-documented.
  
  Weaknesses:
  
  The format of the results section can be hard to follow, and we suggest improving readability by restructuring and simplifying in some areas. There are a range of choices made about data preparation and scaling; the authors could explore sensitivity of the results to some of these.
  
  Response to public reviews
  
  We appreciate the positive comments from the reviewers and have implemented or responded to all of the reviewers’ recommendations.
  
  In response to Reviewer 1, we expand on the potential drivers and biological implications of the findings pointed out in their specific recommendations. For example, we now explicitly mention that antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study. We note that, after the 2009 A(H1N1) pandemic, the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons is lower compared to A(H3N2) dominant seasons prior to 2009. We propose that the weakening of A(H3N2) predominance may be linked to the diversification of A(H3N2) viruses during the 2010s, wherein multiple antigenically distinct clades with similar fitness circulated in each season, as opposed to a single variant with high fitness.
  
  In response to Reviewer 2, we agree that it would be ideal and best practice to measure model performance with an independent test set, but our dataset includes only ~20 seasons. Predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. In the revised manuscript, we provide more justification and clarification of our methodology. Instead of testing model performance on an independent test set, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (Kuhn & Johnson, 2019).
  
  In response to Reviewer 3, we follow the reviewer’s advice to put the Methods section before the Results section. Concerning Reviewer 3’s question about the sensitivity of our results to data preparation and rescaling, we provide more justification and clarification of our methodology in the revised manuscript. In our study, we adjust influenza type/subtype incidences for differences in reporting between the pre- and post-2009 pandemic periods and across HHS regions. We adjust for differences in reporting between the pre- and post-2009 periods because the US CDC and WHO increased laboratory testing capacity in response to the 2009 A(H1N1) pandemic, which led to substantial, long-lasting improvements to influenza surveillance that are still in place today. Figure 1 - figure supplement 2 shows systematic increases in influenza test volume in all HHS regions after the 2009 pandemic. Given the substantial increase in test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results when adjusting for both pre- and post-2009 pandemic reporting and regional reporting versus only adjusting for the pre- and post-2009 pandemic reporting.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Specific comments:
  
  (1) Line 155-156. Request for a reference for: "Given that protective immunity wanes after 1-4 years"
  
  We now include two references (He et al. 2015 and Wraith et al. 2022), which were cited at the beginning of the introduction when referring to the duration of protective immunity for antigenically homologous viruses. (Lines 640-642 in revised manuscript)
  
  (2) Line 162-163: Request a further explanation of the negative correlation between seasonal diversity of HA and NA LBI values and NA epitope distance. Clarify biological implications to aid reader understanding.
  
  In the revised manuscript we expand on the biological implications of A(H3N2) virus populations characterized by high antigenic novelty and low LBI diversity.
  
  Lines 649-653:
  
  “The seasonal diversity of HA and NA LBI values was negatively correlated with NA epitope distance (Figure 2 – figure supplements 5 – 6), with high antigenic novelty coinciding with low genealogical diversity. This association suggests that selective sweeps tend to follow the emergence of drifted variants with high fitness, resulting in seasons dominated by a single A(H3N2) variant rather than multiple cocirculating clades.”
  
  (3) Figure S3 legend t-2 may be marked as t-1.
  
  Thank you for catching this. We have fixed this typo. Note: Figure S3 is now Figure 2 – figure supplement 5.
  
  (4) Lines 201-214. The key takeaways from the analysis of subtype dominance are ultimately not clear. It also misses the underlying dynamics that H3N2 predominance following an evolutionary change has waned since 2009.
  
  In the revised manuscript we elaborate on key takeaways concerning the relationship between antigenic drift and A(H3N2) dominance. We also add a caveat noting that A(H3N2) predominance is weaker during the post-2009 period, which may be linked to the diversification of A(H3N2) lineages after 2012. We do not know of a reference that links the diversification of A(H3N2) viruses in the 2010s to a particular evolutionary change. Therefore, we do not attribute the diversification of A(H3N2) viruses to a specific evolutionary change in A(H3N2) variants circulating at the time (A/Perth/16/2009-like strains (PE09)). Instead, we allude to the potential role of A(H3N2) diversification in creating multiple co-circulating lineages that may have less of a fitness advantage.
  
  Lines 681-703:
  
  “We explored whether evolutionary changes in A(H3N2) may predispose this subtype to dominate influenza virus circulation in a given season. A(H3N2) subtype dominance – the proportion of influenza positive samples typed as A(H3N2) – increased with H3 epitope distance (t – 2) (R2 = 0.32, P = 0.05) and N2 epitope distance (t – 1) (R2 = 0.34, P = 0.03) (regression results: Figure 4; Spearman correlations: Figure 3 – figure supplement 1). Figure 4 illustrates this relationship at the regional level across two seasons in which A(H3N2) was nationally dominant, but where antigenic change differed. In 2003-2004, we observed widespread dominance of A(H3N2) viruses after the emergence of the novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). In contrast, there was substantial regional heterogeneity in subtype circulation during 2007-2008, a season in which A(H3N2) viruses were antigenically similar to those circulating in the previous season. Patterns in type/subtype circulation across all influenza seasons in our study period are shown in Figure 4 – figure supplement 1. As observed for the 2003-2004 season, widespread A(H3N2) dominance tended to coincide with major antigenic transitions (e.g.,
  
  A/Sydney/5/1997 (SY97) seasons, 1997-1998 to 1999-2000; A/California/7/2004 (CA04) season, 20042005), though this was not universally the case (e.g., A/Perth/16/2009 (PE09) season, 2010-2011).
  
  After the 2009 A(H1N1) pandemic, A(H3N2) dominant seasons still occurred more frequently than A(H1N1) dominant seasons, but the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons was lower compared to A(H3N2) dominant seasons prior to 2009. Antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study (https://nextstrain.org/seasonal-
  
  flu/h3n2/ha/12y@2024-05-13) (Dhanasekaran et al., 2022; Huddleston et al., 2020; Yan et al., 2019). The decline in A(H3N2) predominance during the post-2009 period may be linked to the genetic and antigenic diversification of A(H3N2) viruses, wherein multiple lineages with similar fitness co-circulated in each season.”
  
  (5) Line 253-255: It would be beneficial to provide a more detailed interpretation of the statement that "pre-2009 seasonal A(H1N1) viruses may limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses." Elaborate on the cause-and-effect relationship within this statement.
  
  In the revised manuscript we suggest that seasonal A(H1N1) viruses may interfere with the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, because seasonal A(H1N1) viruses and A(H3N2) are more closely related, and thus may elicit stronger cross-reactive T cell responses.
  
  Lines 738-745:
  
  “The internal gene segments NS, M, NP, PA, and PB2 of A(H3N2) viruses and pre-2009 seasonal A(H1N1) viruses share a common ancestor (Webster et al., 1992) whereas A(H1N1)pdm09 viruses have a combination of gene segments derived from swine and avian reservoirs that were not reported prior to the 2009 pandemic (Garten et al., 2009; Smith et al., 2009). Non-glycoprotein genes are highly conserved between influenza A viruses and elicit cross-reactive antibody and T cell responses (Grebe et al., 2008; Sridhar, 2016). Because pre-2009 seasonal A(H1N1) viruses and A(H3N2) are more closely related, we hypothesized that seasonal A(H1N1) viruses could potentially limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, due to greater T cell-mediated cross-protective immunity.”
  
  (6) In the results section, many statements report statistical results of correlation analyses. Consider providing further interpretations of these results, such as the implications of nonsignificant correlations and how they support or contradict the hypothesis or previous studies. For example, the statement on line 248 regarding the lack of significant correlation between influenza B epidemic size and A(H3N2) epidemic metrics would benefit from additional discussion on what this non-significant correlation signifies and how it relates to the hypothesis or previous research.
  
  In the Discussion section, we suggest that the lack of an association between influenza B circulation and A(H3N2) epidemic metrics is due to few T and B cell epitopes shared between influenza A and B viruses (Terajima et al., 2013).
  
  Lines 1005-1007 in revised manuscript (Lines 513-515 in original manuscript):
  
  “Overall, we did not find any indication that influenza B incidence affects A(H3N2) epidemic burden or timing, which is not unexpected, given that few T and B cell epitopes are shared between the two virus types (Terajima et al., 2013).”
  
  Minor comments:
  
  (1) Line 116-122: Include a summary statistical description of all collected data sets, detailing the number of HA and NA sequence data and their sources. Briefly describe subsampled data sets, specifying preferences (e.g., the number of HA or NA sequence data collected from each region).
  
  In our revised manuscript we now include supplementary tables that summarize the number of A/H3 and
  
  A/N2 sequences in each subsampled dataset, aggregated by world region, for all seasons combined (Figure 2 - table supplements 1 - 2). We also include supplementary figures showing the number of sequences collected in each month and each season in North America versus the other nine world regions combined (Figure 2 - figure supplements 1 - 2). Subsampled datasets are plotted individually in the figures below but individual time series are difficult to discern due to minor differences in sequence counts across the datasets.
  
  (2) Figure 7A: Due to space limitations, consider rounding numbers on the x-axis to whole numbers for clarity.
  
  Thank you for this suggestion. In the revised manuscript we round numbers in the axes of Figure 7A (Figure 9A in the revised manuscript) so that the axes are less crowded.
  
  (3) Figure 4C & Figure 4D: Note that Region 10 (purple) data were unavailable for seasons before 2009 (lines 1483-1484). Label each region on the map with its respective region number (1 to 10) and indicate this in the legend for easy identification.
  
  In our original submission, the legend for Figure 4 included “Data for Region 10 (purple) were not available for seasons prior to 2009” at the end of the caption. We have moved this sentence, as well as other descriptions that apply to both C and D, so that they follow the sentence “C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant.”
  
  In our revised manuscript, Figure 4, and Figure 4 - figure supplement 1 (Figure S10 in original submission) include labels for each HHS region.
  
  We did not receive specific recommendations from Reviewer #2. However, our responses to Reviewer #3 addresses the study’s weaknesses mentioned by Reviewer #2.
  
  Reviewer #3 (Recommendations For The Authors):
  
  This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics.
  
  This is a work horse of paper, in the volumes of data that are analyzed and the extensive analysis that is done. The data that are provided are a treasure trove resource for influenza modelers and for anyone interested in seeing influenza surveillance data in the context of evolution, and evolutionary information in the context of epidemiology.
  
  L53 - end of sentence "and antigenic drift": not sure this fits, explain? I thought this sentence was in contrast to antigenic drift.
  
  Thank you for catching this. We did not intend to include “and antigenic drift” at the end of this sentence and have removed it (Line 59).
  
  Para around L115: would using primarily US data be a limitation, because it's global immunity that shapes success of strains? Or, how much does each country's immunity and vaccination and so on actually shape what strains succeed there, compared to global/international factors?
  
  The HA and NA phylogenetic trees in our study are enriched with US sequences because our study focuses on epidemiological dynamics in the US, and we wanted to prioritize A(H3N2) viruses that the US human population encountered in each season. We agree with the reviewer that the world population may be the right scale to understand how immunity, acquired by vaccination or natural infection, may shape the emergence and success of new lineages that will go on to circulate globally. However, our study assesses the overall impact of antigenic drift on regional A(H3N2) epidemic dynamics in the US. In other words, our driving question is whether we can predict the population-level impact of an A(H3N2) variant in the US, conditional on this particular lineage having established in the US and circulating at relatively high levels. We do not assess the global or population-level factors that may influence which A(H3N2) virus lineages are successful in a given location or season.
  
  We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.
  
  Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”
  
  In the Results section, I found the format hard to follow, because of the extensive methodological details, numbers with CIs and long sentences. Sentences sometimes included the question, definitions of variables, and lists. For example at line 215 we have: "Next, we tested for associations between A(H3N2) evolution and epidemic timing, including onset week, defined as the winter changepoint in incidence [16], and peak week, defined as the first week of maximum incidence; spatiotemporal synchrony, measured as the variation (standard deviation, s.d.) in regional onset and peak timing; and epidemic speed, including seasonal duration and the number of weeks from onset to peak (Table 2, Figure S11)". I would suggest putting the methods section first, using shorter sentences, separating lists from the question being asked, and stating what was found without also putting in all the extra detail. Putting the methods section before the results might reduce the sense that you have to explain what you did and how in the results section too.
  
  Thank you for suggesting how to improve the readability of the Results section. In the revised manuscript, we follow the reviewer’s advice to put the Methods section before the Results section. Although eLife formatting requirements specify the order: Introduction, Results, Discussion, and Methods, the journal allows for the Methods section to follow the Introduction when it makes sense to do so. We agree with the reviewer that putting the Methods section before the Results section makes our results easier to follow because we no longer need to introduce methodological details at the beginning of each set of results.
  
  L285 in the RF you remove variables without significant correlations with the target variables, but isn't one of the aims of RF to uncover relationships where a correlation might not be evident, and in part to reveal combinations of features that give the targeted outcome? Also with the RF, I am a bit concerned that you could not use the leave-one-out approach because it was "unstable" - presumably that means that you obtain quite different results if you leave out a season. How robust are these results, and what are the most sensitive aspects? Are the same variables typically high in importance if you leave out a season, for example? What does the scatterplot of observed vs predicted epidemic size (as in Fig 7) look like if each prediction is for the one that was left out (i.e. from a model trained on all the rest)? In my experience, where the RF is "unstable", that can look pretty terrible even if the model trained on all the data looks great (as does Figure 7). In any case I think it's worth discussing sensitivity.
  
  (1) In response to the reviewer’s first question, we explain our rationale for not including all candidate predictors in random forest and penalized regression models.
  
  Models trained with different combinations of predictors can have similar performance, and these combinations of predictors can include variables that do not necessarily have strong univariate associations with the target variable. The performance of random forest and LASSO regression models are not sensitive to redundant or irrelevant predictors (see Figure 10.2 in Kuhn & Johnson, 2019). However, if our goal is variable selection rather than strictly model performance, it is considered best practice to remove collinear, redundant, and/or irrelevant variables prior to training models (see section 11.3 in Kuhn & Johnson, 2019). In both random forest and LASSO regression models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection. In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores. Thus, failing to minimize multicollinearity prior to model training could result in some variables having low rankings and the appearance of being unimportant, because their importance scores are overshadowed by those of the highly correlated variables. Our rationale for preprocessing predictor data follows the philosophy of Kuhn & Johnson, 2019, who recommend including the minimum possible set of variables that does not compromise model performance. Even if a particular model is insensitive to extra predictors, Kuhn and John explain that “removing predictors can reduce the cost of acquiring data or improve the throughput of the software used to make predictions.”
  
  In the revised manuscript, we include more details about our steps for preprocessing predictor data. We also follow the reviewer’s suggestion to include all evolutionary predictors in variable selection analyses, regardless of whether they have strong univariate correlations with target outcomes, because the performance of random forest and LASSO regression models is not affected by redundant predictors.
  
  Including additional predictors in our variable selection analyses does not change our conclusions. As reported in our original manuscript, predictors with strong univariate correlations with various epidemic metrics were the highest ranked features in both random forest and LASSO regression models.
  
  Lines 523-563:
  
  “Preprocessing of predictor data: The starting set of candidate predictors included all viral fitness metrics: genetic and antigenic distances between current and previously circulating strains and the standard deviation and Shannon diversity of H3 and N2 LBI values in the current season. To account for potential type or subtype interference, we included A(H1N1) or A(H1N1)pdm09 epidemic size and B epidemic size in the current and prior season and the dominant IAV subtype in the prior season (Lee et al., 2018). We included A(H3N2) epidemic size in the prior season as a proxy for prior natural immunity to A(H3N2). To account for vaccine-induced immunity, we considered four categories of predictors and included estimates for the current and prior seasons: national vaccination coverage among adults (18-49 years coverage × ≥ 65 years coverage), adjusted A(H3N2) vaccine effectiveness (VE), a combined metric of vaccination coverage and A(H3N2) VE (18-49 years coverage × ≥ 65 years coverage × VE), and H3 and N2 epitope distances between naturally circulating A(H3N2) viruses and the U.S. A(H3N2) vaccine strain in each season. We could not include a predictor for vaccination coverage in children or consider cladespecific VE estimates, because these data were not available for most seasons in our study.
  
  Random forest and LASSO regression models are not sensitive to redundant (highly collinear) features (Kuhn & Johnson, 2019), but we chose to downsize the original set of candidate predictors to minimize the impact of multicollinearity on variable importance scores. For both types of models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection (Kuhn & Johnson, 2019). In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores (Kuhn & Johnson, 2019). We first confirmed that none of the candidate predictors had zero variance or near-zero variance. Because seasonal lags of each viral fitness metric are highly collinear, we included only one lag of each evolutionary predictor, with a preference for the lag that had the strongest univariate correlations with various epidemic metrics. We checked for multicollinearity among the remaining predictors by examining Spearman’s rank correlation coefficients between all pairs of predictors. If a particular pair of predictors was highly correlated (Spearman’s 𝜌 > 0.8), we retained only one predictor from that pair, with a preference for the predictor that had the strongest univariate correlations with various epidemic metrics. Lastly, we performed QR decomposition of the matrix of remaining predictors to determine if the matrix is full rank and identify sets of columns involved in linear dependencies. This step did not eliminate any additional predictors, given that we had already removed pairs of highly collinear variables based on Spearman correlation coefficients.
  
  After these preprocessing steps, our final set of model predictors included 21 variables, including 8 viral evolutionary indicators: H3 epitope distance (t – 2), HI log2 titer distance (t – 2), H3 RBS distance (t – 2), H3 non-epitope distance (t – 2), N2 epitope distance (t – 1), N2 non-epitope distance (t – 1), and H3 and N2 LBI diversity (s.d.) in the current season; 6 proxies for type/subtype interference and prior immunity:
  
  A(H1N1) and B epidemic sizes in the current and prior season, A(H3N2) epidemic size in the prior season, and the dominant IAV subtype in the prior season; and 7 proxies for vaccine-induced immunity: A(H3N2) VE in the current and prior season, H3 and N2 epitope distances between circulating strains and the vaccine strain in each season, the combined metric of adult vaccination coverage × VE in the current and prior season, and adult vaccination coverage in the prior season.”
  
  (2) Next, we clarify our model training methodology to address the reviewer’s second point about using a leave-one-out cross-validation approach.
  
  We believe the reviewer is mistaken; we use a leave-one-season-out validation approach which lends some robustness to the predictions. In our original submission, we stated “We created each forest by generating 3,000 regression trees from 10 repeats of a leave-one-season-out (jackknife) cross-validated sample of the data. Due to the small size of our dataset, evaluating the predictive accuracy of random forest models on a quasi-independent test set produced unstable estimates.” (Lines 813-816 in the original manuscript)
  
  To clarify, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (see Section 3.4 in Kuhn & Johnson, 2019). To reduce noise, we generated 10 bootstrap resamples of each fold and averaged the RMSE and R2 values of model predictions from resamples.
  
  Although it would be ideal and best practice to measure model performance with an independent test set, our dataset includes only ~20 seasons. We found that predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. Further, we suspect that large antigenic jumps in a small subset of seasons further contribute to variation in prediction accuracy across randomly selected test sets. Our rationale for using cross-validation instead of an independent test set is best described in Section 4.3 of Kuhn and Johnson’s book “Applied Predictive Modeling” (Kuhn & Johnson, 2013):
  
  “When the number of samples is not large, a strong case can be made that a test set should be avoided because every sample may be needed for model building. Additionally, the size of the test set may not have sufficient power or precision to make reasonable judgements. Several researchers (Molinaro 2005; Martin and Hirschberg 1996; Hawkins et al. 2003) show that validation using a single test set can be a poor choice. Hawkins et al. (2003) concisely summarize this point: “holdout samples of tolerable size [...] do not match the cross-validation itself for reliability in assessing model fit and are hard to motivate. “Resampling methods, such as cross-validation, can be used to produce appropriate estimates of model performance using the training set. These are discussed in length in Sect.4.4. Although resampling techniques can be misapplied, such as the example shown in Ambroise and McLachlan (2002), they often produce performance estimates superior to a single test set because they evaluate many alternate versions of the data.”
  
  In our revised manuscript, we provide additional clarification of our methods (Lines 574-590):
  
  “We created each forest by generating 3,000 regression trees. To determine the best performing model for each epidemic metric, we used leave-one-season-out (jackknife) cross-validation to train models and measure model performance, wherein each “assessment” set is one season of data predicted by the model, and the corresponding “analysis” set contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of each model (Kuhn & Johnson, 2019). Due to the small size of our dataset (~20 seasons), evaluating the predictive accuracy of random forest models on a quasi-independent test set of 2-3 seasons produced unstable estimates. Instead of testing model performance on an independent test set, we generated 10 bootstrap resamples (“repeats”) of each analysis set (“fold”) and averaged the predictions of models trained on resamples (Kuhn & Johnson, 2013, 2019). For each epidemic metric, we report the mean root mean squared error (RMSE) and R2 of predictions from the best tuned model. We used permutation importance (N = 50 permutations) to estimate the relative importance of each predictor in determining target outcomes. Permutation importance is the decrease in prediction accuracy when a single feature (predictor) is randomly permuted, with larger values indicating more important variables. Because many features were collinear, we used conditional permutation importance to compute feature importance scores, rather than the standard marginal procedure (Altmann et al., 2010; Debeer & Strobl, 2020; Strobl et al., 2008; Strobl et al., 2007).”
  
  (3) In response to the reviewer’s question about the sensitivity of results when one season is left out, we clarify that the variable importance scores in Figure 8 and model predictions in Figure 9 were generated by models tuned using leave-one-season-out cross-validation.
  
  As explained above, in our leave-one-season-out cross-validation approach, each “assessment” set contains one season of data predicted by the model, and the corresponding “analysis” set (“fold”) contains the remaining seasons. We generated predictions of epidemic metrics and variable importance rankings by averaging the model output of 10 bootstrap resamples of each cross-validation fold.
  
  In Lines 791-806, we describe which epidemic metrics have the highest prediction accuracy and report that random forest models tend to underpredict most epidemic metrics in seasons with high antigenic novelty:
  
  “We measured correlations between observed values and model-predicted values at the HHS region level. Among the various epidemic metrics, random forest models produced the most accurate predictions of A(H3N2) subtype dominance (Spearman’s 𝜌 = 0.95, regional range = 0.85 – 0.97), peak incidence (𝜌 = 0.91, regional range = 0.72 – 0.95), and epidemic size (𝜌 = 0.9, regional range = 0.74 – 0.95), while predictions of effective 𝑅! and epidemic intensity were less accurate (𝜌 = 0.81, regional range = 0.65 – 0.91; 𝜌 = 0.78, regional range = 0.63 – 0.92, respectively) (Figure 9). Random forest models tended to underpredict most epidemic targets in seasons with substantial H3 antigenic transitions, in particular the SY97 cluster seasons (1998-1999, 1999-2000) and the FU02 cluster season (2003-2004) (Figure 9).
  
  For epidemic size and peak incidence, seasonal predictive error – the root-mean-square error (RMSE) across all regional predictions in a season – increased with H3 epitope distance (epidemic size, Spearman’s 𝜌 = 0.51, P = 0.02; peak incidence, 𝜌 = 0.63, P = 0.004) and N2 epitope distance (epidemic size, 𝜌 = 0.48, P = 0.04; peak incidence, 𝜌 = 0.48, P = 0.03) (Figure 9 – figure supplements 1 – 2). For models of epidemic intensity, seasonal RMSE increased with N2 epitope distance (𝜌 = 0.64, P = 0.004) but not H3 epitope distance (𝜌 = 0.06, P = 0.8) (Figure 9 – figure supplements 1 – 2). Seasonal RMSE of effective 𝑅! and subtype dominance predictions did not correlate with H3 or N2 epitope distance (Figure 9 – figure supplements 1 – 2).”
  
  I think the competition (interference) results are really interesting, perhaps among the most interesting aspects of this work.
  
  Thank you! We agree that our finding that subtype interference has a greater impact than viral evolution on A(H3N2) epidemics is one of the more interesting results in the study.
  
  Have you seen the paper by Barrat-Charlaix et al? They found that LBI was not good predicting frequency dynamics (see https://pubmed.ncbi.nlm.nih.gov/33749787/); instead, LBI was high for sequences like the consensus sequence, which was near to future strains. LBI also was not positively correlated with epidemic impact in Figure S7.
  
  The local branching index (LBI) measures the rate of recent phylogenetic branching and approximates relative fitness among viral clades, with high LBI values representing greater fitness (Neher et al. 2014).
  
  Two of this study’s co-authors (John Huddleston and Trevor Bedford) are also co-authors of BarratCharlaix et al. 2021. Barrat-Charlaix et al. 2021 assessed the performance of LBI in predicting the frequency dynamics and fixation of individual amino acid substitutions in A(H3N2) viruses. Our study is not focused on predicting the future success of A(H3N2) clades or the frequency dynamics or probability of fixation of individual substitutions. Instead, we use the standard deviation and Shannon diversity of LBI values in each season as a proxy for genealogical (clade-level) diversity. We find that, at a seasonal level, low diversity of H3 or N2 LBI values in the current season correlates with greater epidemic intensity, higher transmission rates, and shorter seasonal duration.
  
  In the Discussion we provide an explanation for these correlation results (Lines 848-857):
  
  “The local branching index (LBI) is traditionally used to predict the success of individual clades, with high LBI values indicating high viral fitness (Huddleston et al., 2020; Neher et al., 2014). In our epidemiological analysis, low diversity of H3 or N2 LBI in the current season correlated with greater epidemic intensity, higher transmission rates, and shorter seasonal duration. These associations suggest that low LBI diversity is indicative of a rapid selective sweep by one successful clade, while high LBI diversity is indicative of multiple co-circulating clades with variable seeding and establishment times over the course of an epidemic. A caveat is that LBI estimation is more sensitive to sequence sub-sampling schemes than strain-level measures. If an epidemic is short and intense (e.g., 1-2 months), a phylogenetic tree with our sub-sampling scheme (50 sequences per month) may not incorporate enough sequences to capture the true diversity of LBI values in that season.”
  
  Figure 1 - LBI goes up over time. Is that partly to do with sampling? Overall how do higher sampling volumes in later years impact this analysis? (though you choose a fixed number of sequences so I guess you downsample to cope with that). I note that LBI is likely to be sensitive to sequencing density.
  
  Thank you for pointing this out. We realized that increasing LBI Shannon diversity over the course of the study period was indeed an artefact of increasing sequence volume over time. Our sequence subsampling scheme involves selecting a random sample of up to 50 viruses per month, with up to 25 viruses selected from North America (if available) and the remaining sequences evenly divided across nine other global regions. In early seasons of the study (late 1990s/early 2000s), sampling was often too sparse to meet the 25 viruses/month threshold for North America or for the other global regions combined (H3: Figure 2 - figure supplement 1; N2: Figure 2 - figure supplement 2). Ecological diversity metrics are sensitive to sample size, which explains why LBI Shannon diversity appeared to steadily increase over time in our original submission. In our revised manuscript, we correct for uneven sample sizes across seasons before estimating Shannon diversity and clarify our methodology.
  
  Lines 443-482:
  
  “Clade growth: The local branching index (LBI) measures the relative fitness of co-circulating clades, with high LBI values indicating recent rapid phylogenetic branching (Huddleston et al., 2020; Neher et al., 2014). To calculate LBI for each H3 and N2 sequence, we applied the LBI heuristic algorithm as originally described by Neher et al., 2014 to H3 and N2 phylogenetic trees, respectively. We set the neighborhood parameter 𝜏 to 0.4 and only considered viruses sampled between the current season 𝑡 and the previous season 𝑡 – 1 as contributing to recent clade growth in the current season 𝑡.
  
  Variation in the phylogenetic branching rates of co-circulating A(H3N2) clades may affect the magnitude, intensity, onset, or duration of seasonal epidemics. For example, we expected that seasons dominated by a single variant with high fitness might have different epidemiological dynamics than seasons with multiple co-circulating clades with varying seeding and establishment times. We measured the diversity of clade growth rates of viruses circulating in each season by measuring the standard deviation (s.d.) and Shannon diversity of LBI values in each season. Given that LBI measures relative fitness among cocirculating clades, we did not compare overall clade growth rates (e.g., mean LBI) across seasons.
  
  Each season’s distribution of LBI values is right-skewed and does not follow a normal distribution. We therefore bootstrapped the LBI values of each season in each replicate dataset 1000 times (1000 samples with replacement) and estimated the seasonal standard deviation of LBI from resamples, rather than directly from observed LBI values. We also tested the seasonal standard deviation of LBI from log transformed LBI values, which produced qualitatively equivalent results to bootstrapped LBI values in downstream analyses.
  
  As an alternative measure of seasonal LBI diversity, we binned raw H3 and N2 LBI values into categories based on their integer values (e.g., an LBI value of 0.5 is assigned to the (0,1] bin) and estimated the exponential of the Shannon entropy (Shannon diversity) of LBI categories (Hill, 1973; Shannon, 1948). The Shannon diversity of LBI considers both the richness and relative abundance of viral clades with different growth rates in each season and is calculated as follows:
  
  where 𝑞 𝐷 is the effective number of categories or Hill numbers of order 𝑞 (here, clades with different growth rates), with 𝑞 defining the sensitivity of the true diversity to rare versus abundant categories (Hill,
  
  1973). exp is the exponential function, 𝑝# is the proportion of LBI values belonging to the 𝑖th category, and 𝑅 is richness (the total number of categories). Shannon diversity 1𝐷 (𝑞 = 1) estimates the effective number of categories in an assemblage using the geometric mean of their proportional abundances 𝑝# (Hill, 1973).
  
  Because ecological diversity metrics are sensitive to sampling effort, we rarefied H3 and N2 sequence datasets prior to estimating Shannon diversity so that seasons had the same sample size. For each season in each replicate dataset, we constructed rarefaction and extrapolation curves of LBI Shannon diversity and extracted the Shannon diversity estimate of the sample size that was twice the size of the reference sample size (the smallest number of sequences obtained in any season during the study) (iNEXT R package) (Chao et al., 2014). Chao et al. found that their diversity estimators work well for rarefaction and short-range extrapolation when the extrapolated sample size is up to twice the reference sample size. For H3, we estimated seasonal diversity using replicate datasets subsampled to 360 sequences/season; For N2, datasets were subsampled to 230 sequences/season.”
  
  Estimating the Shannon diversity of LBI from datasets with even sampling across seasons removes the previous secular trend of increasing LBI diversity over time (Figure 2 in revised manuscript).
  
  Figure 3 - I wondered what about the co-dominant times?
  
  In Figure 3, orange points correspond to seasons in which A(H3N2) and A(H1N1) were codominant. We are not sure of the reviewer’s specific question concerning codominant seasons, but if it concerns whether antigenic drift is linked to epidemic magnitude among codominant seasons alone, we cannot perform separate regression analyses for these seasons because there are only two codominant seasons during the 22 season study period.
  
  Figure 4 - Related to drift and epidemic size, dominance, etc. -- when is drift measured, and (if it's measured in season t), would larger populations create more drift, simply by having access to more opportunity (via a larger viral population size)? This is a bit 'devil's advocate' but what if some epidemiological/behavioural process causes a larger and/or later peak, and those gave rise to higher drift?
  
  Seasonal drift is measured as the genetic or antigenic distance between viruses circulating during season t and viruses circulating in the prior season (𝑡 – 1) or two seasons ago (𝑡 – 2).
  
  Concerning the question about whether larger human populations lead to greater rates of antigenic drift, phylogeographic studies have repeatedly found that East-South-Southeast Asia are the source populations for A(H3N2) viruses (Bedford et al., 2015; Lemey et al., 2014), in part because these regions have tropical or subtropical climates and larger human populations, which enable year-round circulation and higher background infection rates. Larger viral populations (via larger host population sizes) and uninterrupted transmission may increase the efficiency of selection and the probability of strain survival and global spread (Wen et al., 2016). After A(H3N2) variants emerge in East-South-Southeast Asia and spread to other parts of the world, A(H3N2) viruses circulate via overlapping epidemics rather than local persistence (Bedford et al., 2015; Rambaut et al., 2008). Each season, A(H3N2) outbreaks in the US (and other temperate regions) are seeded by case importations from outside the US, genetic diversity peaks during the winter, and a strong genetic bottleneck typically occurs at the end of the season (Rambaut et al., 2008).
  
  Due to their faster rates of antigenic evolution, A(H3N2) viruses undergo more rapid clade turnover and dissemination than A(H1N1) and B viruses, despite similar global migration networks across A(H3N2), A(H1N1), and B viruses (Bedford et al., 2015). Bedford et al. speculate that there is typically little geographic differentiation in A(H3N2) viruses circulating in each season because A(H3N2) viruses tend to infect adults, and adults are more mobile than children. Compared to A(H3N2) viruses, A(H1N1) and B viruses tend to have greater genealogical diversity, geographic differentiation, and longer local persistence times (Bedford et al., 2015; Rambaut et al., 2008). Thus, some A(H1N1) and B epidemics are reseeded by viruses that have persisted locally since prior epidemics (Bedford et al., 2015).
  
  Theoretical models have shown that epidemiological processes can influence rates of antigenic evolution (Recker et al., 2007; Wen et al., 2016; Zinder et al., 2013), though the impact of flu epidemiology on viral evolution is likely constrained by the virus’s intrinsic mutation rate.
  
  In conclusion, larger host population sizes and flu epidemiology can indeed influence rates of antigenic evolution. However, given that our study is US-centric and focuses on A(H3N2) viruses, these factors are likely not at play in our study, due to intrinsic biological characteristics of A(H3N2) viruses and the geographic location of our study.
  
  We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.
  
  Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”
  
  Methods --
  
  L 620 about rescaling and pre- vs post-pandemic times : tell us more - how has reporting changed? could any of this not be because of reporting but because of NPIs or otherwise? Overall there is a lot of rescaling going on. How sensitive are the results to it?
  
  it would be unreasonable to ask for a sensitivity analysis for all the results for all the choices around data preparation, but some idea where there is a reason to think there might be a dependence on one of these choices would be great.
  
  In response to the 2009 A(H1N1) pandemic, the US CDC and WHO increased laboratory testing capacity and strengthened epidemiological networks, leading to substantial, long-lasting improvements to influenza surveillance that are still in place today (https://www.cdc.gov/flu/weekly/overview.htm). At the beginning of the COVID-19 pandemic, influenza surveillance networks were quickly adapted to detect and understand the spread of SARS-CoV-2. The 2009 pandemic occurred over a time span of less than one year, and strict non-pharmaceutical interventions (NPIs), such as lockdowns and mask mandates, were not implemented. Thus, we attribute increases in test volume during the post-2009 period to improved virologic surveillance and laboratory testing capacity rather than changes in care-seeking behavior. In the revised manuscript, we include a figure (Figure 1 - figure supplement 2) that shows systematic increases in test volume in all HHS regions after the 2009 pandemic.
  
  Given the substantial increase in influenza test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various
  
  A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results for Spearman correlations and regression models, when adjusting for the pre- and post-2009 pandemic time periods and regional reporting versus only adjusting for the pre-/post-2009 pandemic time periods. Below, we share adjusted versions of Figure 3 (regression results) and Figure 3 - figure supplement 1 (Spearman correlations). Each figure only adjusts for differences in pre- and post-2009 pandemic reporting.
  
  Author response image 1.
  
  Adjustment for pre- and post-2009 pandemic only
  
  Author response image 2.
  
  Adjustment for pre- and post-2009 pandemic only
  
  L635 - Why discretize the continuous LBI distribution and then use Shannon entropy when you could just use the variance and/or higher moments? (or quantiles)? Similarly, why not use the duration of the peak, rather than Shannon entropy? (though there, because presumably data are already binned weekly, and using duration would involve defining start and stop times, it's more natural than with LBI)
  
  We realize that we failed to mention in the methods that we calculated the standard deviation of LBI in each season, in addition to the exponential of the Shannon entropy (Shannon diversity) of LBI. Both the Shannon diversity of LBI values and the standard deviation of LBI values were negatively correlated with effective Rt and epidemic intensity and positively correlated with seasonal duration. The two measures were similarly correlated with effective Rt and epidemic intensity (Figure 3 - figure supplements 2 - 3), while the Shannon diversity of LBI had slightly stronger correlations with seasonal duration than s.d. LBI (Figure 5). Thus, both measures of LBI diversity appear to capture potentially biologically important heterogeneities in clade growth rates.
  
  Separately, we use the inverse Shannon entropy of the incidence distribution to measure the spread of an A(H3N2) epidemic during the season, following the methods of Dalziel et al. 2018. The peak of an epidemic is a single time point at which the maximum incidence occurs. We have not encountered “the duration of the peak” before in epidemiology terminology, and, to our knowledge, there is not a robust way to measure the “duration of a peak,” unless one were to measure the time span between multiple points of maximum incidence or designate an arbitrary threshold for peak incidence that is not strictly the maximum incidence. Given that Shannon entropy is based on the normalized incidence distribution over the course of the entire influenza season (week 40 to week 20), it does not require designating an arbitrary threshold to describe epidemic intensity.
  
  L642 - again why normalize epidemic intensities, and how sensitive are the results to this? I would imagine given that the RF results were unstable under leave-one-out analysis that some of those results could be quite sensitive to choices of normalization and scaling.
  
  Epidemic intensity, defined as the inverse Shannon entropy of the incidence distribution, measures the spread of influenza cases across the weeks in a season. Following Dalziel et al. 2018, we estimated epidemic intensity from normalized incidence distributions rather than raw incidences so that epidemic intensity is invariant under differences in reporting rates and/or attack rates across regions and seasons. If we were to use raw incidences instead, HHS regions or seasons could have the appearance of greater or lower epidemic intensity (i.e., incidence concentrated within a few weeks or spread out over several weeks), due to differences in attack rates or test volume, rather than fundamental differences in the shapes of their epidemic curves. In other words, epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season.
  
  In the methods section, we provide further clarification for why epidemic intensities are based on normalized incidence distributions rather than raw incidences.
  
  Lines 206-209: “Epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season. Following the methodology of Dalziel et al. 2018, epidemic intensity values were normalized to fall between 0 and 1 so that epidemic intensity is invariant to differences in reporting rates and/or attack rates across regions and seasons.”
  
  L643 - more information about what goes into Epidemia (variables, priors) such that it's replicable/understandable without the code would be good.
  
  We now include additional information concerning the epidemic models used to estimate Rt, including all model equations, variables, and priors (Lines 210-276 in Methods).
  
  L667 did you do breakpoint detection? Why linear models? Was log(incidence) used?
  
  In our original submission, we estimated epidemic onsets using piecewise regression models (Lines 666674 in original manuscript), which model non-linear relationships with breakpoints by iteratively fitting linear models (Muggeo, 2003). Piecewise regression falls under the umbrella of parametric methods for breakpoint detection.
  
  We did not include results from linear models fit to log(incidence) or GLMs with Gaussian error distributions and log links, due to two reasons. First, models fit to log-transformed data require non-zero values as inputs. Although breakpoint detection does not necessarily require weeks of zero incidence leading up to the start of an outbreak, limiting the time period for breakpoint detection to weeks with nonzero incidence (so that we could use log transformed incidence) substantially pushed back previous more biologically plausible estimates of epidemic onset weeks. Second, as an alternative to limiting the dataset to weeks with non-zero incidence, we tried adding a small positive number to weekly incidences so that we could fit models to log transformed incidence for the whole time period spanning epidemic week 40 (the start of the influenza season) to the first week of maximum incidence. Fitting models to log
  
  transformed incidences produced unrealistic breakpoint locations, potentially because log transformations 1) linearize data, and 2) stabilize variance by reducing the impact of extreme values. Due to the short time span used for breakpoint detection, log transforming incidence diminishes abrupt changes in incidence at the beginning of outbreaks, making it difficult for models to estimate biologically plausible breakpoint locations. Log transformations of incidence may be more useful when analyzing time series spanning multiple seasons, rather than short time spans with sharp changes in incidence (i.e., the exponential growth phase of a single flu outbreak).
  
  As an alternative to piecewise regression, our revised manuscript also estimates epidemic onsets using a Bayesian ensemble algorithm that accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (BEAST - a Bayesian estimator of Abrupt change, Seasonal change, and Trend; Zhao et al., 2019). Although a few regional onset time times differed across the two methods, our conclusions did not change concerning correlations between viral fitness and epidemic onset timing.
  
  We have rewritten the methods section for estimating epidemic onsets to clarify our methodology and to include the BEAST method (Lines 292-308):
  
  “We estimated the regional onsets of A(H3N2) virus epidemics by detecting breakpoints in A(H3N2) incidence curves at the beginning of each season. The timing of the breakpoint in incidence represents epidemic establishment (i.e., sustained transmission) rather than the timing of influenza introduction or arrival (Charu et al., 2017). We used two methods to estimate epidemic onsets: 1) piecewise regression, which models non-linear relationships with break points by iteratively fitting linear models to each segment (segmented R package) (Muggeo, 2008; Muggeo, 2003), and 2) a Bayesian ensemble algorithm (BEAST – a Bayesian estimator of Abrupt change, Seasonal change, and Trend) that explicitly accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (Rbeast R package) (Zhao et al., 2019). For each region in each season, we limited the time period of breakpoint detection to epidemic week 40 to the first week of maximum incidence and did not estimate epidemic onsets for regions with insufficient signal, which we defined as fewer than three weeks of consecutive incidence and/or greater than 30% of weeks with missing data. We successfully estimated A(H3N2) onset timing for most seasons, except for three A(H1N1) dominant seasons: 20002001 (0 regions), 2002-2003 (3 regions), and 2009-2010 (0 regions). Estimates of epidemic onset weeks were similar when using piecewise regression versus the BEAST method, and downstream analyses of correlations between viral fitness indicators and onset timing produced equivalent results. We therefore report results from onsets estimated via piecewise regression.”
  
  L773 national indicators -- presumably this is because you don't have regional-level information, but it might be worth saying that earlier so it doesn't read like there are other indicators now, called national indicators, that we should have heard of
  
  In the revised manuscript, we move a paragraph that was at the beginning of the Results to the beginning of the Methods.
  
  Lines 123-132:
  
  “Our study focuses on the impact of A(H3N2) virus evolution on seasonal epidemics from seasons 19971998 to 2018-2019 in the U.S.; whenever possible, we make use of regionally disaggregated indicators and analyses. We start by identifying multiple indicators of influenza evolution each season based on changes in HA and NA. Next, we compile influenza virus subtype-specific incidence time series for U.S. Department of Health and Human Service (HHS) regions and estimate multiple indicators characterizing influenza A(H3N2) epidemic dynamics each season, including epidemic burden, severity, type/subtype dominance, timing, and the age distribution of cases. We then assess univariate relationships between national indicators of evolution and regional epidemic characteristics. Lastly, we use multivariable regression models and random forest models to measure the relative importance of viral evolution, heterosubtypic interference, and prior immunity in predicting regional A(H3N2) epidemic dynamics.”
  
  In Lines 484-487 in the Methods, we now mention that measures of seasonal antigenic and genetic distance are at the national level.
  
  “For each replicate dataset, we estimated national-level genetic and antigenic distances between influenza viruses circulating in consecutive seasons by calculating the mean distance between viruses circulating in the current season 𝑡 and viruses circulating during the prior season (𝑡 – 1 year; one season lag) or two prior seasons ago (𝑡 – 2 years; two season lag).”
  
  L782 Why Beta regression and what is "the resampled dataset" ?
  
  Beta regression is appropriate for models of subtype dominance, epidemic intensity, and age-specific proportions of ILI cases because these data are continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). “The resampled dataset” refers to the “1000 bootstrap replicates of the original dataset (1000 samples with replacement)” mentioned in Lines 777-778 of the original manuscript.
  
  In the revised manuscript, we include more background information about Beta regression models, and explicitly mention that regression models were fit to 1000 bootstrap replicates of the original dataset.
  
  Lines 503-507:
  
  “For subtype dominance, epidemic intensity, and age-specific proportions of ILI cases, we fit Beta regression models with logit links. Beta regression models are appropriate when the variable of interest is continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). For each epidemic metric, we fit the best-performing regression model to 1000 bootstrap replicates of the original dataset.”
  
  The github is clear, comprehensive and well-documented, at least at a brief glance.
  
  Thank you! At the time of resubmission, our GitHub repository is updated to incorporate feedback from the reviewers.
  
  References
  
  Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347.
  
  https://doi.org/10.1093/bioinformatics/btq134
  
  Barrat-Charlaix, P., Huddleston, J., Bedford, T., & Neher, R. A. (2021). Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses. Mol Biol Evol, 38(7), 2767-2777.
  
  https://doi.org/10.1093/molbev/msab065
  
  Bedford, T., Riley, S., Barr, I. G., Broor, S., Chadha, M., Cox, N. J., Daniels, R. S., Gunasekaran, C. P.,
  
  Hurt, A. C., Kelso, A., Klimov, A., Lewis, N. S., Li, X., McCauley, J. W., Odagiri, T., Potdar, V., Rambaut, A., Shu, Y., Skepner, E., . . . Russell, C. A. (2015). Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559), 217-220.
  
  https://doi.org/10.1038/nature14460
  
  Chao, A., Gotelli, N. J., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K., & Ellison, A. M. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67. https://doi.org/10.1890/13-0133.1 Charu, V., Zeger, S., Gog, J., Bjornstad, O. N., Kissler, S., Simonsen, L., Grenfell, B. T., & Viboud, C. (2017). Human mobility and the spatial transmission of influenza in the United States. PLoS
  
  Comput Biol, 13(2), e1005382. https://doi.org/10.1371/journal.pcbi.1005382
  
  Dalziel, B. D., Kissler, S., Gog, J. R., Viboud, C., Bjornstad, O. N., Metcalf, C. J. E., & Grenfell, B. T.
  
  (2018). Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.
  
  Science, 362(6410), 75-79. https://doi.org/10.1126/science.aat6030
  
  Debeer, D., & Strobl, C. (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1), 307. https://doi.org/10.1186/s12859-020-03622-2
  
  Dhanasekaran, V., Sullivan, S., Edwards, K. M., Xie, R., Khvorov, A., Valkenburg, S. A., Cowling, B. J., & Barr, I. G. (2022). Human seasonal influenza under COVID-19 and the potential consequences of influenza lineage elimination. Nat Commun, 13(1), 1721. https://doi.org/10.1038/s41467-02229402-5
  
  Ferrari, S., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501
  
  Garten, R. J., Davis, C. T., Russell, C. A., Shu, B., Lindstrom, S., Balish, A., Sessions, W. M., Xu, X., Skepner, E., Deyde, V., Okomo-Adhiambo, M., Gubareva, L., Barnes, J., Smith, C. B., Emery, S. L., Hillman, M. J., Rivailler, P., Smagala, J., de Graaf, M., . . . Cox, N. J. (2009). Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.
  
  Science, 325(5937), 197-201. https://doi.org/10.1126/science.1176225
  
  Grebe, K. M., Yewdell, J. W., & Bennink, J. R. (2008). Heterosubtypic immunity to influenza A virus:
  
  where do we stand? Microbes Infect, 10(9), 1024-1029.
  
  https://doi.org/10.1016/j.micinf.2008.07.002
  
  Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2), 427-432. https://doi.org/https://doi.org/10.2307/1934352
  
  Huddleston, J., Barnes, J. R., Rowe, T., Xu, X., Kondor, R., Wentworth, D. E., Whittaker, L., Ermetal, B., Daniels, R. S., McCauley, J. W., Fujisaki, S., Nakamura, K., Kishida, N., Watanabe, S., Hasegawa, H., Barr, I., Subbarao, K., Barrat-Charlaix, P., Neher, R. A., & Bedford, T. (2020).
  
  Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza
  
  A/H3N2 evolution. Elife, 9, e60067. https://doi.org/10.7554/eLife.60067 Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer.
  
  Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC.
  
  Lee, E. C., Arab, A., Goldlust, S. M., Viboud, C., Grenfell, B. T., & Bansal, S. (2018). Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol,
  
  14(3), e1006020. https://doi.org/10.1371/journal.pcbi.1006020
  
  Lemey, P., Rambaut, A., Bedford, T., Faria, N., Bielejec, F., Baele, G., Russell, C. A., Smith, D. J., Pybus,
  
  O. G., Brockmann, D., & Suchard, M. A. (2014). Unifying viral genetics and human transportation
  
  data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog, 10(2), e1003932. https://doi.org/10.1371/journal.ppat.1003932
  
  Muggeo, V. (2008). Segmented: An R Package to Fit Regression Models With Broken-Line Relationships. R News, 8, 20-25.
  
  Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Stat Med, 22(19), 30553071. https://doi.org/10.1002/sim.1545
  
  Neher, R. A., Russell, C. A., & Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife, 3, e03568. https://doi.org/10.7554/eLife.03568
  
  Rambaut, A., Pybus, O. G., Nelson, M. I., Viboud, C., Taubenberger, J. K., & Holmes, E. C. (2008). The genomic and epidemiological dynamics of human influenza A virus. Nature, 453(7195), 615-619.
  
  https://doi.org/10.1038/nature06945
  
  Recker, M., Pybus, O. G., Nee, S., & Gupta, S. (2007). The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proceedings of the National Academy of Sciences, 104(18), 7711-7716.
  
  https://doi.org/doi:10.1073/pnas.0702154104
  
  Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423.
  
  Smith, G. J., Vijaykrishna, D., Bahl, J., Lycett, S. J., Worobey, M., Pybus, O. G., Ma, S. K., Cheung, C. L., Raghwani, J., Bhatt, S., Peiris, J. S., Guan, Y., & Rambaut, A. (2009). Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459(7250), 1122-1125. https://doi.org/10.1038/nature08182
  
  Sridhar, S. (2016). Heterosubtypic T-Cell Immunity to Influenza in Humans: Challenges for Universal TCell Influenza Vaccines. Front Immunol, 7, 195. https://doi.org/10.3389/fimmu.2016.00195
  
  Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. https://doi.org/10.1186/1471-2105-9-307
  
  Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8, 25.
  
  https://doi.org/10.1186/1471-2105-8-25
  
  Terajima, M., Babon, J. A., Co, M. D., & Ennis, F. A. (2013). Cross-reactive human B cell and T cell epitopes between influenza A and B viruses. Virol J, 10, 244. https://doi.org/10.1186/1743-422x10-244
  
  Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M., & Kawaoka, Y. (1992). Evolution and ecology of influenza A viruses. Microbiological Reviews, 56(1), 152-179.
  
  https://doi.org/doi:10.1128/mr.56.1.152-179.1992
  
  Wen, F., Bedford, T., & Cobey, S. (2016). Explaining the geographical origins of seasonal influenza A
  
  (H3N2). Proc Biol Sci, 283(1838). https://doi.org/10.1098/rspb.2016.1312
  
  Yan, L., Neher, R. A., & Shraiman, B. I. (2019). Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. Elife, 8. https://doi.org/10.7554/eLife.44205
  
  Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing
  
  of Environment, 232, 111181. https://doi.org/10.1016/j.rse.2019.04.034
  
  Zinder, D., Bedford, T., Gupta, S., & Pascual, M. (2013). The Roles of Competition and Mutation in Shaping Antigenic and Genetic Diversity in Influenza. PLOS Pathogens, 9(1).
  
  https://doi.org/10.1371/journal.ppat.1003104
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2023.10.02.23296453v2
www.biorxiv.org www.biorxiv.org

Aberration correction in long GRIN lens-based microendoscopes for extended field-of-view two-photon imaging in deep brain regions

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Life Assessment
  
  This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based micro endoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors show that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. The evidence supporting the claims of the authors is solid, although some aspects of the manuscript should be clarified and missing information provided. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.
  
  We thank the Referees for their interest in the paper and for the constructive feedback. We have taken the time necessary to address all of their comments, acquiring new data and performing additional analyses. With the inclusion of these new results, we modified four main figures (Figures 1, 6, 7, and 8), added three new Supplementary Figures (Supplementary Figures 1, 2, and 3), and significantly edited the text. Based on the additional work suggested by the Referees, we believe that we have improved our manuscript, provided missing information, and clarified some aspects of the manuscript, which the Referees pointed our attention to.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Referee’s comment: Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (e.g. Antonini et al, 2020; eLife), filling out the quiver of available extended-fieldof-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.
  
  Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in the piriform cortex, which is difficult to access, especially in chronic preparations.
  
  The design, characterization, and simulations are clear and thorough, but not exhaustive (see below), and do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications not mentioned in the present text such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes.
  
  Strengths:
  
  The text is clearly written, the ex vivo analysis is thorough and well-supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.
  
  Weaknesses:
  
  Referee’s comment: (1) The novelty of the present work over previous efforts from the same group is not well explained. What needed to be done differently to correct these longer GRIN lenses?
  
  We thank the Referee for the positive evaluation of our work. The optical properties of GRIN lenses depend on the geometrical and optical features of the specific GRIN lens type considered, i.e. its diameter, length, numerical aperture, pitch, and radial modulation of the refractive index. Our approach is based on the addition of a corrective optical element at the back end of the GRIN lens to compensate for aberrations that light encounters as it travels through the GRIN lens. The corrective optical element must, therefore, be specifically tailored to the specific GRIN lens type we aim to correct the aberrations of. The novelty of the present article lies in the successful execution of the ray-trace simulations and two-photon lithography fabrication of corrective optical elements necessary to achieve aberration correction in the two novel and long GRIN lens types, i.e. NEM-050-25-15-860-S-1.5p and NEM-050-23-15-860-S-2.0p (GRIN length, 6.4 mm and 8.8 mm, respectively). Our previous work (Antonini et al. eLife 2020) demonstrated aberration correction with GRIN lenses shorter than 4.1 mm. The design and fabrication of a single corrective optical element suitable to enlarge the field-of-view (FOV) in these longer GRIN lenses is not obvious, especially because longer GRIN lenses are affected by stronger aberrations. To better clarify this point, we revised the Introduction at page 5 (lines 3-10 from bottom) as follows:
  
  “Recently, a novel method based on 3D microprinting of polymer optics was developed to correct for GRIN aberrations by placing specifically designed aspherical corrective lenses at the back end of the GRIN lens 7. This approach is attractive because it is built-in on the GRIN lens and corrected microendoscopes are ready-to-use, requiring no change in the optical set-up. However, previous work demonstrated the feasibility of this method only for GRIN lenses of length < 4.1 mm 7, which are too short to reach the most ventral regions of the mouse brain. The applicability of this technology to longer GRIN lenses, which are affected by stronger optical aberrations 19, remained to be proven.”
  
  (2) Some strong motivations for the method are not presented. For example, the introduction (page 3) focuses on identifying neurons with different coding properties, but this can be done with electrophysiology (albeit with different strengths and weaknesses). Compared to electrophysiology, optical methods more clearly excel at genetic targeting, subcellular measurements, and molecular specificity; these could be mentioned.
  
  Thank you for the comment. We added a paragraph in the Introduction (page 3, lines 2-8) according to what suggested by the Reviewer:
  
  “High resolution 2P fluorescence imaging of the awake brain is a fundamental tool to investigate the relationship between the structure and the function of brain circuits 1. Compared to electrophysiological techniques, functional imaging in combination with genetically encoded indicators allows monitoring the activity of genetically targeted cell types, access to subcellular compartments, and tracking the dynamics of many biochemical signals in the brain (2). However, a critical limitation of multiphoton microscopy lies in its limited (< 1 mm) penetration depth in scattering biological media 3”.
  
  Another example, in comparing microfabricated lenses to other approaches, an unmentioned advantage is miniaturization and potential application to mini-2P microscopes, which use GRIN lenses.
  
  We added the concept suggested by the Reviewer in the Discussion (page 21, lines 4-7 from bottom). The text now reads:
  
  “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes 42-44, allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.
  
  (3) Some potentially useful information is lacking, leaving critical questions for potential adopters:
  
  How sensitive is the assembly to decenter between the corrective optic and the GRIN lens?
  
  Following the Referee’s comment, we conducted new optical simulations to evaluate the decrease in optical performance of the corrected endoscopes as a function of the radial shift of the corrective lens from the optical axis of the GRIN rod (decentering, new Supplementary Figure 3), using light rays passing either off- or on-axis. For off-axis rays, we found that the Strehl ratio remained above 0.8 (Maréchal criterion) for positive translations in the range 6-11.5 microns and 16-50 microns for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, while the Strehl ratio decreased below 0.8 for negative translations of amplitude ~ 5 microns. Please note that for the most marginal rays, a negative translation produces a mismatch between the corrective microlens and the GRIN lens such that the light rays no longer pass through the corrective lens. In contrast, rays passing near the optical axis were still focused by the corrected probe with Strehl ratio above 0.8 in a range of radial shifts of -40 – 40 microns for both microendoscope types. Altogether, these novel simulations suggest that decentering between the corrective microlens and the GRIN lens < 5 microns do not majorly affect the optical properties of the corrected endoscopes. These new results are now displayed in Supplementary Figure 3 and described on page 7 (lines 3-5 from bottom).
  
  What is the yield of fabrication and of assembly?
  
  The fabrication yield using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with a stereomicrscope and, in case of air bubble formation, they were discarded.
  
  The assembly yield, i.e. correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).
  
  We added this information in the Methods at page 29 (lines 1-12), as follows:
  
  “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”.
  
  Supplementary Figure 1: Is this really a good agreement between the design and measured profile? Does the figure error (~10 um in some cases on average) noticeably degrade the image?
  
  As the Reviewer correctly noticed, the discrepancy between the simulated profile and the experimentally measured profile can be up to 5-10 microns at specific radial positions. This discrepancy could be due to issues with: (i) the fabrication of the microlens; (ii) the experimental measurement of the lens profile with the stylus profilometer. To discriminate among these two possibilities, we asked what would be the expected optical properties of the corrected endoscope should the corrective lens have the experimentally measured (not the simulated) profile. To this aim, we performed new optical simulations of the point spread function (PSF) of the corrected probe using, as corrective microlens profile, the average, experimentally measured, profile of a fabricated corrective lens. For both microendoscope types, we first fitted the mean experimentally measured profile of the fabricated lens with the aspherical function reported in equation (1) of the main text:
  
  where:
  
  -                is the radial distance from the optical axis;
  
  -                is equal to 1⁄ , where R is the radius of curvature;
  
  -                is the conic constant;
  
  -                − are asphericity coefficients;
  
  -                is the height of the microlens profile on-axis.
  
  The fitting values of the parameters of equation (1) for the two lenses are reported for the Referee’s inspection here below (variables describing distances are expressed in mm):
  
  Author response table 1.
  
  Fitting values for the parameters of Equation (1) describing the profile of corrective microlens replicas measured with the stylus profilometer. Distances are expressed in mm.
  
  We then assumed that the profile of the corrective microlenses were equal to the mean experimentally measured profiles and used the aspherical fitting functions in the optical simulations to compute the performance of corrected microendoscopes. For both microendoscope types, we found that the Strehl ratio was lower than 0.35, well below the theoretical diffractionlimited threshold of 0.8 (Maréchal criterion) at moderate distances from the optical axis (68 μm94 μm and 67 μm-92 μm on the focal plane in the object space, after the front end of the GRIN lens, for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, Author response image 1A, C), and the PSF was strongly distorted (Author response image 1B, D).
  
  Author response image 1.
  
  Simulated optical performance of corrected probes with profiles of corrective microlenses equal to the mean experimentally measured profiles of fabricated corrective lenses. A) The Strehl ratio for the 6.4 mm-long corrected microendoscope with measured microlens profile (black dots) is computed on-axis (distance from the center of the FOV d = 0 µm) and at two radial distances off-axis (d = 68 μm and 94 μm on the focal plane in the object space) and compared to the Strehl ratio of the uncorrected (red line) and corrected (blue line) microendoscopes. B) Lateral (x,y) and axial (x,z) fluorescence intensity (F) profiles of simulated PSFs on-axis (left) and off-axis (right, at the indicated distance d computed on the focal plane in the object space) for the 6.4 mm-long corrected microendoscope with measured microlens profile. C) Same as in (A) for the 8.8 mm-long corrected microendoscope (off-axis d = 67 μm and 92 μm on the focal plane in the object space). D) Same as in (B) for the 8.8 mm-long corrected microendoscope.
  
  These simulated findings are in contrast with the experimentally measured optical properties of our corrected endoscopes (Figure 3). In other words, these novel simulated results show that experimentally measured profiles of the corrected lenses are incompatible with the experimental measurements of the optical properties of the corrected endoscopes. Therefore, our experimental recording of the lens profile shown in Supplementary Figure 1 of the first submission (now Supplementary Figure 4) should be used only as a coarse measure of the lens shape and cannot be used to precisely compare simulated lens profiles with measured lens profiles.
  
  How do individual radial profiles compare to the presented means?
  
  We provide below a modified version of Supplementary Figure 4 (Supplementary Figure 1 in the first submission), where individual profiles measured with the stylus profilometer and the mean profile are displayed for both microendoscope types (Author response image 2). In the manuscript (Supplementary Figure 4), we would suggest to keep showing mean profiles ± standard errors of the mean, as we did in the original submission.
  
  Author response image 2.
  
  Characterization of polymeric corrective lens replicas. A) Stylus profilometer measurements were performed along the radius of the corrective polymer microlens replica for the 6.4 mm-long corrected microendoscope. Individual measured profiles (grey solid lines) obtained from n = 3 profile measurements on m = 3 different corrective lens replicas, plus the mean profile (black solid line) are displayed. B) Same as (A) for the 8.8 mm-long microendoscope.
  
  What is the practical effect of the strong field curvature? Are the edges of the field, which come very close to the lens surface, a practical limitation?
  
  A first practical effect of the field curvature is that structures at different z coordinates are sampled. The observed field curvature of corrected endoscopes may therefore impact imaging in brain regions characterized by strong axially organized anatomy (e.g., the pyramidal layer of the hippocampus), but would not significantly affect imaging in regions with homogeneous cell density within the axial extension of the field curvature (< 170 µm, see more details below). A second consequence of the field curvature, as the Referee correctly points out, is that cell at the border of the FOV are closer to the front end of the GRIN lens. In measurements of subresolved fluorescent layers (Figure 3A-D), we observed that the field curvature extends in the axial direction to ~ 110 μm and ~170 μm for the 6.4 mm- and the 8.8 mm-long microendoscopes, respectively. Considered that the nominal working distances on the object side of the 6.4 mm- and the 8.8 mm-long microendoscopes were, respectively, 210 μm and 178 μm (Table 3), structures positioned at the very edge of the FOV were ~ 100 μm and ~ 8 μm away from the GRIN front end for the 6.4 mm-long and for the 8.8 mm-long probe, respectively. Previous studies have shown that brain tissue within 50-100 μm from the GRIN front end may show signs of tissue reaction to the implant (Curreli et al. PLOS Biology 2022, Attardo et al. Nature 2015). Therefore, structures at the very edge of the FOV of the 8.8 mm-long endoscopes, but not those at the edge of the 6.4 mm-long endoscopes, may be within the volume showing tissue reaction. We added a paragraph in the text to discuss these points (page 18 lines 10-14).
  
  The lenses appear to be corrected for monochromatic light; high-performance microscopes are generally achromatic. Is the bandwidth of two-photon excitation sufficient to warrant optimization over multiple wavelengths?
  
  Thanks for this comment. All optical simulations described in the first submission were performed at a fixed wavelength (λ = 920 nm). Following the Referee’s request, we explored the effect of changing wavelength on the Strehl ratio using new optical simulations. We found that the Strehl ratio remains > 0.8 at least within ± 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained, but at different z planes (new Supplementary Figure 1A-D, right panels). This means that the corrective lens is working as expected also for wavelengths which are different from 920 nm, with different wavelengths having the most enlarged FOV located at different working distances. These new results are now described on page 7 (lines 8-10).
  
  GRIN lenses are often used to access a 3D volume by scanning in z (including in this study). How does the corrective lens affect imaging performance over the 3D field of view?
  
  The optical simulations we did to design the corrective lenses were performed maximizing aberration correction only in the focal plane of the endoscope. Following the Referee’s comment, we explored the effect of aberration correction outside the focal plane using new optical simulations. In corrected endoscopes, we found that for off-axis rays (radial distance from the optical axis > 40 μm) the Strehl ratio was > 0.8 (Maréchal criterion) in a larger volume compared to uncorrected endoscopes (new Supplementary Figure 2), demonstrating that the aberration correction method developed in this study does extend beyond the focal plane for short distances. For example, at a radial distance of ~ 90 μm from the optical axis, the axial range in which the Strehl ratio was > 0.8 in corrected endoscopes was 28 μm and 19 μm for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. These new results are now described on page 7 (10-19).
  
  (4) The in vivo images (Figure 7D) have a less impressive resolution and field than the ex vivo images (Figure 4B), and the reason for this is not clear. Given the difference in performance, how does this compare to an uncorrected endoscope in the same preparation? Is the reduced performance related to uncorrected motion, field curvature, working distance, etc?
  
  In comparing images in Figure 4B with images shown in Figure 7D, the following points should be considered:
  
  (1) Figure 4B is a maximum fluorescence intensity projection of multiple axial planes of a z-stack acquired through a thin brain slice (slice thickness: 50 µm) using 8 frame averages for each plane. In contrast, images in Figure 7D are median projection of a t-series acquired on a single plane in the awake mouse at 30 Hz resonant scanning imaging (8 min, 14,400 frames).
  
  (2) Images of the fixed brain slice in Figure 4B were acquired at 1024 pixels x 1024 pixels resolution, nominal pixel size 0.45 µm/pixel, and with objective NA = 0.50, whereas in vivo images in Figure 7D were acquired at 512 pixels x 512 pixels resolution, nominal pixel size 0.72 - 0.84 µm/pixel, and with objective NA = 0.45.
  
  (3) In the in vivo preparation (Figure 7D), excitation and emission light travel through > 180 µm of scattering and absorbing brain tissue, reducing spatial resolution and the SNR of the collected fluorescence signal.
  
  (4) By shifting the sample in the x, y plane, in Figure 4B we could chose a FOV containing homogenously stained cells. x, y shifting and selecting across multiple FOVs was not possible in vivo, as the GRIN lens was cemented on the animal skull.
  
  (5) Images in Figure 7D were motion corrected, but we cannot exclude that part of the decrease in resolution observed in Figure 7D when compared to images in Figure 4B are due to incomplete correction of motion artifacts.
  
  For all the reasons listed above, we believe that it is expected to see smaller resolution and contrast in images recorded in vivo (Figure 7D) compared to images acquired in fixed tissue (Figure 4B).
  
  Regarding the question of how do images from an uncorrected and a corrected endoscopes compared in vivo, we think that this comparison is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors. Moreover, the major advantage of quantifying how the optical properties of uncorrected and corrected endoscopes impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible in the in vivo recordings.
  
  Regarding Figure 7, there is no analysis of the biological significance of the calcium signals or even a description of where olfactory stimuli were presented.
  
  We appreciate the Reviewer pointing out the lack of detailed analysis regarding the biological significance of the calcium signals and the presentation of olfactory stimuli in Figure 7. Our initial focus was on demonstrating the effectiveness of the optimized GRIN lenses for imaging deep brain areas like the piriform cortex, with an emphasis on the improved signal-tonoise ratio (SNR) these lenses provide. However, we agree that including more context about the experimental conditions would enhance the manuscript. To address this point, we added a new panel (Figure 7F) showing calcium transients aligned with the onset of olfactory stimulus presentations, which are now indicated by shaded light blue areas. Additionally, we have specified the timing of each stimulus presented in Figure 7E. This revision allows readers to better understand the relationship between the calcium signals and the olfactory stimuli.
  
  The timescale of jGCaMP8f signals in Figure 7E is uncharacteristically slow for this indicator (compared to Zhang et al 2023 (Nature)), though perhaps this is related to the physiology of these cells or the stimuli.
  
  Regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the original manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals.
  
  (5) The claim of unprecedented spatial resolution across the FOV (page 18) is hard to evaluate and is not supported by references to quantitative comparisons. The promises of the method for future studies (pages 18-19) could also be better supported by analysis or experiment, but these are minor and to me, do not detract from the appeal of the work.
  
  GRIN lens-based imaging of piriform cortex in the awake mouse had already been done in Wang et al., Neuron 2020. The GRIN lens used in that work was NEM-050-50-00920-S-1.5p (GRINTECH, length: 6.4 mm; diameter: 0.5 mm), similar to the one that we used to design the 6.4 mm-long corrected microendoscope. Here we used a microendoscope specifically design to correct off-axis aberrations and enlarge the FOV, in order to maximize the number of neurons recorded with the highest possible spatial resolution, while keeping the tissue invasiveness to the minimum. Following the Referee’s comments, we revised the sentence at page 19 (lines 68 from bottom) as follows:
  
  “We used long corrected microendoscopes to measure population dynamics in the olfactory cortex of awake head-restrained mice with unprecedented combination of high spatial resolution across the FOV and minimal invasiveness(17)”.
  
  (6) The text is lengthy and the material is repeated, especially between the introduction and conclusion. Consolidating introductory material to the introduction would avoid diluting interesting points in the discussion.
  
  We thank the Reviewer for this comment. As suggested, we edited the Introduction and shortened the Discussion.
  
  Reviewer #2 (Public review):
  
  In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.
  
  This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral regions of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.
  
  Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.
  
  The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.
  
  We thank the Referee for the positive comments on our study. We address the points indicated by the Referee in the “Recommendation to the authors” section below.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This work presents the development, characterization, and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two micro endoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.
  
  Strengths:
  
  (1) The paper is generally clear and well-written. The scientific approach is well structured and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected micro endoscopes:
  
  a) PSFs measured with corrected micro endoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected micro endoscopes.
  
  b) Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected micro endoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.
  
  c) Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.
  
  (2) There is a strong need for high-quality micro endoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient, and potentially easy to disseminate within the neuroscience community.
  
  Weaknesses:
  
  (1) Many points need to be clarified/discussed. Here are a few examples:
  
  a) It is written in the methods: “The uncorrected microendoscopes were assembled either using different optical elements compared to the corrected ones or were obtained from the corrected
  
  probes after the mechanical removal of the corrective lens.”
  
  This is not very clear: the uncorrected microendoscopes are not simply the unmodified GRIN lenses?
  
  We apologize for not been clear enough on this point. Uncorrected microendoscopes are not simply unmodified GRIN lenses, rather they are GRIN lenses attached to a round glass coverslip (thickness: 100 μm). The glass coverslip was included in ray-trace optical simulations of the uncorrected system and this is the reason why commercial GRIN lenses and corresponding uncorrected microendoscopes have different working distances, as reported in Tables 2-3. To make the text clearer, we added the following sentence at page 27 (last 4 lines):
  
  “To evaluate the impact of corrective microlenses on the optical performance of GRIN-based microendoscopes, we also simulated uncorrected microendoscopes composed of the same optical elements of corrected probes (glass coverslip and GRIN rod), but in the absence of the corrective microlens”.
  
  b) In the results of the simulation of neuronal activity (Figure 5A, for example), the neurons in the center of the FOV have a very large diameter (of about 30µm). This should be discussed.
  
  Thanks for this comment. In synthetic calcium imaging t-series, cell radii were randomly sampled from a Gaussian distribution with mean = 10 µm and standard deviation (SD) = 3 µm. Both values were estimated from the literature (ref. no. 28: Suzuki & Bekkers, Journal of Neuroscience, 2011) as described in the Methods (page 35). In the image shown in Figure 5A, neurons near to the center of the FOV have radius of ~ 20 µm corresponding to the right tail of the distribution (mean + 3SD = 19 µm). It is also important to note that, for corrected microendoscopes, neurons in the central portion of the FOV appear larger than cells located near the edges of the FOV, because the magnification depends on the distance from the optical axis (see Figure 3E, F) and near the center the magnification is > 1 for both microendoscope types.
  
  Also, why is the optical resolution so low on these images?
  
  Images shown in Figure 5 are median fluorescence intensity projections of 5 minute-long simulated t-series. Simulated calcium data were generated with pixel size 0.8 μm/pixel and frame rate 30 Hz, similarly to in vivo recordings. In the simulations, pixels not belonging to any cell soma were assigned a value of background fluorescence randomly sampled from a normal distribution with mean and standard deviation estimated from experimental data, as described in the Methods section (page 37). To simulate activity, the mean spiking rate of neurons was set to 0.3 Hz, thus in a large fraction of frames neurons do not show calcium transients. Therefore, the median fluorescence intensity value of somata will be close to their baseline fluorescence value (_F_0). Since in simulations F0 values (~ 45-80 a.u.) were not much higher than the background fluorescence level (~ 45 a.u.), this may generate the appearance of low contrast image in Figure 5A. Finally, we suspect that PDF rendering also contributed to degrade the quality of those images. We will now submit high resolution images alongside the PDF file.
  
  c) It seems that we can't see the same neurons on the left and right panels of Figure 5D. This should be discussed.
  
  The Referee is correct. When we intersected the simulated 3D volume of ground truth neurons with the focal surface of microendoscopes, the center of the FOV for the 8.8 mmlong corrected microendoscope was located at a larger depth than the FOV of the 8.8 mm uncorrected microendoscope. This effect was due to the larger field curvature of corrected 8.8 mmlong endoscopes compared to 8.8 mm-long uncorrected endoscopes. This is the reason why different neurons were displayed for uncorrected and corrected endoscopes in Figure 5D. We added this explanation in the text at page 37 (lines 1-4). The text reads:
  
  “Due to the stronger field curvature of the 8.8 mm-long corrected microendoscope (Figure 1C) compared to 8.8 mm-long uncorrected microendoscopes, the center of the corrected imaging focal surface resulted at a larger depth in the simulated volume compared to the center of the uncorrected focal surface(s). Therefore, different simulated neurons were sampled in the two cases”.
  
  d) It is not very clear to me why in Figure 6A, F the fraction of adjacent cell pairs that are more correlated than expected increases as a function of the threshold on peak SNR. The authors showed in Supplementary Figure 3B that the mean purity index increases as a function of the threshold on peak SNR for all micro endoscopes. Therefore, I would have expected the correlation between adjacent cells to decrease as a function of the threshold on peak SNR. Similarly, the mean purity index for the corrected short microendoscope is close to 1 for high thresholds on peak SNR: therefore, I would have expected the fraction of adjacent cell pairs that are more correlated than expected to be close to 0 under these conditions. It would be interesting to clarify these points.
  
  Thanks for raising this point. We defined the fraction of adjacent cell pairs more correlated than expected as the number of adjacent cell pairs more correlated than expected divided by the number of adjacent cell pairs. The reason why this fraction raises as a function of the SNR threshold is shown in Supplementary Figure 2 in the first submission (now Supplementary Figure 5). There, we separately plotted the number of adjacent cell pairs more correlated than expected (numerator) and the number of adjacent cell pairs (denominator) as a function of the SNR threshold. For both microendoscope types, we observed that the denominator more rapidly decreased with peak SNR threshold than the numerator. Therefore, the fraction of adjacent cell pairs more correlated than expected increases with the peak SNR threshold.
  
  To understand why the denominator decreases with SNR threshold, it should be considered that, due to the deterioration of spatial resolution and attenuation of fluorescent signal collection as a function of the radial distance from the optical axis (see for example fluorescent film profiles in Figure 3A, C), increasing the threshold on the peak SNR of extracted calcium traces implies limiting cell detection to those cells located within smaller distance from the center of the FOV. This information is shown in Figure 5C, F.
  
  In the manuscript text, this point is discussed at page 12 (lines 1-3 from bottom) and page 13 (lines 1-4):
  
  “The fraction of pairs of adjacent cells (out of the total number of adjacent pairs) whose activity correlated significantly more than expected increased as a function of the SNR threshold for corrected and uncorrected microendoscopes of both lengths (Fig. 6A, F). This effect was due to a larger decrease of the total number of pairs of adjacent cells as a function of the SNR threshold compared to the decrease in the number of pairs of adjacent cells whose activity was more correlated than expected (Supplementary Figure 5)”.
  
  e) Figures 6C, H: I think it would be fairer to compare the uncorrected and corrected endomicroscopes using the same effective FOV.
  
  To address the Reviewer’s concern, we repeated the linear regression of purity index as a function of the radial distance using the same range of radial distances for the uncorrected and corrected case of both microendoscope types. Below, we provide an updated version of Figure 6C, H for the referee’s perusal. Please note that the maximum value displayed on the x-axis of both graphs is now corresponding to the minimum value between the two maximum radial distance values obtained in the uncorrected and corrected case (maximum radial distance displayed: 151.6 µm and 142.1 μm for the 6.4 mm- and the 8.8 mm-long GRIN rod, respectively). Using the same effective FOV, we found that the purity index drops significantly more rapidly with the radial distance for uncorrected microendoscopes compared to the corrected ones, similarly to what observed in the original version of Figure 6. The values of the linear regression parameters and statistical significance of the difference between the slopes in the uncorrected and corrected cases are stated in the Author response image 3 caption below for both microendoscope types. In the manuscript, we would suggest to keep showing data corresponding to all detected cells, as we did in the original submission.
  
  Author response image 3.
  
  Linear regression of purity index as a function of the radial distance. A) Purity index of extracted traces with peak SNR > 10 was estimated using a GLM of ground truth source contributions and plotted as a function of the radial distance of cell identities from the center of the FOV for n = 13 simulated experiments with the 6.4 mm-long uncorrected (red) and corrected (blue) microendoscope. Black lines represent the linear regression of data ± 95% confidence intervals (shaded colored areas). Maximum value of radial distance displayed: 151.6 μm. Slopes ± standard error (s.e.): uncorrected, (-0.0015 ± 0.0002) µm-1; corrected, (-0.0006 ± 0.0001) μm-1. Uncorrected, n = 991; corrected, n = 1156. Statistical comparison of slopes, p < 10<sup>-10</sup>, permutation test. B) Same as (A) for n = 15 simulated experiments with the 8.8 mm-long uncorrected and corrected microendoscope. Maximum value of radial distance displayed: 142.1 μm. Slopes ± s.e.: uncorrected, (-0.0014 ± 0.0003) μm-1; corrected, (-0.0010 ± 0.0002) µm-1. Uncorrected, n = 718; corrected, n = 1328. Statistical comparison of slopes, p = 0.0082, permutation test.
  
  f) Figure 7E: Many calcium transients have a strange shape, with a very fast decay following a plateau or a slower decay. Is this the result of motion artefacts or analysis artefacts?
  
  Thank you for raising this point about the unusual shapes of the calcium transients in Figure 7E. The observed rapid decay following a plateau or a slower decay is indeed a result of how the data were presented in the original submission. Our experimental protocol consisted of 22 s-long trials with an inter-trial interval of 10 s (see Methods section, page 44). In the original figure, data from multiple trials were concatenated, which led to artefactual time courses and apparent discontinuities in the calcium signals. To resolve this issue, we revised Figure 7E to accurately represent individual concatenated trials. We also added a new panel (please see new Figure 7F) showing examples of single cell calcium responses in individual trials without concatenation, with annotations indicating the timing and identity of presented olfactory stimuli.
  
  Also, the duration of many calcium transients seems to be long (several seconds) for GCaMP8f. These points should be discussed.
  
  Author response: regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study, but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals. We cite these references in the text. We believe that these revisions and clarifications address the Reviewer's concern and enhance the overall clarity of our manuscript.
  
  g) The authors do not mention the influence of the neuropil on their data. Did they subtract the neuropil's contribution to the signals from the somata? It is known from the literature that the presence of the neuropil creates artificial correlations between neurons, which decrease with the distance between the neurons (Grødem, S., Nymoen, I., Vatne, G.H. et al. An updated suite of viral vectors for in vivo calcium imaging using intracerebral and retro-orbital injections in male mice. Nat Commun 14, 608 (2023). https://doi.org/10.1038/s41467-023-363243; Keemink SW, Lowe SC, Pakan JMP, Dylda E, van Rossum MCW, Rochefort NL. FISSA: A neuropil decontamination toolbox for calcium imaging signals. Sci Rep. 2018 Feb 22;8(1):3493.
  
  doi: 10.1038/s41598-018-21640-2. PMID: 29472547; PMCID: PMC5823956)
  
  This point should be addressed.
  
  We apologize for not been clear enough in our previous version of the manuscript. The neuropil was subtracted from calcium traces both in simulated and experimental data. Please note that instead of using the term “neuropil”, we used the word “background”. We decided to use the more general term “background” because it also applies to the case of synthetic calcium tseries, where neurons were modeled as spheres devoid of processes. The background subtraction is described in the Methods on page 39:
  
  “F(t) was computed frame-by-frame as the difference between the average signal of pixels in each ROI and the background signal. The background was calculated as the average signal of pixels that: i) did not belong to any bounding box; ii) had intensity values higher than the mean noise value measured in pixels located at the corners of the rectangular image, which do not belong to the circular FOV of the microendoscope; iii) had intensity values lower than the maximum value of pixels within the boxes”.
  
  h) Also, what are the expected correlations between neurons in the pyriform cortex? Are there measurements in the literature with which the authors could compare their data?
  
  We appreciate the reviewer's interest in the correlations between neurons in the piriform cortex. The overall low correlations between piriform neurons we observed (Figure 8) are consistent with a published study describing ‘near-zero noise correlations during odor inhalation’ in the anterior piriform cortex of rats, based on extracellular recordings (Miura et al., Neuron 2013). However, to the best of our knowledge, measurements directly comparable to ours have not been described in the literature. Recent analyses of the correlations between piriform neurons were restricted to odor exposure windows, with the goal to quantify odor-specific activation patterns (e.g. Roland et al., eLife 2017; Bolding et al., eLife 2017, Pashkovski et al., Nature 2020; Wang et al., Neuron 2020). Here, we used correlation analyses to characterize the technical advancement of the optimized GRIN lens-based endoscopes. We showed that correlations of pairs of adjacent neurons were independent from radial distance (Figure 8B), highlighting homogeneous spatial resolution in the field of view.
  
  (2) The way the data is presented doesn't always make it easy to compare the performance of corrected and uncorrected lenses. Here are two examples:
  
  a) In Figures 4 to 6, it would be easier to compare the FOVs of corrected and uncorrected lenses if the scale bars (at the centre of the FOV) were identical. In this way, the neurons at the centre of the FOV would appear the same size in the two images, and the distances between the neurons at the centre of the FOV would appear similar. Here, the scale bar is significantly larger for the corrected lenses, which may give the illusion of a larger effective FOV.
  
  We appreciate the Referee’s comment. Below, we explain why we believe that the way we currently present imaging data in the manuscript is preferable:
  
  (1) current figures show images of the acquired FOV as they are recorded from the microscope (raw data), without rescaling. In this way, we exactly show what potential users will obtain when using a corrected microendoscope.
  
  (2) In the current version of the figures, the fact that the pixel size is not homogeneous across the FOV, nor equal between uncorrected and corrected microendoscopes, is initially shown in Figure 3E, F and then explicitly stated throughout the manuscript when images acquired with a corrected microendoscope are shown.
  
  (3) Rescaling images acquired with the corrected endoscopes gives the impression that the acquisition parameters were different between acquisitions with the corrected and uncorrected microendoscopes, which was not the case.
  
  Importantly, the larger FOV of the corrected microendoscope, which is one of the important technological achievements presented in this study, can be appreciated in the images regardless of the presentation format.
  
  b) In Figures 3A-D it would be more informative to plot the distances in microns rather than pixels. This would also allow a better comparison of the micro endoscopes (as the pixel sizes seem to be different for the corrected and uncorrected micro endoscopes).
  
  The Referee is correct that the pixel size is different between the corrected and uncorrected probes. This is because of the different magnification factor introduced by the corrective microlens, as described in Figure 3E, F. The rationale for showing images in Figure 3AD in pixels rather than microns is the following:
  
  (1) Optical simulations in Figure 1 suggest that a corrective optical element is effective in compensating for some of the optical aberrations in GRIN microendoscopes.
  
  (2) After fabricating the corrective optical element (Figure 2), in Figure 3A-D we conduct a preliminary analysis of the effect of the corrective optical element on the optical properties of the GRIN lens. We observed that the microfabricated optical element corrected for some aberrations (e.g., astigmatism), but also that the microfabricated optical element was characterized by significant field curvature. This can be appreciated showing distances in pixels.
  
  (3) The observed field curvature and the aspherical profile of the corrected lens prompted us to characterize the magnification factor of the corrected endoscopes as a function of the radial distance. We found that the magnification factor changed as a function of the radial distance (Figure 3E-F) and that pixel size was different between uncorrected and corrected endoscopes. We also observed that, in corrected endoscopes, pixel size was a function of the radial distance (Figure 3E-F).
  
  (4) Once all of the above was established and quantified, we assigned precise pixel size to images of uncorrected and corrected endoscopes and we show all following images of the study (Figure 3G on) using a micron (rather than pixel) scale.
  
  (3) There seems to be a discrepancy between the performance of the long lenses (8.8 mm) in the different experiments, which should be discussed in the article. For example, the results in Figure 4 show a considerable enlargement of the FOV, whereas the results in Figure 6 show a very moderate enlargement of the distance at which the person's correlation with the first ground truth emitter starts to drop.
  
  Thanks for raising this point and helping us clarifying data presentation. Images in Figure 4B are average z-projections of z-stacks acquired through a mouse fixed brain slice and they were taken with the purpose of showing all the neurons that could be visualized from the same sample using an uncorrected and a corrected microendoscope. In Figure 4B, all illuminated neurons are visible regardless of whether they were imaged with high axial resolution (e.g., < 10 µm as defined in Figure 3J) or poor axial resolution. In contrast, in Figure 6J we evaluated the correlation between the calcium trace extracted from a given ROI and the real activity trace of the first simulated ground truth emitter for that specific ROI. The moderate increase in the correlation for the corrected microendoscope compared to the uncorrected microendoscope (Figure 6J) is consistent with the moderate improvement in the axial resolution of the corrected probe compared to the uncorrected probe at intermediate radial distances (60-100 µm from the optical axis, see Figure 3J). We added a paragraph in the Results section (page 14, lines 8-18) to summarize the points described above.
  
  a) There is also a significant discrepancy between measured and simulated optical performance, which is not discussed. Optical simulations (Figure 1) show that the useful FOV (defined as the radius for which the size of the PSF along the optical axis remains below 10µm) should be at least 90µm for the corrected microendoscopes of both lengths. However, for the long microendoscopes, Figure 3J shows that the axial resolution at 90µm is 17µm. It would be interesting to discuss the origin of this discrepancy: does it depend on the microendoscope used?
  
  As the Reviewer correctly pointed out, the size of simulated PSFs at a given radial distance (e.g., 90 µm) tends to be generally smaller than that of the experimentally measured PSFs. This might be due to multiple reasons:
  
  (1) simulated PSFs are excitation PSFs, i.e. they describe the intensity spatial distribution of focused excitation light. On the contrary, measured PSFs result from the excitation and emission process, thus they are also affected by aberrations of light emitted by fluorescent beads and collected by the microscope.
  
  (2) in the optical simulations, the Zemax file of the GRIN lenses contained first-order aberrations. High-order aberrations were therefore not included in simulated PSFs.
  
  (3) intrinsic variability of experimental measurements (e.g., intrinsic variability of the fabrication process, alignment of the microendoscope to the optical axis of the microscope, the distance between the GRIN back end and the objective…) are not considered in the simulations.
  
  We added a paragraph in the Discussion section (page 17, lines 9-18) summarizing the abovementioned points.
  
  Are there inaccuracies in the construction of the aspheric corrective lens or in the assembly with the GRIN lens? If there is variability between different lenses, how are the lenses selected for imaging experiments?
  
  The fabrication yield, i.e. the yield of generating the corrective lenses, using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with the stereoscope and, in case of air bubble formation, they were discarded.
  
  The assembly yield, i.e. the yield of correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).
  
  We added this information in the Methods at page 29 (lines 1-12), as follows:
  
  “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Page 4, what is meant by 'ad-hoc" in describing software control?
  
  With “ad-hoc” we meant “specifically designed”. We revised the text to make this clear.
  
  (2) It was hard to tell how the PSF was modeled for the simulations (especially on page 34, describing the two spherical shells of the astigmatic PSF and ellipsoids modeled along them). Images or especially videos that show the modeling would make this easier to follow.
  
  Simulated calcium t-series were generated following previous work by our group (Antonini et al., eLife 2020), as stated in the Methods on page 37 (line 5). In Figure 4A of Antonini et al. eLife 2020, we provided a schematic to visually describe the procedure of simulated data generation. In the present paper, we decided not to include a similar drawing and cite the eLife 2020 article to avoid redundancy.
  
  (3) Some math symbols are missing from the methods in my version of the text (page 36/37).
  
  We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it at the time of submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.
  
  (4) The Z extent of stacks (i.e. number of steps) used to generate images in Figure 4 is missing.
  
  We thank the Reviewer for the comment and we now revised the caption of Figure 4 and the Methods section as follows:
  
  “Figure 4. Aberration correction in long GRIN lens-based microendoscopes enables highresolution imaging of biological structures over enlarged FOVs. A) jGCaMP7f-stained neurons in a fixed mouse brain slice were imaged using 2PLSM (λexc = 920 nm) through an uncorrected (left) and a corrected (right) microendoscope based on the 6.4 mm-long GRIN rod. Images are maximum fluorescence intensity (F) projections of a z-stack acquired with a 5 μm step size. Number of steps: 32 and 29 for uncorrected and corrected microendoscope, respectively. Scale bars: 50 μm. Left: the scale applies to the entire FOV. Right, the scale bar refers only to the center of the FOV; off-axis scale bar at any radial distance (x and y axes) is locally determined multiplying the length of the drawn scale bar on-axis by the corresponding normalized magnification factor shown in the horizontal color-coded bar placed below the image (see also Fig. 3, Supplementary Table 3, and Materials and Methods for more details). B) Same results for the microendoscope based on the 8.8 mm-long GRIN rod. Number of steps: 23 and 31 for uncorrected and corrected microendoscope, respectively”.
  
  We also modified the text in the Methods (page 35, lines 1-2):
  
  “(1024 pixels x 1024 pixels resolution; nominal pixel size: 0.45 µm/pixel; axial step: 5 µm; number of axial steps: 23-32; frame averaging = 8)”.
  
  (5) Overall, the text is wordy and a bit repetitive and could be cut down significantly in length without loss of clarity. This is true throughout, but especially when comparing the introduction and discussion.
  
  We edited the text (Discussion and Introduction), as suggested by the Reviewer.
  
  (6) Although I don't think it's necessary, I would advise including comparison data with an uncorrected endoscope in the same in vivo preparation.
  
  We thank the Referee for the suggestion. Below, we list the reasons why we decided not to perform the comparison between the uncorrected and corrected endoscopes in the in vivo preparation:
  
  (1) We believe that the comparison between uncorrected and corrected endoscopes is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of all these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors.
  
  (2) A major advantage of quantifying how the optical properties of uncorrected and corrected endoscope impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible under in vivo conditions.
  
  (3) The proposed experiment requires to perform imaging in the awake mouse with a corrected microendoscope, then anesthetize the animal to carefully remove the corrective microlens using forceps, and finally repeat the optical recordings in awake mice with the uncorrected microendoscope. Although this is feasible (we performed the proposed experiment in Antonini et al. eLife 2020 using a 4.1 mm-long microendoscope), the yield of success of these experiments is low. The low yield is due to the fact that the mechanical force applied on top of the microendoscope to remove the corrective microlens may induce movement of the GRIN lens inside the brain, both in vertical and horizontal directions. This can randomly result in change of the focal plane, death or damage of the cells, tissue inflammation, and bleeding. From our own experience, the number of animals used for this experiment is expected to be high.
  
  Reviewer #2 (Recommendations for the authors):
  
  Below, I provide a few minor corrections and suggestions for the authors to consider before final submission.
  
  (1) Page 5: when referring to Table 1 maybe add "Table 1 and Methods".
  
  Following the Reviewer’s comment, we revised the text at page 6 (lines 4-5 from bottom) as follows:
  
  “(see Supplementary Table 1 and Materials and Methods for details on simulation parameters)”.
  
  (2) Page 8: "We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long micro endoscope and the 8.8 mm-long micro endoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3AD)." I could not find the information given in this paragraph, specifically:
  
  a) Upon examining the black triangles in Figure 3I and J, the enlargement of the effective FOV does not appear to be 4.7 and 2.3 times.
  
  In Figure 3I, J, black triangles mark the intersections between the curves fitting the data and the threshold of 10 µm on the axial resolution. The values on the x-axis corresponding to the intersections (Table 1, “Effective FOV radius”) represent the estimated radius of the effective FOV of the probes, i.e. the radius within which the microendoscope has spatial resolution below the threshold of 10 μm. The ratios of the effective FOV radii are 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively, which correspond to 4.7 and 2.3 times larger FOV (Table 1). To make this point clearer, we modified the indicated sentence as follows (page 10, lines 3-11 from bottom):
  
  “We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed a relative increase of the effective FOV radius of 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively (Table 1). This corresponded to an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long microendoscope and the 8.8
  
  mm-long microendoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3A-D)."
  
  b) I do not understand how the enlargements in Figure 3I and J align with the ray trace simulations in Figure 1, indicating an enlargement of 5.4 and 5.6.
  
  In Figure 1C, E of the first submission we showed the Strehl ratio of focal spots focalized after the microendoscope, in the object plane, as a function of radial distance from the optical axis of focal spots focalized in the focal plane at the back end of the GRIN rod (“Objective focal plane” in Figure 1A, B), before the light has traveled along the GRIN lens. After reading the Referee’s comment, we realized this choice does not facilitate the comparison between Figure 1 and Figure 3I, J. We therefore decided to modify Figure 1C, E by showing the Strehl ratio of focal spots focalized after the microendoscope as a function of their radial distance from the optical axis in the objet plane (where the Strehl ratio is computed), after the light has traveled through the GRIN lens (radial distances are still computed on a plane, not along the curved focal surface represented by the “imaging plane” in Figure 1 A, B). Computing radial distances in the object space, we found that the relative increase in the radius of the FOV due to the correction of aberrations was 3.50 and 3.35 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. We also revised the manuscript text accordingly (page 7, lines 6-8):
  
  “The simulated increase in the radius of the diffraction-limited FOV was 3.50 times and 3.35 times for the 6.4 mm-long and 8.8 mm-long probe, respectively (Fig. 1C, E)”. We believe this change should facilitate the comparison of the data presented in Figure 1 and Figure 3.
  
  Moreover, in comparing results in Figure 1 and Figure 3, it is important to keep in mind that:
  
  (1) the definitions of the effective FOV radius were different in simulations (Figure 1) and real measurements (Figure 3). In simulations, we considered a theoretical criterion (Maréchal criterion) and set the lower threshold for a diffraction-limited FOV to a Strehl ratio value of 0.8. In real measures, the effective FOV radius obtained from fluorescent bead measurements was defined based on the empirical criterion of setting the upper threshold for the axial resolution to 10 µm.
  
  (2) the Zemax file of the GRIN lenses contained low-order aberrations and not high-order aberrations.
  
  (3) the small variability in some of the experimental parameters (e.g., the distance between the GRIN back end and the focusing objective) were not reflected in the simulations.
  
  Given the reasons listed above, it is expected that the prediction of the simulations do not perfectly match the experimental measurements and tend to predict larger improvements of aberration correction than the experimentally measured ones.
  
  c) Finally, how can the enlargement in Figure 3I be compared to the measurements of the sub-resolved fluorescence layers in Figures 3A-D? Could the authors please clarify these points?
  
  When comparing measurements of subresolved fluorescent films and beads it is important to keep in mind that the two measures have different purposes and spatial resolution. We used subresolved fluorescent films to visualize the shape and extent of the focal surface of microendoscopes in a continuous way along the radial dimension (in contrast to bead measurements that are quantized in space). This approach comes at the cost of spatial resolution, as we are using fluorescent layers, which are subresolved in the axial but not in the radial dimension. Therefore, fluorescent film profiles are not used in our study to extract relevant quantitative information about effective FOV enlargement or spatial resolution of corrected microendoscopes. In contrast, to quantitatively characterize axial and lateral resolutions we used measurements of 100 nm-diameter fluorescent beads (therefore subresolved in the x, y, and z dimensions) located at different radial distances from the center of the FOV, using a much smaller nominal pixel size compared to the fluorescent films (beads, lateral resolution: 0.049 µm/pixel, axial resolution: 0.5 µm/pixel; films, lateral resolution: 1.73 µm/pixel, axial resolution: 2 µm/pixel).
  
  (3) On page 15, the statement "significantly enlarge the FOV" should be more specific by providing the actual values for the increase. It would also be good to mention that this is not a xy lateral increase; rather, as one moves further from the center, more of the imaged cells belong to axially different planes.
  
  The values of the experimentally determined FOV enlargements (4.7 times and 2.3 times for 6.4 mm- and 8.8 mm-long microendoscope, respectively) are provided in Table 1 and are now referenced on page 10. Following the Referee’s request, we added the following sentence in the discussion (page 18, lines 10-14) to underline that the extended FOV samples on different axial positions because of the field curvature effect:
  
  “It must be considered, however, that the extended FOV achieved by our aberration correction method was characterized by a curved focal plane. Therefore, cells located in different radial positions within the image were located at different axial positions and cells at the border of the FOV were closer to the front end of the microendoscope”.
  
  (4) On page 36, most of the formulas appear to be corrupted. This may have occurred during the conversion to the merged PDF. Please verify this and check for similar problems in other equations throughout the text as well.
  
  We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.
  
  (5) In the discussion, the authors could potentially add comments on how the verified performance of the corrective lenses depends on the wavelength and mention the range within which the wavelength can be changed without the need to redesign a new corrective lens.
  
  Following this comments and those of other Reviewers, we explored the effect of changing wavelength on the Strehl ratio using new Zemax simulations. We found that the Strehl ratio remains > 0.8 within ± at least 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained but at different z planes (new Supplementary Figure 1A-D, right panels). These new results are now described on page 7 (lines 8-10).
  
  (6) Also, they could discuss if and how the corrective lens could be integrated into fiberscopes for freely moving experiments.
  
  Following the Referee’s suggestion, we added a short text in the Discussion (page 21, lines 4-7 from bottom). It reads:
  
  “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes(42-44), allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.
  
  (7) Finally, since the main advantage of this approach is its simplicity, the authors should also comment on or outline the steps to follow for potential users who are interested in using the corrective lenses in their systems.
  
  Thanks for this comment. The Materials and Methods section of this study and that of Antonini et al. eLife 2020 describe in details the experimental steps necessary to reproduce corrective lenses and apply them to their experimental configuration.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Suggestions for improved or additional experiments, data, or analyses, and Recommendations for improving the writing and presentation:
  
  See Public Review.
  
  Please see our point-by-point response above.
  
  (2) Minor corrections on text and figures: a) Figure 6A: is the fraction of cells expressed in %?
  
  Author response: yes, that is correct. Thank you for spotting it. We added the “%” symbol to the y label.
  
  b) Figurer 8A, left: The second line is blue and not red dashed. In addition, it could be interesting to also show a line corresponding to the 0 value.
  
  Thank you for the suggestions. We modified Figure 8 according to the Referee’s comments.
  
  c) Some parts of equation (1) and some variables in the Material and Methods section are missing
  
  We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.
  
  d) In the methods, the authors mention a calibration ruler with ticks spaced every 10 µm along two orthogonal directions and refer to the following product: 4-dot calibration slide, Cat. No. 1101002300142, Motic, Hong Kong. However, this product does not seem to correspond to a calibration ruler.
  
  We double check. The catalog number 1101002300142 is correct and product details can be found at the following link:
  
  https://moticmicroscopes.com/products/calibration-slide-4-dots-1101002300142?srsltid=AfmBOorGYx9PcXtAlIMmSs_tEpxS4nX21qIcV8Kfn4qGwizQK3LYOQn3
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.24.604890v2
www.biorxiv.org www.biorxiv.org

New submission 26/09/2023, 09:10:48

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.
  
  Reviewer #1 (Recommendations For The Authors):
  
  As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.
  
  Author response image 1.
  
  Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.
  
  R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.
  
  This new experiment has been added to the revised manuscript
  
  Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”
  
  (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.
  
  Author response image 2.
  
  Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.
  
  R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.
  
  (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.
  
  R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.
  
  (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.
  
  R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.
  
  In revision, we have clarified the role of the RL framework
  
  Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.
  
  Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”
  
  (5) Some comments on the writing:
  
  The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.
  
  R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.
  
  (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.
  
  R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”
  
  (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.
  
  R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:
  
  Thus,
  
  This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.
  
  Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”
  
  Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”
  
  Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”
  
  (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.
  
  R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,
  
  Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”
  
  (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?
  
  R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.
  
  (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.
  
  R10: Again, we are sorry about the confusion. Please see the revised text in R8.
  
  (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.
  
  R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”
  
  (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.
  
  R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”
  
  Reviewer #2 (Recommendations For The Authors):
  
  I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?
  
  Author response image 3.
  
  RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.
  
  R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.
  
  This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.
  
  Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.
  
  In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:
  
  Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”
  
  Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”
  
  (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.
  
  Author response image 4.
  
  Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.
  
  R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.
  
  We revised the original manuscript, and added this new experiment
  
  Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”
  
  (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).
  
  R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.
  
  Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”
  
  (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?
  
  R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.
  
  Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”
  
  Reviewer #3 (Public Review):
  
  (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.
  
  Author response image 5.
  
  Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.
  
  R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.
  
  In revision, we have clarified the rationale of this study
  
  Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”
  
  (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.
  
  R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:
  
  (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.
  
  (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.
  
  (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.
  
  In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.
  
  (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.
  
  R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.
  
  Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).
  
  Author response image 6.
  
  Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.
  
  Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.
  
  Author response image 7.
  
  Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.
  
  (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?
  
  R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.
  
  In revision, we have clarified the role of the RL framework
  
  Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.
  
  Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”
  
  (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.
  
  Author response image 8.
  
  Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.
  
  R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.
  
  (6）Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?
  
  R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.
  
  Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”
  
  We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.
  
  Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.
  
  Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.
  
  Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.
  
  Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.
  
  MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.
  
  Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.
  
  Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.
  
  Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.12.30.522364v3
www.biorxiv.org www.biorxiv.org

Multi-level processing of emotions in life motion signals revealed through pupil responses

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment:
  
  This study presents an important finding on the implicit and automatic emotion perception from biological motion (BM). The evidence supporting the claims of the authors is solid, although inclusion of a larger number of samples and more evidence for the discrepancy between Intact and local emotional BMs would have strengthened the study. The work will be of broad interest to perceptual and cognitive neuroscience.
  
  We express our sincere gratitude for the positive and constructive evaluation of our manuscript. We have now included more participants and conducted a replication experiment to strengthen our results.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Tian et al. investigated the effects of emotional signals in biological motion on pupil responses. In this study, subjects were presented with point-light biological motion stimuli with happy, neutral, and sad emotions. Their pupil responses were recorded with an eye tracker. Throughout the study, emotion type (i.e., happy/sad/neutral) and BM stimulus type (intact/inverted/non-BM/local) were systematically manipulated. For intact BM stimuli, happy BM induced a larger pupil diameter than neutral BM, and neutral BM also induced a larger pupil diameter than sad BM. Importantly, the diameter difference between happy and sad BM correlated with the autistic trait of individuals. These effects disappeared for the inverted BM and non-BM stimuli. Interestingly, both happy and sad emotions show superiority in pupil diameter.
  
  Strengths:
  
  (1) The experimental conditions and results are very easy to understand.
  
  (2) The writing and data presentation are clear.
  
  (3) The methods are sound. I have no problems with the experimental design and results.
  
  Weaknesses:
  
  (1) My main concern is the interpretation of the intact and local condition results. The processing advantage of happy emotion is not surprising given a number of existing studies. However, the only difference here seems to be the smaller (or larger) pupil diameter for sad compared to neutral in the intact (or local, respectively) condition. The current form only reports this effect but lacks in-depth discussions and explanations as to why this is the case.
  
  Thanks for pointing this out, our apology for not making this point clear. It has long been documented that pupil size reflects the degree of cognitive effort and attention input (Joshi & Gold, 2019; van der Wel & van Steenbergen, 2018), and indexes the noradrenalin activity in emotion processing structures like amygdala (Dal Monte et al., 2015; Harrison et al., 2006; Liddell et al., 2005). Accordingly, we proposed that the smaller pupil response observed under the sad condition as compared to the neutral condition is because the sad biological motion (BM) could be less efficient in attracting visual attention and evoking emotional arousal. In line with this, it has been found that infants looked more at the neutral point-light walker when displayed in pair with the sad walker (Ogren et al., 2019), suggesting that the sad BM is less effective in capturing visual attention than the neutral BM. Besides, neural studies have revealed that, compared with other emotions (anger, happiness, disgust, and fear), the processing of sad emotion failed to evoke heightened activities in any emotionally relevant brain regions including the amygdala, the extrastriate body area (EBA) and the fusiform body area (FBA) (Peelen et al., 2007)(Peelen et al., 2007). The current study echoed with these previous findings by demonstrating a disadvantage for intact sad BM in evoking pupil responses. Notably, different from the intact sad BM, the local sad BM would instead induce stronger pupil responses than the neutral local BM. This distinctive pupil modulation effect observed in intact and local sad BM could be explained as a multi-level emotion processing model of BM. Specifically, even though both the intact and local BM conveyed important life information (Chang & Troje, 2008, 2009; Simion et al., 2008), the latter is deprived of the global form feature. Hence, the processing of emotions in local BM may occur at a more basic and preliminary level, responding to the general affective salient emotion information (happy and sad) without detailed analysis. In fact, similar dissociated emotion processing phenomenon has been observed in another important type of emotional signal with analogous function (i.e., facial expression). For example, happy and fearful faces elicited differential amygdala activations when perceived consciously. However, they elicited comparable amygdala activations when suppressed (Williams et al., 2004). Moreover, it has been proposed that there exist two parallel routes for facial expression processing: a quick but coarse subcortical route that detects affective salient information without detailed analysis, and a fine-grained but slow cortical route that discriminates the exact emotion type. Similarly, the dissociated emotion processing in local and intact BM may function in the same manner, with the former serving as a primary emotion detection mechanism and the latter serving as a detailed emotion discrimination mechanism. Still, future studies adopting more diverse experimental paradigms and neuroimaging techniques were needed to further investigate this issue. We have added these points and more thoroughly discussed the potential mechanism in the revised text (see lines 329-339, 405-415, 418-420).
  
  References:
  
  Chang, D. H. F., & Troje, N. F. (2008). Perception of animacy and direction from local biological motion signals. Journal of Vision, 8(5), 3. https://doi.org/10.1167/8.5.3
  
  Chang, D. H. F., & Troje, N. F. (2009). Characterizing global and local mechanisms in biological motion perception. Journal of Vision, 9(5), 8–8. https://doi.org/10.1167/9.5.8
  
  Dal Monte, O., Costa, V. D., Noble, P. L., Murray, E. A., & Averbeck, B. B. (2015). Amygdala lesions in rhesus macaques decrease attention to threat. Nature Communications, 6(1). https://doi.org/10.1038/ncomms10161
  
  Harrison, N. A., Singer, T., Rotshtein, P., Dolan, R. J., & Critchley, H. D. (2006). Pupillary contagion: central mechanisms engaged in sadness processing. Social Cognitive and Affective Neuroscience, 1(1), 5–17. https://doi.org/10.1093/scan/nsl006
  
  Joshi, S., & Gold, J. I. (2019). Pupil size as a window on neural substrates of cognition. Trends in Cognitive Sciences, 24(6), 466–480. https://doi.org/10.31234/osf.io/dvsme
  
  Liddell, B. J., Brown, K. J., Kemp, A. H., Barton, M. J., Das, P., Peduto, A., Gordon, E., & Williams, L. M. (2005). A direct brainstem–amygdala–cortical ‘alarm’ system for subliminal signals of fear. NeuroImage, 24(1), 235–243.
  
  Ogren, M., Kaplan, B., Peng, Y., Johnson, K. L., & Johnson, S. P. (2019). Motion or emotion: infants discriminate emotional biological motion based on low-level visual information. Infant Behavior and Development, 57, 101324. https://doi.org/10.1016/j.infbeh.2019.04.006
  
  Peelen, M. V., Atkinson, A. P., Andersson, F., & Vuilleumier, P. (2007). Emotional modulation of body-selective visual areas. Social Cognitive and Affective Neuroscience, 2(4), 274–283. https://doi.org/10.1093/scan/nsm023
  
  Simion, F., Regolin, L., & Bulf, H. (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. https://doi.org/10.1073/pnas.0707021105
  
  van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of effort in cognitive control tasks: a review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y
  
  Williams, M. A., Morris, A. P., McGlone, F., Abbott, D. F., & Mattingley, J. B. (2004). Amygdala responses to fearful and happy facial expressions under conditions of binocular suppression. Journal of Neuroscience, 24(12), 2898-2904.
  
  (2) I also found no systematic discussion and theoretical contributions regarding the correlation with the autistic traits. If the main point of this paper is to highlight an implicit and objective behavioral marker of the autistic trait, more interpretation and discussion of the links between the results and existing findings in ASD are needed.
  
  We thank the reviewer for this insightful suggestion. The perception of biological motion (BM) has long been considered an important hallmark of social cognition. Abundant studies reported that individuals with social cognitive deficits (e.g., ASD) were impaired in BM perception (Blake et al., 2003; Freitag et al., 2008; Klin et al., 2009; Nackaerts et al., 2012). More recently, it has been pointed out that the extraction of more complex social information (e.g., emotions, intentions) from BM, as compared to basic BM recognitions, could be more effective in detecting ASDs (Federici et al., 2020; Koldewyn et al., 2009; Parron et al., 2008; Todorova et al., 2019). Specifically, a meta-analysis found that the effect size expanded nearly twice when the task required emotion recognition as compared to simple perception/detection (Todorova et al., 2019). However, for the high-functioning ASD individuals, it has been reported that they showed comparable performance with the control group in explicitly labelling BM emotions, while their responses were rather delayed (Mazzoni et al., 2021). This suggested that ASD individuals could adopt compensatory strategies to complete the explicit BM labelling task, while their automatic behavioural responses remained impaired. This highlights the importance of using more objective measures that do not rely on active reports to investigate the intrinsic perception of emotions from BM and its relationship with ASD-related social deficits. The current study thus introduced the pupil size measurement to this field, and we combined it with the passive viewing task to investigate the more automatic aspect of BM emotion processing. More importantly, in addition to diagnostic ASDs, the non-clinical general population also manifested autistic tendencies that followed normal distribution and demonstrated substantial heritability (Hoekstra et al., 2007). Here, we focused on the autistic tendencies in the general population, and our results showed that pupil modulations by BM emotions were indicative of individual autistic traits. Specifically, passively viewing the happy BMs evoked larger pupil responses than the sad BMs, while such emotional modulation diminished with the increase of autistic tendencies. More detailed test-retest examination further illustrated such a correlation was driven by the general diminishment in pupil modulation effects by emotional BM (happy or sad) for individuals with high autistic tendencies. This finding demonstrated that the automatic emotion processing of BM stimuli was impaired in individuals with high autistic tendencies, lending support to previous studies (Hubert et al., 2006; Nackaerts et al., 2012; Parron et al., 2008). This indicated the utility of emotional BM stimuli and pupil measurement in identifying ASD-related tendencies in both clinical and non-clinical populations. We have added these points to the revised text (see lines 347-375).
  
  References:
  
  Blake, R., Turner, L. M., Smoski, M. J., Pozdol, S. L., & Stone, W. L. (2003). Visual recognition of biological motion is impaired in children with autism. Psychological Science, 14(2), 151–157. https://doi.org/10.1111/1467-9280.01434
  
  Federici, A., Parma, V., Vicovaro, M., Radassao, L., Casartelli, L., & Ronconi, L. (2020). Anomalous perception of biological motion in autism: a conceptual review and meta-analysis. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-61252-3
  
  Freitag, C. M., Konrad, C., Häberlen, M., Kleser, C., von Gontard, A., Reith, W., Troje, N. F., & Krick, C. (2008). Perception of biological motion in autism spectrum disorders. Neuropsychologia, 46(5), 1480–1494. https://doi.org/10.1016/j.neuropsychologia.2007.12.025
  
  Hoekstra, R. A., Bartels, M., Verweij, C. J. H., & Boomsma, D. I. (2007). Heritability of autistic traits in the general population. Archives of Pediatrics & Adolescent Medicine, 161(4), 372. https://doi.org/10.1001/archpedi.161.4.372
  
  Hubert, B., Wicker, B., Moore, D. G., Monfardini, E., Duverger, H., Fonséca, D. D., & Deruelle, C. (2006). Brief report: recognition of emotional and non-emotional biological motion in individuals with autistic spectrum disorders. Journal of Autism and Developmental Disorders, 37(7), 1386–1392. https://doi.org/10.1007/s10803-006-0275-y
  
  Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. https://doi.org/10.1038/nature07868
  
  Koldewyn, K., Whitney, D., & Rivera, S. M. (2009). The psychophysics of visual motion and global form processing in autism. Brain, 133(2), 599–610. https://doi.org/10.1093/brain/awp272
  
  Mazzoni, N., Ricciardelli, P., Actis-Grosso, R., & Venuti, P. (2021). Difficulties in recognising dynamic but not static emotional body movements in autism spectrum disorder. Journal of Autism and Developmental Disorders, 52(3), 1092–1105. https://doi.org/10.1007/s10803-021-05015-7
  
  Nackaerts, E., Wagemans, J., Helsen, W., Swinnen, S. P., Wenderoth, N., & Alaerts, K. (2012). Recognizing biological motion and emotions from point-light displays in autism spectrum disorders. PLoS ONE, 7(9), e44473. https://doi.org/10.1371/journal.pone.0044473
  
  Parron, C., Da Fonseca, D., Santos, A., Moore, D. G., Monfardini, E., & Deruelle, C. (2008). Recognition of biological motion in children with autistic spectrum disorders. Autism, 12(3), 261–274. https://doi.org/10.1177/1362361307089520
  
  Todorova, G. K., Hatton, R. E. M., & Pollick, F. E. (2019). Biological motion perception in autism spectrum disorder: a meta-analysis. Molecular Autism, 10(1). https://doi.org/10.1186/s13229-019-0299-8
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Through a series of four experiments, Yuan, Wang and Jiang examined pupil size responses to emotion signals in point-light motion stimuli. Experiment 1 examined upright happy, sad and neutral point-light biological motion (BM) walkers. The happy BM induced a significantly larger pupil response than the neutral, whereas the sad BM evoked a significantly smaller pupil size than the neutral BM. Experiment 2 examined inverted BM walkers. Experiment 3 examined BM stimuli with acceleration removed. No significant effects of emotion were found in neither Experiment 2 nor Experiment 3. Experiment 4 examined scrambled BM stimuli, in which local motion features were preserved while the global configuration was disrupted. Interestingly, the scrambled happy and sad BM led to significantly greater pupil size than the scrambled neutral BM at a relatively early time, while no significant difference between the scrambled happy and sad BM was found. Thus, the authors argue that these results suggest multi-level processing of emotions in life motion signals.
  
  Strengths:
  
  The experiments were carefully designed and well-executed, with point-light stimuli that eliminate many potential confounding effects of low-level visual features such as luminance, contrast, and spatial frequency.
  
  Weaknesses:
  
  Correlation results with limited sample size should be interpreted with extra caution.
  
  Thanks for pointing this out. To strengthen the correlation results, we have conducted a replication experiment (Exp.1b) and added a test-retest examination to further assess the reliability of our measurements. Specifically, a new group of 24 participants (16 females, 8 males) were recruited to perform the identical experiment procedure as in Experiment 1. Then, after at least seven days, they were asked to return to the lab for a retest. The results successfully replicated the previously reported main effect of emotional condition in both the first test (F(2, 46) = 12.0, p < .001, ηp2 = 0.34, Author response image 1A) and the second test (F(2, 46) = 14.8, p < .001, ηp2 = 0.39, Author response image 1B). The happy BM induced a significantly larger pupil response than the neutral BM (First Test: t(23) = 2.60, p = .022, Cohen’s d = 0.53, 95% CI for the mean difference = [0.02, 0.14], Holm-corrected, p = .048 after Bonferroni correction, Author response image 1A; Second Test: t(23) = 3.36, p = .005, Cohen’s d = 0.68, 95% CI for the mean difference = [0.06, 0.24], Holm-corrected, p = .008 after Bonferroni correction, Author response image 1B). On the contrary, the sad BM induced a significantly smaller pupil response than the neutral BM (First Test: t(23) = -2.77, p = .022, Cohen’s d = 0.57, 95% CI for the mean difference = [-0.19, -0.03], Holm-corrected, p = .033 after Bonferroni correction; Second Test: t(23) = -3.19, p = .005, Cohen’s d = 0.65, 95% CI for the mean difference = [-0.24, -0.05], Holm-corrected, p = .012 after Bonferroni correction, Author response image 1B). Besides, the happy BM induced significantly larger pupil response than the sad BM (first test: t(23) = 4.23, p < .001, Cohen’s d = 0.86, 95% CI for the mean difference = [0.10, 0.28], Holm-corrected, p < .001 after Bonferroni correction, Author response image 1A; second test: t(23) = 4.26, p < .001, Cohen’s d = 0.87, 95% CI for the mean difference = [0.15, 0.44], Holm-corrected, p < .001 after Bonferroni correction, Author response image 1B). The results of the cluster-based permutation analysis were also similar (see Supplementary Material for more details).
  
  Author response image 1.
  
  Normalized mean pupil responses in the replication experiment (Experiment 1b) of Experiment 1a and its retest, using the neutral condition as baseline, plotted against happy and sad conditions. (A) In the first test, the group average pupil response to happy intact BM is significantly larger than that to sad and neutral BM, while the pupil response induced by sad BM is significantly smaller than that evoked by neutral BM, replicating the results of Experiment 1a. (B) Moreover, such results were similarly found in the second test.
  
  Notably, we successfully replicated the negative correlation between the happy over sad dilation effect and individual autistic traits in the first test (r(23) = -0.46, p = .023, 95% CI for the mean difference = [-0.73, -0.07], Author response image 2A). No other significant correlations were found (see Author response image 2B-C). Moreover, in the second test, such a correlation was similarly found and was even stronger (r(23) = -0.61, p = .002, 95% CI for the mean difference = [-0.81, -0.27], Author response image 2D). We‘ve also performed a test-retest reliability analysis on the happy over sad pupil dilation effect and the AQ score. The results showed robust correlations. See Author response table 1 for more details.
  
  Author response table 1.
  
  Reliability of pupil size and AQ indices.
  
  Importantly, in the second test, we’ve also observed a significant negative correlation between AQ and the happy minus neutral pupil dilation effect (r(23) = -0.44, p = .032, 95% CI for the mean difference = [-0.72, -0.04], Author response image 2E), and a significant positive correlation between the sad minus neutral pupil size and AQ (r(23) = 0.50, p = .014, 95% CI for the mean difference = [0.12, 0.75], Author response image 2F). This indicated that the overall correlation between happy over sad dilation effect and AQ was driven both by the diminished happy dilation effect as well as the sad constriction effect. Overall, our replication experiment consistently found a significant negative correlation between AQ and happy over sad dilation effect both in the test and the retest. Moreover, it revealed that such an effect was contributed by both a negative correlation between AQ and happy-neutral pupil response and a positive correlation between AQ and sad-neutral pupil response, demonstrating a general impairment in BM emotion perception (happy or sad) for individuals with high autistic tendencies. This also indicated the utility of adopting a test-retest pupil examination to more precisely detect individual autistic tendencies. We have added these points in the revised text (see lines 135-173, lines 178-180).
  
  Author response image 2.
  
  Correlation results for pupil modulation effects and AQ scores in the replication experiment (Experiment 1b) of Experiment 1a and its retest. (A) We replicated the negative correlation between the happy over sad pupil dilation effect and AQ in the first test. (B-C) No other significant correlations were found. (D) In the second test, the negative correlation between the happy over sad pupil dilation effect and AQ was similarly observed and even stronger. (E-F) Moreover, the happy vs. neutral pupil dilation effect and the sad vs. neutral pupil constriction effect respectively correlate with AQ in the second test.
  
  It would be helpful to add discussions as a context to compare the current results with pupil size reactions to emotion signals in picture stimuli.
  
  Thanks for this this thoughtful comment. The modulation of emotional information on pupil responses has been mostly investigated using picture stimuli. Bradley et al. (2008) first demonstrated that humans showed larger pupil responses towards emotional images as compared to neutral images, while no difference was observed between the positive and negative images. This was regarded as the result of increased sympathetic activity induced by emotional arousal that is independent of the emotional valence. Similar results have been replicated with different presentation durations, repetition settings, and tasks (Bradley & Lang, 2015; Snowden et al., 2016). However, the emotional stimuli adopted in these studies were mostly complicated scene images that conveyed rather general emotional information. When it comes to the specific emotion cues (e.g., fear, anger, happy, sad) delivered by our conspecifics through biologically salient signals (e.g., faces, gestures, voices), the results became intermixed. Some studies demonstrated that fearful, disgusted, and angry static faces induced larger pupil sizes than the neutral face, while sad and happy faces failed to induce such pupil dilatory effects (Burley et al., 2017). In contrast, other studies observed larger pupil responses for happy faces as compared to sad and fearful faces (Aktar et al., 2018; Burley & Daughters, 2020; Jessen et al., 2016). These conflicting results could be due to the low-level confounds of emotional faces (e.g., eye size) (Carsten et al., 2019; Harrison et al., 2006). Similar to faces, BM also conveyed salient clues concerning the emotional states of our interactive partners. However, they were highly simplified, deprived of various irrelevant visual confounders (e.g., body shape). Here, we reported that the happy BM induced a stronger pupil response than the neutral and sad BM, lending support to the happy dilation effect observed with faces (Burley & Daughters, 2020; Prunty et al., 2021). Moreover, it helps ameliorate the concern regarding the low-level confounding factors by identifying similar pupil modulations in another type of social signal with distinctive perceptual features. We have added these points to the revised text (see lines 301-321).
  
  References:
  
  Aktar, E., Mandell, D. J., de Vente, W., Majdandžić, M., Oort, F. J., van Renswoude, D. R., Raijmakers, M. E. J., & Bögels, S. M. (2018). Parental negative emotions are related to behavioral and pupillary correlates of infants’ attention to facial expressions of emotion. Infant Behavior and Development, 53, 101–111. https://doi.org/10.1016/j.infbeh.2018.07.004
  
  Bradley, M. M., & Lang, P. J. (2015). Memory, emotion, and pupil diameter: repetition of natural scenes. Psychophysiology, 52(9), 1186–1193. https://doi.org/10.1111/psyp.12442
  
  Bradley, M. M., Miccoli, L., Escrig, M. A., & Lang, P. J. (2008). The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology, 45(4), 602–607. https://doi.org/10.1111/j.1469-8986.2008.00654.x
  
  Burley, D. T., & Daughters, K. (2020). The effect of oxytocin on pupil response to naturalistic dynamic facial expressions. Hormones and Behavior, 125, 104837. https://doi.org/10.1016/j.yhbeh.2020.104837
  
  Burley, D. T., Gray, N. S., & Snowden, R. J. (2017). As far as the eye can see: relationship between psychopathic traits and pupil response to affective stimuli. PLOS ONE, 12(1), e0167436. https://doi.org/10.1371/journal.pone.0167436
  
  Carsten, T., Desmet, C., Krebs, R. M., & Brass, M. (2019). Pupillary contagion is independent of the emotional expression of the face. Emotion, 19(8), 1343–1352. https://doi.org/10.1037/emo0000503
  
  Harrison, N. A., Singer, T., Rotshtein, P., Dolan, R. J., & Critchley, H. D. (2006). Pupillary contagion: central mechanisms engaged in sadness processing. Social Cognitive and Affective Neuroscience, 1(1), 5–17. https://doi.org/10.1093/scan/nsl006
  
  Jessen, S., Altvater-Mackensen, N., & Grossmann, T. (2016). Pupillary responses reveal infants’ discrimination of facial emotions independent of conscious perception. Cognition, 150, 163–169. https://doi.org/10.1016/j.cognition.2016.02.010
  
  Prunty, J. E., Keemink, J. R., & Kelly, D. J. (2021). Infants show pupil dilatory responses to happy and angry facial expressions. Developmental Science, 25(2). https://doi.org/10.11<br /> 11/desc.13182
  
  Snowden, R. J., O’Farrell, K. R., Burley, D., Erichsen, J. T., Newton, N. V., & Gray, N. S. (2016). The pupil’s response to affective pictures: role of image duration, habituation, and viewing mode. Psychophysiology, 53(8), 1217–1223. https://doi.org/10.1111/psyp.12668
  
  Overall, I think this is a well-written paper with solid experimental results that support the claim of the authors, i.e., the human visual system may process emotional information in biological motion at multiple levels. Given the key role of emotion processing in normal social cognition, the results will be of interest not only to basic scientists who study visual perception, but also to clinical researchers who work with patients of social cognitive disorders. In addition, this paper suggests that examining pupil size responses could be a very useful methodological tool to study brain mechanisms underlying emotion processing.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The overarching goal of the authors was to understand whether emotional information conveyed through point-light biological motion can trigger automatic physiological responses, as reflected in pupil size.
  
  Strengths:
  
  This manuscript has several noticeable strengths: it addresses an intriguing research question that fills that gap in existing literature, presents a clear and accurate presentation of the current literature, and conducts a series of experiments and control experiments with adequate sample size. Yet, it also entails several noticeable limitations - especially in the study design and statistical analyses.
  
  Weaknesses:
  
  (1) Study design:
  
  (1.1) Dependent variable:
  
  Emotional attention is known to modulate both microsaccades and pupil size. Given the existing pupillometry data that the authors have collected, it would be both possible and valuable to determine whether the rate of microsaccades is also influenced by emotional biological motion.
  
  We thank the reviewer for this advice. Microsaccades functioned as a mechanism to maintain visibility by continuously shifting the retinal image to overcome visual adaptation (Martinez-Conde et al., 2006). Moreover, it was found to be sensitive to attention processes (Baumeler et al., 2020; Engbert & Kliegl, 2003b; Meyberg et al., 2017), and could reflect the activity of superior colliculus (SC) and other related brain areas (Martinez-Conde et al., 2009, 2013). Previous studies have found that, compared with neutral and pleasant images, unpleasant images significantly inhibit early microsaccade rates (Kashihara, 2020; Kashihara et al., 2013). This is regarded as the result of retaining previous crucial information at the sacrifice of updating new visual input. We agree with the reviewer that it would be valuable to investigate whether emotional information conveyed by BM could modulate microsaccades. However, it should be noted that our data collection and experimental design are not optimized for this purpose. This is because we have only recorded the left eye’s data, while abundant methodological studies have doubted the reliability of using only one eye’s data to analyze microsaccades (Fang et al., 2018; Hauperich et al., 2020; Nyström et al., 2017) and suggested that the microsaccades should be defined by spontaneous binocular eye movement (Engbert & Kliegl, 2003a, 2003b). Besides, according to Kashihara et al. (2013), participants showed differential microsaccade rates after the stimuli disappeared so as to maintain the previously observed different emotional information. However, in the current study, we discarded the data after the stimuli disappeared, making it impossible to analyze the microsaccade data after the stimuli disappeared. Despite these disadvantages, we have attempted to analyze the microsaccade rate during the stimuli presentation using only the left eye’s data. Specifically, we applied the algorithm developed by Otero-Millan et al. (2014) (minimum duration =6 ms, maximum amplitude = 1.5 degrees, maximum velocity = 150 degrees/sec) to the left eye’s data from 100 ms before to 4000 ms after stimulus onset. Subsequently, we calculated the microsaccade rates using a moving window of 100 ms (stepped in 1 ms) (Engbert & Kliegl, 2003b; Kashihara et al., 2013). The microsaccade rate displayed a typical curve, with suppression shortly after stimulus appearance (inhibition phase), followed by an increased rate of microsaccade occurrence (rebound phase). The cluster-based permutation analysis was then applied to explore the modulation of BM emotions on microsaccade rates. However, no significant differences among different emotional conditions (happy, sad, neutral) were found for the four experiments.
  
  Author response image 3.
  
  Time-series change in the microsaccade rates to happy, sad, and neutral BM in Experiments 1-4. Solid lines represent microsaccade rates under each emotional condition as a function of time (happy: red; sad: blue; neutral: gray); shaded areas represent the SEM between participants. No significant differences were found after cluster-based permutation correction for the four experiments.
  
  It is important to note that the microsaccade rate analysis was conducted on only the left eye’s data and that the experiment design is not optimized for this analysis, thus, extra caution should be exercised in interpreting the results. Still, we found it very innovative and important to combine the microsaccade index with the pupil size to holistically investigate the processing of emotional information in BM, and future studies are highly needed to adopt more suitable recording techniques and experiment designs to further probe this issue. We have discussed this issue in the revised text (see lines 339-344).
  
  References:
  
  Baumeler, D., Schönhammer, J. G., & Born, S. (2020). Microsaccade dynamics in the attentional repulsion effect. Vision Research, 170, 46–52. https://doi.org/10.1016/j.visres.2020.03.009
  
  Engbert, R., & Kliegl, R. (2003a). Binocular coordination in microsaccades. In The Mind’s Eye (pp. 103–117). Elsevier. https://doi.org/10.1016/b978-044451020-4/50007-4
  
  Engbert, R., & Kliegl, R. (2003b). Microsaccades uncover the orientation of covert attention. Vision Research, 43(9), 1035–1045. https://doi.org/10.1016/s0042-6989(03)00084-1
  
  Fang, Y., Gill, C., Poletti, M., & Rucci, M. (2018). Monocular microsaccades: do they really occur? Journal of Vision, 18(3), 18. https://doi.org/10.1167/18.3.18
  
  Hauperich, A.-K., Young, L. K., & Smithson, H. E. (2020). What makes a microsaccade? a review of 70 years research prompts a new detection method. Journal of Eye Movement Research, 12(6). https://doi.org/10.16910/jemr.12.6.13
  
  Kashihara, K. (2020). Microsaccadic modulation evoked by emotional events. Journal of Physiological Anthropology, 39(1). https://doi.org/10.1186/s40101-020-00238-6
  
  Kashihara, K., Okanoya, K., & Kawai, N. (2013). Emotional attention modulates microsaccadic rate and direction. Psychological Research, 78(2), 166–179. https://doi.org/10.1007/s00426-013-0490-z
  
  Martinez-Conde, S., Macknik, S. L., Troncoso, X. G., & Dyar, T. A. (2006). Microsaccades counteract visual fading during fixation. Neuron, 49(2), 297–305. https://doi.org/10.1016/j.neuron.2005.11.033
  
  Martinez-Conde, S., Macknik, S. L., Troncoso, X. G., & Hubel, D. H. (2009). Microsaccades: a neurophysiological analysis. Trends in Neurosciences, 32(9), 463–475. https://doi.org/10.1016/j.tins.2009.05.006
  
  Martinez-Conde, S., Otero-Millan, J., & Macknik, S. L. (2013). The impact of microsaccades on vision: towards a unified theory of saccadic function. Nature Reviews Neuroscience, 14(2), 83–96. https://doi.org/10.1038/nrn3405
  
  Meyberg, S., Sinn, P., Engbert, R., & Sommer, W. (2017). Revising the link between microsaccades and the spatial cueing of voluntary attention. Vision Research, 133, 47–60. https://doi.org/10.1016/j.visres.2017.01.001
  
  Nyström, M., Andersson, R., Niehorster, D. C., & Hooge, I. (2017). Searching for monocular microsaccades – a red hering of modern eye trackers? Vision Research, 140, 44–54. https://doi.org/10.1016/j.visres.2017.07.012
  
  Otero-Millan, J., Castro, J. L. A., Macknik, S. L., & Martinez-Conde, S. (2014). Unsupervised clustering method to detect microsaccades. Journal of Vision, 14(2), 18–18. https://doi.org/10.1167/14.2.18
  
  (1.2) Stimuli:
  
  It appears that the speed of the emotional biological motion stimuli mimics the natural pace of the emotional walker. What is the average velocity of the biological motion stimuli for each condition?
  
  Thanks for pointing out this issue. The neutral and emotional (sad or happy) BM stimuli are equal in walking speed (one step for one second, 1Hz). We have also computed their physical velocity by calculating the Euclidean distance in pixel space of each key point between adjacent frames (Poyo Solanas et al., 2020). The velocity was 5.76 pixels/frame for the happy BM, 4.14 pixels/frame for the neutral BM, and 3.21 pixels/frame for the sad BM. This difference in velocity profile was considered an important signature for conveying emotional information, as the happy walker was characterized by a larger step pace and longer arm swing and the sad walker would instead exhibit a slouching gait with short slow strides and smaller arm movement (Barliya et al., 2012; Chouchourelou et al., 2006; Halovic & Kroos, 2018; Roether et al., 2009). More importantly, our current results could not be explained by the differences in velocities. This is because the inverted emotional BM with identical velocity characteristics failed to induce any modulations on pupil responses. Furthermore, the local sad and happy BM differed the most in velocity feature, while they induced similar modulations on pupil sizes. We have added these points in the revised text (see lines 254-257, 484-491).
  
  References:
  
  Barliya, A., Omlor, L., Giese, M. A., Berthoz, A., & Flash, T. (2012). Expression of emotion in the kinematics of locomotion. Experimental Brain Research, 225(2), 159–176. https://doi.org/10.1007/s00221-012-3357-4
  
  Chouchourelou, A., Matsuka, T., Harber, K., & Shiffrar, M. (2006). The visual analysis of emotional actions. Social Neuroscience, 1(1), 63–74. https://doi.org/10.1080/17470910600630599
  
  Halovic, S., & Kroos, C. (2018). Not all is noticed: kinematic cues of emotion-specific gait. Human Movement Science, 57, 478–488. https://doi.org/10.1016/j.humov.2017.11.008
  
  Poyo Solanas, M., Vaessen, M. J., & de Gelder, B. (2020). The role of computational and subjective features in emotional body expressions. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-63125-1
  
  Roether, C. L., Omlor, L., Christensen, A., & Giese, M. A. (2009). Critical features for the perception of emotion from gait. Journal of Vision, 9(6), 15–15. https://doi.org/10.1167/9.6.15
  
  When the authors used inverted biological motion stimuli, they didn't observe any modulation in pupil size. Could there be a difference in microsaccades when comparing inverted emotional biological motion stimuli?
  
  Thanks for this consideration. Both microsaccades and pupil size can provide valuable insights into the underlying neural dynamics of attention and cognitive control (Baumeler et al., 2020; Engbert & Kliegl, 2003; Meyberg et al., 2017). Notably, previous studies have shown that the microsaccades and pupil sizes could be similar and highly correlated in reflecting various cognitive processes, such as multisensory integration, inhibitory control, and cognitive load (Krejtz et al., 2018; Wang et al., 2017; Wang & Munoz, 2021). Moreover, the generation of both microsaccades and pupil responses would involve shared neural circuits, including the midbrain structure superior colliculus (SC) and the noradrenergic system (Hafed et al., 2009; Hafed & Krauzlis, 2012; Wang et al., 2012). However, the pupil size could be more sensitive than microsaccade rates in contexts such as affective priming (Krejtz et al., 2020) and decision formation (Strauch et al., 2018). Moreover, abundant former studies have all shown that inversion would significantly disrupt the perception of emotions from BM (Atkinson et al., 2007; Dittrich et al., 1996; Spencer et al., 2016; Yuan et al., 2022, 2023). Overall, it is unlikely for the microsaccade rates to show significant differences when comparing inverted emotional biological motion stimuli. Besides, we have attempted to analyze the microsaccade rate in the inverted BM situation, while our results showed no significant differences (see also Point 1.1, Author response image 3). Still, it is needed for future studies to combine the microsaccade index and pupil size to provide a thorough understanding of BM emotion processing. We have discussed this issue in the revised text (see lines 339-344).
  
  References:
  
  Atkinson, A. P., Tunstall, M. L., & Dittrich, W. H. (2007). Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition, 104(1), 59–72. https://doi.org/10.1016/j.cognition.2006.05.005
  
  Baumeler, D., Schönhammer, J. G., & Born, S. (2020). Microsaccade dynamics in the attentional repulsion effect. Vision Research, 170, 46–52. https://doi.org/10.1016/j.visres.2020.03.009
  
  Dittrich, W., Troscianko, T., Lea, S., & Morgan, D. (1996). Perception of emotion from dynamic point-light displays represented in dance. Perception, 25(6), 727–738. https://doi.org/10.1068/p250727
  
  Engbert, R., & Kliegl, R. (2003). Microsaccades uncover the orientation of covert attention. Vision Research, 43(9), 1035–1045. https://doi.org/10.1016/s0042-6989(03)00084-1
  
  Hafed, Z. M., Goffart, L., & Krauzlis, R. J. (2009). A neural mechanism for microsaccade generation in the primate superior colliculus. Science, 323(5916), 940–943. https://doi.org/10.1126/science.1166112
  
  Hafed, Z. M., & Krauzlis, R. J. (2012). Similarity of superior colliculus involvement in microsaccade and saccade generation. Journal of neurophysiology, 107(7), 1904-1916.
  
  Krejtz, K., Duchowski, A. T., Niedzielska, A., Biele, C., & Krejtz, I. (2018). Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. Plos One, 13(9), e0203629. https://doi.org/10.1371/journal.pone.0203629
  
  Krejtz, K., Żurawska, J., Duchowski, A., & Wichary, S. (2020). Pupillary and microsaccadic responses to cognitive effort and emotional arousal during complex decision making. Journal of Eye Movement Research, 13(5). https://doi.org/10.16910/jemr.13.5.2
  
  Meyberg, S., Sinn, P., Engbert, R., & Sommer, W. (2017). Revising the link between microsaccades and the spatial cueing of voluntary attention. Vision Research, 133, 47–60. https://doi.org/10.1016/j.visres.2017.01.001
  
  Spencer, J. M. Y., Sekuler, A. B., Bennett, P. J., Giese, M. A., & Pilz, K. S. (2016). Effects of aging on identifying emotions conveyed by point-light walkers. Psychology and Aging, 31(1), 126–138. https://doi.org/10.1037/a0040009
  
  Strauch, C., Greiter, L., & Huckauf, A. (2018). Pupil dilation but not microsaccade rate robustly reveals decision formation. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-31551-x
  
  Wang, C.-A., Blohm, G., Huang, J., Boehnke, S. E., & Munoz, D. P. (2017). Multisensory integration in orienting behavior: pupil size, microsaccades, and saccades. Biological Psychology, 129, 36–44. https://doi.org/10.1016/j.biopsycho.2017.07.024
  
  Wang, C.-A., Boehnke, S. E., White, B. J., & Munoz, D. P. (2012). Microstimulation of the monkey superior colliculus induces pupil dilation without evoking saccades. Journal of Neuroscience, 32(11), 3629–3636. https://doi.org/10.1523/jneurosci.5512-11.2012
  
  Wang, C.-A., & Munoz, D. P. (2021). Differentiating global luminance, arousal and cognitive signals on pupil size and microsaccades. European Journal of Neuroscience, 54(10), 7560–7574. https://doi.org/10.1111/ejn.15508
  
  Yuan, T., Ji, H., Wang, L., & Jiang, Y. (2022). Happy is stronger than sad: emotional information modulates social attention. Emotion. https://doi.org/10.1037/emo0001145
  
  Yuan, T., Wang, L., & Jiang, Y. (2023). Cross-channel adaptation reveals shared emotion representation from face and biological motion. In Emotion (p. In Press).
  
  (2) Statistical analyses
  
  (2.1) Multiple comparisons:
  
  There are many posthoc comparisons throughout the manuscript. The authors should consider correction for multiple comparisons. Take Experiment 1 for example, it is important to note that the happy over neutral BM effect and the sad over neutral BM effect are no longer significant after Bonferroni correction, which is worth noting.
  
  Thanks for this suggestion. In our original analysis, we applied the Holm post-hoc corrections for multiple comparisons. The Holm correction is a step-down correction method and is more powerful but less conservative than the Bonferroni correction. We have now conducted the stricter Bonferroni post-hoc correction. In Experiment 1, the happy over neutral, and happy over sad BM effect is still significant after the Bonferroni post-hoc correction (happy vs. neutral: p = .036; happy vs. sad: p = .009), and the sad over neutral comparison remains marginally significant after the Bonferroni post-hoc correction (p = .071). Importantly, the test-retest replication experiment also yielded significant results for the comparisons between happy and neutral (First Test: p = .022, Holm-corrected, p = .048, Bonferroni-corrected; Second Test: p = .005, Holm-corrected, p = .008, Bonferroni-corrected), sad and neutral (First Test: p = .022, Holm-corrected, p = .033, Bonferroni-corrected; Second Test: p = .005, Holm-corrected, p = .012, Bonferroni-corrected, Author response image 1B), and happy and sad BM (First test: p < .001, Holm-corrected, p < .001, Bonferroni-corrected; Second test: p < .001, Holm-corrected, p < .001, Bonferroni-corrected). These results provided support for the replicability and consistency of the reported significant contrasts. See also Point 2.3.
  
  In Experiment 4, the significance levels of all comparisons remained the same after Bonferroni post-hoc correction (happy vs. neutral: p = .011; sad vs. neutral: p = .007; happy vs. sad: p = 1.000). We have now added these results in the main text (See lines 119, 122, 124, 143, 145, 148, 150, 153, 155, 248, 251, 254).
  
  (2.2) The authors present the correlation between happy over sad dilation effect and the autistic traits in Experiment 1, but do not report such correlations in Experiments 2-4. Did the authors collect the Autistic Quotient measure in Experiments 2-4? It would be informative if the authors could demonstrate the reproducibility (or lack thereof) of this happy-sad index in Experiments 2-4.
  
  We apologize for not making it clear. We have collected the AQ scores in Experiments 2-4. However, it should be pointed out that the happy over sad pupil dilation effect was only observed in Experiment 1. Moreover, we’ve again identified such happy over sad pupil dilation effect in the replication experiment (Experiment 1b) as well as its correlation with AQ. Instead, no significant correlations between AQ and the happy-sad pupil index were found in Experiments 2-4, see Author response image 4 for more details. We have reported these correlations in the main text (see lines 157-173, 190-194, 212-216, 257-262).
  
  Author response image 4.
  
  Correlations between the happy over sad pupil dilation effect and AQ scores. (A) The happy over sad pupil dilation effect correlated negatively with individual autistic scores. (B-C) Such correlation was similarly observed in the test and retest of the replication experiment. (D-F) No such correlations were found for the inverted, nonbiological, and local BM stimuli.
  
  (2.3) The observed correlation between happy over sad dilation effect and the autistic traits in Experiment 1 seems rather weak. It could be attributed to the poor reliability of the Autistic Quotient measure or the author-constructed happy-sad index. Did the authors examine the test-retest reliability of their tasks or the Autistic Quotient measure?
  
  Thanks for this suggestion. We have now conducted a test-retest replication study to further confirm the observed significant correlations. Specifically, we recruited a new group of 24 participants (16 females, 8 males) to perform the identical procedure as in Experiment 1, and they were asked to return to the lab for a retest after at least seven days. We’ve replicated the significant main effect of emotional conditions in both the first test (F(2, 46) = 12.0, p < .001, ηp2 = 0.34) and the second test (F(2, 46) = 14.8, p < .001, ηp2 = 0.39). Besides, we also replicated the happy minus neutral pupil dilation effect (First Test: t(23) = 2.60, p = .022, Cohen’s d = 0.53, 95% CI for the mean difference = [0.02, 0.14], Holm-corrected, p = .048 after Bonferroni correction; Second Test: t(23) = 3.36, p = .005, Cohen’s d = 0.68, 95% CI for the mean difference = [0.06, 0.24], Holm-corrected, p = .008 after Bonferroni correction), and the sad minus neutral pupil constriction effect (First Test: t(23) = -2.77, p = .022, Cohen’s d = 0.57, 95% CI for the mean difference = [-0.19, -0.03], Holm-corrected, p = .033 after Bonferroni correction; Second Test: t(23) = -3.19, p = .005, Cohen’s d = 0.65, 95% CI for the mean difference = [-0.24, -0.05], Holm-corrected, p = .012 after Bonferroni correction). Additionally, the happy BM still induced a significantly larger pupil response than the sad BM (first test: t(23) = 4.23, p < .001, Cohen’s d = 0.86, 95% CI for the mean difference = [0.10, 0.28], Holm-corrected, p < .001 after Bonferroni correction; second test: t(23) = 4.26, p < .001, Cohen’s d = 0.87, 95% CI for the mean difference = [0.15, 0.44], Holm-corrected, p < .001 after Bonferroni correction).
  
  Notably, we’ve successfully replicated the negative correlation between the happy over sad dilation effect and individual autistic traits (r(23) = -0.46, p = .023, 95% CI for the mean difference = [-0.73, -0.07]). Such a correlation was similarly found and was even stronger in the retest (r(23) = -0.61, p = .002, 95% CI for the mean difference = [-0.81, -0.27]). A test-retest reliability analysis was conducted on the happy over sad pupil dilation effect and the AQ score. The results showed robust correlations (r(happy-sad pupil size)= 0.56; r(AQ)= 0.90) and strong test-retest reliabilities (α(happy-sad pupil size)= 0.60; α(AQ)= 0.82). We have added these results to the main text (see lines 135-173). See also Response to Reviewer #2 Response 1 for more details.
  
  (2.4) Relatedly, the happy over sad dilation effect is essentially a subtraction index. Without separately presenting the pipul size correlation with happy and sad BM in supplemental figures, it becomes challenging to understand what's primarily driving the observed correlation.
  
  Thanks for pointing this out. We have now presented the separate correlations between AQ and the pupil response towards happy and sad BM in Experiment 1 (see Author response image 5A), and the test-retest replication experiment of Experiment 1 (see Author response image 5B-C). No significant correlations were found. This is potentially because the raw pupil response is a mixed result of BM perception and emotion perception, while the variations in pupil sizes across emotional conditions could more faithfully reflect individual sensitivities to emotions in BM (Burley et al., 2017; Pomè et al., 2020; Turi et al., 2018).
  
  Author response image 5.
  
  No significant correlations between AQ and pupil response towards happy and sad intact BM were found in Experiment 1a and the test-retest replication experiment (Experiment 1b).
  
  To probe what's primarily driving the observed correlation between happy-sad pupil size and AQ, we instead used the neutral as the baseline and separately correlated AQ with the happy-neutral and the sad-neutral pupil modulation effects. No significant correlation was found in Experiment 1a (Author response image 6A-B) and the first test of the replication experiment (Experiment 1b) (Author response image 6C-D). Importantly, in the second test of the replication experiment, we found a significant negative correlation between AQ and the happy-neutral pupil size (r(23) = -0.44, p = .032, 95% CI for the mean difference = [-0.72, -0.04], Author response image 6E), and a significant positive correlation between AQ and the sad-neutral pupil size (r(23) = 0.50, p = .014, 95% CI for the mean difference = [0.12, 0.75], Author response image 6F). This suggested that the overall correlation between AQ and the happy over sad dilation effect was driven by diminished pupil modulations towards both the happy and sad BM for high AQ individuals, demonstrating a general deficiency in BM emotion perception (happy or sad) among individuals with high autistic tendencies. It further revealed the potential of adopting a test-retest pupil examination to more precisely detect individual autistic tendencies. We have reported these results in the main text (see lines 166-173).
  
  Author response image 6.
  
  Correlation results for pupil modulations and AQ scores. (A-B) In Experiment 1a, no significant correlation was observed between AQ and the happy pupil modulation effect, as well as between AQ and the sad pupil modulation effect. (C-D) Similarly, no significant correlations were found in the first test of the replication experiment (Experiment 1b). (E-F) Importantly, in the second test of Experiment 1b, the happy vs. neutral pupil dilation effect was positively correlated with AQ, and the sad vs. neutral pupil constriction effect was positively correlated with AQ.
  
  References:
  
  Burley, D. T., Gray, N. S., & Snowden, R. J. (2017). As Far as the Eye Can See: Relationship between Psychopathic Traits and Pupil Response to Affective Stimuli. PLOS ONE, 12(1), e0167436. https://doi.org/10.1371/journal.pone.0167436
  
  Pomè, A., Binda, P., Cicchini, G. M., & Burr, D. C. (2020). Pupillometry correlates of visual priming, and their dependency on autistic traits. Journal of vision, 20(3), 3-3.
  
  Turi, M., Burr, D. C., & Binda, P. (2018). Pupillometry reveals perceptual differences that are tightly linked to autistic traits in typical adults. eLife, 7. https://doi.org/10.7554/elife.32399
  
  (2.5) For the sake of transparency, it is important to report all findings, not just the positive results, throughout the paper.
  
  Thanks for this suggestion. We have now reported all the correlations results between AQ and pupil modulation effects (happy-sad, happy-neutral, sad-neutral) in the main text (see lines 130-131, 157-162, 166-170, 190-194, 212-216, 257-262). Given that no significant correlations were observed between AQ and the raw pupil responses across four experiments, we reported their correlations with AQ in the supplementary material. We have stated this point in the main text (see lines 132-134).
  
  (3) Structure
  
  (3.1) The Results section immediately proceeds to the one-way repeated measures ANOVA. This section could be more reader-friendly by including a brief overview of the task procedures and variables, e.g., shifting Fig. 3 to this section.
  
  Thanks for this advice. We have now added a brief overview of the task procedures and variables and we have also shifted the figure position (see lines 101-103).
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) I suggest that the authors first explain the task (i.e., Fig. 3) at the beginning of the results. And it seems more appropriate to show the time course figures (Fig. 2) and before the bar plots (Fig. 1). If I understand correctly, the bar plots reflect the averaged data from the time course plots. Also, please clearly state the time window used to average the data. The results of the correlation analysis can be displayed in the last step.
  
  Thanks for this suggestion. We have now added a concise explanation of the task at the beginning of the results (see lines 101-103). We have also adjusted the figure positions and adjusted the order of our results according to the reviewer’s suggestion. The time window we used to average the data was from the onset of the stimuli until the end of the stimuli presentation. We have now clearly stated these issues in the revised text (see lines 111-112).
  
  (2) According to the above, I think a more reasonable arrangement should be Fig. 3, 2, and 1.
  
  Thanks for this suggestion. We have adjusted the figure positions accordingly.
  
  (3) Please include each subject's data points in the bar plots in Fig. 1.
  
  We have now presented each subject’s individual data point in the bar plot.
  
  (4) Lines 158-160 and 199-202 report interaction effects of the two-way ANOVA. This is good, but the direction of interaction effect should also be reported.
  
  We thank the reviewer for this suggestion. We have now reported the direction of the interaction effect. The significant interaction observed across Experiment 1 and Experiment 2 was mainly due to the diminishment of emotional modulation in inverted BM. The significant interaction crossing Experiment 1 and Experiment 3 was similarly caused by the lack of emotional modulation in nonbiological stimuli. With regard to the significant interaction across Experiment 1 and Experiment 4, it could be primarily attributed to the vanishment of pupil modulation effect between happy and sad local BM. We have specified these points in the revised text, see lines 198-199, 219-220, 267-269.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Number of experiments:
  
  As stated in the Methods section, this study seems to consist of five experiments (120/24=5) according to the description below. However, the current manuscript only reports findings from four of these experiments. Can the authors clarify on this matter?
  
  "A total of 120 participants (44 males, 76 females) ranging from 18 to 29 years old (M ± SD = 23.1 ± 2.5) were recruited, with 24 in each experiment."
  
  We apologize for not making it clear. This referred to a pure behavior explicit emotion classification experiment (N=24) that served as a prior test to confirm that the local BM stimuli conveyed recognizable emotional information. We have now more carefully stated this issue in the revised text, see lines 456-458.
  
  (2) Emotion processing mechanism of BM
  
  "Mechanism" is a very strong word, suggesting a causal relationship. In the setting of a passive viewing task that lacks any behavioral report, it is possible that the observed changes in pupil size could be epiphenomenal, rather than serving as the underlying mechanism.
  
  Thanks for this suggestion. We have now either changed “mechanism” into “phenomenon” or deleted it. We have also carefully discussed the potential implications for future studies to incorporate variant behavioral, physiological and neural indexes to yield more robust causal evidence to unveil the potential mechanism serving the observed multi-level BM emotion processing phenomenon.
  
  (3) Data sharing
  
  The authors could improve their efforts in promoting data transparency to ensure a comprehensive view of the results. This implies sharing deidentified raw data instead of summary data in an Excel spreadsheet.
  
  Thanks for this suggestion. We have now uploaded the deidentified raw data. (https://doi.org/10.57760/sciencedb.psych.00125).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.18.549471v2
www.biorxiv.org www.biorxiv.org

New submission 06/10/2023, 09:44:25

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This valuable work provides new insights into history-dependent biases in human perceptual decisionmaking. It provides compelling behavioral and MEG evidence that humans adapt their historydependent to the correlation structure of uncertain sensory environments. Further neural data analyses would strengthen some of the findings, and the studied bias would be more accurately framed as a stimulus- or outcome-history bias than a choice-history bias because tested subjects are biased not by their previous choice, but by the previous feedback (indicating the category of the previous stimulus).
  
  Thank you for your constructive evaluation of our manuscript. We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors. We have also added several of your suggested neural data analyses so as to strengthen the support for our conclusions, and we have elaborated on the Introduction so as to clarify the gaps in the literature that our study aims to fill. Our revisions are detailed in our replies below. We also took the liberty to reply to some points in the Public Review, which we felt called for clarification of the main aims (and main contribution) of our study.
  
  Reviewer #1 (Public Review):
  
  This paper aims to study the effects of choice history on action-selective beta band signals in human MEG data during a sensory evidence accumulation task. It does so by placing participants in three different stochastic environments, where the outcome of each trial is either random, likely to repeat, or likely to alternate across trials. The authors provide good behavioural evidence that subjects have learnt these statistics (even though they are not explicitly told about them) and that they influence their decision-making, especially on the most difficult trials (low motion coherence). They then show that the primary effect of choice history on lateralised beta-band activity, which is well-established to be linked to evidence accumulation processes in decision-making, is on the slope of evidence accumulation rather than on the baseline level of lateralised beta.
  
  The strengths of the paper are that it is: (i) very well analysed, with compelling evidence in support of its primary conclusions; (ii) a well-designed study, allowing the authors to investigate the effects of choice history in different stochastic environments.
  
  Thank you for pointing out these strengths of our study.
  
  There are no major weaknesses to the study. On the other hand, investigating the effects of choice/outcome history on evidence integration is a fairly well-established problem in the field. As such, I think that this provides a valuable contribution to the field, rather than being a landmark study that will transform our understanding of the problem.
  
  Your evaluation of the significance of our work made us realize that we may have failed to bring across the main gaps in the literature that our current study aimed to fill. We have now unpacked this in our revised Introduction.
  
  Indeed, many previous studies have quantified history-dependent biases in perceptual choice. However, the vast majority of those studies used tasks without any correlation structure; only a handful of studies have quantified history biases in tasks entailing structured environments, as we have done here (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020). The focus on correlated environments matters from an ecological perspective, because (i) natural environments are commonly structured rather than random (a likely reason for history biases being so prevalent in the first place), and (ii) history biases that change flexibly with the environmental structure are a hallmark of adaptive behavior. Critically, the few previous studies that have used correlated environments and revealed flexible/adaptive history biases were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases.
  
  Furthermore, although several previous studies have identified neural correlates of history biases in standard perceptual choice tasks in unstructured environments (see (Talluri et al., 2021) for a brief overview), most have focused on static representations of the bias in ongoing activity preceding the new decision; only a single monkey physiology study has tested for both a static bias in the pre-stimulus activity and a dynamic bias building up during evidence accumulation (Mochol et al., 2021). Ours is the first demonstration of a dynamic bias during evidence accumulation in the human brain.
  
  The authors have achieved their primary aims and I think that the results support their main conclusions. One outstanding question in the analysis is the extent to which the source-reconstructed patches in Figure 2 are truly independent of one another (as often there is 'leakage' from one source location into another, and many of the different ROIs have quite similar overall patterns of synchronisation/desynchronisation.).
  
  We do not assume (and nowhere state) that the different ROIs are “truly independent” of one another. In fact, patterns of task-related power modulations of neural activity would be expected to be correlated between many visual and action-related cortical areas even without leakage (due to neural signal correlations). So, one should not assume independence even for intracortically recorded local field potential data, fMRI data, or other data with minimal spatial leakage effects. That said, we agree that filter leakage will add a (trivial) component to the similarity of power modulations across ROIs, which can and should be quantified with the analysis you propose.
  
  A possible way to investigate this further would be to explore the correlation structure of the LCMV beamformer weights for these different patches, to ask how similar/dissimilar the spatial filters are for the different reconstructed patches.
  
  Thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.
  
  That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).
  
  Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.
  
  We have now clarified these points in the paper.
  
  Reviewer #2 (Public Review):
  
  In this work, the authors use computational modeling and human neurophysiology (MEG) to uncover behavioral and neural signatures of choice history biases during sequential perceptual decision-making. In line with previous work, they see neural signatures reflecting choice planning during perceptual evidence accumulation in motor-related regions, and further show that the rate of accumulation responds to structured, predictable environments suggesting that statistical learning of environment structure in decision-making can adaptively bias the rate of perceptual evidence accumulation via neural signatures of action planning. The data and evidence show subtle but clear effects, and are consistent with a large body of work on decision-making and action planning.
  
  Overall, the authors achieved what they set out to do in this nice study, and the results, while somewhat subtle in places, support the main conclusions. This work will have impact within the fields of decisionmaking and motor planning, linking statistical learning of structured sequential effects in sense data to evidence accumulation and action planning.
  
  Strengths:
  
  The study is elegantly designed, and the methods are clear and generally state-of-the-art
  
  The background leading up to the study is well described, and the study itself conjoins two bodies of work - the dynamics of action-planning processes during perceptual evidence accumulation, and the statistical learning of sequential structure in incoming sense data
  
  Careful analyses effectively deal with potential confounds (e.g., baseline beta biases)
  
  Thank you for pointing out these strengths of our study.
  
  Weaknesses:
  
  Much of the study is primarily a verification of what was expected based on previous behavioral work, with the main difference (if I'm not mistaken) being that subjects learn actual latent structure rather than expressing sequential biases in uniform random environments.
  
  As we have stated in our reply to the overall assessment above, we realize that we may have failed to clearly communicate the novelty of our current results, and we have revised our Introduction accordingly. It is true that most previous studies of history biases in perceptual choice have used standard tasks without across-trial correlation structure. Only a handful of studies have quantified history biases in tasks entailing structured environments that varied from one condition to the next (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020), and showed that history biases change flexibly with the environmental structure. Our current work adds to this emerging picture, using a specific task setting analogous to one of these previous studies done in rats (Hermoso-Mendizabal et al., 2020).
  
  Critically, all the previous studies that have revealed flexible/adaptive history biases in correlated environments were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases. And it is also the very first demonstration of a dynamic history-dependent bias (i.e., one that gradually builds up during evidence accumulation) in the human brain.
  
  Whether this difference - between learning true structure or superstitiously applying it when it's not there - is significant at the behavioral or neural level is unclear. Did the authors have a hypothesis about this distinction? If the distinction is not relevant, is the main contribution here the neural effect?
  
  We are not quite sure what exactly you mean with “is significant”, so we will reply to two possible interpretations of this statement.
  
  The first is that you may be asking for evidence for any difference between the estimated history biases in the structured (i.e., Repetitive, Alternating) vs. the unstructured (i.e., Neutral) environments used in our experiment. We do, in fact, provide quantitative comparisons between the history biases in the structured and Neutral environments at the behavioral level. Figure 1D and Figure 1 – figure supplement 2A and accompanying text show a robust and statistically significant difference in history biases. Specifically, the previous stimulus weights differ between each of the biased environments and the Neutral environment and the weights shifted in expected and opposite directions for both structured environments, indicating a tendency to repeat the previous stimulus category in Repetitive and vice versa in Alternating (Figure1D). Going further, we also demonstrate that the adjustment of the history is behaviorally relevant in that it improves performance in the two structured environments, but not in the unstructured environment (Figure 1F and Figure 1 – figure supplement 2A and figure supplement 3).
  
  The second is that you refer to the question of whether the history biases are generated via different computations in structured vs. random environments. Indeed, this is a very interesting and important question. We cannot answer this question based on the available results, because we here used a statistical (i.e., descriptive) model. Addressing this question would require developing and fitting a generative model of the history bias and comparing the inferred latent learning processes between environments. This is something we are doing in ongoing work.
  
  The key effects (Figure 4) are among the more statistically on-the-cusp effects in the paper, and the Alternating group in 4C did not reliably go in the expected direction. This is not a huge problem per se, but does make the key result seem less reliable given the clear reliability of the behavioral results
  
  The model-free analyses in Figure 3C and 4B, C from the original version of our manuscript were never intended to demonstrate the “key effects”, but only as supplementary to the results from the modelbased analyses in Figures 3C and 4D, E in our current version of the manuscript. The latter show the “key effects” because they are a direct demonstration of the shaping of build-up of action-selective activity by history bias.
  
  To clarify this, we now decided to focus Figures 3 and 4 on the model-based analyses only. This decision was further supported by noticing a confound in our model-independent analyses in new control analyses prompted by Reviewer #3.
  
  Please note that the alternating bias in the Alternating environment is also less strong at the behavioral level compared to the bias in the Repetitive condition (see Figure 1D). A possible explanation is that a sequence of repetitive stimuli produces stronger prior expectations (for repetition) than an equally long sequence of alternating stimuli (Meyniel et al., 2016). This might also induce the bias to repeat the previous stimulus category in the Neutral condition (Figure 1D). Moreover, this intrinsic repetition bias might counteract the bias to alternate the previous stimulus category in Alternating.
  
  The treatment of "awareness" of task structure in the study (via informal interviews in only a subsample of subjects) is wanting
  
  Agreed. We have now removed this statement from Discussion.
  
  Reviewer #3 (Public Review):
  
  This study examines how the correlation structure of a perceptual decision making task influences history biases in responding. By manipulating whether stimuli were more likely to be repetitive or alternating, they found evidence from both behavior and a neural signal of decision formation that history biases are flexibly adapted to the environment. On the whole, these findings are supported across an impressive range of detailed behavioral and neural analyses. The methods and data from this study will likely be of interest to cognitive neuroscience and psychology researchers. The results provide new insights into the mechanisms of perceptual decision making.
  
  The behavioral analyses are thorough and convincing, supported by a large number of experimental trials (~600 in each of 3 environmental contexts) in 38 participants. The psychometric curves provide clear evidence of adaptive history biases. The paper then goes on to model the effect of history biases at the single trial level, using an elegant cross-validation approach to perform model selection and fitting. The results support the idea that, with trial-by-trial accuracy feedback, the participants adjusted their history biases due to the previous stimulus category, depending on the task structure in a way that contributed to performance.
  
  Thank you for these nice words on our work.
  
  The paper then examines MEG signatures of decision formation, to try to identify neural signatures of these adaptive biases. Looking specifically at motor beta lateralization, they found no evidence that starting-level bias due to the previous trial differed depending on the task context. This suggests that the adaptive bias unfolds in the dynamic part of the decision process, rather than reflecting a starting level bias. The paper goes on to look at lateralization relative to the chosen hand as a proxy for a decision variable (DV), whose slope is shown to be influenced by these adaptive biases.
  
  This analysis of the buildup of action-selective motor cortical activity would be easier to interpret if its connection with the DV was more explicitly stated. The motor beta is lateralized relative to the chosen hand, as opposed to the correct response which might often be the case. It is therefore not obvious how the DV behaves in correct and error trials, which are combined together here for many of the analyses.
  
  We have now unpacked the connection of the action-selective motor cortical activity and decision variable in the manuscript, as follows:
  
  “This signal, referred to as ‘motor beta lateralization’ in the following, has been shown to exhibit hallmark signatures of the DV, specifically: (i) selectivity for choice and (ii) ramping slope that depends on evidence strength (Siegel et al., 2011; Murphy et al., 2021; O’Connell and Kelly, 2021).”
  
  Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right). This pattern matches what would be expected for a neural signature of the DV, because errors are more frequently made on weak-evidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).
  
  --
  
  As you will see, all three reviewers found your work to provide valuable insights into history-dependent biases during perceptual decision-making. During consultation between reviewers, there was agreement that what is referred as a choice-history bias in the current version of the manuscript should rather be framed as a stimulus- or outcome-history bias (despite the dominant use of the term 'choicehistory' bias in the existing literature), and the reviewers pointed toward further analyses of the neural data which they thought would strengthen some of the claims made in the preprint. We hope that these comments will be useful if you wish to revise your preprint.
  
  We are pleased to hear that the reviewers think our work provides valuable insights into historydependent biases in perceptual decision-making. We thank you for your thoughtful and constructive evaluation of our manuscript.
  
  We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors.
  
  We have also performed several of your suggested neural data analyses so as to strengthen the support for our conclusions.
  
  Reviewer #1 (Recommendations For The Authors):
  
  One suggestion is to explore the correlation structure of the LCMV beam former weights for the regions of interest in the study, for the reasons outlined in my public review.
  
  Again, thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.
  
  That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).
  
  Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.
  
  We have now clarified also these points in the paper.
  
  I also wondered if the authors had considered:
  
  (i) the extent to which the bias changes across time, as the transition probabilities are being learnt across the experiment? given that these are not being explicitly instructed to participants, is any modelling possible of how the transition structure is itself being learnt over time, and whether this makes predictions of either behaviour or neural signals?
  
  We refer to this point in the discussion. The learning of the transition probabilities which can and should be addressed. This requires generative models that capture the learning of the transition structure over time (Yu and Cohen, 2009; Meyniel et al., 2016; Glaze et al., 2018; Hermoso-Mendizabal et al., 2020).
  
  The fact that our current statistical modeling approach successfully captures the bias adjustment between environments implies that the learning must be sufficiently fast. Tracking this process explicitly would be an exciting and important endeavor for the future. We think it is beyond the scope of the present study focusing on the trial-by-trial effect of history bias (however generated) on the build-up of action-selective activity.
  
  (ii) neural responses at the time of choice outcome - given that so much of the paper is about the update of information in different statistical environments, it seems a shame that no analyses are included of feedback processing, how this differs across the different environments, and how might be linked to behavioural changes at the next trial.
  
  We agree that the neural responses to feedback are a very interesting topic. We currently analyze these in another ongoing project on (outcome) history bias in a foraging task. We will consider re-analyzing the feedback component in the current data set, in this new study as well.
  
  However, this is distinct from the main question that is in the focus of our current paper – which, as elaborated above, is important to answer: whether and how adaptive history biases shape the dynamics of action-selective cortical activity in the human brain. While interesting and important, neural responses to feedback were not part of this question. So, we prefer to keep the focus of our paper on our original question.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor:
  
  -pg. 7: "inconstant"
  
  -some citations (e.g., Barbosa 2020) are missing from the bibliography
  
  Thank you for pointing this out. We have fixed these.
  
  -figure S2 is very useful! could probably go in main text.
  
  We agree that this figure is important. But we decided to show it in the Supplement (now Figure 1 – figure supplement 2) after careful consideration for two reasons. First, we wanted to put the reader’s focus on the stimulus weights, because it is those weights, which are flexibly adjusted to the statistics of the environment rather than the choice weights, which seem less adaptive (i.e., stereotypical across environments) and idiosyncratic. Second, plotting the previous stimulus weights only enabled to add the individual weights in the Neutral condition, which would have been to cluttered to add to figure S2.
  
  For these reasons, we feel that this Figure is more suitable for expert readers with a special interest in the details of the behavioral analyses and would be better placed in the Supplement. These readers will certainly be able to find and interpret that information in the Supplement.
  
  Reviewer #3 (Recommendations For The Authors):
  
  I would suggest that a more in depth description of the previous literature that explains exactly how the features of the lateralized beta--as it is formulated here-- reflect the decision variable would assist with the readers' understanding. A demonstration of how the lateralized beta behaves under different coherence conditions, or for corrects vs errors, for example, might be helpful for readers.
  
  We now provide a more detailed description of how/why the motor beta lateralization is a valid proxy of DV in the revised paper.
  
  We have demonstrated the dependence of the ramping of the motor beta lateralization on the motion coherence using a regression model with current signed motion coherence as well as single trial bias as regressors. The beta weights describing the impact of the signed motion coherence on the amplitude as well as on the slope of the motor beta lateralization are shown in Figure 4G (now 4E). As expected, stronger motion coherence induces a steeper downward slope of the motor beta lateralization.
  
  Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right).This pattern matches what would be expected for a neural signature DV, because errors are more frequently made on weakevidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).
  
  Finally, please note that our previous studies have demonstrated that the time course of the beta lateralization during the trial closely tracks the time course of a normative model-derived DV (Murphy et al., 2021) and that the motor beta ramping slope is parametrically modulated by motion coherence (de Lange et al., 2013), which is perfectly in line with the current results.
  
  Along similar lines, around figures 3c and 4B, some control analyses may be helpful to clarify whether there are differences between the groups of responses consistent and inconsistent with the previous trial (e.g. correctness, coherence) that differ between environments, and also could influence the lateralized beta.
  
  Thank you for pointing us to this important control analysis. We have done this, and indeed, it identified accuracy and motion strength as possible confounds (Author response image 1). Specifically, proportion correct as well as motion coherence were larger for consistent vs. inconsistent conditions in Repetitive and vice versa in Alternating. Those differences in accuracy and coherence might indeed influence the slope of the motor beta lateralization that our model-free analysis had identified, rendering the resulting difference between consistent and inconsistent difficult to interpret unambiguously in terms of bias. Thus, we have decided to drop the consistency (i.e., model-independent) analysis and focus completely on the modelbased analyses.
  
  Author response image 1.
  
  Proportion correct and motion coherence split by environment and consistency of current choice and previous stimulus. In the Repetitive environment (Rep.), accuracy and motion coherence are larger for current choice consistent vs. inconsistent with previous stimulus category and vice versa in the Alternating environment (Alt.).
  
  Importantly, this decision has no implications for the conclusions of our paper: The model-independent analyses in the original versions of Figure 3 and 4 were only intended as a supplement to the most conclusive and readily interpretable results from the model-based analyses (now in Figs. 3C and 4D, E. The latter are the most direct demonstration of a shaping of build-up of action-selective activity by history bias, and they are unaffected by these confounds.
  
  In addition, I wondered whether the bin subsampling procedure to match trial numbers for choice might result in unbalanced coherences between the up and down choices.
  
  The subsampling itself did not cause any unbalanced coherences between the up and down choices, which we now show in Figure 4 – figure supplement 1. There was only a slight imbalance in coherences between up and down choices before the subsampling which then translated into the subsampled trials but the coherences were equally distributed before as compared to after the subsampling.
  
  Also, please note that the purpose of this analysis was to make the neural bias directly “visible” in the beta lateralization data, rather than just regression weights. The issue does not pertain to the critical single-trial regression analysis, which yielded consistent results.
  
  References
  
  Abrahamyan A, Silva LL, Dakin SC, Carandini M, Gardner JL (2016) Adaptable history biases in human perceptual decisions. Proceedings of the National Academy of Sciences 113:E3548–E3557.
  
  Braun A, Urai AE, Donner TH (2018) Adaptive History Biases Result from Confidence-weighted Accumulation of Past Choices. The Journal of Neuroscience:2189–17. de Lange FP, Rahnev DA, Donner TH, Lau H (2013) Prestimulus Oscillatory Activity over Motor Cortex Reflects Perceptual Expectations. Journal of Neuroscience 33:1400–1410.
  
  Glaze CM, Filipowicz ALS, Kable JW, Balasubramanian V, Gold JI (2018) A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat Hum Behav 2:213–224.
  
  Hermoso-Mendizabal A, Hyafil A, Rueda-Orozco PE, Jaramillo S, Robbe D, de la Rocha J (2020) Response outcomes gate the impact of expectations on perceptual decisions. Nat Commun 11:1057.
  
  Kim TD, Kabir M, Gold JI (2017) Coupled Decision Processes Update and Maintain Saccadic Priors in a Dynamic Environment. The Journal of Neuroscience 37:3632–3645.
  
  Meyniel F, Maheu M, Dehaene S (2016) Human Inferences about Sequences: A Minimal Transition Probability Model Gershman SJ, ed. PLOS Computational Biology 12:e1005260.
  
  Mochol G, Kiani R, Moreno-Bote R (2021) Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology 31:1234-1244.e6.
  
  Murphy PR, Wilming N, Hernandez-Bocanegra DC, Prat-Ortega G, Donner TH (2021) Adaptive circuit dynamics across human cortex during evidence accumulation in changing environments. Nat Neurosci 24:987–997.
  
  O’Connell RG, Kelly SP (2021) Neurophysiology of Human Perceptual Decision-Making. Annu Rev Neurosci 44:495–516.
  
  Ratcliff R, McKoon G (2008) The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks. Neural Computation 20:873–922.
  
  Siegel M, Engel AK, Donner TH (2011) Cortical Network Dynamics of Perceptual Decision-Making in the Human Brain. Frontiers in Human Neuroscience 5 Available at: http://journal.frontiersin.org/article/10.3389/fnhum.2011.00021/abstract [Accessed April 8, 2017].
  
  Talluri BC, Braun A, Donner TH (2021) Decision making: How the past guides the future in frontal cortex. Current Biology 31:R303–R306.
  
  Urai AE, Donner TH (2022) Persistent activity in human parietal cortex mediates perceptual choice repetition bias. Nat Commun 13:6015.
  
  Wilming N, Murphy PR, Meyniel F, Donner TH (2020) Large-scale dynamics of perceptual decision information across human cortex. Nat Commun 11:5109.
  
  Yu A, Cohen JD (2009) Sequential effects: Superstition or rational behavior. Advances in neural information processing systems 21:1873–1880.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.11.21.516403v4
www.biorxiv.org www.biorxiv.org

Speech and music recruit frequency-specific distributed and overlapping cortical networks

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We have specifically addressed the points of uncertainty highlighted in eLife's editorial assessment, which concerned the lack of low-level acoustics control, limitations of experimental design, and in-depth analysis. Regarding “the lack of low-level acoustics control, limitations of experimental design”, in response to Reviewer #1, we clarify that our study aimed to provide a broad perspective —which includes both auditory and higher-level processes— on the similarities and distinctions in processing natural speech and music within an ecological context. Regarding “the lack of in-depth analysis”, in response to Reviewer #1 and #2, we have clarified that while model-based analyzes are valuable, they pose fundamental challenges when comparing speech and music. Non-acoustic features inherently differ between speech and music (such as phonemes and pitch), making direct comparisons reliant on somewhat arbitrary choices. Our approach mitigates this challenge by analyzing the entire neural signal, thereby avoiding potential pitfalls associated with encoding models of non-comparable features. Finally, we provide some additional analyzes suggested by the Reviewers.
  
  We sincerely appreciate your thoughtful and thorough consideration throughout the review process.
  
  eLife assessment
  
  This study presents valuable intracranial findings on how two important types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid but somewhat incomplete since although the data analysis is thorough, the results are robust and the stimuli have ecological validity, important considerations such as low-level acoustics control, limitations of experimental design, and in-depth analysis, are lacking. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this study, the authors examined the extent to which the processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.
  
  Strengths:
  
  The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.
  
  Weaknesses:
  
  The weakness of this study, in my view, lies in its experimental design and reasoning:
  
  (1) Despite using longer stimuli, the study does not significantly enhance ecological validity compared to previous research. The analyses treat these long speech and music stimuli as stationary signals, overlooking their intricate musical or linguistic structural details and temporal variation across local structures like sentences and phrases. In previous studies, short, less ecological segments of music were used, maintaining consistency in content and structure. However, this study, despite employing longer stimuli, does not distinguish between neural responses to the varied contents or structures within speech and music. Understanding the implications of long-term analyses, such as spectral and connectivity analyses over extended periods of around 10 minutes, becomes challenging when they do not account for the variable, sometimes quasi-periodical or even non-periodical, elements present in natural speech and music. When contrasting this study with prior research and highlighting its advantages, a more balanced perspective would have been beneficial in the manuscript.
  
  Regarding ecological validity, we respectfully hold a differing perspective from the reviewer. In our view, a one-second music stimulus lacks ecological validity, as real-world music always extends much beyond such a brief duration. While we acknowledge the trade-off in selecting longer stimuli, limiting the diversity of musical styles, we maintain that only long stimuli afford participants an authentic musical listening experience. Conversely, shorter stimuli may lead participants to merely "skip through" musical excerpts rather than engage in genuine listening.
  
  Regarding the critique that we "did not distinguish between neural responses to the varied contents or structures within speech and music," we partly concur. Our TRF (temporal response function) analyzes incorporate acoustic content, particularly the acoustic envelope, thereby addressing this concern to some extent. However, it is accurate to note that we did not model non-acoustic features. In acknowledging this limitation, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.
  
  Finally, we did take into account the reviewer’s remark and did our best to give a more balanced perspective of our approach and previous studies in the discussion.
  
  “While listening to natural speech and music rests on cognitively relevant neural processes, our analytical approach, extending over a rather long period of time, does not allow to directly isolate specific brain operations. Computational models -which can be as diverse as acoustic (Chi et al., 2005), cognitive (Giordano et al., 2021), information-theoretic (Di Liberto et al., 2020), or self-supervised neural network (Donhauser & Baillet, 2019 ; Millet et al., 2022) models- are hence necessary to further our understanding of the type of computations performed by our reported frequency-specific distributed networks. Moreover, incorporating models accounting for musical and linguistic structure can help us avoid misattributing differences between speech and music driven by unmatched sensitivity factors (e.g., arousal, emotion, or attention) as inherent speech or music selectivity (Mas-Herrero et al., 2013; Nantais & Schellenberg, 1999).”
  
  (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music.
  
  We acknowledge the reviewer's concern. Indeed, speech and music differ on various levels, including acoustic and cognitive aspects, and our analyzes do not explicitly distinguish them. The aim of this study was to provide an overview of the similarities and differences between natural speech and music processing, in ecological context. Future work is needed to explore further the different hierarchical levels or networks composing such listening experiences. Of note, however, we report whole-brain results with high spatial resolution (thanks to iEEG recordings), enabling the distinction between auditory, superior temporal gyrus (STG), and higher-level responses. Our findings clearly highlight that both auditory and higher-level regions predominantly exhibit shared responses, challenging the interpretation that our results can be attributed solely to differences in 'basic acoustic characteristics'.
  
  We have now more clearly pointed out this reasoning in the results section:
  
  “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“
  
  (3) The concept of selectivity - shared, preferred, and domain-selective - increases the risks of potentially overgeneralized interpretations and theoretical inaccuracies. The authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with post hoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not necessarily imply that a region is specifically selective to a type of stimulus like speech. The manuscript's narrative might lead to an overgeneralized interpretation that their findings apply broadly to speech or music. However, identifying differences in neural responses to a few sets of specific stimuli in one brain region does not robustly support such a generalization. This is because speech and music are inherently diverse, and specificity often relates more to the underlying functions than to observed neural responses to a limited number of examples of a stimulus type. See the next point.
  
  Exactly! Here, we present a precise operational definition of these terms, implemented with clear and rigorous statistical methods. It is important to note that in many cognitive neuroscience studies, the term "selective" is often used without a clear definition. By establishing operational definitions, we identified three distinct categories based on statistical testing of differences from baseline and between conditions. This approach provides a framework for more accurate interpretation of experimental findings, as now better outlined in the introduction:
  
  “Finally, we suggest that terms should be operationally defined based on statistical tests, which results in a clear distinction between shared, selective, and preferred activity. That is, be A and B two investigated cognitive functions, “shared” would be a neural population that (compared to a baseline) significantly and equally contributes to the processing of both A and B; “selective” would be a neural population that exclusively contributes to the processing of A or B (e.g. significant for A but not B); and “preferred” would be a neural population that significantly contributes to the processing of both A and B, but more prominently for A or B (Figure 1A).”
  
  Regarding the risk of over-generalization, we want to clarify that our manuscript does not claim that a specific region or frequency band is selective to speech or music. As indeed we focus on testing excerpts of speech and music, we employ the reverse logical reasoning: "if 10 minutes of instrumental music activates a region traditionally associated with speech selectivity, we can conclude that this region is NOT speech-selective." Our conclusions revolve around the absence of selectivity rather than the presence of selective areas or frequency bands. In essence, "one counterexample is enough to disprove a theory." We now further elaborated on this point in the discussion section:
  
  “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyzes. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”
  
  (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents issues. For instance, in the cochlea, different stimuli activate different parts of the basilar membrane due to the distinct spectral contents of speech and music, with each part being selective to certain frequencies. However, this phenomenon reflects the frequency selectivity of the basilar membrane - an important function, not an inherent selectivity for speech or music. Similarly, if cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.
  
  We completely agree with the last statement, as our primary goal was not to investigate the functional mechanisms underlying speech and music processing. However, the finding of a substantial portion of the cortical network as being shared between the two domains constrains our understanding of the underlying common operations. Regarding the initial part of the comment, we would like to clarify that in the framework we propose, if cortical regions show heightened responses to one type of stimulus over another, this falls into the ‘preferred’ category. The ‘selective’ (exclusive) category, on the other hand, would require that the region be unresponsive to one of the two stimuli.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.
  
  Strengths:
  
  (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.
  
  (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.
  
  (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.
  
  Weaknesses:
  
  While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.
  
  The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. The selected musical stimuli, incorporating both vocals and multiple instrumental sounds, raise questions about the specificity of neural activation. For instance, it's unclear if the vocal elements in music and speech engage identical neural circuits. Additionally, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at a neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.
  
  We appreciate the reviewer's acknowledgment that delving into the intricate details of neural coding of speech and music was beyond the scope of this work. To address some of the more precise issues raised, we have clarified in the manuscript that our musical stimuli do not contain vocals and are purely instrumental. We apologize if this was not clear initially.
  
  “In the main experimental session, patients passively listened to ~10 minutes of storytelling (Gripari, 2004); 577 secs, La sorcière de la rue Mouffetard, (Gripari, 2004) and ~10 minutes of instrumental music (580 secs, Reflejos del Sur, (Oneness, 2006) separated by 3 minutes of rest.”
  
  Furthermore, we now acknowledge the importance of modeling melodic, phonetic, or linguistic features in the discussion, and we have referenced the work of Sankaran et al. (2024) and McCarty et al. (2023) in this regard. However, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.
  
  “These selective responses, not visible in primary cortical regions, seem independent of both low-level acoustic features and higher-order linguistic meaning (Norman-Haignere et al., 2015), and could subtend intermediate representations (Giordano et al., 2023) such as domain-dependent predictions (McCarty et al., 2023; Sankaran et al., 2023).”
  
  References:
  
  McCarty, M. J., Murphy, E., Scherschligt, X., Woolnough, O., Morse, C. W., Snyder, K., Mahon, B. Z., & Tandon, N. (2023). Intraoperative cortical localization of music and language reveals signatures of structural complexity in posterior temporal cortex. iScience, 26(7), 107223.
  
  Sankaran, N., Leonard, M. K., Theunissen, F., & Chang, E. F. (2023). Encoding of melody in the human auditory cortex. bioRxiv. https://doi.org/10.1101/2023.10.17.562771
  
  The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.
  
  We appreciate the reviewer's concern, but we do not view this as a weakness for our study's purpose. Every method inherently has limitations, and intracranial recordings currently offer the best possible spatial specificity and temporal resolution for studying the human brain. Studying cell assemblies thoroughly in humans is ethically challenging, and examining speech and music in non-human primates or rats raises questions about cross-species analogy. Therefore, despite its limitations, we believe intracranial recording remains the best option for addressing these questions in humans.
  
  Regarding the granularity of neural representation, while understanding how computations occur in the central nervous system is crucial, we question whether the single neuron scale provides the most informative insights. The single neuron approach seem more versatile (e.g., in term of cell type or layer affiliation) than the local circuitry they contribute to, which appears to be the brain's building blocks (e.g., like the laminar organization; see Mendoza-Halliday et al.,2024). Additionally, the population dynamics of these functional modules appear crucial for cognition and behavior (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023). Therefore, we emphasize the need for multi-scale research, as we believe that a variety of approaches will complement each other's weaknesses when taken individually. We clarified this in the introduction:
  
  “This approach rests on the idea that the canonical computations that underlie cognition and behavior are anchored in population dynamics of interacting functional modules (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023) and bound to spectral fingerprints consisting of network- and frequency-specific coherent oscillations (Siegel et al., 2012).”
  
  Importantly, we focus on the macro-scale and conclude that, at the anatomical region level, no speech or music selectivity can be observed during natural stimulation. This is stated in the discussion, as follow:
  
  “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyses. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”
  
  References :
  
  Mendoza-Halliday, D., Major, A.J., Lee, N. et al. A ubiquitous spectrolaminar motif of local field potential power across the primate cortex. Nat Neurosci (2024).
  
  Safaie, M., Chang, J.C., Park, J. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).
  
  Buzsáki, G., & Vöröslakos, M. (2023). Brain rhythms have come of age. Neuron, 111(7), 922-926.
  
  While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category, or ratio-based statistics, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing.
  
  To clarify, the metrics we are investigating (coherence, power, linear correlations) are continuous. Additionally, we conduct a comprehensive statistical analysis of these results. The statistical testing, which includes assessing differences from baseline and between the speech and music conditions using a statistical threshold, yields three categories. Of note, ratio-based statistics (a continuous metric) are provided in Figures S9 and S10 (Figures S8 and S9 in the original version of the manuscript).
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain-specific or rather domain-general and shared? To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low-frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of the regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.
  
  Strengths:
  
  I found this manuscript to be rigorous providing compelling and clear evidence of shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity, and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches, and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence, and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis of the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.
  
  Weaknesses:
  
  I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.
  
  Thank you for this positive review of our work. We added these points as limitations and future directions in the discussion section:
  
  “Finally, in adopting here a comparative approach of speech and music – the two main auditory domains of human cognition – we only investigated one type of speech and of music also using a passive listening task. Future work is needed to investigate for instance whether different sentences or melodies activate the same selective frequency-specific distributed networks and to what extent these results are related to the passive listening context compared to a more active and natural context (e.g. conversation).”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The concepts of activation and deactivation within the study's context of selectivity are not straightforward to comprehend. It would be beneficial for the authors to provide more detailed explanations of how these phenomena relate to the selectivity of neural responses to speech and music. Such elaboration would aid readers in better understanding the nuances of how certain brain regions are selectively activated or deactivated in response to different auditory stimuli.
  
  The reviewer is right that the reported results are quite complex to interpret. The concepts of activation and deactivation are generally complex to comprehend as they are in part defined by an approach (e.g., method and/or metric) and the scale of observation (Pfurtscheller et al., 1999). The power (or the magnitude) of time-frequency estimate is by definition a positive value. Deactivation (or desynchronization) is therefore related to the comparison used (e.g., baseline, control, condition). This is further complexified by the scale of the measurement, for instance, when it comes to a simple limb movement, some brain areas in sensory motor cortex are going to be activated, yet this phenomenon is accompanied at a finer scale by some desynchonization of the mu-activity, and such desynchronization is a relative measure (e.g., before/after motor movement). At a broader scale it is not rare to see some form of balance between brain networks, some being ‘inhibited’ to let some others be activated like the default mode network versus sensory-motor networks. In our case, when estimating selective responses, it is the strength of the signal that matters. The type of selectivity is then defined by the sign/direction of the comparison/subtraction. We now provide additional details about the sign of selectivity between domains and frequencies in the Methods and Results section:
  
  Methods:
  
  “In order to explore the full range of possible selective, preferred, or shared responses, we considered both responses greater and smaller than the baseline. Indeed, as neural populations can synchronize or desynchronize in response to sensory stimulation, we estimated these categories separately for significant activations and significant deactivations compared to baseline.”
  
  Results:
  
  “We classified, for each canonical frequency band, each channel into one of the categories mentioned above, i.e. shared, selective, or preferred (Figure 1A), by examining whether speech and/or music differ from baseline and whether they differ from each other. We also considered both activations and deactivations, compared to baseline, as both index a modulation of neural population activity, and have been linked with cognitive processes (Pfurtscheller & Lopes da Silva, 1999; Proix et al., 2022). However, because our aim was not to interpret specific increase or decrease with respect to the baseline, we here simply consider significant deviations from the baseline. In other words, when estimating selectivity, it is the strength of the response that matters, not its direction (activation, deactivation).”
  
  “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”
  
  References :
  
  J.P. Lachaux, J. Jung, N. Mainy, J.C. Dreher, O. Bertrand, M. Baciu, L. Minotti, D. Hoffmann, P. Kahane,Silence Is Golden: Transient Neural Deactivation in the Prefrontal Cortex during Attentive Reading, Cerebral Cortex, Volume 18, Issue 2, February 2008, Pages 443–450
  
  Pfurtscheller, G., & Da Silva, F. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical neurophysiology, 110(11), 1842-1857
  
  (2) The manuscript doesn't easily provide information about the control conditions, yet the conclusion significantly depends on these conditions as a baseline. It would be beneficial if the authors could clarify this information for readers earlier and discuss how their choice of control stimuli influences their conclusions.
  
  We added information in the Results section about the baseline conditions:
  
  “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”
  
  Of note, while the choice of different ‘basic auditory stimuli’ as baseline can change the reported results in regions involved in low-level acoustical analyzes (auditory cortex), it will have no impact on the results observed in higher-level regions, which predominantly also exhibit shared responses. We have now more clearly pointed out this reasoning in the results section:
  
  “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“
  
  (3) The spectral analyses section doesn't clearly explain how the authors performed multiwise correction. The authors' selectivity categorization appears similar to ANOVAs with posthoc tests, implying the need for certain corrections in the p values or categorization. Could the authors clarify this aspect?
  
  We apologize that this was not in the original version of the manuscript. In the spectral analyzes, the selectivity categorization depended on both (1) the difference effects between the domains and the baseline, and (2) the difference effect between domains. Channels were marked as selective when there was (1) a significant difference between domains and (2) only one domain significantly differed from the baseline. All difference effects were estimated using the paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the build-in tmax method to correct for the multiple comparisons over channels (Nichols & Holmes, 2002; Groppe et al. 2011). We have now more clearly explained how we controlled family-wise error in the Methods section:
  
  “For each frequency band and channel, the statistical difference between conditions was estimated with paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the tmax method to control the family-wise error rate (Nichols and Holmes 2002; Groppe et al. 2011). In tmax permutation testing, the null distribution is estimated by, for each channel (i.e. each comparison), swapping the condition labels (speech vs music or speech/music vs baseline) between epochs. After each permutation, the most extreme t-scores over channels (tmax) are selected for the null distribution. Finally, the t-scores of the observed data are computed and compared to the simulated tmax distribution, similar as in parametric hypothesis testing. Because with an increased number of comparisons, the chance of obtaining a large tmax (i.e. false discovery) also increases, the test automatically becomes more conservative when making more comparisons, as such correcting for the multiple comparison between channels.”
  
  References :
  
  Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L., & Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. NeuroImage, 86, 446–460.
  
  Groppe, D. M., Bickel, S., Dykstra, A. R., Wang, X., Mégevand, P., Mercier, M. R., Lado, F. A., Mehta, A. D., & Honey, C. J. (2017). iELVis: An open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. Journal of Neuroscience Methods, 281, 40–48.
  
  Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Other suggestions:
  
  (1) The authors need to provide more details on how the sEEG electrodes were localized and selected. Are all electrodes included or only the ones located in the gray matter? If all electrodes were used, how to localize and label the ones that are outside of gray matter? In Figures 1C & 1D it seems that a lot of the electrodes were located in depth locations, how were the anatomical labels assigned for these electrodes
  
  We apologize that this was not clear in the original version of the manuscript. Our electrode localization procedure was based on several steps described in detail in Mercier et al., 2022. Once electrodes were localized in a post-implant CT-scan and the coordinates projected onto the pre-implant MRI, we were able to obtain the necessary information regarding brain tissues and anatomical region. That is, first, the segmentation of the pre-impant MRI with SPM12 provided both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (csf) probabilities) and the indexed-binary representations (i.e., either gray, white, csf, bone, or soft tissues) that allowed us to dismiss electrodes outside of the brain and select those in the gray matter. Second, the individual's brain was co-registered to a template brain, which allowed us to back project atlas parcels onto individual’s brain and assign anatomical labels to each electrode. The result of this procedure allowed us to group channels by anatomical parcels as defined by the Brainnetome atlas (Figure 1D), which informed the analyses presented in section Population Prevalence (Methods, Figures 4, 9-10, S4-5). Because this study relies on stereotactic EEG, and not Electro-Cortico-Graphy, recording sites include both gyri and sulci, while depth structures were not retained.
  
  We have now updated the “General preprocessing related to electrodes localisation” section in the Methods. The relevant part now states:
  
  “To precisely localize the channels, a procedure similar to the one used in the iELVis toolbox and in the fieldtrip toolbox was applied (Groppe et al., 2017; Stolk et al., 2018). First, we manually identified the location of each channel centroid on the post-implant CT scan using the Gardel software (Medina Villalon et al., 2018). Second, we performed volumetric segmentation and cortical reconstruction on the pre-implant MRI with the Freesurfer image analysis suite (documented and freely available for download online http://surfer.nmr.mgh.harvard.edu/). This segmentation of the pre-implant MRI with SPM12 provides us with both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (CSF) probabilities) and the indexed-binary representations (i.e., either gray, white, CSF, bone, or soft tissues). This information allowed us to reject electrodes not located in the brain. Third, the post-implant CT scan was coregistered to the pre-implant MRI via a rigid affine transformation and the pre-implant MRI was registered to MNI152 space, via a linear and a non-linear transformation from SPM12 methods (Penny et al., 2011), through the FieldTrip toolbox (Oostenveld et al., 2011). Fourth, applying the corresponding transformations, we mapped channel locations to the pre-implant MRI brain that was labeled using the volume-based Human Brainnetome Atlas (Fan et al., 2016).”
  
  Reference:
  
  Mercier, M. R., Dubarry, A.-S., Tadel, F., Avanzini, P., Axmacher, N., Cellier, D., Vecchio, M. D., Hamilton, L. S., Hermes, D., Kahana, M. J., Knight, R. T., Llorens, A., Megevand, P., Melloni, L., Miller, K. J., Piai, V., Puce, A., Ramsey, N. F., Schwiedrzik, C. M., … Oostenveld, R. (2022). Advances in human intracranial electroencephalography research, guidelines and good practices. NeuroImage, 260, 119438.
  
  (2) From Figures 5 and 6 (and also S4, S5), is it true that aside from the shared response, lower frequency bands show more music selectivity (blue dots), while higher frequency bands show more speech selectivity (red dots)? I am curious how the authors interpret this.
  
  The reviewer is right in noticing the asymmetric selective response to music and speech in lower and higher frequency bands. However, while this effect is apparent in the analyzes wherein we inspected stronger synchronization (activation) compared to baseline (Figures 2 and S1), the pattern appears to reverse when examining deactivation compared to baseline (Figures 3 and S2). In other words, there seems to be an overall stronger deactivation for speech in the lower frequency bands and a relatively stronger deactivation for music in the higher frequency bands.
  
  We now provide additional details about the sign of selectivity between domains and frequencies in the Results section:
  
  “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”
  
  Note, however, that this pattern of results depends on only a select number of patients, i.e. when ignoring regional selective responses that are driven by as few as 2 to 4 patients, the pattern disappears (Figures 5-6). More precisely, ignoring regions explored by a small number of patients almost completely clears the selective responses for both speech and music. For this reason, we do not feel confident interpreting the possible asymmetry in low vs high frequency bands differently encoding (activation or deactivation) speech and music.
  
  Minor:
  
  (1) P9 L234: Why only consider whether these channels were unresponsive to the other domain in the other frequency bands? What about the responsiveness to the target domain?
  
  We thank the reviewer for their interesting suggestion. The primary objective of the cross-frequency analyzes was to determine whether domain-selective channels for a given frequency band remain unresponsive (i.e. exclusive) to the other domain across frequency bands, or whether the observed selectivity is confined to specific frequency ranges (i.e.frequency-specific). In other words, does a given channel exclusively respond to one domain and never—in whichever frequency band—to the other domain? The idea behind this question is that, for a channel to be selectively involved in the encoding of one domain, it does not necessarily need to be sensitive to all timescales underlying that domain as long as it remains unresponsive to any timescale in the other domain. However, if the channel is sensitive to information that unfolds slowly in one domain and faster in the other domain, then the channel is no longer globally domain selective, but the selectivity is frequency-specific to each domain.
  
  The proposed analyzes answer a slightly different, albeit also meaningful, question: how many frequencies (or frequency bands) do selective responses span? From the results presented below, the reviewer can appreciate the overall steep decline in selective response beyond the single frequency band with only few channels remaining selectively responsive across maximally four frequency bands. That is, selective responses globally span one frequency band.
  
  Author response image 1.
  
  Cross-frequency channel selective responses. The top figure shows the results for the spectral analyzes (baselined against the tones condition, including both activation and deactivation). The bottom figure shows the results for the connectivity analyzes. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band. In the next value, we remove the channels that no longer respond selectively to the target domain for the following frequency band. The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis.
  
  (2) P21 L623: "Population prevalence." The subsection title should be in bold.
  
  Done.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors chose to use pure tone and syllables as baseline, I wonder if they also tried the rest period between tasks and if they could comment on how it differed and why they chose pure tones, (above and beyond a more active auditory baseline).
  
  This is an interesting suggestion. The reason for not using the baseline between speech and music listening (or right after) is that it will be strongly influenced by the previous stimulus. Indeed, after listening to the story it is likely that patients keep thinking about the story for a while. Similarly after listening to some music, the music remains in “our head” for some time.
  
  This is why we did not use rest but other auditory stimulation paradigms. Concerning the choice of pure tones and syllables, these happen to be used for clinical purposes to assess functioning of auditory regions. They also corresponded to a passive listening paradigm, simply with more basic auditory stimuli. We clarified this in the Results section:
  
  “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”
  
  Discussion - you might want to address phase information in contrast to power. Your encoding models map onto low-frequency (bandpassed) activity which includes power and phase. However, the high-frequency model includes only power. The model comparison is not completely fair and may drive part of the effects in Figure 7a. I would recommend discussing this, or alternatively ruling out the effect with modeling power separately for the low frequency.
  
  We thank the reviewer for their recommendation. First, we would like to emphasize that the chosen signal extraction techniques that we used are those most frequently reported in previous papers (e.g. Ding et al., 2012; Di Liberto et al., 2015; Mesgarani and Chang, 2012).
  
  Low-frequency (LF) phase and high-frequency (HFa) amplitude are also known to track acoustic rhythms in the speech signal in a joint manner (Zion-Golumbic et al., 2013; Ding et al., 2016). This is possibly due to the fact that HFa amplitude and LF phase dynamics have a somewhat similar temporal structure (see Lakatos et al., 2005 ; Canolty and Knight, 2010).
  
  Still, the reviewer is correct in pointing out the somewhat unfair model comparison and we appreciate the suggestion to rule out a potential confound. We now report in Supplementary Figure S8, a model comparison for LF amplitude vs. HFa amplitude to complement the findings displayed in Figure 7A. Overall, the reviewer can appreciate that using LF amplitude or phase does not change the results: LF (amplitude or phase) always better captures acoustic features than HFa amplitude.
  
  Author response image 2.
  
  TRF model comparison of low-frequency (LF) amplitude and high-frequency (HFa) amplitude. Models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency (LF) amplitude or the high frequency (HFa) amplitude. The ‘peakRate & LF amplitude’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. Same conventions as in Figure 7A.
  
  References:
  
  Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515.
  
  Di Liberto, G. M., O’sullivan, J. A., & Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current Biology, 25(19), 2457-2465.
  
  Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.
  
  Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164.
  
  Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., ... & Schroeder, C. E. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77(5), 980-991.
  
  Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.
  
  Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233-236.
  
  Similarly, the Coherence analysis is affected by both power and phase and is not dissociated. i.e. if the authors wished they could repeat the coherence analysis with phase coherence (normalizing by the amplitude). Alternatively, this issue could be addressed in the discussion above
  
  We agree with the Reviewer. We have now better clarified our choice in the Methods section:
  
  “Our rationale to use coherence as functional connectivity metric was three fold. First, coherence analysis considers both magnitude and phase information. While the absence of dissociation can be criticized, signals with higher amplitude and/or SNR lead to better time-frequency estimates (which is not the case with a metric that would focus on phase only and therefore would be more likely to include estimates of various SNR). Second, we choose a metric that allows direct comparison between frequencies. As, at high frequencies phase angle changes more quickly, phase alignment/synchronization is less likely in comparison with lower frequencies. Third, we intend to align to previous work which, for the most part, used the measure of coherence most likely for the reasons explained above.“
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.10.08.511398v5
www.biorxiv.org www.biorxiv.org

Nonlinear feedback modulation contributes to the optimization of flexible decision-making

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This valuable study by Wu and Zhou combined neurophysiological recordings and computational modelling to investigate the neural mechanisms that underpin the interaction between sensory evaluation and action selection. The neurophysiological results suggest non-linear modulation of decision-related LIP activity by action selection, but some further analysis would be helpful in order to understand whether these results can be generalised to LIP circuitry or might be dependent on specific spatial task configurations. The authors present solid computational evidence that this might be due to projections from choice target representations. These results are of interest for neuroscientists investigating decision-making.
  
  Strengths:
  
  Wu and Zhou combine awake behaving neurophysiology for a sophisticated, flexible visual-motion discrimination task and a recurrent network model to disentangle the contribution of sensory evaluation and action selection to LIP firing patterns. The correct saccade response direction for preferred motion direction choices is randomly interleaved between contralateral and ipsilateral response targets, which allows the dissociation of perceptual choice from saccade direction.
  
  The neurophysiological recordings from area LIP indicate non-linear interaction between motion categorisation decisions and saccade choice direction.
  
  The careful investigation of a recurrent network model suggests that feedback from choice target representations to an earlier sensory evaluation stage might be the source for this non-linear modulation and that it is an important circuit component for behavioural performance.
  
  The paper presents a possible solution to a central controversy about the role of LIP in perceptual decision-making, but see below.
  
  Weaknesses:
  
  The paper presents a possible solution to a central controversy about the role of LIP in perceptual decision-making. However, the authors could be more clear and upfront about their interpretational framework and potential alternative interpretations.
  
  Centrally, the authors' model and experimental data appears to test only that LIP carries out sensory evaluation in its RFs. The model explicitly parks the representation of choice targets outside the "LIP" module receiving sensory input. The feedback from this separate target representation provides then the non-linear modulation that matches the neurophysiology. However, they ignore the neurophysiological results that LIP neurons can also represent motor planning to a saccade target.
  
  The neurophysiological results with a modulation of the direction tuning by choice direction (contralateral vs ipsilateral) are intriguing. However, the evaluation of the neurophysiological results are difficult, because some of the necessary information is missing to exclude alternative explanations. It would be good to see the actual distributions and sizes of the RF, which were determined based on visual responses not with a delayed saccade task. There might be for example a simple spatial configuration, for example, RF and preferred choice target in the same (contralateral) hemifield, for which there is an increase in firing. It is a shame that we do not see what these neurons would do if only a choice target would be put in the RF, as has been done in so many previous LIP experiments. The authors exclude also some spatial task configurations (vertical direction decisions), which makes it difficult to judge whether these data and models can be generalised. The whole section is difficult to follow, partly also because it appears to mix reporting results with interpretation (e.g. "feedback").
  
  The model and its investigation is very interesting and thorough, but given the neurophysiological literature on LIP, it is not clear that the target module would need to be in a separate brain area, but could be local circuitry within LIP between different neuron types.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".
  
  Strengths:
  
  Linking the results to RNN simulations and simulated lesions.
  
  Weaknesses:
  
  Potential interpretational issues due to a lack of evidence on what happens at the time of the saccades.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The neurophysiological results with a modulation of the direction tuning by choice direction are intriguing. However, the evaluation of the neurophysiological results are difficult because some of the necessary information is missing to exclude alternative explanations.
  
  We thank the reviewer for the helpful comments. We have addressed this point in detail in the following response.
  
  (a) Clearly state in the results how the response field "RF", where the stimulus was placed, was mapped. The methods give as "MGS"" (i.e., spatial selectivity during stimulus presentation and delay)" task rather than the standard delayed saccade. And also "while for those neurons which did not show a clear RF during the MGS task, we presented motion stimuli in the positions (always in the visual field contralateral to the recorded hemisphere) in which neurons exhibited the strongest response to the motion stimuli." All this sounds more like a sensory receptive field not an eye movement response filed". What was the exact task and criterion?
  
  We agree with the reviewer that the original description of how we mapped the response fields (RFs) of LIP neurons lacked sufficient detail. In this study, we used the memory-guided saccade (MGS) task to map the RFs of all isolated LIP neurons. Both MGS and delayed saccade tasks are commonly used to map a neuron's response field in previous decision-making studies.
  
  In the MGS task, monkeys initially fixate on the center of the screen. Subsequently, a dot randomly flashes at one of the eight possible locations surrounding the fixation dot with an eccentricity of 8 degree, requiring the monkeys to memorize the location of the flashed dot. After a delay of 1000 ms, the monkeys are instructed to saccade to the remembered location once the fixation dot disappears. The MGS task is a standard behavior task for mapping visual, memory, and motor RFs, particularly in brain regions involved in eye movement planning and control, such as LIP, FEF, and the superior colliculus.
  
  We believe the reviewer's confusion may stem from whether we mapped the visual, memory, or motor RFs of LIP neurons in the current study, as these "RFs" are not always consistent across individual neurons. In our study, we primarily mapped the visual and memory RFs of each LIP neuron by analyzing their activity during both the target presentation and delay periods. To focus on sensory evaluation-related activity, we presented the visual motion stimulus within the visual-memory RF of each neuron. For neurons that did not show a significant visual-memory RF, we used a different approach: we tested the neurons with the main task by altering the spatial configuration of the task stimuli to identify the visual field that elicited the strongest response when the motion stimulus was presented within it. This approach was used to guide the placement of the stimulus during the recording sessions.
  
  Following the reviewer’s suggestion, we have added the following clarification to the results section to better describe how we mapped the RF of LIP neurons:
  
  ‘We used the memory-guided saccade (MGS) task, which is commonly employed in LIP studies, to map the receptive fields (RFs) of all isolated LIP neurons. Specifically, we mapped both the visual and memory RFs of each neuron by analyzing their activity during the target presentation and delay periods of the MGS task (see Methods).’.
  
  (b) l.85 / l126: What do you mean by "orthogonal to the axis of the neural RF" - was the RF shape asymmetric, if so how did you determine this? OR do you mean the motion direction axis? Please explain.
  
  We realized that the original description of this point may have been unclear and could lead to confusion. The axis of the neural RF refers to the line connecting the center of the RF (which coincides with the center of the motion stimulus) to the fixation dot. We have revised this sentence in the revised manuscript as follows:
  
  ‘To examine the neural activity related to the evaluation of stimulus motion, we presented the motion stimuli within the RF of each neuron, while positioning the saccade targets at locations orthogonal to the line connecting the center of the RF (which also marks the center of the motion stimulus) and the fixation dot.’
  
  (c) Behavioural task. Figure 1 - are these example session? Please state this clearly. Can you show the examples (psychometric function and reaction times) separated for trials where correct choice direction aligning with the motion preference (within 90 degrees) and those that did not?
  
  Figure 1 shows the averaged behavioral results from all recording sessions. We have added this detail in the revised legend of Figure 1.
  
  We are uncertain about the reviewer’s reference to the “correct choice direction aligning with the motion preference,” as the term “motion preference” is specific to the neuron response, which are different for different neurons recorded simultaneously using multichannel recording probe.
  
  Nonetheless, following the reviewer’s suggestion, we grouped the trials in each recording session into two groups based on the relationship between the saccade direction and the preferred motion direction of the identified LIP neuron during one example single-channel recording. Both the RT and the performance accuracy during one example session were shown in the following figure.
  
  Author response image 1.
  
  Give also the performance averaged across all sites included in this study and range.<br /> If performance does differ for different configuration, please, show that the main modulatory effect does not align with this distinction.
  
  To clarify this point, we have plotted performance accuracy and RTs for horizontal, oblique, and vertical target position configurations separately, which are shown for both monkeys in the following figures. We did not observe any systematic influences of task configurations on the monkeys' performance accuracy. While the RTs did differ across different configurations, we believe these differences are likely attributable to several factors, such as varying levels of familiarity introduced by our training process and the intrinsic RT difference between different saccade directions.
  
  Author response image 2.
  
  (d) Show the distribution of RF positions and the direction preferences for the recording sites included in the quantitative analysis of this study. (And if available, separately those excluded).
  
  Following the reviewer’s suggestion, we have plotted the centers of the RFs for all neurons with identifiable RFs, categorizing them by their preferred motion directions. To determine each neuron’s RF, we analyzed the average firing rates from both the target presentation and delay periods during each trial of the memory-guided saccade (MGS) task. The RF centers of neurons with significant RFs were determined through a two-step process. First, we selected neurons that exhibited significant RFs in the MGS based on the following criteria: 1) there must be a significant activity difference between the eight target locations, and 2) the mean activity during the selected periods should be significantly greater than the baseline activity during the fixation period. Second, we fitted the activity data from the eight conditions to a Gaussian distribution, using the center of the fitted distribution as the RF center. A significant proportion of neurons from both monkeys that exhibited significant response to motion stimuli did not exhibited notable RFs based our current method. The following figures show the distributions of RFs and motion direction preference for all LIP neurons with identifiable RFs separately for each monkey. Since this is not the focus of the current study, we are not planning to include this result in the revised manuscript.
  
  Author response image 3.
  
  (e) Following on from d), was there a systematic relationship between RF position or direction preference and modulation by choice direction? For instance could the responses be simply explained by an increase in modulation for choices into the same (contralateral) hemifield as where the stimulus was placed?
  
  The reviewer raised a good point. To address whether there was a systematic relationship between RF position or direction preference and modulation by choice direction, we calculated a modulation index for each neuron to quantify the influence of saccade direction on neuronal responses to motion stimuli. We then plotted the modulation index against the RF position for each LIP neuron, shown as following:
  
  Author response image 4.
  
  As shown in the figures above, neurons with RFs farther from the horizontal meridian were more likely to exhibit stronger modulation by the saccade direction, while neurons with RFs closer to the horizontal meridian showed inconsistent and weaker modulation. This is because when the RFs was on the horizontal meridian, saccade directions were aligned with the vertical axis (with no contralateral or ipsilateral directions). This is consistent with the finding in Figure S3—no significant differences in direction selectivity between the CT and IT conditions in the data sessions where the saccade targets were aligned close to the vertical direction. Since fewer than half of the identified neurons showed clear receptive fields using our method, the figure above did not include all the neurons used in the analysis in the manuscript. Therefore, we chose not to include this figure in the revised manuscript.
  
  Additionally, we quantified the relationship between the modulation index and direction preference for neurons in sessions where the monkeys’ saccades were aligned to either horizontal or oblique directions. As shown in the following figure, no systematic relationship was found between direction preference and modulation by the choice direction for LIP neurons at the population level.
  
  Author response image 5.
  
  We have added this result as Figure S 2 in the revised manuscript.
  
  Notably, the observed modulation of saccade direction on LIP neurons’ response to motion stimuli cannot be simply explained by saccade direction selectivity. We presented two more evidence to rule out such possibility in the original manuscript. First, the modulation effect we observed was nonlinear; specifically, the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This phenomenon is unlikely to be attributed to a linear gain modulation driven by saccade directions. Second, we plotted the averaged neural activity for contralateral and ipsilateral saccade directions separately, and found that LIP neurons showed similar levels of activity between two saccade directions (revised Figure 2L).
  
  Additionally, we added a paragraph in the Methods section to describe the way we calculated modulation index as follows:
  
  “We have calculated a modulation index for each neuron to reflect the influence of saccade direction on neuron’s response to visual stimuli. The modulation index is calculated as:
  
  where represents the average firing rate from 50ms to 250ms after sample onset for all contralateral saccade trails with a neuron’s preferred moving direction of visual stimuli. The naming conventions are the same for , , and . An MI value between 0 and 1 indicate higher modulation in contralateral saccade trials, and an MI value between -1 and 0 indicates higher modulation in ipsilateral saccade trials.”
  
  Please split Figures 2G,H,I J,K, by whether the RF was located contralaterally or ipsilaterally. If there are only a small number of ipsilateral RFs, please show these examples, perhaps in an appendix.
  
  This is a reasonable suggestion; however, it is not applicable to our study. Among all the neurons included in our analysis, only one neuron from each monkey exhibited ipsilateral receptive fields (RFs). Therefore, we believe it may not be necessary to plot the result for this outlier.
  
  (f) Were the choice targets always equi-distant from the stimulus and at what distance was this? Please give quantitative details in methods.
  
  The review was correct that the choice targets were always equidistant form the stimulus. The distance between the motion stimulus and the target was typically 12-15 degree. We have added the details in the revised Methods section as follows:
  
  ‘Therefore, the two saccade targets were equidistant from the stimulus, with the distance typically ranging from 12 to 15 degrees.
  
  (2) For Figure 3E, how do you explain that there is an up regulation of for contralateral choices before the stimulus onset, i.e. before the animal can make a decision? Is this difference larger for error trials?
  
  This is a good question, which we have attempted to clarify in the revised manuscript. We believe that the observed upregulation in neural activity for contralateral choices may reflect the monkeys’ internal choice bias or expectation (choice between two motion directions) prior to stimulus presentation, which could influence their subsequent decisions. In Figure 3E, we calculated the r-choice to assess the correlation between the neuron’s direction selectivity and the monkeys’ decisions on motion stimuli, separately for contralateral and ipsilateral choice conditions. The increased r-decision during the pre-stimulus period indicates stronger neural activity for trials in which the monkeys later reported that the upcoming stimulus was in the preferred direction, and weaker activity for trials where the stimulus was judged to be in the non-preferred direction. This correlation was more pronounced for contralateral choices than for ipsilateral ones. It is important to note that while the monkeys cannot predict the upcoming stimulus direction with greater-than-chance accuracy, these results suggest that pre-stimulus neural activity in LIP is correlated with the monkeys’ eventual decision for that trial. Furthermore, LIP neural activity was more strongly correlated with the monkeys’ decisions in the contralateral choice condition compared to the ipsilateral one.
  
  Additionally, we clarify that the r-decision was calculated using both correct and error trials. When comparing Figure 2J with Figure 2K, the correlation between neural activity and the monkeys’ upcoming decision during the pre-stimulus period was most prominent in low- and zero-coherence trials, where the monkeys either made more errors or based decisions on guesswork. We infer that the monkeys' confidence in these decisions was likely lower compared to high-coherence trials. Thus, the decision process appears to be influenced by pre-stimulus neural activity, particularly in low-coherence and zero-coherence trials.
  
  Although it is unclear precisely what covert process this pre-stimulus activity reflects, similar patterns of choice-predictive pre-stimulus activity have been observed in LIP and other brain areas (Shadlen, M.N. and Newsome,T.W., 2001; Coe, B., at al. 2002; Baso, M.A. and Wurtz, R.H., 1998; Z. M. Williams at al. 2003). We have clarified this point in the revised manuscript, including a revision of the relevant sentence in the Results section for clarity, shown as follows:
  
  “Furthermore, we used partial correlation analysis to examine decision- and stimulus-related components of DS (i.e., r-decision and r-stimulus, Figure 3E and 3F) using all four coherence levels. The decision-related component of LIP DS was significantly greater in the CT condition than in the IT condition (Figure 3E; nested ANOVA: P = 1.07e-6, F= 25.72), and this difference emerged even before motion stimulus onset. This suggests that the LIP DS was more closely correlated with monkeys’ decisions in the CT condition than in the IT condition. The upregulation in r-decision for contralateral choices may reflect the monkeys’ internal choice bias or expectation (choice between two motion directions) prior to stimulus presentation, which could influence their subsequent decisions more in the CT condition”
  
  (3) Figure 2K: what is the very large condition-independent contribution? It almost seems as most of what these neurons code for is neither saccade or motion related.
  
  The condition-independent contribution is the time-dependent component that is unrelated to saccade, motion, or their interaction. Our findings are consistent with previous methodological studies, where this time-dependent component was shown to account for a significant portion of the variance in population activity (Kobak, D. et al., 2016)
  
  (4) Abstract:
  
  a) "We found that the PPC activity related to monkeys' abstract decisions about visual stimuli was nonlinearly modulated by monkeys' following saccade choices directing outside each neuron's response field."
  
  This sentence is not clear/precise in two regards:
  
  Should "directing" be "directed"?
  
  Also, it is not just saccades directed outside the RF, but towards the contralateral hemifield.
  
  We thank the reviewer for the suggestion. We agree that ‘directing’ should be ‘directed’ and revised it accordingly. However, we do not believe that ‘directed outside each neuron's response field’ should be replaced with “towards the contralateral hemifield”. There are two major reasons. First, the modulation effect was identified as the difference between contralateral and ipsilateral saccade directions. We cannot conclude that the modulation mainly happened in the contralateral saccade direction. Second, we used ‘directed outside each neuron's response field’ to emphasize that this modulation cannot be simply explained by saccade direction selectivity, whereas ‘towards the contralateral hemifield’ cannot fulfill this purpose.
  
  (b) " Recurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, mediated such feedback modulation."
  
  - should be "that feedback connection .... might mediate". A model can only ever give a possible explanation.
  
  Thanks for the help on the writing again! We have revised this sentence as following: “Recurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, might mediate such feedback modulation.”
  
  (c) "thereby increasing the consistency of flexible decisions." I am not sure what is really meant by increasing the consistency of flexible decisions? More correct or more the same?
  
  We apologize for the confusion. In the manuscript, "decision consistency" refers to the degree of agreement in the model's decisions under specific conditions. A higher decision consistency indicates that the model is more likely to produce the same choice when encountering encounters a stimulus in that condition. We have incorporated your suggestion and revise this sentence as “thereby increasing the reliability of flexible decisions”. We also clarified the definition of consistency in the main text as follows:
  
  “These disrupted patterns of saccade DS observed in the target module following projection-specific inactivation aligned with the decreased decision consistency of RNNs, where decision consistency reflects the degree of agreement in the model's choices under specific task conditions. This suggests a diminished reliance on sensory input and an increased dependence on internal noise in the decision-making process.”.
  
  (5) Results: headers should be changed to reflect the actual results, not the interpretation:
  
  "Nonlinear feedback modulation of saccade choice on visual motion selectivity in LIP"
  
  "Feedback modulation specifically impacted the decision-correlated activity in LIP"
  
  These first parts of the results describe neurophysiological modulations of LIP activity, the source cannot be known from the presented data alone. I thought that this feedback is suggested by the modelling results in the last part of the results. It is confusing to the reader that the titles already refer to the source of the modulation as "feedback". The titles should more accurelty describe what is found, not pre-judge the interpretation.
  
  We thank the reviewer for those valuable suggestions. We have updated the subtitles to: “Nonlinear modulation of saccade choice on visual motion selectivity in LIP” and “Decision-correlated but not stimulus-correlated activity was modulated in LIP.”
  
  (6) page 8, l366-380. Can you link the statements more directly to panels in Figure 6. For Figure 6H-K, it needs to be clarified that the headers for 6D-G also apply to H-K.
  
  We have added headers for Figure 6H-K in the revised version, and revised the corresponding results section as follows.
  
  ‘We further examined how the energy landscape in the 1-D subspace changed in relation to task difficulty (motion coherence). Consistent with prior findings, trials with lower decision consistency (trials using lower motion coherence) exhibited shallower attractor basins at the time of decision for all types of RNNs (Fig. 6H-K). However, both the depth and the positional separation of attractor basins in the network dynamics significantly decreased for all non-zero motion coherence levels after the ablation of all feedback connections (comparing Figure 6I with Figure 6H; P(depth) = 5.20e-25, F = 122.80; P(position) = 1.82e-27, F = 137.75; two-way ANOVA). Notably, this reduction in basin depth and separation was more pronounced in the specific group compared to the nonspecific groups after ablating the feedback connections (comparing Figure 6J with Figure 6K; P(depth) = 2.65e-13, F =57.35; P(position) = 3.73e-14, F = 61.79; two-way ANOVA). These results might underlie the computational mechanisms that explain the observed reduction in the decision consistency of RNNs following projection-specific inactivation: the shallower and closer attractor basins after ablating feedback connections resulted in less consistent decisions. This happened because the variability in neural activity made it more likely for population activity to stochastically shift out of the shallower basins and into nearby alternative ones.’
  
  (7) line 556-557: Please provide a reference or data for the assertion that nearby recording sites in LIP (100 microns apart) have similar RFs.
  
  The reviewer raised an interesting question that we are unable to address in depth with the current data, as we lack information on the specific cortical location for each recording session. In the original manuscript, we suggested that nearby recording sites in LIP have similar receptive fields (RFs), based on both our own experience with LIP recordings and previous studies. Specifically, we observed that neurons recorded within a single penetration using a single-channel electrode typically exhibited similar RFs. Similarly, the majority of neurons recorded from the same multichannel linear probe within a single session also showed comparable RFs. Additionally, several studies (both electrophysiological and fMRI) have reported topographic organization of RFs in LIP (Gaurav H. Patel et al., 2010; S. Ben Hamed et al., 2001; Gene J. Blatt et al., 1990).
  
  (8) Line 568, Methods: a response criterion of a maximum firing rate of 2 spikes/s seems very low, especially for LIP. How do the results change if this lifted to something more realistic like 5 spikes/s or 10 spikes/s?
  
  We chose this criterion to ensure we included as many neurons as possible in our analysis. To further clarify, we have plotted the distribution of maximum firing rates across all neurons. Based on our findings, relaxing this criterion is unlikely to affect the results, as the majority of neurons exhibit maximum firing rates well above 5 spikes/s, and many exceed 10 spikes/s. We hope this explanation addresses the concern.
  
  Author response image 6.
  
  Reviewer #2 (Recommendations For The Authors):
  
  In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".
  
  The data are generally interesting, and the manuscript is generally well written (but see some specific comments below on where I was confused). However, I'm still not sure about the conclusions. The way the experiment is setup, the "contra" saccade target is essentially in the same hemifield as the motion patch stimulus. Given that the RF's can be quite large, isn't it important to try to check whether the saccade itself contributed to the effects? i.e. if the RF is on the left side, and the "contra" saccade is to the left, then even if it is orthogonal to the location of the stimulus motion patch itself, couldn't the saccade still be part of a residual edge of the RF? This could potentially contribute to elevating the firing rate on the preferred motion direction trials. I think it would help to align the data on saccade onset to see what happens. It would also help to have fully mapped the neurons' movement fields by asking the monkeys to generate saccades to all screen locations in the monitor. The authors mention briefly that they used a memory-guided saccade task to map RF's, but it is also important to map with a visual target. And, in any case, it would be important to show the mapping results aligned on saccade onset.
  
  Another comment is that the authors might want to mention this other recent related paper by the Pack group: https://www.biorxiv.org/content/10.1101/2023.08.03.551852v2.full.pdf
  
  We thank the reviewer for the comments and realized that we did not explain our results clearly in the original manuscript. We agree with the reviewer that saccade direction selectivity might be a confounding factor for the modulation of the saccade choice direction onto LIP neurons’ activity responded to visual motion stimuli. Because the RFs of LIP neurons might be large and the saccade target might be presented within the edge of the RFs. However, we believe that the observed modulation of saccade direction on LIP neurons’ response to motion stimuli cannot be simply explained by saccade direction selectivity. We presented several pieces of evidence to rule out such possibility. First, the modulation effect we observed was not linear; specifically, the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This phenomenon is unlikely to be attributed to a linear gain modulation driven by saccade directions. Second, we plotted the averaged neural activity for contralateral and ipsilateral saccade directions separately, aligned the activity to either motion stimulus onset or saccade onset, and found that LIP neurons showed similar levels of activity between the contralateral and ipsilateral directions (revised Figure 2L), which is not consistent with obvious saccade direction selectivity.
  
  To better control for this confound, we have added figures plotting the mean neural activity aligned to saccade onset for both contralateral and ipsilateral saccades, which are now included in the revised main Figure 2. These figures are presented in the detailed response below. Additionally, we have revised the corresponding results section to clarify our points, as outlined below:
  
  “Figure 2A-2F shows three example LIP neurons that exhibited significant motion coherence correlated DS. Surprisingly, LIP neurons showed greater DS in the CT condition than in the IT condition, even though the same motion stimuli were used in the same spatial location for both conditions. The averaged population activity showed this DS difference between CT and IT conditions for all four coherence levels (Figure 2G, 2H). During presentation of their preferred motion direction, LIP neurons showed significantly elevated activity in the CT relative to the IT at all coherence levels (Figure S1A, S1B, nested ANOVA: P(high) = 0.0326, F = 4.65; P(medium) = 0.0088, 142 F = 7.03; P(low) = 0.0076, F = 7.32; P(zero) = 0.0124, F = 6.4), and a trend toward lower activity to the nonpreferred direction for CT vs. IT (Figure S1C, S1D, nested ANOVA: P(high) = 0.0994, F = 2.75; P(medium) = 0.0649, F = 3.12; P(low) = 0.0311, F = 4.73; P(zero) = 0.0273, F = 4.96). Most of the LIP neurons (48 of 83) showed such opposing trends in activity modulation between the preferred and nonpreferred directions (Figure 2I). These results indicated a nonlinear modulation of saccade choice on motion DS in LIP, aligned precisely with the response property of each neuron. This is unlikely to be driven by a linear gain modulation of saccade direction selectivity. Receiver operating characteristic (ROC) analysis further confirmed significantly greater motion DS in the CT condition than in the IT condition (Figure 2J 148 and 2K; nested ANOVA: P(high) = 5.0e-4, F= 12.44; P(medium) = 9.53e-6, F = 20.91; P(low) = 9.33e-7, F 149 = 26.03; P(zero) = 2.56e-8, F= 34.3). Such DS differences were observed even before stimulus onset. Moreover, LIP neurons exhibited similar levels of mean activity between different saccade directions (CT vs. IT) before monkeys’ saccade choice (Figure 2L), further supporting that saccade direction selectivity did not significantly contribute to the observed modulation of LIP neurons’ responses to motion stimuli.
  
  We also thank the reviewer for pointing out the missing of this relevant study, we have added the suggested refence in the revised discussion section as follows:
  
  ‘A recent study demonstrated that neurons in the middle temporal area responded more strongly to motion stimuli when monkeys saccaded toward their RFs in a standard decision task with a fixed mapping between motion stimuli and saccade directions. This modulation emerged through the training process and contributed causally to the monkeys' following saccade choices. Consistently, we found that the response of LIP neurons to motion stimuli was more strongly correlated with the monkeys' decisions in the CT condition (saccades toward RFs) than in the IT condition, in a more flexible decision task. Together, these results suggest that the modulation of action selection on sensory processing may be a general process in perceptual decision-making. However, the observed modulation of saccade direction on LIP neurons' responses to motion stimuli cannot be simply explained by saccade direction selectivity. Several lines of evidence argue against this possibility. First, the modulation effect was nonlinear; specifically, neuronal firing rates increased for preferred motion directions but decreased for non-preferred directions (Figure 2I and Figure S1). This pattern is unlikely to be driven by a linear gain modulation based on saccade directions. Second, we found that LIP neurons exhibited similar levels of activity in both the CT and IT conditions (Figure 2L), which is inconsistent with the presence of clear saccade direction selectivity.
  
  Some more specific comments are below:
  
  - I had a bit of a hard time with the abstract. It does not appear to be crystal clear to me, and it is the first thing that I am reading after the title. For example, if there is a claim about both perceptual decision-making and later target selection, then I feel that the task should be explained a bit more clearly than saying "flexible decision" task. Also, "..modulated by monkeys' following saccade choices directing outside each neuron's response field" was hard to read. It needs to be rewritten. Maybe just say "...modulated by the subsequent eye movement choices, even when these eye movement choices always directed the eyes away from the recorded neuron's response field". Also, I don't fully understand what "selectivity-specific feedback" means. Then, the concept of "consistency" in flexible decisions is brought up, again without much context. The above are examples of why I had a hard time with the abstract.
  
  We realize that our original statement may have been unclear and potentially caused confusion for the readers. Following the reviewer’s suggestions, we have revised the abstract as follows:
  
  ‘Neural activity in the primate brain correlates with both sensory evaluation and action selection aspects of decision-making. However, the intricate interaction between these distinct neural processes and their impact on decision behaviors remains unexplored. Here, we examined the interplay of these decision processes in posterior parietal cortex (PPC) when monkeys performed a flexible decision task, in which they chose between two color targets based on a visual motion stimulus. We found that the PPC activity related to monkeys’ abstract decisions about visual stimuli was nonlinearly modulated by their subsequent saccade choices, which were directed outside each neuron’s response field. Recurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, might mediate such feedback modulation. Further analysis on network dynamics revealed that selectivity-specific feedback connectivity intensified the attractor basins of population activity underlying saccade choices, thereby increasing the reliability of flexible decisions. These results highlight an iterative computation between different decision processes, mediated primarily by precise feedback connectivity, contributing to the optimization of flexible decision-making.’
  
  Specifically, selectivity-specific feedback refers to the feedback connections with positive or negative weights between selectivity-matched and selectivity-nonmatched unit pairs, respectively.
  
  Regarding "decision consistency," we define it as the degree to which the model’s decisions remain congruent under specific conditions. A higher level of decision consistency indicates that the model is more likely to produce the same choice each time it is presented with a stimulus under those conditions, in another words, decision reliability. We have revised the corresponding results section to make these concepts clearer.
  
  - Line 69: I'm not fully sure, but I think that some people might suggest that superior colliculus is also involved in the sensory aspect of the evaluation. But, I guess the sentence itself is correct as you write it. So, I don't think anyone should argue with it. However, if someone does argue with it, then they would flag the next sentence, since if the colliculus does both, then do the sensory and motor parts really employ distinct neural processes? Anyway, I think this is very minor.
  
  This is an interesting point. We have also noticed a recent study that demonstrates that the superior colliculus is causally involved in the sensory aspect of decision-making, specifically in visual categorization. However, the study also distinguishes between neural activity related to categorical decisions and that related to saccade planning. This suggests that the sensory and motor aspects of decision-making likely involve distinct neural processing, even within the same brain region—potentially reflecting separate populations of neurons. Therefore, we stand by our statement in the ‘next sentence’.
  
  - Line 79-80: you might want to look at this work because I feel that it is relevant to cite here: https://www.biorxiv.org/content/10.1101/2023.08.03.551852v2
  
  We have discussed this reference in the revised discussion section of the manuscript, please refer to the above response.
  
  - For a result like that shown in Fig. 2, I feel that it is important to show RF mapping with a saccade task alone. i.e. for the same neurons, have a monkey make a delayed visually guided saccade task to all possible locations on the display, and demonstrate that there is no modulation by saccades to the targets. Otherwise, the result in Fig. 2 could reflect first an onset response by a motion, and then the saccade-related response that would happen anyway, even without the decision task. So, I feel that now, it is not entirely clear whether the result reflects this so-called feedback modulation, or whether simply planning the saccade to the target itself activates the neurons. With large RF's, this is a distinct possibility in my opinion.
  
  - Line 174: this would also be predicted if the neuron's were responding based on the saccade target plan independent of the motion stimulus
  
  - On a related note, I would recommend plotting all data also aligned on saccade onset. This can help establish what the cause of the effects described is
  
  We understand the reviewer’s concern that the modulation might be related to saccade planning, and we acknowledge that the original manuscript might not adequately address this potential confound. Unfortunately, we did not map the LIP neurons' receptive fields (RFs) using a saccade-only task. However, as mentioned earlier, we believe that the modulation of LIP neurons' responses to motion stimuli based on saccade choice direction cannot be simply attributed to saccade direction selectivity. Several lines of evidence support this conclusion. First, the modulation we observed was nonlinear: the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This pattern is inconsistent with a simple linear gain modulation driven by saccade direction selectivity. Second, we directly compared LIP neuronal activity for contralateral and ipsilateral target conditions, and found no significant differences between the two. This suggests that saccade direction selectivity is unlikely to be the primary contributor to the observed modulation. In the revised figure, we added a plot (Figure 2L) that aligns neural activity to saccade onset, in addition to the original alignment to motion stimulus onset (Figure S1E). This new analysis further supports our interpretation.
  
  Author response image 7.
  
  - Even when reading the simulation results, I'm still not 100% sure I understand what is meant by this idea of "consistency" of flexible decision-making
  
  We have addressed this issue in a previous comment and please refer to the response above.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.15.549136v3
www.biorxiv.org www.biorxiv.org

New submission 11/10/2023, 08:41:32

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We would like to thank you and the two reviewers for their constructive feed-back on our manuscript entitled: "Substrate evaporation drives collective construction in termites".
  
  Here, we submit a revised version in which -we believe- we fill the missing details identified by the reviewers and we clarify the presentation of our results.
  
  From the eLife assessment we can identify a few main points that the reviewers found unclear or not well developed in our previous manuscript:
  
  • Insufficient details about computer simulation models. Is the match between simulations and experiments qualitative or quantitative?
  
  • Request for clarifications related to the wall stimulus: is evaporation stronger at the high-curvature wall corners or similar along all the wall edge? Why is there less consistency in the experimental results with the wall stimulus, with a minority of wall experiments in which something different happens?
  
  • Quantitative estimation of the humidity gradients in our experimental setup.
  
  • "Confirmation" that termites can sense humidity gradients of magnitude and scale comparable with those encountered in our experiments.
  
  • Request for additional background information about the considered termite species and their construction habits.
  
  The reviewers also made a number of interesting suggestions and other comments:
  
  • Suggestion of possible explanations and interpretations for a purported discrepancy with a previous work by Calovi and collaborators.
  
  • Suggestion of alternative experimental approaches (array of probes, alternative experimental setups).
  
  We address all these points below.
  
  Details about computer simulation models
  
  There are two different types of computer simulations in our experiments: 1. simulations of evaporation on the initial structure, and 2. simulations of structure growth based on curvature.
  
  1) Simulations of evaporation We recall that these simulations rely on the hypothesis that humidity transport happens in a diffusive way, that is evaporation rate is proportional to the humidity gradient. New details on the implementation of these diffusive simulations are now added in section S.VI. We also adapted figures 4A and 4B which are now expressed in units more comparable to the expected humidity field in experiments. Essentially, we show that the model under-estimates the absolute magnitude of the humidity gradient |∇ℎ| in our setup while it correctly predicts the relative importance of the same field across the topography.
  
  First, it is instructive to report the value of |∇ℎ| predicted by diffusive simulations with the bottom boundary at 100% humidity (like the clay disk), and the top boundary of the simulation box at 70% like our experimental room. Note that, at a given temperature, relative humidity and absolute humidity are proportional, so we will assume here that temperature is constant and always refer to relative humidity. Thus, humidity gradient will be measured in 𝑚𝑚−1 exactly like curvature. One than has:
  
  • flat disk, |∇ℎ| ∼0.01mm−1
  
  • wall tips, |∇ℎ| ∼0.13mm−1
  
  • wall top edge |∇ℎ| ∼0.1mm−1
  
  • pillar tips |∇ℎ| ∼0.19mm−1,
  
  First we remark, that the value of |∇ℎ| on the flat portion of the disk is 10 times smaller of the estimation |∇ℎ|0 ∼0.5mm−1 of the same quantity in our experiments, which is now given in the manuscript and discussed in a specific paragraph below. This discrepancy is due to the fact that our simulations overestimate the size of the diffusive region (i.e. the simulation box) to 18mm while we expect the diffusive layer to be much thinner (i.e. 𝛿 ∼2mm). Note also that, as in all diffusive problems, the humidity gradient on any point of the bottom boundary (i.e. on the clay surface) depends on the distance of that point from the top boundary, for example the closer are the boundaries the stronger is the gradient. This is a very general feature of diffusive problems: the gradient of the diffusing field depends on the distance from the boundaries, where the value of the field is given. Note also that, in principle, the size of the simulation box does not only affect the overall magnitude of the humidity gradient but also its shape. However, one observes that in our simulations the topographic cues are only 30% closer to the top boundary compared to the flat, bottom, surface, but the local gradient is 10 to 20 times larger. This evidence suggests that the ’curvature’ effect is much stronger than the ’distance’ effect, and supports the fact that our approximation does not affect in a significant way the estimation of the relative importance of the humidity gradient at the bottom surface. We then conclude that our diffusive simulations do not provide a correct estimation of the order of magnitude of |∇ℎ|, but well capture its relative variations across the topography.
  
  2) Structure growth based on curvature. As observed by the reviewer, the dynamical simulations included here refer to a model that was developed in a previous study, thus we chose to not include all the details of the simulations in the present one. At this stage, that model is still phenomenological: for example we cannot provide a physical estimation of the dimensionless parameter 𝑑 which controls the typical size of the structure produced by the simulations of the model. Thus in principle, the comparisons with real experiments cannot be other than "qualitative". Indeed, to push such a comparison further is not necessarily of interest, given the minimal and mean field character of our model, and the extreme complexity of the natural system which is studied here. However, our experimental setup was specifically designed to overcome this limit, which is designing topographies where the curvature cues where modulated in a way which is almost discrete, with flat regions, and regions where curvature is strong ’for termites’, i.e. the curvature radius is of the order of termite body size. Our experimental results greatly validate our choice because deposition patterns also show an almost ’discrete’ shape, with specific regions attracting most of the depositing actions. Thus, we claim that the significance of the agreement is strong, and we suggest that when stimuli and response both behave in a quasi-discrete manner, the difference between qualitative and quantitative is not well defined. Finally, we recall that in all the discussion above curvature and humidity gradient can be exchanged, as we already pointed out in the manuscript. Consistently, the humidity gradient show a strong variation between the curved regions and the flat ones.
  
  Results with the wall stimulus One important point coming out from the reviews is that we did not clearly present the results with the wall stimulus. These concerns are best summarized by a comment from reviewer 2, who states: “evaporation rates seem inconclusive in the wall geometry, yet the termites still deposit material at the high-curvature wall corners”.
  
  We acknowledge that the interpretation of results of experiments with the wall stimulus must address three key points: 1- Salt deposition experiment are inconclusive in showing variation of the evaporation rate, across the top of the wall; 2- A portion (4/11) of termite experiments do not show a clear pellet deposition pattern by termites; 3- Conversely, in the remaining portion (7/11), most experiments still show a clear pellet deposition on the corners of the wall, in spite of small differences in evaporation between the corners and the top edge (like in our Fig. 3B). These points are now addressed in the manuscript and discussed below.
  
  The variation of the humidity gradient between the corners of the wall, and the wall’s top edge is relatively small while both are regions of relatively high curvature and higher evaporation as compared to the the flat surface of the clay disk. We now report precise values of the humidity gradient from numerical simulations, as discussed above. These indicate that humidity gradient at the wall corners and upper edge is respectively 10 and 7 times larger than on the flat bottom, but evaporation at the wall tips is only 0.3 times larger than on the wall upper edge.
  
  Experiments with the saline solution qualitatively confirm the same result of an evaporation pattern more evenly distributed on the wall stimulus (point 1) than on the pillars.
  
  Taken together, these results might explain why not all wall experiments end up with depositions at the tips (point 2): simply, in the wall experiments the relative importance of the deposition cue between tips and wall upper edge is not high enough to always guide termite behavior in a deterministic way.
  
  But we should also point to the fact that the evaporation simulations presented in figure 4 and the experiments with the saline solution both reflect the humidity field on the clay templates before termite construction has started. As soon as termites start adding pellets to the wall, effectively starting to build a pillar, the humidity gradient will be reinforced at the locations of pellet deposition, and a self-reinforcing process is initiated, similar to our dynamical simulations based on local curvature. This explains why eventually termite activity can result in clear and localized depositions (point 3) also with the wall stimulus.
  
  Incidentally, we would like to include here another consideration: the nest of Coptotermes termites comprise a “scaffold” with multiple interconnected pillars. In other termite genera, the prevalent nest structure is one made by surfaces, rather than pillars, such as in Nasutitermes nests, Apicotermes, Psammotermes, or again some fungus growing structures in Macrotermes and Synacanthotermes). The fact that the wall stimulus presents some potential to stimulate construction everywhere on its edge is intriguing as it might provide some cues on the construction of different nest architectures.
  
  Quantitative estimation of the humidity gradient in our setup The moisture gradients in our experiments and simulations was only presented in a non-quantitative manner, because we were mainly interested in identifying locations of high and low evaporation. But, combining scaling arguments already discussed in S.IX and the the results of our evaporation simulations, one can produce a lower boundary for the magnitude of the humidity gradient |∇ℎ|, predict its higher value at key positions on our setup, and compare it with humidity variations experienced by termites in their natural environment. These considerations are now included in the manuscript and discussed below.
  
  First, we define a reference value |∇ℎ|0 for the humidity gradient on the (flat) clay disk, which can be estimated using the boundary layer thickness 𝛿 ∼2mm (see section IX.A of the SI) and the variation of relative humidity Δℎ between the clay disk surface and the exterior which was Δℎ =30% (the difference between the fully wetted substrate, and room air humidity at 70% saturation). Note that |∇ℎ|0 constitutes a lower boundary for the expected values of the humidity gradient in our setup, as confirmed by our experiments with saline solution. We can then write:
  
  Next, the results of diffusive simulations shown in figure 4A and 4B indicate that the humidity gradient at highly curved regions of the topographic cues is at least 10 times larger than |∇ℎ|0 which allows to estimate an upper boundary for |∇ℎ| in our experimental setup, say |∇ℎ|𝑚𝑎𝑥 ∼1mm−1. Humidity sensing capabilities of termites Our hypothesis that humidity gradients could guide termite building behavior implicitly assumes that termites can sense humidity gradients comparable with those existing in our experiments.
  
  Humidity is important to all termites because of their small size and unsclerotized body. Coptotermes termites in particular are wetwood termites that can only survive in high-humidity environments such as moist wood or soil. It is well documented that coptotermes termites (like other termites and cockroaches) have humidity receptors in their antennae, and behavioral studies indicate that they can discriminate between chambers with different humidity content.
  
  For example, a study by Gautam and Henderson (2011, Environmental entomology, 40:1232) provided chambers with different relative humidity and, after 12 hours, almost all termites were in the highest humidity chamber (98% RH), leaving the other chambers with 75% or less RH empty. These results (which are similar also to other results testing termite response to chambers with different soil moisture) indicate that -given a sufficient amount of time- termites can detect a difference of humidity from 75% to 98% over a spatial scale of centimeters.
  
  The quantitative estimation of the humidity gradient described above indicates that in our experimental setup termites can experience humidity variations of 15% over a distance of only 1mm and even shorter, while the length of a single termite antenna is about 1.5 mm.
  
  In other words, the humidity gradients that we estimate for our experiments are well above those that termites were able to discriminate in previous experiments. Future experiments should aim to test the exact limits of resolution of the humidity-sensing ability of termites (e.g. in an environment where humidity is close to 100% everywhere), and the mechanisms how they sense the gradient (e.g. comparing information from the two antennae, or by integrating humidity information over time).
  
  By definition, |∇ℎ|0 corresponds to a variation of humidity between a fully saturated atmosphere (i.e. 100%), comparable to the nest interior, and a "humid" atmosphere (i.e. 70%) comparable to the natural environment where termites live (say the nest exterior), occurring over a distance (2mm) which is comparable with their body size.
  
  We can then conclude that even the lower boundary |∇ℎ|0 of the humidity gradient corresponds to an atmosphere variation to which termites must be used, i.e. nest interior vs nest exterior, happening across one body length. If we add that the upper boundary |∇ℎ|𝑚𝑎𝑥 is one order of magnitude higher, it appears extremely unlikely that they could not detect these gradients.
  
  Additional background information about our considered termite species and their construction habits
  
  We have now added some details about the life history and nesting habits of termites in the Coptotermes genus in a new paragraph in section SI. Essentially, these are wetwood termites that nest in moist wood or soil, and their nests present a typical structure comprising a scaffold of interconnected pillars (we now show a picture of a typical structure from one of our lab-reared colonies).
  
  After the initial submission of our manuscript we have also obtained a more precise taxonomic identification of the termites we used, which indicated that our termites are better identified as Coptotermes gestroi than Coptotermes formosanus. The two species are extremely close and can also interbreed in the areas where they co-occur, but in this case C. gestroi is a better match. Hence, we have amended the name in the manuscript and in the supplementary material.
  
  Differences with previous results by Calovi and collaborators
  
  We believe that there is no real discrepancy between our results and those described by Calovi et al. (2019, Phil. Trans. Roy. Soc. B 374:20180374). What they measure-termite aggregation and activity- is similar to what we also observe in our experiments: termites aggregate in concave regions, such as at the base of the wall in our experiments, and they collect pellets at the locations that they visit more often. And, above all, we observe that concavities promote digging activity, which in turns promote aggregation as already observed in previous studies like Green et al. (2017, Proc. Roy. Soc. B 284:20162730). The main difference is that in our analyses we treat separately the three measurements of termite occupancy, pellet collection and pellet deposition, and in this way we identify a role of convexity for pellet deposition.
  
  It is possible that, apart from the differences in language and interpretations between our study and the study by Calovi, there were also real differences in termite building behavior between the two studies that we couldn’t fully appreciate from our own reading of the article by Calovi, but which the reviewer has spotted. The reviewer makes a very interesting suggestion that some of these differences might be due to the different humidity level used in our experiment, compared to the experiment by Calovi and collaborators. Room humidity was high, at around 70% in our experiments. The humidity in Calovi’s experiments was possibly even higher as they performed their experiments in a closed box, but we could not find precise reported information on the humidity level in their publication.
  
  Given that it is not clear that the building behavior in our experiments was qualitatively different from the building behavior in Calovi and collaborators’ experiments, and given that we don’t know the precise humidity value used in Calovi’s experiments (plus, we worked on different termite species that could have different sensitivity to humidity) we decided that -based on the information that we have- we could not meaningfully expand our discussion of similarities and differences with Calovi’s study in our manuscript.
  
  It is clear, though, and we completely agree with the referee on this point, that in light of Calovi’s and our own new results, it would now be extremely interesting if future experiments could characterize termite construction activity across a range of finely controlled air humidity values. Anecdotally, in preliminary experiments we did include some trials in which termites were hosted in a completely closed box, and we observed much reduced construction activity in those conditions. However, the fact that we could not easily track termite activity and pellet collections / depositions in those conditions (because of the box), together with the fact that the building activity itself was reduced, made us to converge towards the open arena experiments that we describe here.
  
  Suggestion of alternative experimental approaches One reviewer made interesting suggestions for alternative experiments, including using an array of humidity probes for measuring humidity, or a different experimental setup -analogous to those used in previous experiments by Bardunias and collaborators-. It is often the case that only at the end of a series of experiments we identify an alternative, and possibly better, way of doing the same experiment. In future, if we have the opportunity to run other similar experiments again, we will likely experiment with these suggestions. When we first designed our own experiments, one of our priorities was to be able to film all termites in the arena at all time, so that potentially we could also study individual termite behavior and task specialization. This partly constrained the type of experimental setups that we could use.
  
  One aspect that clearly emerged from our work and from the revision process is that any future experiments related to this topic should achieve a very precise control of air humidity, and test a wider range of stimuli of more varied and controlled size, humidity and curvature. Since our own experiments were conducted, three of us have moved to different institutions, which imposes practical constraints for us on working on the same termites in a similar way, but the suggestions from the reviewers will be helpful as we are planning our future research.
  
  We hope that the explanations above and the details that we have changed in the manuscript itself have contributed to clarify unclear aspects of our study.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.17.528984v4
www.biorxiv.org www.biorxiv.org

Robust variability of grid cell properties within individual grid modules enhances encoding of local space

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers for their time and thoughtful comments. We believe that the further analyses suggested have made the results clearer and more robust. Below, we briefly highlight the key points addressed in the revision and the new evidence supporting them. Then, we address each reviewer’s critiques point-by-point.
  
  - Changes in variability with respect to time/experience
  
  Both reviewers #1 and #3 asked whether the variability in grid properties observed was dependent on time or experience. This is an important point, given that such a dependence on time could lead to interesting hypotheses about the underlying dynamics of the grid code. However, in the new analyses we performed, we do not observe changes in grid variability within a session (Fig S5 of the revised manuscript), suggesting that the grid variability seen is constant within the timescale of the data set.
  
  - The assumption of constant grid parameters in the literature
  
  Reviewer #2 pointed out that it had been appreciated by experimentalists that grid properties are variable within a module. We agree that we may have overstated the universality of this assumption in the original manuscript, and we have toned down the language in the revision. However, we note that many previous theoretical studies assumed these properties to be constant, within a given module. We provide some examples below, and have added evidence of this assertion, with citations to the theoretical literature, to the revised manuscript .
  
  - Additional sources of variability
  
  Reviewer #3 pointed out additional sources that might explain the variability observed in the paper (beyond time and experience). These sources include: field width, border location, and the impact of conjunctive cells. We have run additional analyses and have found no significant impact on the observed variability from any of these factors. We believe that these are important controls, and have added them to the manuscript (Fig S4-S7 of the revised manuscript)
  
  - Analysis of computational models
  
  Reviewer #3 noted that our results could be strengthened by performing similar analyses on the output of computational models of grid cells. This is a good idea. We have now measured the variability of grid properties in a recent normative recurrent neural network (RNN) model that develops grid cells when trained to perform path integration (Sorscher et al., 2019). This model has been shown to develop signatures of a 2D toroidal attractor (Sorscher et al., 2023) and achieves a high accuracy on a simple path integration task. Interestingly, the units with the greatest grid scores also exhibit a range of grid spacings and grid orientations (Fig S8 of the revised manuscript). Furthermore, by decreasing the amount of sparsity (through decreasing the weight decay regularization), we found an increase in the variability of the grid properties. This analysis demonstrates a heretofore unknown similarity between the RNN models trained to perform path integration and recorded grid cells from MEC. It additionally provides a framework for computational analysis of the emergence of grid property variability.
  
  Reviewer #1:
  
  (1) Is the variability in grid spacing and orientation that the authors found intrinsically organized or is it shaped by experience? Previous research has shown that grid representations can be modified through experience (e.g., Boccara et al., Science 2019). To understand the dynamics of the network, it would be important to investigate whether robust variability exists from the beginning of the task period (recording period) or whether variability emerges in an experience-dependent manner within a session.
  
  This is an interesting question that was not addressed in the paper. To test this, we performed additional analysis to resolve whether the variability changes across a session.
  
  Using a sliding window, we have measured changes in variability with respect to recording time (Fig S5A). To this end, we compute grid orientation and spacing over a time-window whose length is half the total length of the recording. From the population distribution of orientation and spacing values, we compute the standard deviation as a measure of variability. We repeat the same procedure, sliding the window forward until the variability for the second half of the recording is computed.
  
  We applied this approach to recording ID R12 (the same as in Figs 2-4) given that this recording session was significantly longer than the rest (nearly two hours). Results are shown in Fig S5B-C. For both orientation and spacing, no changes of variability with respect to time can be observed. Similar results were found for other modules (see caption of Fig S5 for statistics).
  
  We also note that the rats were already familiarized with the environment for 10-20 sessions prior to the recordings, so there may not be further learning during the period of the grid cell recordings. No changes in variability can be seen in Rat R across days (e.g., in Fig 5B R12 and R22 have similar distributions of variability). However, we note that it may be possible that there are changes in grid properties at time-scales greater than the recordings.
  
  (2) It is important to consider the optimal variability size. The larger the variability, the better it is for decoding. On the other hand, as the authors state in the
  
  Discussion, it is assumed that variability does not exist in the continuous attractor model. Although this study describes that it does not address how such variability fits the attractor theory, it would be better if more detailed ideas and suggestions were provided as to what direction the study could take to clarify the optimal size of variability.
  
  We appreciate this suggestion and agree that more discussion is warranted on how our results can be reconciled with previously observed attractor dynamics. To explore this, we studied the recurrent neural network (RNN) model from Sorscher et al. (2019), which develops grid responses when trained on path integration. This network has previously been found to develop signatures of toroidal topology (Sorscher et al., 2023), yet we find its grid responses also contain heterogeneity in grid properties (Fig S8). By decreasing the strength of the weight decay regularization (which leads to denser connectivity in the recurrent layer), we find an increase in the grid property variability. Interestingly, decreasing the weight decay regularization has been previously found to lead to weaker grid responses and worse ability of the RNN to perform path integration on environments larger than it was trained on. This approach not only provides preliminary evidence to our claim that too much variability can lead to weaker continuous attractor structure, but also provides a modeling framework with which future work can explore this question in more detail. We have added discussion of this issue to the manuscript text (Discussion).
  
  Reviewer #2:
  
  (1) Even though theoreticians might have gotten the mistaken impression that grid cells are highly regular, this might be due to an overemphasis on regularity in a subset of papers. Most experimentalists working with grid cells know that many if not most grid cells show high variability of firing fields within a single neuron, though this analysis focuses on between neurons. In response to this comment, the reviewers should tone down and modify their statements about what are the current assumptions of the field (and if possible provide a short supplemental section with direct quotes from various papers that have made these assumptions).
  
  We agree that some experimentalists are aware of variability in the recorded grid response patterns and that this work may not come as a complete surprise to them. We have toned down our language in the Introduction, changing “our results challenge a long-held assumption” to “our results challenge a frequently made assumption in the theoretical literature”. Additionally, we have added a caveat that “experimentalists have been aware” of the observed variability in grid properties.
  
  We would like to emphasize that the lack of work carefully examining the robustness of this variability has prevented a firm understanding of whether this is an inherent property of grid cells or due to measurement noise. The impact of this can be seen in theoretical neuroscience work where a considerable number of articles (including recent publications) start with the assumption that all grid cells within a module have identical properties, with the exception of phase shift and noise. We have now cited a number of these papers in the Introduction, to provide specific references. To further illustrate the pervasiveness of this assumption being explicitly made in theoretical neuroscience, below we provide quotes from a few important papers:
  
  “Cells with a common spatial period also share a common grid orientation; their responses differ only by spatial translations, or different preferred firing phases, with respect to their common response period” (Sreenivasan and Fiete, 2011)”
  
  “Grid cells are organized into discrete modules; within each module, the spatial scale and orientation of the grid lattice are the same, but the lattice for different cells is shifted in space.” (Stemmler et al., 2015)”
  
  “Recently, it was shown that grid cells are organized in discrete modules within which cells share the same orientation and periodicity but vary randomly in phase” (Wei et al., 2015)”
  
  “...cells within one module have receptive fields that are translated versions of one another, and different modules have firing lattices of different scales and orientations” (Dorrell et al., 2023)”
  
  In these works, this assumption is used to derive properties relating to the computational properties of grid cells (e.g., error correction, optimal scaling between grid spacings in different modules).
  
  In addition, since grid cells are assumed to be identical in the computational neuroscience community, there has been little work on quantifying how much variability a given model produces. This makes it challenging to understand how consistent different models are with our observations. This is illustrated in our analysis of a recent recurrent neural network (RNN) model of grid cells (Fig S8), which does exhibit variability.
  
  (2) The authors state that "no characterization of the degree and robustness of variability in grid properties within individual modules has been performed." It is always dangerous to speak in absolute terms about what has been done in scientific studies. It is true that few studies have had the number of grid cells necessary to make comparisons within and between modules, but many studies have clearly shown the distribution of spacing in neuronal data (e.g. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Hardcastle et al., 2015) so the variability has been visible in the data presentations. Also, most researchers in the field are well aware that highly consistent grid cells are much rarer than messy grid cells that have unevenly spaced firing fields. This doesn't hurt the importance of the paper, but they need to tone down their statements about the lack of previous awareness of variability (specific locations are noted in the specific comments).
  
  We have toned down our language in the Introduction. However, we note that our point that no detailed analysis had been done on measuring the robustness of this variability stands. Thus, for the general community, it has not been clear whether this previously observed variability is noise or a real feature of the grid code.
  
  (3) The methods section needs to have a separate subheading entitled: How grid cells were assigned to modules" that clearly describes how the grid cells were assigned to a module (i.e. was this done by Gardner et al., or done as part of this paper's post-processing?
  
  We thank the reviewer for pointing out this missing information. We have added a new subsection in the Materials and Methods section, entitled “Grid module classification” to clarify how the grid cells are assigned to modules. In short, this was done by Gardner et al. (2022) using an unsupervised clustering approach that was viewed as enabling a less biased identification of modules. We did not perform any additional processing steps on module identity.
  
  Reviewer #3:
  
  (1) One possible explanation of the dispersion in lambda (not in theta) could be variability in the typical width of the field. For a fixed spacing, wider fields might push the six fields around the center of the autocorrelogram toward the outside, depending on the details of how exactly the position of these fields is calculated. We recommend authors show that lambda does not correlate with field width, or at least that the variability explained by field width is smaller than the overall lambda variability.
  
  We agree that this option had not been carefully ruled out by our previous analyses. To tackle this question, we compute the field width of a given cell using the value at the minima of its spatial autocorrelogram (Fig S4A-B). For all cells in recording ID R12, there is a non-significant negative linear correlation between grid field width and between-cell variability (Fig S4C) . The variability explained by the width of the field is 4% of the variability, as indicated by the R<sup>2</sup> value of the linear fit. Similar results were found for all other modules (see caption of Fig S4C for statistics). Therefore, we do not think that grid field width explains spacing variability.
  
  (2) An alternative explanation could be related to what happens at the borders. The authors tackle this issue in Figure S2 but introduce a different way of measuring lambda based on three fields, which in our view is not optimal. We recommend showing that the dispersions in lambda and theta remain invariant as one removes the border-most part of the maps but estimating lambda through the autocorrelogram of the remaining part of the map. Of course, there is a limit to how much can be removed before measures of lambda and theta become very noisy.
  
  We have performed additional analysis to explore the role of borders in grid property variability. To do so, we have followed the suggestion by the reviewer and have re-analyzed grid properties from the autocorrelogram when the border-most part of the maps are removed (Fig S6A-B). For all modules, we do not see any changes in variability (computed as the standard deviation of the population distribution) for either orientation or spacing. As predicted by the reviewer, after removing about 25% of the border-most part of the environment we start seeing changes in variability, as measures of theta and lambda become noisy and computed over a smaller spatial range. This result holds for all other modules (Fig S6C-D).
  
  (3) A third possibility is slightly more tricky. Some works (for example Kropff et al, 2015) have shown that fields anticipate the rat position, so every time the rat traverses them they appear slightly displaced opposite to the direction of movement. The amount of displacement depends on the velocity. Maps that we construct out of a whole session should be deformed in a perfectly symmetric way if rats traverse fields in all directions and speeds. However, if the cell is conjunctive, we would expect a deformation mainly along the cell's preferred head direction. Since conjunctive cells have all possible preferred directions, and many grid cells are not conjunctive at all, this phenomenon could create variability in theta and lambda that is not a legitimate one but rather associated with the way we pool data to construct maps. To rule away this possibility, we recommend the authors study the variability in theta and lambda of conjunctive vs non-conjunctive grid cells. If the authors suspect that this phenomenon could explain part of their results, they should also take into account the findings of Gerlei and colleagues (2020) from the Nolan lab, that add complexity to this issue.
  
  We appreciate the reviewer pointing out the possible role conjunctive cells may play. To investigate how conjunctive cells may affect the observed grid property variability, we have performed additional analyses taking into account if the grid cells included in the study are conjunctive. Comparing within- and between-cell variability of conjunctive vs. non-conjunctive cells in recording R12, we do not see any qualitative differences for either orientation or spacing (Fig S7A-B). When excluding conjunctive cells from the between-variability comparison, we do not see any significant difference compared to when these cells are included (Fig S7C-D). As such, it does not appear that conjunctive cells are the source of variability in the population.
  
  We further note that the number of putative conjunctive cells varied across modules and recordings. For instance, in recording Q1 and Q2, Gardner et al. (2022) reported 3 (out of 97) and 1 (out of 66) conjunctive cells, respectively. Given that we see variability robustly across recordings (Fig 5), we do not believe that conjunctive cells can explain the presence of variability we observe.
  
  (4) The results in Figure 6 are correct, but we are not convinced by the argument. The fact that grid cells fire in the same way in different parts of the environment and in different environments is what gives them their appeal as a platform for path integration since displacement can be calculated independently of the location of the animal. Losing this universal platform is, in our view, too much of a price to pay when the only gain is the possibility of decoding position from a single module (or non-adjacent modules) which, as the authors discuss, is probably never the case. Besides, similar disambiguation of positions within the environment would come for free by adding to the decoding algorithm spatial cells (non-hexagonal but spatially stable), which are ubiquitous across the entorhinal cortex. Thus, it seems to us that - at least along this line of argumentation - with variability the network is losing a lot but not gaining much.
  
  We agree that losing the continuous attractor network (CAN) structure and the ability to path integrate would be a very large loss. However, we do not believe that the variability we observe necessarily destroys either the CAN or path integration. We argue this for two reasons. First, the data we analyzed [from Gardner et al. (2022)] is exactly the data set that was found to have toroidal topology and therefore viewed to be consistent with a major prediction of CANs. Thus, the amount of variability in grid properties does not rule out the underlying presence of a continuous attractor. Second, path integration may still be possible with grid cells that have variable properties. To illustrate this, we analyzed data from Sorscher et al. (2019) recurrent neural network model (RNN) that was trained explicitly on path integration, and found that the grid representations that emerged had variability in spacing and orientation (see point #6 below).
  
  (5) In Figure 4 one axis has markedly lower variability. Is this always the same axis? Can the authors comment more on this finding?
  
  We agree that in Fig 4 the first axis has lower variability. We believe that this is specific to the module R12 and does not reflect any differences in axis or bias in the methods used to compute the axis metrics. To test this, we have performed the same analyses for other modules, finding that other recordings do not exhibit the same bias. Results for the modules with the most cells are shown below (Author response image 1).
  
  Author response image 1.
  
  Grid propertied along Axis 1 are not less variable for many recorded grid modules. Same as Fig.4C-D, but for four other recorded modules. Note that the variability along each axis is similar.
  
  (6) The paper would gain in depth if maps coming out of different computational models could be analyzed in the same way.
  
  We agree with the reviewer that examining computational models using the same approach would strengthen our results and we appreciate the suggestion. To address this, we have analyzed the results from a previous normative model for grid cells [Sorscher et al., (2019)] that trained a recurrent neural network (RNN) model to perform path integration and found that units developed grid cell like responses. These models have been found to exhibit signatures of toroidal attractor dynamics [Sorscher et al. (2023)] and exhibit a diversity of responses beyond pure grid cells, making them a good starting point for understanding whether models of MEC may contain uncharacterized variability in grid properties.
  
  We find that RNN units in these normative models exhibit similar amounts of variability in grid spacing and orientation as observed in the real grid cell recordings (Fig S8A-D). This provides additional evidence that this variability may be expected from a normative framework, and that the variability does not destroy the ability to path integrate (which the RNN is explicitly trained to perform).
  
  The RNN model offers possibilities to assess what might cause this variability. While we leave a detailed investigation of this to future work, we varied the weight decay regularization hyper-parameter. This value controls how sparse the weights in the hidden recurrent layer are. Large weight decay regularization strength encourages sparser connectivity, while small weight decay regularization strength allows for denser connectivity. We find that increasing this penalty (and enforcing sparser connectivity) decreases the variability of grid properties (Fig S8E-F). This suggests that the observed variability in the Gardner et al. (2022) data set could be due to the fact that grid cells are synaptically connected to other, non-grid cells in MEC.
  
  (7) Similarly, it would be very interesting to expand the study with some other data to understand if between-cell delta_theta and delta_lambda are invariant across environments. In a related matter, is there a correlation between delta_theta (delta_lambda) for the first vs for the second half of the session? We expect there should be a significant correlation, it would be nice to show it.
  
  We agree this would be interesting to examine. For this analysis, it is essential to have a large number of grid cells, and we are not aware of other published data sets with comparable cell numbers using different environments.
  
  Using a sliding window analysis, we have characterized changes in variability with respect to the recording time (Figure S5A). To do so, we compute grid orientation and spacing over a time-window whose length is half of the total length of the recording. From the population distribution of orientation and spacing values, we compute the standard deviation as a measure of between-cell variability. We repeat the same procedure, sliding the window forward until the variability for the second half of the recording is computed.
  
  We applied this approach to recording ID R12 (the same as in Figs 2-4) given that this recording session was significantly longer than the rest (almost two hours). Results are shown in Fig S5 B-C. For both orientation and spacing, no systematic changes of variability with respect to time were observed. Similar results were found for other modules (see caption of Fig S5 for statistics).
  
  We also note that the rats were already familiarized with the environment for 10-20 sessions prior to the recordings, so there may not be further learning during the period of the grid cell recordings. No changes in variability can be seen in Rat R across days (e.g., in Fig 5B R12 and R22 have similar distributions of variability). However, we note that it may be possible that there are changes in grid properties at time-scales greater than the recordings.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.27.582373v3
www.biorxiv.org www.biorxiv.org

Hexokinase regulates Mondo-mediated longevity via the PPP and organellar dynamics

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  In this manuscript entitled "Hexokinase regulates Mondo-mediated longevity via the PPP and organellar dynamics", Laboy and colleagues investigated upstream regulators of MML-1/Mondo, a key transcription factor that regulates aging and metabolism, using the nematode C. elegans and cultured mammalian cells. By performing a targeted RNAi screen for genes encoding enzymes in glucose metabolism, the authors found that two hexokinases, HXK-1 and HXK-2, regulate nuclear localization of MML-1 in C. elegans. The authors showed that knockdown of hxk-1 and hxk-2 suppressed longevity caused by germline-deficient glp-1 mutations. The authors demonstrated that genetic or pharmacological inhibition of hexokinases decreased nuclear localization of MML-1, via promoting mitochondrial β-oxidation of fatty acids. They found that genetic inhibition of hxk-2 changed the localization of MML-1 from the nucleus to mitochondria and lipid droplets by activating pentose phosphate pathway (PPP). The authors further showed that the inhibition of PPP increased the nuclear localization of mammalian MondoA in cultured human cells under starvation conditions, suggesting the underlying mechanism is evolutionarily conserved. This paper provides compelling evidence for the mechanisms by which novel upstream metabolic pathways regulate MML-1/Mondo, a key transcription factor for longevity and glucose homeostasis, through altering organelle communications, using two different experimental systems, C. elegans and mammalian cells. This paper will be of interest to a broad range of biologists who work on aging, metabolism, and transcriptional regulation.
  
  Reviewer #2 (Public Review):
  
  Raymond Laboy et.al explored how transcriptional Mondo/Max-like complex (MML-1/MXL-2) is regulated by glucose metabolic signals using germ-line removal longevity model. They believed that MML-1/MXL-2 integrated multiple longevity pathways through nutrient sensing and therefore screened the glucose metabolic enzymes that regulated MML-1 nuclear localization. Hexokinase 1 and 2 were identified as the most vigorous regulators, which function through mitochondrial beta-oxidation and the pentose phosphate pathway (PPP), respectively. MML-1 localized to mitochondria associated with lipid droplets (LD), and MML-1 nuclear localization was correlated with LD size and metabolism. Their findings are interesting and may help us to further explore the mechanisms in multiple longevity models, however, the study is not complete and the working model remains obscure. For example, the exact metabolites that account for the direct regulation of MML-1 were not identified, and more detailed studies of the related cellular processes are needed.
  
  The identification of responsible metabolites is necessary since multiple pieces of evidence from the study suggests that lipid other than glucose metabolites may be more likely to be the direct regulator of MML-1 and HXK regulate MML-1 indirectly by affecting the lipid metabolism: 1) inhibiting the PPP is sufficient to rescue MML-1 function independent of G6P levels; 2) HXK-1 regulates MML-1 by increasing fatty acid beta-oxidation; 3) LD size correlates with MML-1 nuclear localization and LD metabolism can directly regulate MML-1. The identification of metabolites will be helpful for understanding the mechanism.
  
  Beta-oxidation and the PPP are involved in the regulation of MML-1 by HXK-1 and HXK-2, respectively. But how these two pathways participate in the regulation is not clear. Is it the beta-oxidation rate or the intermediate metabolites that matters? As for the PPP, it provides substrates for nucleotide synthesis and also its product NADPH is essential for redox balance. Is one of the metabolites or the NADPH levels involved in MML-1 regulation? More studies are needed to provide answers to these concerns.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Following are my comments that the authors may want to address to further improve this excellent paper.
  
  Major comments
  
  (1) Although the authors provided evidence that hexokinases in glucose metabolism are associated with germline-deficient glp-1(-) mutants, they did not mention why they focused on glp-1(-) mutants rather than other longevity mutants. In their previous study (Nakamura et al., 2016), they showed that MML-1 is required for multiple longevity pathways in C. elegans, including reduced mitochondrial respiration and insulin/IGF-1 signaling. Please discuss why the authors focused on glp-1(-) mutants in this paper. It will be even better if the authors test the roles of hexokinases in some other longevity regimens.
  
  Many thanks for this astute comment. Previously we had shown that mml-1 is required for glp-1, daf-2, and isp-1 longevity, and Johnson et al. had shown a requirement for eat-2, hence the idea that MML-1 is a convergent transcription factor. We first focused on glp-1 because that was the starting point of our screen, and the result was clear and simple: hexokinases regulate MML‑1 nuclear localization and activity in glp-1 and are required for longevity. Naturally, the question arises: do hexokinases behave like MML-1 as convergent longevity regulators across pathways? To address this, we examined the interaction of hxk-1 and hxk-2 with isp-1, daf-2, and raga-1. Specifically, we now show that:
  
  A. Like glp-1(e2141) mutants, isp-1(qm150) mutants stimulate MML-1 nuclear localization, and the hexokinases are required for isp-1 longevity (Figure 1G-H).
  
  B. daf-2(e1370) mutants do not further stimulate MML-1 nuclear localization beyond basal levels, yet MML-1 is strongly required for daf-2 longevity (Nakamura et al., 2016, Supplementary Figure 1L-M). However, the hexokinases are not required for daf-2 longevity (Supplementary Figure 1M), suggesting that the signaling pathway is wired differently in daf-2, and that other pathways regulate MML-1 activity.
  
  C. raga-1(ok701) mutants stimulate MML-1 nuclear localization and mml-1 is required for raga-1 longevity, suggesting that MML-1 acts downstream of TORC1 signaling (Supplementary Figure 1N-O). However, hexokinases are not required for raga-1 longevity, suggesting that raga-1 acts downstream or parallel to hexokinase signaling (Supplementary Figure 1P).
  
  D. We performed untargeted metabolomics in glp-1, daf-2, and mml-1 single and double mutants and observed that hexose phosphates, which have been shown to regulate MML-1 human homologs MondoA/ChREBP, were differentially regulated between mutants.
  
  Author response image 1.
  
  E. Altogether these experiments reveal that though MML-1 promotes longevity in most pathways, the hexokinases are only required in some (glp-1, isp-1), but not others (raga-1, daf-2). Furthermore, strong MML-1 nuclear localization is often but not always associated with longevity (e.g. daf-2), and the wiring of the signaling pathway is different for various longevity regimens. Consistently, mTOR and Insulin signaling are more functionally linked and therefore may show a more similar genetic profile. Differences in hexose phosphate between glp-1 and daf-2 could explain why MML-1 requires hexokinase function in glp-1 to promote longevity but not in daf-2. However, considerably more work is required to rigorously validate this hypothesis.
  
  (2) In figure 5, the authors investigated whether the association between PPP and MML‑1/MondoA, tested in C. elegans, is conserved in mammals under starvation conditions. The authors should clarify why they tested the MondoA localization upon starvation in cultured human cells. This comment is related to my comment #1 as the authors could determine the roles of hexokinases under dietary restriction (DR)-conditions or in DR-mimetic in eat-2(-) mutants.
  
  In this case, the actual translatability to a worm longevity pathway was not our goal. Rather, we examined MondoA in cell culture under contrasting conditions of MondoA subcellular localization, where high glucose media had cytosolic/nuclear localization and starvation conditions cytosolic localization. We then showed that similar to our data in worms, PPP inhibition with 6-AN induced MondoA nuclear localization and activity. We now mention this rationale in the results section, lines 352-356.
  
  (3) In figure 2, the authors showed that HXK-2 regulates mitochondrial localization of MML-1, and HXK-1 regulates nuclear localization of MML-1 through mitochondrial β-oxidation in glp‑1(-) mutants. Can the authors test whether mitochondrial β-oxidation affects the effects of hxk RNAi on longevity of glp-1(-) mutants?
  
  Excellent suggestion. We tried to test this idea and found that acs-2 RNAi alone abolished glp-1 longevity, making epistasis experiments difficult to interpret. This is consistent with published data showing that glp-1 longevity requires NHR-49, a transcription factor that regulates mitochondrial b‑oxidation, that drives acs-2 expression (Ratnappan et al., 2014). It could well be that b‑oxidation inhibition promotes MML-1 nuclear localization but abolishes lifespan extension because of epistatic effects on other transcription factors or processes. Further investigation would be required to elucidate the exact mechanism that goes beyond the scope of the paper.
  
  (4) The authors showed that 2-deoxy-glucose, which decreases the activity of HXK, decreased the nuclear localization of MML-1, and this is consistent with their genetic data. Based on these data, 2-deoxy-glucose is expected to decrease longevity. Interestingly, however, 2-deoxy-glucose has been reported to increase lifespan by restricting glucose, whereas extra glucose intake decreases lifespan in C. elegans, shown by multiple research groups, including M. Ristow, C. Kenyon, and S.J.V. Lee labs. This is seemingly paradoxical and worth discussing with key references, especially because MondoA and Chrebp are known as glucose-responsive transcription factors.
  
  Thank you for this important comment. 2-DG has been shown to extend lifespan by suppressing glucose metabolism at concentrations ranging from 0.1 to 5 mM, higher concentrations ranging from 20 to 50 mM had the opposite effect decreasing lifespan (Schulz et al., 2007). The concentration we tested was 50 mM 2-DG and observed decreased MML-1 nuclear localization, which is consistent with the previous data showing decreased longevity. We now raise this point in the discussion suggesting that mild inhibition of glucose metabolism has beneficial effects on longevity, while strong suppression causes a shortening of the lifespan (lines 411-414).
  
  Minor comments
  
  (1) The current Introduction does not include the explicit statement about that MML-1 and MondoA are homologs. Please clarify this as naive readers may be confused.
  
  Thank you for pointing this out. We now say in the intro that MondoA and MML-1 are homologs (lines 59-60).
  
  (2) In figure 1, the effects of hxk-3 on nuclear localization of MML-1 is small compared to those of hxk-1 and hxk-2. Please add speculation about why HXK-3 has different roles in nuclear localization of MML-1 compared to HXK-1 and HXK-2.
  
  According to GExplore 1.4 (Hutter & Suh, 2016), hxk-3 expression declines during larval development and is low expressed in the adult. Perhaps it has little effect in the young adult, and the other hexokinases suffice to support MML-1 nuclear localization. It also remains possible that hxk-3 is not required in glp-1, but required in other longevity pathways.
  
  (3) The authors tested the effects of genetic inhibition of hxk-1 and hxk-2 on the regulation of MML-1 localization and lifespan of glp-1(-) mutants by using RNAi. I wonder whether the authors can perform the experiments with hxk-1 or hxk-2 loss (or reduction) of function mutants. If they cannot, please discuss the reason and the limitations of RNAi.
  
  This is an important point raised by the reviewer. We found that RNAi was most effective for phenotypes related to MML-1 nuclear localization and longevity, likely because it results in acute knockdown. We also showed that pharmacological inhibition of hexokinase function with 3BrP and 2‑DG (Supplementary Figure 1B and 1C) and the PPP with 6-AN (Figure 3B) had consistent results with our observation with RNAi.
  
  We generated hexokinase KO mutants by deleting the coding sequence of each hexokinase by CRISPR/Cas9. First, we measured the expression of each hexokinase isozyme in each mutant. Notably, hxk-1(syb1271) null mutant had higher expression of hxk-2 and hxk-3, hxk-2(syb1261) did not significantly affect the expression of hxk-1 and hxk-3, and hxk-3(syb1267) had a mild increase in hxk-2 expression. We followed up on the hxk-1(syb1271) and hxk-2(syb1261) and crossed these mutants with our MML-1::GFP reporter. We observed a modest but significant reduction in MML-1 nuclear localization in both strains. The effect with RNAi is much stronger in comparison to the null mutants, potentially due to a compensatory upregulation of the other hexokinases in the mutants that we do not observe with RNAi (Supplementary Figure 1D-E). Another alternative is that there is a threshold in the effects of hexokinase function on MML-1 nuclear localization. We tried to generate a hxk-1; hxk-2 double mutant but it was lethal and therefore did not pursue this further.
  
  Author response image 2.
  
  (4) Please correct minor typos throughout the manuscript. Following are some examples. <br /> - On page 4, line 111, please correct "Supplementary Figure D-E" to "Supplementary Figure 1D-E".
  
  - On page 9, line 272, please correct "3A-B" to "4A-B".
  
  - On page 9, line 275, please correct "S4" to "4".
  
  - On page 10, line 309, please correct "4A" to "4B"
  
  Corrected.
  
  (5) In Fig. 3E, please add the information about the scale bars in figure legends.
  
  Corrected.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Here are some detailed suggestions for the authors:
  
  (1) Since MML-1/MXL-2 complex functions in multiple longevity models, e.g. DR, ILS, what are the roles of HXK-1 and HXK-2 in these models?
  
  We now show that although mml-1 is required in most longevity pathways, hxk-1 and hxk-2 are required in some pathways (glp-1, isp-1) but not others (daf-2, raga-1). See above for more details.
  
  (2) As for the metabolites screening, the lipid metabolic genes can be included. Not only for the above reasons, also previous study had found that the mml-1 mRNA levels and MML-1 GFP nuclear localization were all increased in the glp-1 model, while mml-1 mRNA levels were unaffected by hxk knockdown, suggesting more pathways be involved.
  
  We agree with the reviewer that understanding what metabolites regulate MML-1 nuclear localization and activity is an important, yet challenging question. Our studies demonstrate a role of glucose metabolism, in particular, hexokinase in this process, consistent with hexose-p being activators of MondoA. Our data also suggest mechanisms beyond hexose-p regulate MML-1, since knockdown of the PPP components stimulates MML-1 even when hxk-2 is depleted and low G6P, and inhibition of the PPP with 6-AN stimulates MondoA nuclear localization under starvation conditions in mammalian cell culture. We tested redox regulation, nucleoside, and lipid metabolism as candidate processes (see below). Notably, our data suggest this other mechanism is tied to lipid metabolism through droplet size since various perturbations that impact LD size and number (atgl-1, dgat-2, tkt-1, Figure 4) affected MML-1 nuclear localization. It remains an open question whether MML-1 is regulated by other metabolites through a ligand-protein interaction or not. We cannot exclude that beyond lipid droplet regulation, specific lipids, other metabolites, or metabolic modules linked to the PPP might regulate MML-1 nuclear localization and activity.
  
  We employed genetic manipulation and pharmacological inhibition to understand the upstream signals that regulate MML-1. These approaches will not be sufficient to determine whether other metabolite(s) are involved in MML-1/MondoA translocation to the nucleus through a direct interaction. Novel technologies that determine protein-metabolite interactions (e.g. MIDAS) will help us answer this question in future work, and go beyond the scope of this paper. As a compromise, we discuss possible metabolites that may orchestrate this based on our observations based on MML‑1 subcellular localization at LD/mitochondria (including PPP and TCA cycle intermediates).
  
  (3) Line 238, it should be "NADPH".
  
  Corrected.
  
  (4) RNAi targeting enzymes of different branches of PPP can be performed
  
  In our initial screen, we examined the effect of various enzymes of the PPP on MML-1 nuclear localization (Figure 1A, Supplementary Table S1) and found that knockdown of enzymes in both the oxidative phase (PGDH/T25B9.9) and non-oxidative phase (transketolase/TKT-1) affect MML-1 nuclear localization. In line, 6-AN treatment, which affects the oxidative phase, also stimulated MML‑1 nuclear localization (Figure 3B). We also observed that knockdown of enzymes involved in ribose 5P conversion to ribose, ribose 1P, and phosphoribosyl pyrophosphate, an intermediate in nucleotide biosynthesis, decreased MML-1 nuclear localization (rpia-1, F07A11._5, _Y43F4B.5, _R151._2; Supplementary Table S1). Whether MML‑1/MondoA responds to nucleotide pool remains elusive.
  
  (5) As for PPP, these are many possibilities that can be tested. For example, as PPP supplies NADPH for oxidative balance, does MML-1 respond to ROS? Also, it appears the genes in the non-oxidative arm of PPP regulate MML-1, so is nucleotide synthesis involved?
  
  Thank you for the suggestion. We tested other enzymes involved in NADPH production from the folate cycle and observed a mild but significant reduction of MML-1 nuclear localization upon dao-3i (Supplementary Table S1). Moreover, we tested whether MML-1 nuclear localization is responsive to ROS. While paraquat exposure induced oxidative stress by measuring the transcriptional reporter gst‑4p::GFP (Supplementary Figure 3A), paraquat exposure did not significantly affect MML-1 nuclear localization (Supplementary Figure 3B). Therefore we think it less likely that NADPH production acting through redox regulation is the main effect.
  
  We also tried supplementation with some of the metabolite outputs of PPP including ribose, ribulose, and xylulose, as well as nucleosides (see below), but saw no effect on MML-1 nuclear localization. We agree that further studies are required to pinpoint whether there is another metabolic moiety regulating MML-1 at the protein-ligand level, but this goes beyond the scope of the current investigation.
  
  Author response image 2.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.14.544948v2
www.biorxiv.org www.biorxiv.org

A syngeneic spontaneous zebrafish model of tp53-deficient, EGFRvIII, and PI3KCA H1047R-driven glioblastoma reveals inhibitory roles for inflammation during tumor initiation and relapse in vivo

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  “EGFRvIII is mainly associated with the classical subtype, so the mesenchymal subtype might be unexpected here. This could be commented on.”
  
  We acknowledge that EGFRvIII is most often associated with the classical subtype of glioblastoma and agree that mesenchymal subtype classification may be unexpected given the use of her4.1:EGFRvIII as a driver in our model. We would like to highlight the fact that our brain tumors do also express certain markers associated with the classical subtype including neural precursor and neural stem cell markers like sox2, ascl1b, and gli2 (Supplementary Fig 4, 5; Supplementary Table 1-3). However, our transcriptomic data was not found to significantly enrich for classical subtype gene expression, compared to normal brains. This could be due to a significant contribution of normal brain tissue to our analyses (bulk tumor burdened brains were harvested for RNA sequencing), as well as the significant contribution of mesenchymal subtype signatures and/or inflammatory gene expression in our brain tumor-positive samples. Because signatures associated with inflammation consist of some of the most highly upregulated genes in our samples, this could potentially dilute out and/or lessen alterative subtype and/or signature gene expression. Importantly, it is now widely appreciated that patient tumors simultaneously consist of heterogenous tumor cells reflecting multiple molecular subtypes (Couturier et al., 2020; Darmanis et al., 2017; Neftel et al., 2019), providing glioblastoma with a high level of phenotypic plasticity. We also demonstrate that the contribution of additional drivers not always present with EGFRvIII in patient glioblastoma enhances primary brain tumors in vivo. This result is consistent with more aggressive glioblastomas seen in patients with EGFRvIII variants and TP53 loss-of-function mutations (Ruano et al., 2009). It will therefore be interesting in the future to consider how single or multiple driver mutations contribute to subtype-specific gene expression in our model, as well as histopathology, relative to patients. We have included some of these discussion points to our revised manuscript.
  
  “Some more histologic characterization of the tumors would be helpful. Are they invasive, do larger tumors show necrosis and microvascular proliferation? This would help with understanding the full potential of the new model.”
  
  We have updated our manuscript to include more histolopathological characterization and images (Supplementary Fig 2).
  
  “Current thinking in established glioblastoma is that the M1/M2 designations for macrophages are not relevant, with microglia macrophage populations showing a mixture of pre- and anti-inflammatory features. Ideally, there would be a much more detailed characterization of the intratumoral microglia/macrophage population here, as single markers can’t be relied upon.”
  
  We performed additional gene set enrichment analyses (GSEA) using our sequencing datasets and compared p53EPS gene expression to M1/M2 macrophage expression signatures and expression signatures from MCSF-stimulated macrophages at early and late (M2 polarized) time-points. From this analysis, we detected enrichment for markers of both pro- and antiinflammatory features, however, with stronger and significant enrichment for gene expression signatures associated with classical pro-inflammatory M1 macrophages. We have included these GSEA plots and gene set enrichment lists as supplementary materials (Supplementary Fig 6, Supplementary Table 6). We also performed GSEA against a broad curated set of immunologic gene sets (C7: immunologic signature gene sets, Molecular Signatures Database, (Liberzon et al., 2011)) and have included the list of signatures and enrichment scores as a supplementary table (Supplementary Table 6).
  
  “Phagocytosis could have anti-tumor effects through removal of live cancer cells or could be cancer-promoting if apoptotic cells are being rapidly cleared with concomitant activation of an immunosuppressive phenotype in the phagocytes (ie. efferocytosis).”
  
  We looked at efferocytosis-associated gene expression in our sequencing dataset (124 “efferocytosis” genes, GeneCards), and while we detected upregulation of certain genes associated with efferocytosis in p53EPS brains, we did not detect significant enrichment for the entire gene set. Furthermore, we did not detect up-regulation of key efferocytosis receptors including Axl and Tyro3 (Supplementary Table 1, 2), compared to normal brains. While efferocytosis may contribute to tumor growth and evolution, this GSEA combined with our functional data supporting an inhibitory role for phagocytes in p53EPS tumor initiation and engraftment following transplantation (Fig 4, Fig 5, Supplementary Fig 7), suggests that efferocytosis is not a major driver of tumor formation in our model. However, how efferocytosis affects tumor progression in our model and/or relapse following therapy will be an interesting feature to explore in the future using temporal manipulations of phagocytes and/or treatments with chemical inhibitors.
  
  Author response image 1.
  
  Gene Set Enrichment Analysis (GSEA) for efferocytosis-associated gene expression (124 “efferocytosis” genes in GeneCards) in tp53EPS tumor brains, compared to normal zebrafish brains. Normalized enrichment score (NES) and p-value are indicated.
  
  “Do the irf7/8 and chlodronate experiments distinguish between effects on microglia/macrophages and dendritic cells?”
  
  In addition to microglia/macrophages, the IRF8 transcription factor has been shown to control survival and function of dendritic cells (Sichien et al., 2016). Chlodronate treatments are also used to deplete both macrophages and dendritic cells in vivo. Therefore, we cannot distinguish the effects of these manipulations in our experiments and have updated our manuscript throughout to reflect this.
  
  Reviewer #2:
  
  “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. It would be important to include a wild-type or uninjected control for the pERK and pAKT staining shown in Fig1 I-K to aid in the interpretation of these results. Likewise, quantification of the pERK and pAKT staining would be useful to demonstrate the increase over WT, and would also serve to facilitate comparison with the similar staining in the KPG model (Supp Fig 2D).”
  
  We have updated Fig 1 and Supplementary Fig 3D (formerly Fig 2D), to include histology from tumor-free uninjected control animals, as well as quantifications of p-ERK and p-AKT staining to highlight increased MAPK/AKT signaling pathway activation in our tumor model.
  
  “The authors use a transplantation assay to further test the tumorigenic potential of dissociated cells from glial-derived tumors. Listing the percentage of transplants that generate fluorescent tumor would be helpful to fully interpret these data. Additionally, it was not clear based on the description in the results section that the transplantation assay was an “experimental surrogate” to model the relapse potential of the tumor cell. This is first mentioned in the discussion. The authors may consider adding a sentence for clarity earlier in the manuscript as it helps the reader better understand the logic of the assay.”
  
  We have clarified in the text the percentage of transplants that generated fluorescent tumor (1625%, n=3 independent screens). This is also represented in Fig 5C,D. We also added text when introducing the transplantation assay, explaining that transplantation is frequently used as an experimental surrogate to assess relapse potential, and that our objective was to assess tumor cell propagation in the context of specific manipulations within the TME.
  
  “The authors nicely show high levels of immune cell infiltration and associations between microglia/macrophages and tumor cells. However, a quantification of the emergence of macrophages over time in relation to tumor initiation and growth would provide significant support to the observations of tumor suppressive activity of the phagocytes. Along these lines, the inclusion of a statement about when leukocytes emerge during normal development would be informative for those not familiar with the zebrafish model.”
  
  In zebrafish, microglia colonize the neural retina by 48 hpf, and the optic tectum by 84 hpf (Herbomel et al., 2001), prior to when we typically observe lesions in our p53EPS brains. To validate the emergence of microglia prior to tumor formation in p53EPS, we have now used live confocal imaging through the brains of uninjected control and p53EPS injected zebrafish at 5, 7 and 9 dpf. As expected, microglia were present throughout the cephalic region and in the brain at 5 dpf (120 hpf). At this stage, p53EPS injected zebrafish brains displayed mosaic cellular expression of her4.1:mScarlet; however, cells were sparse and diffuse, and no large intensely fluorescent tumor-like clusters were detected at this stage (n=12/12 tumor negative). At 7 dpf, microglia were observed in the brains of control and p53EPS zebrafish; however, at this stage we detected clusters of her4.1:mScarlet+ cells (n=5/9), indicative of tumor formation. Lesions were found to be surrounded and/or infiltrated by mpeg:_EGFP+ microglia. Finally, at 9 dpf _her4.1:mScarlet+ expression became highly specific to tumor lesions, and these lesions were associated with _mpeg:_EGFP+ microglia/macrophages (n=8/8 of tumor-positive zebrafish). These descriptions along with representative images has been added to Figure 3.
  
  “From the data provided in Figure 4G and Supp Fig 7b, the authors suggest that “increased p53EPS tumor initiation following Irf gene knock-down is a consequence of irf7 and irf8 loss-of-function in the TME.” Given the importance of the local microenvironment highlighted in this study, spatial information on the form of in situ hybridization to identify the relevant location of the expression change would be important to support this conclusion.”
  
  We performed fluorescent in situ hybridization (using HCR RNA-FISH, Molecular Instruments) on whole mount control and irf7 CRISPR-injected p53EPG animals (her4.1:EGFRvIII +her4.1:PI3KCAH1047R + her4.1:GFP, GFP was used in this case because of probe availability).
  
  Representative confocal projections through tumors, as well as single optical sections are presented and discussed in Figure 4, highlighting the location of irf7 expression change following gene knock-down. We found significant irf7 signal in and surrounding p53EPS tumors at early stages of tumor formation_. This expression was reduced and/or lost following _irf7 CRISPR gene targeting, consistent with RT-PCR data (Supplementary Fig 7).
  
  “The authors used neutral red staining that labels lysosomal-rich phagocytes to assess enrichment at the early stages of tumor initiation. The images in Figure 3 panel A should be labeled to denote the uninjected controls to aid in the interpretation of the data. In Supplemental Figure 6, the neutral red staining in the irf8 CRISPR-injected larvae looks to be increased, counter to the quantification. Can the authors comment if the image is perhaps not representative?”
  
  We have updated Figure 3 and Supplementary Figure 6 to aid in the interpretation of our results. In Fig 3A, we used tumor-negative controls from our injected cohorts. This was done to control for exogenous transgene presence and/or over-expression prior to (or in the absence of) malignant transformation. In Supplementary Fig 6, our images are representative, but we have now used unprocessed images with arrowheads to highlight neutral-red positive foci for clarity. In our original manuscript the images contained software generated markers, which could have obscured and/or confused the neutral red staining we were trying the highlight.
  
  Recommendations For the Authors:
  
  Reviewer #1:
  
  “The PI 3-kinase does a lot more than just activating mTOR and Akt – I would suggest modifying that sentence in the introduction.”
  
  We have adjusted text in the introduction to reflect the broad role for PI3K signaling.
  
  Reviewer #2:
  
  “In Supplemental Fig 1, it would be helpful for the authors to provide a co-stain, such as DAPI to label all nuclei, which would allow the reader to assess the morphology of the cells in the context of the surrounding tissue.”
  
  We have included brightfield images in Supplementary Fig 1, that together with her4.1:mScarlet fluorescence, should help readers assess tumor location and morphology in the context of surrounding tissue. Tumor cell morphology at high-resolution can be visualized in Fig 3, Movie 1 and Movie 2.
  
  “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. The authors may consider testing if the addition of an inhibitor of MAPK signaling may prevent or decrease the formation of glial-derived tumors in this context to further support their results.”
  
  To further assess the role for MAPK activation, we decided to test the effect of 50uM AZD6244 MAPK inhibitor following transplantation of dissociated primary p53EPS cells into syngeneic CG1 strain zebrafish embryos, similar to as previously described (Modzelewska et al., 2016). Following 5 days of drug treatments, we did not detect significant differences in tumor engraftment or in tumor size between DMSO control and AZD6244-treated cohorts, suggesting that MAPK inhibition is not sufficient to prevent p53EPS engraftment and growth in our model. In the future, assessments of on-target drug effects, possible resistance mechanisms, and/or testing MAPK inhibitors in combination with other targeted agents including Akt and/or mTOR inhibitors (Edwards et al., 2006; McNeill et al., 2017; Schreck et al., 2020) will enhance our understanding of potential therapeutic strategies.
  
  Author response image 2.
  
  Dorsal views of 8 dpf zebrafish larvae engrafted with her4.1:mScarlet+ p53EPS tumor cells following treatment from 3-8dpf with 0.1% DMSO (control) or 50uM AZD6244. Tumor cell injections were performed at 2 dpf into syngeneic CG1 strain embryos. The percentage of total animals with persisting engraftment following drug treatments, as well as tumor size (microns squared, quantified using Carl Zeiss ZEN software) are shown for control and AZD6244 treated larvae.
  
  “Have the authors tested if EGFR and PI3KCA driven by other neural promoters produce similar results, or not? This would help support the specificity of her4.1 neural progenitors and glia as the cell of origin in this model.”
  
  At this time, we have not tested other neural promoters. However, previous reports describe a zebrafish zic4-driven glioblastoma model with mesenchymal-like gene expression (Mayrhofer et al., 2017), supporting neural progenitors as a cell of origin. In the future it will be interesting to test sox2, nestin, and gfap promoters to further define and support her4.1-expressing neural progenitors and glia as the cell of origin in our model.
  
  “Other leukocyte populations, such as neutrophils, can also respond to inflammatory cues. Can the authors comment if neutrophils are also observed in the TME?”
  
  We performed initial assessments of neutrophils in the TME using our expression datasets as well as her4.1:EGFRvIII + her4.1:PI3KCAH1047R co-injection into Tg(mpx:EGFP) strain zebrafish. We observed tumor formation without significant infiltration of mpx:EGFP+ neutrophils. Future investigations will be important to assess differences in the contributions of different myeloidderived lineages in the TME of p53EPS, as well as how heterogeneity may be altered depending on different oncogenic drivers and/or stage of tumor progression, as seen in human glioblastoma (Friedmann-Morvinski and Hambardzumyan, 2023). We have added text in the disscussion section of our manuscript to indicate the possibility of neutrophils and/or other immune cell types contributing to p53EPS tumor biology.
  
  Author response image 3.
  
  Control-injected tumornegative and tumor-positive Tg(mpx:EGFP) zebrafish at 10 dpf. Tg(mpx:EGFP) strain embryos were injected at the one-cell stage with her4.1:EGFRvIII + her4.1:PI3KCAH1047R + her4.1:mScarlet.
  
  “It is not clear if the transcriptomics data has been deposited in a publicly available database, such as the Gene Expression Omnibus (GEO). Sharing of these data would be a benefit to the field and facilitate use in other studies.”
  
  We have uploaded all transcriptomic data to GEO under accession GSE246295.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.17.562653v2
www.biorxiv.org www.biorxiv.org

New submission 27/11/2023, 09:12:32

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank all three Reviewers for their comments and have revised the manuscript accordingly.
  
  Reviewer #1 (Public Review):
  
  The main objective of this paper is to report the development of a new intramuscular probe that the authors have named Myomatrix arrays. The goal of the Myomatrix probe is to significantly advance the current technological ability to record the motor output of the nervous system, namely fine-wire electromyography (EMG). Myomatrix arrays aim to provide large-scale recordings of multiple motor units in awake animals under dynamic conditions without undue movement artifacts and maintain long-term stability of chronically implanted probes. Animal motor behavior occurs through muscle contraction, and the ultimate neural output in vertebrates is at the scale of motor units, which are bundles of muscle fibers (muscle cells) that are innervated by a single motor neuron. The authors have combined multiple advanced manufacturing techniques, including lithography, to fabricate large and dense electrode arrays with mechanical features such as barbs and suture methods that would stabilize the probe's location within the muscle without creating undue wiring burden or tissue trauma. Importantly, the fabrication process they have developed allows for rapid iteration from design conception to a physical device, which allows for design optimization of the probes for specific muscle locations and organisms. The electrical output of these arrays is processed through a variety of means to try to identify single motor unit activity. At the simplest, the approach is to use thresholds to identify motor unit activity. Of intermediate data analysis complexity is the use of principal component analysis (PCA, a linear second-order regression technique) to disambiguate individual motor units from the wide field recordings of the arrays, which benefits from the density and numerous recording electrodes. At the highest complexity, they use spike sorting techniques that were developed for Neuropixels, a large-scale electrophysiology probe for cortical neural recordings. Specifically, they use an estimation code called kilosort, which ultimately relies on clustering techniques to separate the multi-electrode recordings into individual spike waveforms.
  
  The biggest strength of this work is the design and implementation of the hardware technology. It is undoubtedly a major leap forward in our ability to record the electrical activity of motor units. The myomatrix arrays trounce fine-wire EMGs when it comes to the quality of recordings, the number of simultaneous channels that can be recorded, their long-term stability, and resistance to movement artifacts.
  
  The primary weakness of this work is its reliance on kilosort in circumstances where most of the channels end up picking up the signal from multiple motor units. As the authors quite convincingly show, this setting is a major weakness for fine-wire EMG. They argue that the myomatrix array succeeds in isolating individual motor unit waveforms even in that challenging setting through the application of kilosort.
  
  Although the authors call the estimated signals as well-isolated waveforms, there is no independent evidence of the accuracy of the spike sorting algorithm. The additional step (spike sorting algorithms like kilosort) to estimate individual motor unit spikes is the part of the work in question. Although the estimation algorithms may be standard practice, the large number of heuristic parameters associated with the estimation procedure are currently tuned for cortical recordings to estimate neural spikes. Even within the limited context of Neuropixels, for which kilosort has been extensively tested, basic questions like issues of observability, linear or nonlinear, remain open. By observability, I mean in the mathematical sense of well-posedness or conditioning of the inverse problem of estimating single motor unit spikes given multi-channel recordings of the summation of multiple motor units. This disambiguation is not always possible. kilosort's validation relies on a forward simulation of the spike field generation, which is then truth-tested against the sorting algorithm. The empirical evidence is that kilosort does better than other algorithms for the test simulations that were performed in the context of cortical recordings using the Neuropixels probe. But this work has adopted kilosort without comparable truth-tests to build some confidence in the application of kilosort with myomatrix arrays.
  
  Kilosort was developed to analyze spikes from neurons rather than motor units and, as Reviewer #1 correctly points out, despite a number of prior validation studies the conditions under which Kilosort accurately identifies individual neurons are still incompletely understood. Our application of Kilosort to motor unit data therefore demands that we explain which of Kilosort’s assumptions do and do not hold for motor unit data and explain how our modifications of the Kilosort pipeline to account for important differences between neural and muscle recording, which we summarize below and have included in the revised manuscript.
  
  Additionally, both here and in the revised paper we emphasize that while the presented spike sorting methods (thresholding, PCA-based clustering, and Kilosort) robustly extract motor unit waveforms, spike sorting of motor units is still an ongoing project. Our future work will further elaborate how differences between cortical and motor unit data should inform approaches to spike sorting as well as develop simulated motor unit datasets that can be used to benchmark spike sorting methods.
  
  For our current revision, we have added detailed discussion (see “Data analysis: spike sorting”) of the risks and benefits of our use of Kilosort to analyze motor unit data, in each case clarifying how we have modified the Kilosort code with these issues in mind:
  
  “Modification of spatial masking: Individual motor units contain multiple muscle fibers (each of which is typically larger than a neuron’s soma), and motor unit waveforms can often be recorded across spatially distant electrode contacts as the waveforms propagate along muscle fibers. In contrast, Kilosort - optimized for the much more local signals recorded from neurons - uses spatial masking to penalize templates that are spread widely across the electrode array. Our modifications to Kilosort therefore include ensuring that Kilosort search for motor unit templates across all (and only) the electrode channels inserted into a given muscle. In this Github repository linked above, this is accomplished by setting parameter nops.sigmaMask to infinity, which effectively eliminates spatial masking in the analysis of the 32 unipolar channels recorded from the injectable Myomatrix array schematized in Supplemental Figure 1g. In cases including chronic recording from mice where only a single 8-contact thread is inserted into each muscle, a similar modification can be achieved with a finite value of nops.sigmaMask by setting parameter NchanNear, which represents the number of nearby EMG channels to be included in each cluster, to equal the number of unipolar or bipolar data channels recorded from each thread. Finally, note that in all cases Kilosort parameter NchanNearUp (which defines the maximum number of channels across which spike templates can appear) must be reset to be equal to or less than the total number of Myomatrix data channels.”
  
  “Allowing more complex spike waveforms: We also modified Kilosort to account for the greater duration and complexity (relative to neural spikes) of many motor unit waveforms. In the code repository linked above, Kilosort 2.5 was modified to allow longer spike templates (151 samples instead of 61), more spatiotemporal PCs for spikes (12 instead of 6), and more left/right eigenvector pairs for spike template construction (6 pairs instead of 3). These modifications were crucial for improving sorting performance in the nonhuman primate dataset shown in Figure 3, and in a subset of the rodent datasets (although they were not used in the analysis of mouse data shown in Fig. 1 and Supplemental Fig. 2a-f).”
  
  Furthermore, as the paper on the latest version of kilosort, namely v4, discusses, differences in the clustering algorithm is the likely reason for kilosort4 performing more robustly than kilosort2.5 (used in the myomatrix paper). Given such dependence on details of the implementation and the use of an older kilosort version in this paper, the evidence that the myomatrix arrays truly record individual motor units under all the types of data obtained is under question.
  
  We chose to modify Kilosort 2.5, which has been used by many research groups to sort spike features, rather than the just-released Kilosort 4.0. Although future studies might directly compare the performance of these two versions on sorting motor unit data, we feel that such an analysis is beyond the scope of this paper, which aims primarily to introduce our electrode technology and demonstrate that a wide range of sorting methods (thresholding, PCA-based waveform clustering, and Kilosort) can all be used to extract single motor units. Additionally, note that because we have made several significant modifications to Kilosort 2.5 as described above, it is not clear what a “direct” comparison between different Kilosort versions would mean, since the procedures we provide here are no longer identical to version 2.5.
  
  There is an older paper with a similar goal to use multi-channel recording to perform sourcelocalization that the authors have failed to discuss. Given the striking similarity of goals and the divergence of approaches (the older paper uses a surface electrode array), it is important to know the relationship of the myomatrix array to the previous work. Like myomatrix arrays, the previous work also derives inspiration from cortical recordings, in that case it uses the approach of source localization in large-scale EEG recordings using skull caps, but applies it to surface EMG arrays. Ref: van den Doel, K., Ascher, U. M., & Pai, D. K. (2008). Computed myography: three-dimensional reconstruction of motor functions from surface EMG data. Inverse Problems, 24(6), 065010.
  
  We thank the Reviewer for pointing out this important prior work, which we now cite and discuss in the revised manuscript under “Data analysis: spike sorting” [lines 318-333]:
  
  “Our approach to spike sorting shares the same ultimate goal as prior work using skin-surface electrode arrays to isolate signals from individual motor units but pursues this goal using different hardware and analysis approaches. A number of groups have developed algorithms for reconstructing the spatial location and spike times of active motor units (Negro et al. 2016; van den Doel, Ascher, and Pai 2008) based on skin-surface recordings, in many cases drawing inspiration from earlier efforts to localize cortical activity using EEG recordings from the scalp (Michel et al. 2004). Our approach differs substantially. In Myomatrix arrays, the close electrode spacing and very close proximity of the contacts to muscle fibers ensure that each Myomatrix channel records from a much smaller volume of tissue than skin-surface arrays. This difference in recording volume in turn creates different challenges for motor unit isolation: compared to skin-surface recordings, Myomatrix recordings include a smaller number of motor units represented on each recording channel, with individual motor units appearing on a smaller fraction of the sensors than typical in a skin-surface recording. Because of this sensordependent difference in motor unit source mixing, different analysis approaches are required for each type of dataset. Specifically, skin-surface EMG analysis methods typically use source-separation approaches that assume that each sensor receives input from most or all of the individual sources within the muscle as is presumably the case in the data. In contrast, the much sparser recordings from Myomatrix are better decomposed using methods like Kilosort, which are designed to extract waveforms that appear only on a small, spatially-restricted subset of recording channels.”
  
  The incompleteness of the evidence that the myomatrix array truly measures individual motor units is limited to the setting where multiple motor units have similar magnitude of signal in most of the channels. In the simpler data setting where one motor dominates in some channel (this seems to occur with some regularity), the myomatrix array is a major advance in our ability to understand the motor output of the nervous system. The paper is a trove of innovations in manufacturing technique, array design, suture and other fixation devices for long-term signal stability, and customization for different muscle sizes, locations, and organisms. The technology presented here is likely to achieve rapid adoption in multiple groups that study motor behavior, and would probably lead to new insights into the spatiotemporal distribution of the motor output under more naturally behaving animals than is the current state of the field.
  
  We thank the Reviewer for this positive evaluation and for the critical comments above.
  
  Reviewer #2 (Public Review):
  
  Motoneurons constitute the final common pathway linking central impulse traffic to behavior, and neurophysiology faces an urgent need for methods to record their activity at high resolution and scale in intact animals during natural movement. In this consortium manuscript, Chung et al. introduce highdensity electrode arrays on a flexible substrate that can be implanted into muscle, enabling the isolation of multiple motor units during movement. They then demonstrate these arrays can produce high-quality recordings in a wide range of species, muscles, and tasks. The methods are explained clearly, and the claims are justified by the data. While technical details on the arrays have been published previously, the main significance of this manuscript is the application of this new technology to different muscles and animal species during naturalistic behaviors. Overall, we feel the manuscript will be of significant interest to researchers in motor systems and muscle physiology, and we have no major concerns. A few minor suggestions for improving the manuscript follow.
  
  We thank the Reviewer for this positive overall assessment.
  
  The authors perhaps understate what has been achieved with classical methods. To further clarify the novelty of this study, they should survey previous approaches for recording from motor units during active movement. For example, Pflüger & Burrows (J. Exp. Biol. 1978) recorded from motor units in the tibial muscles of locusts during jumping, kicking, and swimming. In humans, Grimby (J. Physiol. 1984) recorded from motor units in toe extensors during walking, though these experiments were most successful in reinnervated units following a lesion. In addition, the authors might briefly mention previous approaches for recording directly from motoneurons in awake animals (e.g., Robinson, J. Neurophys. 1970; Hoffer et al., Science 1981).
  
  We agree and have revised the manuscript to discuss these and other prior use of traditional EMG, including here [lines 164-167]:
  
  “The diversity of applications presented here demonstrates that Myomatrix arrays can obtain highresolution EMG recordings across muscle groups, species, and experimental conditions including spontaneous behavior, reflexive movements, and stimulation-evoked muscle contractions. Although this resolution has previously been achieved in moving subjects by directly recording from motor neuron cell bodies in vertebrates (Hoffer et al. 1981; Robinson 1970; Hyngstrom et al. 2007) and by using fine-wire electrodes in moving insects (Pfluger 1978; Putney et al. 2023), both methods are extremely challenging and can only target a small subset of species and motor unit populations. Exploring additional muscle groups and model systems with Myomatrix arrays will allow new lines of investigation into how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles…
  
  For chronic preparations, additional data and discussion of the signal quality over time would be useful. Can units typically be discriminated for a day or two, a week or two, or longer?
  
  A related issue is whether the same units can be tracked over multiple sessions and days; this will be of particular significance for studies of adaptation and learning.
  
  Although the yields of single units are greatest in the 1-2 weeks immediately following implantation, in chronic preparations we have obtained well-isolated single units up to 65 days post-implant. Anecdotally, in our chronic mouse implants we occasionally see motor units on the same channel across multiple days with similar waveform shapes and patterns of behavior-locked activity. However, because data collection for this manuscript was not optimized to answer this question, we are unable to verify whether these observations actually reflect cross-session tracking of individual motor units. For example, in all cases animals were disconnected from data collection hardware in between recording sessions (which were often separated by multiple intervening days) preventing us from continuously tracking motor units across long timescales. We agree with the reviewer that long-term motor unit tracking would be extremely useful as a tool for examining learning and plan to address this question in future studies.
  
  We have added a discussion of these issues to the revised manuscript [lines 52-59]:
  
  “…These methods allow the user to record simultaneously from ensembles of single motor units (Fig. 1c,d) in freely behaving animals, even from small muscles including the lateral head of the triceps muscle in mice (approximately 9 mm in length with a mass of 0.02 g 23). Myomatrix recordings isolated single motor units for extended periods (greater than two months, Supp. Fig. 3e), although highest unit yield was typically observed in the first 1-2 weeks after chronic implantation. Because recording sessions from individual animals were often separated by several days during which animals were disconnected from data collection equipment, we are unable to assess based on the present data whether the same motor units can be recorded over multiple days.”
  
  Moreover, we have revised Supplemental Figure 3 to show an example of single motor units recorded >2 months after implantation:
  
  Author response image 1.
  
  Longevity of Myomatrix recordings In addition to isolating individual motor units, Myomatrix arrays also provide stable multi-unit recordings of comparable or superior quality to conventional fine wire EMG…. (e) Although individual motor units were most frequently recorded in the first two weeks of chronic recordings (see main text), Myomatrix arrays also isolate individual motor units after much longer periods of chronic implantation, as shown here where spikes from two individual motor units (colored boxes in bottom trace) were isolated during locomotion 65 days after implantation. This bipolar recording was collected from the subject plotted with unfilled black symbols in panel (d).
  
  It appears both single-ended and differential amplification were used. The authors should clarify in the Methods which mode was used in each figure panel, and should discuss the advantages and disadvantages of each in terms of SNR, stability, and yield, along with any other practical considerations.
  
  We thank the reviewer for the suggestion and have added text to all figure legends clarifying whether each recording was unipolar or bipolar.
  
  Is there likely to be a motor unit size bias based on muscle depth, pennation angle, etc.?
  
  Although such biases are certainly possible, the data presented here are not well-suited to answering these questions. For chronic implants in small animals, the target muscles (e.g. triceps in mice) are so small that the surgeon often has little choice about the site and angle of array insertion, preventing a systematic analysis of this question. For acute array injections in larger animals such as rhesus macaques, we did not quantify the precise orientation of the arrays (e.g. with ultrasound imaging) or the muscle fibers themselves, again preventing us from drawing strong conclusions on this topic. This question is likely best addressed in acute experiments performed on larger muscles, in which the relative orientations of array threads and muscle fibers can be precisely imaged and systematically varied to address this important issue.
  
  Can muscle fiber conduction velocity be estimated with the arrays?
  
  We sometimes observe fiber conduction delays up to 0.5 msec as the spike from a single motor unit moves from electrode contact to electrode contact, so spike velocity could be easily estimated given the known spatial separation between electrode contacts. However (closely related to the above question) this will only provide an accurate estimate of muscle fiber conduction velocity if the electrode contacts are arranged parallel to fiber direction, which is difficult to assess in our current dataset. If the arrays are not parallel, this computation will produce an overestimate of conduction velocity, as in the extreme case where a line of electrode contacts arranged perpendicular to the fiber direction might have identical spike arrival times, and therefore appear to have an infinite conduction velocity. Therefore, although Myomatrix arrays can certainly be used to estimate conduction velocity, such estimates should be performed in future studies only in settings where the relative orientation of array threads and muscle fibers can be accurately measured.
  
  The authors suggest their device may have applications in the diagnosis of motor pathologies. Currently, concentric needle EMG to record from multiple motor units is the standard clinical method, and they may wish to elaborate on how surgical implantation of the new array might provide additional information for diagnosis while minimizing risk to patients.
  
  We thank the reviewer for the suggestion and have modified the manuscript’s final paragraph accordingly [lines 182-188]:
  
  “Applying Myomatrix technology to human motor unit recordings, particularly by using the minimally invasive injectable designs shown in Figure 3 and Supplemental Figure 1g,i, will create novel opportunities to diagnose motor pathologies and quantify the effects of therapeutic interventions in restoring motor function. Moreover, because Myomatrix arrays are far more flexible than the rigid needles commonly used to record clinical EMG, our technology might significantly reduce the risk and discomfort of such procedures while also greatly increasing the accuracy with which human motor function can be quantified. This expansion of access to high-resolution EMG signals – across muscles, species, and behaviors – is the chief impact of the Myomatrix project.”
  
  Reviewer #3 (Public Review):
  
  This work provides a novel design of implantable and high-density EMG electrodes to study muscle physiology and neuromotor control at the level of individual motor units. Current methods of recording EMG using intramuscular fine-wire electrodes do not allow for isolation of motor units and are limited by the muscle size and the type of behavior used in the study. The authors of Myomatrix arrays had set out to overcome these challenges in EMG recording and provided compelling evidence to support the usefulness of the new technology.
  
  Strengths:
  
  They presented convincing examples of EMG recordings with high signal quality using this new technology from a wide array of animal species, muscles, and behavior.
  
  • The design included suture holes and pull-on tabs that facilitate implantation and ensure stable recordings over months.
  
  • Clear presentation of specifics of the fabrication and implantation, recording methods used, and data analysis.
  
  We thank the Reviewer for these comments.
  
  Weaknesses:
  
  The justification for the need to study the activity of isolated motor units is underdeveloped. The study could be strengthened by providing example recordings from studies that try to answer questions where isolation of motor unit activity is most critical. For example, there is immense value for understanding muscles with smaller innervation ratio which tend to have many motor neurons for fine control of eyes and hand muscles.
  
  We thank the Reviewer for the suggestion and have modified the manuscript accordingly [lines 170-174]:
  
  “…how the nervous system executes skilled behaviors and coordinates the populations of motor units both within and across individual muscles. These approaches will be particularly valuable in muscles in which each motor neuron controls a very small number of muscle fibers, allowing fine control of oculomotor muscles in mammals as well as vocal muscles in songbirds (Fig. 2g), in which most individual motor neurons innervate only 1-3 muscle fibers (Adam et al. 2021).”
  
  Reviewer #1 (Recommendations for The Authors):
  
  I would urge the authors to consider a thorough validation of the spike sorting piece of the workflow. Barring that weakness, this paper has the potential to transform motor neuroscience. The validation efforts of kilosort in the context of Neuropixels might offer a template for how to convince the community of the accuracy of myomatrix arrays in disambiguating individual motor unit waveforms.
  
  I have a few minor detailed comments, that the authors may find of some use. My overall comment is to commend the authors for the precision of the work as well as the writing. However, exercising caution associated with kilosort could truly elevate the paper by showing where there is room for improvement.
  
  We thank the Reviewer for these comments - please see our summary of our revisions related to Kilosort in our reply to the public reviews above.
  
  L6-7: The relationship between motor unit action potential and the force produced is quite complicated in muscle. For example, recent work has shown how decoupled the force and EMG can be during nonsteady locomotion. Therefore, it is not a fully justified claim that recording motor unit potentials will tell us what forces are produced. This point relates to another claim made by the authors (correctly) that EMG provides better quality information about muscle motor output in isometric settings than in more dynamic behaviors. That same problem could also apply to motor unit recordings and their relationship to muscle force. The relationship is undoubtedly strong in an isometric setting. But as has been repeatedly established, the electrical activity of muscle is only loosely related to its force output and lacks in predictive power.
  
  This is an excellent point, and our revised manuscript now addresses this issue [lines 174-176]:
  
  “…Of further interest will be combining high-resolution EMG with precise measurement of muscle length and force output to untangle the complex relationship between neural control, body kinematics, and muscle force that characterizes dynamic motor behavior. Similarly, combining Myomatrix recordings with high-density brain recordings….”
  
  L12: There is older work that uses an array of skin mounted EMG electrodes to solve a source location problem, and thus come quite close to the authors' stated goals. However, the authors have failed to cite or provide an in-depth analysis and discussion of this older work.
  
  As described above in the response to Reviewer 1’s public review comments, we now cite and discuss these papers.
  
  L18-19: "These limitations have impeded our understanding of fundamental questions in motor control, ..." There are two independently true statements here. First is that there are limitations to EMG based inference of motor unit activity. Second is that there are gaps in the current understanding of motor unit recruitment patterns and modification of these patterns during motor learning. But the way the first few paragraphs have been worded makes it seem like motor unit recordings is a panacea for these gaps in our knowledge. That is not the case for many reasons, including key gaps in our understanding of how muscle's electrical activity relates to its force, how force relates to movement, and how control goals map to specific movement patterns. This manuscript would in fact be strengthened by acknowledging and discussing the broader scope of gaps in our understanding, and thus more precisely pinpointing the specific scientific knowledge that would be gained from the application of myomatrix arrays.
  
  We agree and have revised the manuscript to note this complexity (see our reply to this Reviewer’s other comment about muscle force, above).
  
  L140-143: The estimation algorithms yields potential spikes but lacking the validation of the sorting algorithms, it is not justifiable to conclude that the myomatrix arrays have already provided information about individual motor units.
  
  Please see our replies to Reviewer #1s public comments (above) regarding motor unit spike sorting.
  
  L181-182: "These methods allow very fine pitch escape routing (<10 µm spacing), alignment between layers, and uniform via formation." I find this sentence hard to understand. Perhaps there is some grammatical ambiguity?
  
  We have revised this passage as follows [lines 194-197]:
  
  "These methods allow very fine pitch escape routing (<10 µm spacing between the thin “escape” traces connecting electrode contacts to the connector), spatial alignment between the multiple layers of polyimide and gold that constitute each device, and precise definition of “via” pathways that connect different layers of the device.”
  
  L240: What is the rationale for choosing this frequency band for the filter?
  
  Individual motor unit waveforms have peak energy at roughly 0.5-2.0 kHz, although units recorded at very high SNR often have voltage waveform features at higher frequencies. The high- and lowpass cutoff frequencies should reflect this, although there is nothing unique about the 350 Hz and 7,000 Hz cutoffs we describe, and in all recordings similar results can be obtained with other choices of low/high frequency cutoffs.
  
  L527-528: There are some key differences between the electrode array design presented here and traditional fine-wire EMG in terms of features used to help with electrode stability within the muscle. A barb-like structure is formed in traditional fine-wire EMG by bending the wire outside the canula of the needle used to place it within the muscle. But when the wire is pulled out, it is common for the barb to break off and be left behind. This is because of the extreme (thin) aspect ratio of the barb in fine wire EMG and low-cycle fatigue fracture of the wire. From the schematic shown here, the barb design seems to be stubbier and thus less prone to breaking off. This raises the question of how much damage is inflicted during the pull-out and the associated level of discomfort to the animal as a result. The authors should present a more careful statement and documentation with regard to this issue.
  
  We have updated the manuscript to highlight the ease of inserting and removing Myomatrix probes, and to clarify that in over 100 injectable insertions/removal there have been zero cases of barbs (or any other part) of the devices breaking off within the muscle [lines 241-249]:
  
  “…Once the cannula was fully inserted, the tail was released, and the cannula slowly removed. After recording, the electrode and tail were slowly pulled out of the muscle together. Insertion and removal of injectable Myomatrix devices appeared to be comparable or superior to traditional fine-wire EMG electrodes (in which a “hook” is formed by bending back the uninsulated tip of the recording wire) in terms of both ease of injection, ease of removal of both the cannula and the array itself, and animal comfort. Moreover, in over 100 Myomatrix injections performed in rhesus macaques, there were zero cases in which Myomatrix arrays broke such that electrode material was left behind in the recorded muscle, representing a substantial improvement over traditional fine-wire approaches, in which breakage of the bent wire tip regularly occurs (Loeb and Gans 1986).”
  
  Reviewer #2 (Recommendations For The Authors):
  
  The Abstract states the device records "muscle activity at cellular resolution," which could potentially be read as a claim that single-fiber recording has been achieved. The authors might consider rewording.
  
  The Reviewer is correct, and we have removed the word “cellular”.
  
  The supplemental figures could perhaps be moved to the main text to aid readers who prefer to print the combined PDF file.
  
  After finalizing the paper we will upload all main-text and supplemental figures into a single pdf on biorXiv for readers who prefer a single pdf. However, given that the supplemental figures provide more technical and detailed information than the main-text figures, for the paper on the eLife site we prefer the current eLife format in which supplemental figures are associated with individual main-text figures online.
  
  Reviewer #3 (Recommendations For The Authors):
  
  • The work could be strengthened by showing examples of simultaneous recordings from different muscles.
  
  Although Myomatrix arrays can indeed be used to record simultaneously from multiple muscles, in this manuscript we have decided to focus on high-resolution recordings that maximize the number of recording channels and motor units obtained from a single muscle. Future work from our group with introduce larger Myomatrix arrays optimized for recording from many muscles simultaneously.
  
  • The implantation did not include mention of testing the myomatrix array during surgery by using muscle stimulation to verify correct placement and connection.
  
  As the Reviewer points out electrical stimulation is a valuable tool for confirming successful EMG placement. However we did not use this approach in the current study, relying instead on anatomical confirmation of muscle targeting (e.g. intrasurgical and postmortem inspection in rodents) and by implanting large, easy-totarget arm muscles (in primates) where the risk of mis-targeting is extremely low. Future studies will examine both electrical stimulation and ultrasound methods for confirming the placement of Myomatrix arrays.
  
  References cited above
  
  Adam, I., A. Maxwell, H. Rossler, E. B. Hansen, M. Vellema, J. Brewer, and C. P. H. Elemans. 2021. 'One-to-one innervation of vocal muscles allows precise control of birdsong', Curr Biol, 31: 3115-24 e5.
  
  Hoffer, J. A., M. J. O'Donovan, C. A. Pratt, and G. E. Loeb. 1981. 'Discharge patterns of hindlimb motoneurons during normal cat locomotion', Science, 213: 466-7.
  
  Hyngstrom, A. S., M. D. Johnson, J. F. Miller, and C. J. Heckman. 2007. 'Intrinsic electrical properties of spinal motoneurons vary with joint angle', Nat Neurosci, 10: 363-9.
  
  Loeb, G. E., and C. Gans. 1986. Electromyography for Experimentalists, First edi (The University of Chicago Press: Chicago, IL).
  
  Michel, C. M., M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Peralta. 2004. 'EEG source imaging', Clin Neurophysiol, 115: 2195-222.
  
  Negro, F., S. Muceli, A. M. Castronovo, A. Holobar, and D. Farina. 2016. 'Multi-channel intramuscular and surface EMG decomposition by convolutive blind source separation', J Neural Eng, 13: 026027.
  
  Pfluger, H. J.; Burrows, M. 1978. 'Locusts use the same basic motor pattern in swimming as in jumping and kicking', Journal of experimental biology, 75: 81-93.
  
  Putney, Joy, Tobias Niebur, Leo Wood, Rachel Conn, and Simon Sponberg. 2023. 'An information theoretic method to resolve millisecond-scale spike timing precision in a comprehensive motor program', PLOS Computational Biology, 19: e1011170.
  
  Robinson, D. A. 1970. 'Oculomotor unit behavior in the monkey', J Neurophysiol, 33: 393-403.
  
  van den Doel, Kees, Uri M Ascher, and Dinesh K Pai. 2008. 'Computed myography: three-dimensional reconstruction of motor functions from surface EMG data', Inverse Problems, 24: 065010.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.21.529200v2
www.biorxiv.org www.biorxiv.org

New submission 12/01/2024, 09:20:32

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Experiments regarding the inducible expression of MukBEF: The authors should provide western blots or rt-qPCR for MukBEF expression at 40 min and 2H.
  
  We provide now a western blot of MukB in non-induced and induced conditions as Figure 1-figure supplement 1D.
  
  Experiments with RiTer and LiTer constructs:<br /> a. Authors compare the mukB deletion against wild type (Fig. 2C). It would be additionally informative if these comparisons are made for matP deletion and wild type as well. This will strengthen the conclusion that long-range interactions in ter do increase in the absence of matP.
  
  We agree that the matP mutant may help the reader to compare the effect of the translocation in different backgrounds and have added it to the figure. This strengthens the conclusion that longrange interactions in ter do increase in the absence of matP in a rearranged chromosome, as observed in the WT configuration (Lioy et al., 2018).
  
  b. Additionally, in Fig. 2C, it appears that there is some decrease in long-range interactions in the absence of mukB in ter1 (Riter). Is this a significant change?
  
  The change observed is not significant. The results shown in Fig. 2C have been obtained using a 3C approach, which generated slightly more variability than Hi-C. Furthermore, we measured the range of contacts for the segment corresponding to Ter1 in RiTer (matS12-matS28), in different genetic contexts and different configurations. The results show that this level of variation is not significant (see graph below reporting two independent experiments).
  
  Author response image 1.
  
  Range of interactions measured on the interval matS12-matS18 in different genetic contexts and different configurations (MG1655 WT(1 and 2), ∆mukB, RiTer, RiTer ∆mukB).
  
  Experiments with various matS organizations: These experiments are interesting and an important part of the paper. However, it is rather hard to visualize the chromosome conformations in the strains after transposition. To aid the reader (particularly with panel E), authors can provide schematics of the chromosome conformations and anticipated/ observed chromosomal interactions. Circular interaction plots would be useful here.
  
  We thank the reviewer for this interesting remark; we have tried in the past to represent these interactions using a circular representation (see for example the web site of Ivan Junier; https://treetimc.github.io/circhic/index.html). However, this representation is not trivial to apprehend for nonspecialists, especially in strains with a rearranged chromosome configuration. Nonetheless, we have added graphical circular representations of the chromosome configurations to help the reader.
  
  ChIP experiments:<br /> a. This section of the manuscript needs to be further strengthened. It is not clear whether the ChIP signal observed is significant (for example at T10 or T20 min, the peak value does not appear to go above 1.1 fold. Can the authors be sure that this small increase is not simply a consequence of increase in copy number of the loci around the origin, as replication has initiated?
  
  The basal value of the ChIP on the non-replicated sequences (between 0-3.5 Mb for 10 minutes and 0-3 Mb for 20 minutes) is 0.8 and 0.7, respectively, whereas the mean value of the replicated sequence is 1.6 and 1.45. So the enrichment observed for these two points is about 2-fold, not 1.1 and it is 4 fold for t40min. These values were obtained by dividing the number of normalized reads in the ChIP (the number of reads at each position divided by the total number of reads) by the normalized reads of the input. Therefore, the increase in copy number is considered in the calculation. Furthermore, we added a supplementary figure (Figure Sup9) in which we performed a ChIP without tags on synchronized cells, and in this case, we did not observe any enrichment triggered by replication.
  
  b. Authors make a conclusion that MukB loads behind the replication fork. However, the time resolution of the presented experiments is not sufficient to be certain of this. Authors would need to perform more time-resolved experiments for the same.
  
  Reviewer 1 is correct; we attempted to discriminate whether the observed enrichment is (i) associated with the replication fork since we observed a decrease in the center of the enrichment at oriC as the maximum enrichment moves away with the replication fork after 20 and 40 minutes, or (ii) associated with the newly replicated sequence. To investigate this, we attempted to induce a single round of replication by shifting the cells back to 40°C after 10 minutes at 30°C. Unfortunately, replication initiation is not immediately halted by shifting the cells to 40°C, and we were unable to induce a single round of replication. To clarify our conclusions, we modified our manuscript to
  
  “Altogether, these findings indicate that MukBEF is loaded into regions newly replicated either at the replication fork or even further behind it, except in the Ter region from which it would be excluded.”
  
  c. Authors conclude that in the LiTer7 strain, MukB signal is absent from Ter2. However, when compared with the ChIP profiles by eye across panels in A and B, this does not seem to be significant. In the same results sections, authors state that there is a 3-fold increase in MukB signal in other regions. The corresponding graph does not show the same.
  
  Rather than relying solely on the enrichment levels, which can be challenging to compare across different strains due to slight variations in replication levels, we believe there is a clear disruption in this profile that corresponds to the Ter2 sequence. Furthermore, this discontinuity in enrichment relative to the replication profile is also observable in the WT configuration. At T40min, MukB ChIPseq signals halt at the Ter boundary, even though Ter is actively undergoing replication, as evidenced by observations in the input data.
  
  Regarding the fold increase of MukB, Reviewer 1 is correct; we overestimated this enrichment in the text and have now corrected it.
  
  d. Authors should provide western blot of MukB-Flag.
  
  We have added Supplementary Figure 1 D, which contains a Western blot of MukB-Flag.
  
  The bioinformatic analysis of matS site distribution is interesting, but this is not followed upon. The figure (Fig 5) is better suited in the supplement and used only as a discussion point.
  
  We acknowledge the reviewer's point, but we used this section to attempt to extend our findings to other bacteria and emphasize the observation that even though a few matS sites are necessary to inhibit MukBEF, the Ter domains are large and centered on dif even in other bacteria.
  
  The discussion section is lacking many references and key papers have not been cited (paragraph 1 of discussion for example has no references).
  
  The possibility that SMC-ScpAB and MukBEF can act independent of replication has been suggested previously, but are not cited or discussed. Similarly, there is some evidence for SMC-ScpAB association with newly replicated DNA (PMID 21923769).
  
  We have added references to the suggested paragraph and highlighted the fact that MukBEF's activity independent of replication was already known. However, we believe that the situation is less clear for SMC-ScpAB in B. subtilis or C. crescentus. In a similar manner, we found no clear evidence that SMCScpAB is associated with newly replicated DNA in the referenced studies.
  
  To clarify and enrich the discussion section, we have added a paragraph that provides perspective on the loading mechanisms of SMC-ScpAB and MukBEF.
  
  There are minor typographical errors that should be corrected. Some are highlighted here:
  
  a. Abstract: L5: "preferentially 'on' instead of 'in'"
  
  b. Introduction: Para 1 L8: "features that determine"
  
  c. Introduction: Para 2 L1: please check the phrasing of this line
  
  d. Results section 2: L1: Ter "MD" needs to be explained
  
  e. Page 8: Para 2: L6: "shows that 'a'"
  
  g. Page 13: Para 2: "MukBEF activity...". This sentence needs to be fixed.
  
  i. Figure 4: "input" instead of "imput"
  
  We thank Reviewer 1 for pointing out all these grammatical or spelling mistakes. We have corrected them all.
  
  f. Page 12: Para 2: "Xer" instead of "XDS"? *We added a reference to clarify the term.
  
  h. Methods: ChIP analysis: Authors state "MatP peaks", however, reported data is for MukB
  
  This description pertains to the matP peak detection shown in Supplementary Figure 3. We have incorporated this clarification into the text.
  
  j. Supplementary figure legends need to be provided (currently main figure legends appear to be pasted twice)
  
  Supplementary figure legends are provided at the end of the manuscript, and we have edited the manuscript to remove one copy of the figure legends.
  
  k. Authors should ensure sequencing data are deposited in an appropriate online repository and an accession number is provided.
  
  We waited for the appropriate timing in the editing process to upload our data, which we have now done. Additionally, we have added a data availability section to the manuscript, including sequence references on the NCBI.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The authors largely avoid speculation on what might be the physiological relevance of the exclusion of MukBEF (and Smc-ScpAB) from the replication termination region (and the coordination with DNA replication). At this stage it would be helpful to present possible scenarios even if not yet supported by data. The authors should for example consider the following scenario: loop extrusion of a dif site in a chromosome dimer followed by dimer resolution by dif recombination leads to two chromosomes that are linked together by MukBEF (equivalent to cohesin holding sister chromatids together in eukaryotes but without a separase). This configuration (while rare) will hamper chromosome segregation. Is MatP particularly important under conditions of elevated levels of chromosome dimers? Could this even be experimentally tested? Other scenarios might also be entertained.
  
  Even though we prefer to avoid speculations, we agree that we may attempt to propose some hypotheses to the reader. To do so, we have added a few sentences at the end of our discussion. “We may speculate, based on in vitro observations (Kumar et al., 2022), that MukBEF could interfere with TopIV activity and delay potential chromosome decatenation. Another possibility is that chromosome dimers resolved at the dif site may become trapped in loops formed by MukBEF, thus delaying segregation. But none of these possible scenarios are supported by data yet, and a major challenge for the future is to determine whether and how MukBEF may interfere with one or both of these processes.”
  
  The manuscript text is well written. However, the labeling of strains in figures and text is sometimes inconsistent which can be confusing (LiTer Liter liter; e.g Riter Fig 2C). For consistency, always denote the number of matS sites in LiTer strains and also in the RiTer strain. The scheme denoting LiTer and RiTer strains should indicate the orientation of DNA segments so it is clear that the engineering does not involve inversion (correct?). Similarly: Use uniform labelling for time points: see T40mn vs 40mn vs T2H vs 2H
  
  We have reviewed the manuscript to standardize our labeling. Additionally, we have included a schema in Figure 2, indicating the matS numbers at the Ter border to emphasize that the transposition events do not involve inversion.
  
  matS sites do not have identical sequences and bind different levels of MatP (suppl fig 3). Does this possibly affect the interpretation of some of the findings (when altering few or only a single matS site). Maybe a comment on this possibility can be added.
  
  We agree with the referee; we do not want to conclude too strongly about the impact of matS density, so we have added this sentence at the end of the section titled 'matS Determinants to Prevent MukBEF Activity':
  
  “Altogether, assuming that differences in the matS sequences do not modify MatP's ability to bind to the chromosome and affect its capacity to inhibit MukBEF, these results suggested that the density of matS sites in a small chromosomal region has a greater impact than dispersion of the same number of matS sites over a larger segment”
  
  Figure 5: show selected examples of matS site distribution in addition to the averaged distribution (as in supplemental figure)?
  
  Figure 5 shows the median of the matS distribution based on the matS positions of 16 species as displayed in the supplementary figure. We believe that this figure is interesting as it represents the overall matS distribution across the Enterobacterales, Pasteurellales, and Vibrionales.
  
  How do authors define 'background levels' (page 9)in their ChIP-Seq experiments? Please add a definition or reword.
  
  We agree that the term 'background level' here could be confusing, so we have modified it to 'basal level' to refer to the non-replicating sequence. The background level can be observed in Supplementary Figure 9 in the ChIP without tags, and, on average, the background level is 1 throughout the entire chromosome in these control experiments.
  
  This reviewer would naively expect the normalized ChIP-Seq signals to revolve around a ratio of 1 (Fig. 4)? They do in one panel (Figure 4B) but not in the others (Figure 4A). Please provide an explanation.
  
  We thank the referee for this pertinent observation. An error was made during the smoothing of the data in Figure 4A, which resulted in an underestimation of the input values. This mistake does not alter the profile of the ChIP (it's a division by a constant) and our conclusions. We provide a revised version of the figure.
  
  Inconsistent axis labelling: e.g Figure 4
  
  Enterobacterals should be Enterobacterales (?)
  
  KB should be kb
  
  MB should be Mb
  
  Imput should be Input
  
  FlaG should be Flag
  
  We have made the suggested modifications to the text.
  
  'These results unveiled that fluorescent MukBEF foci previously observed associated with the Ori region were probably not bound to DNA' Isn't the alternative scenario that MukBEF bound to distant DNA segments colocalize an equally likely scenario? Please rephrase.
  
  Since we lack evidence regarding what triggers the formation of a unique MukB focus associated with the origin and what this focus could represent, we have removed this sentence.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The text is well-written and easy to follow, but I would suggest several improvements to make things clearer:
  
  Many plots are missing labels or legends. (I) All contact plots such as Fig. 1C should have a color legend. It is not clear how large the signal is and whether the plots are on the same scale. (II)<br /> Ratiometric contact plots such as in Fig. 1D should indicate what values are shown. Is this a log ratio?
  
  As indicated in the materials and methods section, the ratio presented on this manuscript was calculated for each point on the map by dividing the number of contacts in one condition by the number of contacts in the other condition. The Log2 of the ratio was then plotted using a Gaussian filter.
  
  Genotypes and strain names are often inconsistent. Sometimes ΔmukB, ΔmatP, ΔmatS is used, other times it is just mukB, matP, matS; There are various permutations of LiTer, Liter, liter etc.
  
  These inconsistencies have been corrected.
  
  The time notation is unconventional. I recommend using 0 min, 40 min, 120 min etc. instead of T0, T40mn, T2H.
  
  As requested, we have standardized and used conventional annotations.
  
  A supplemental strain table listing detailed genotypes would be helpful.
  
  A strain table has been added, along with a second table recapitulating the positions of matS in the different strains.
  
  Fig. 1A: Move the IPTG labels to the top? It took me a while to spot them.
  
  We have moved the labels to the top of the figure and increased the font size to make them more visible.
  
  Fig 1C: Have these plots been contrast adjusted? If so, this should be indicated. The background looks very white and the transitions from diagonal to background look quite sharp.
  
  No, these matrices haven't been contrast-adjusted. They were created in MATLAB, then exported as TIFF files and directly incorporated into the figure. Nevertheless, we noticed that the color code of the matrix in Figure 3 was different and subsequently adjusted it to achieve uniformity across all matrices.
  
  7, Fig 1C: What is the region around 3 Mb and 4 Mb? It looks like the contacts there are somewhat MukBEF-independent.
  
  The referee is right. In the presence of the plasmid pPSV38 (carrying the MukBEF operon or not), we repeatedly observed an increase of long range contacts around 3 Mb. The origin of these contacts is unknown.
  
  Fig 1D: Have the log ratios been clipped at -1 and 1 or was some smoothing filter applied? I would expect the division of small and noisy numbers in the background region to produce many extreme values. This does not appear to be the case.
  
  The referee is right, dividing two matrices generates a ratio with extreme values. To avoid this, the Log2 of the ratio is plotted with a Gaussian filter, as described before (Lioy et al., 2018).
  
  Fig 1E: I recommend including a wild-type reference trace as a point of reference.
  
  We have added the WT profile to the figure.
  
  Fig 2: I feel the side-by-side cartoon from Supplemental Fig. 2A could be included in the main figure to make things easier to grasp.
  
  We added a schematic representation of the chromosome configuration on top of the matrices to aid understanding.
  
  Fig. 2C: One could put both plots on the same y-axis scale to make them comparable.
  
  We have modified the axes as required.
  
  Fig. 3C: The LiTer4 ratio plot has two blue bands in the 3-4.5 Mb region. I was wondering what they might be. These long-range contacts seem to be transposition-dependent and suppressed by MatP, is that correct?
  
  The referee is right. This indicates that in the absence of MatP, one part of the Ter was able to interact with a distal region of the chromosome, albeit with a low frequency. The origin is not yet known.
  
  Fig. 3E: It is hard to understand what is a strain label and what is the analyzed region of interest. The plot heading and figure legend say Ter2 (but then, there are different Ter2 variants), some labels say Ter, others say Ter2, sometimes it doesn't say anything, some labels say ΔmatS or ΔmatP, others say matS or matP, and so on.
  
  We have unified our notation and add more description on the legend to clarify this figure :
  
  “Ter” corresponds to the range of contacts over the entire Ter region, in the WT strain (WT Ter) or in the ΔmatP strain (ΔmatP Ter). The column WT matSX-Y corresponds to the range of contacts between the designated matS sites in the WT configuration. This portion of the Ter can be compared with the same Ter segment in the transposed strain (Ter2). Additionally, the matS20-28 segment corresponds to Ter2 in LiTer9, just as matS22-28 corresponds to Ter2 in LiTer7, and matS25-28 to Ter2 in LiTer4. The range of contacts of this segment was also measured in a ΔmatP or ΔmatS background.”
  
  Fig. 4 and p.9: "Normalized ChIP-seq experiments were performed by normalizing the quantity of immuno-precipitated fragments to the input of MukB-Flag and then divide by the normalized ChIP signals at t0 to measure the enrichment trigger by replication."
  
  This statement and the ChIP plots in Fig. 4A are somewhat puzzling. If the data were divided by the ChIP signal at t0, as stated in the text, then I would expect the first plot (t0) to be a flat line at value 1. This is not the case. I assume that normalized ChIP is shown without the division by t0, as stated in the figure legend.
  
  The referee is right. This sentence has been corrected, and as described in the Methods section, Figure 4 shows the ChIP normalized by the input.
  
  If that's true and the numbers were obtained by dividing read-count adjusted immunoprecipitate by read-count adjusted input, then I would expect an average value of 1. This is also not the case. Why are the numbers so low? I think this needs some more details on how the data was prepared.
  
  The referee is right; we thank him for this remark. Our data are processed using the following method: the value of each read is divided by the total number of reads. A sliding window of 50 kb is applied to these normalized values to smooth the data. Then, the resulting signal from the ChIP is divided by the resulting signal from the input. This is what is shown in Figure 4. Unfortunately, for some of our results, the sliding window was not correctly applied to the input data. This did not alter the ChIP profile but did affect the absolute values. We have resolved this issue and corrected the figure.
  
  Another potential issue is that it's not clear what the background signal is and whether it is evenly distributed. The effect size is rather small. Negative controls (untagged MukB for each timepoint) would help to estimate the background distribution, and calibrator DNA could be used to estimate the signal-to-background ratio. There is the danger that the apparent enrichment of replicated DNA is due to increased "stickiness" rather than increased MukBEF binding. If any controls are available, I would strongly suggest to show them.
  
  To address this remark, a ChIP experiment with a non-tagged strain under comparable synchronization conditions has been performed. The results are presented as Supplementary Figure 9; they reveal that the enrichment shown in Figure 4 is not attributed to nonspecific antibody binding or 'stickiness’.
  
  Fig. 4A, B: The y-axes on the right are unlabeled and the figure legends mention immunoblot analysis, which is not shown.
  
  We labeled the y-axes as 'anti-Flag ChIP/input' and made corrections to the figure legend.
  
  Fig. 4B: This figure shows a dip in enrichment at the Ter2 region of LiTer7, which supports the authors' case. Having a side-by-side comparison with WT at 60 min would be good, as this time point is not shown in Fig. 4A.
  
  Cell synchronization can be somewhat challenging, and we have observed that the timing of replication restart can vary depending on the genetic background of the cells. This delay is evident in the case of LiTer7. To address this, we compared LiTer7 after 60 minutes to the wild type strain (WT) after 40 minutes of replication. Even though the duration of replication is 20 minutes longer in LiTer7, the replication profiles of these two strains under these two different conditions (40 minutes and 60 minutes) are comparable and provide a better representation of similar replication progression.
  
  Fig. 4C: Highlighting the position of the replication origin would help to interpret the data.
  
  We highlight oriC position with a red dash line
  
  Fig. 4C: One could include a range-of-contact plot that compares the three conditions (similar to Fig. 1E).
  
  We have added this quantification to Supplemental Figure 8
  
  Supplemental Fig. 2A: In the LiTer15 cartoon, the flanking attachment sites do not line up. Is this correct? I would also recommend indicating the direction of the Ter1 and Ter2 regions before and after recombination.
  
  In this configuration, attB and attR, as well as attL and attB', should be aligned but the remaining attR attL may not. We have corrected this misalignment. To clarify the question of sequence orientation, we have included in the figure legend that all transposed sequences maintain their original orientation.
  
  Supplemental Fig. 3: One could show where the deleted matS sites are.
  
  We added red asterisks to the ChIP representation to highlight the positions of the missing matS.
  
  Supplemental Fig. 3B: The plot legend is inconsistent with panel A (What is "WT2")?
  
  We have corrected it.
  
  Supplemental Fig. 3C: The E-value notation is unusual. Is this 8.9 x 10^-61?
  
  The value is 8.9 x 10-61; we modified the annotation.
  
  23) Abstract: "While different features for the activity of the bacterial canonical SMC complex, SmcScpAB, have been described in different bacteria, not much is known about the way chromosomes in enterobacteria interact with their SMC complex, MukBEF."
  
  Could this be more specific? What features are addressed in this manuscript that have been described for Smc-ScpAB but not MukBEF? Alternatively, one could summarize what MukBEF does to capture the interest of readers unfamiliar with the topic.
  
  We modified these first sentences.
  
  p.5 "was cloned onto a medium-copy number plasmid under control of a lacI promoter" Is "lacI promoter" correct? My understanding is that the promoter of the lacI gene is constitutive, whereas the promoter of the downstream lac operon is regulated by LacI. I would recommend providing an annotated plasmid sequence in supplemental material to make things clearer.
  
  We modified it and replaced “ lacI promoter” with the correct annotation, pLac.
  
  p. 5 heading "MukBEF activity does not initiate at a single locus" and p. 6 "Altogether, the results indicate that the increase in contact does not originate from a specific position on the chromosome but rather appears from numerous sites". Although this conclusion is supported by the follow-up experiments, I felt it is perhaps a bit too strong at this point in the text. Perhaps MukBEF loads slowly at a single site, but then moves away quickly? Would that not also lead to a flat increase in the contact plots? One could consider softening these statements (at least in the section header), and then be more confident later on.
  
  We used 'indicate' and 'suggesting' at the end of this results section, and we feel that we have not overreached in our conclusions at this point. While it's true that we can consider other hypotheses, we believe that, at this stage, our suggestion that MukBEF is loaded over the entire chromosome is the simplest and more likely explanation.
  
  p.7: "[these results] also reveal that MukBEF does not translocate from the Ori region to the terminus of the chromosome as observed with Smc-ScpAB in different bacteria."
  
  This isn't strictly true for single molecules, is it? Some molecules might translocate from Ori to Ter. Perhaps clarify that this is about the bulk flux of MukBEF?
  
  At this point, our conclusion that MukBEF does not travel from the ori to Ter is global and refers to the results described in this section. However, the referee is correct in pointing out that we cannot exclude the possibility that in a WT configuration (without a Ter in the middle of the right replicore), a specific MukBEF complex can be loaded near Ori and travel all along the chromosome until the Ter. To clarify our statement, we have revised it to 'reveal that MukBEF does not globally translocate from the Ori region to the terminus of the chromosome.' This change is intended to highlight the fact that we are drawing a general conclusion about the behavior of MukBEF and to facilitate its comparison with Smc-ScpAB in B. subtilis.
  
  p. 10: The section title "Long-range contacts correlate with MukBEF binding" and the concluding sentence "Altogether, these results indicate that MukBEF promotes long-range DNA contacts independently of the replication process even though it binds preferentially in newly replicated regions" seem to contradict each other. I would rephrase the title as "MukBEF promotes long-range contacts in the absence of replication" or similar.
  
  We agree with this suggestion and have used the proposed title.
  
  p. 13: I recommend reserving the name "condensin" for the eukaryotic condensin complex and using "MukBEF" throughout.
  
  We used MukBEF throughout.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.30.555477v3
www.biorxiv.org www.biorxiv.org

Human eIF2A has a minimal role in translation initiation and in uORF-mediated translational control in HeLa cells

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Beyond what is stated in the title of this paper, not much needs to be summarized. eIF2A in HeLa cells promotes translation initiation of neither the main ORFs nor short uORFs under any of the conditions tested.
  
  Strengths:
  
  Very comprehensive, in fact, given the huge amount of purely negative data, an admirably comprehensive and well-executed analysis of the factor of interest.
  
  Weaknesses:
  
  The study is limited to the HeLa cell line, focusing primarily on KO of eIF2A and neglecting the opposite scenario, higher eIF2A expression which could potentially result in an increase in non-canonical initiation events.
  
  We thank the reviewer for the positive evaluation. As suggested by the reviewer in the detailed recommendations, we will clarify in the title, abstract and text that our conclusions are limited to HeLa cells. Furthermore, as suggested we will test the effect of eIF2A overexpression on the luciferase reporter constructs, and will upload a revised manuscript.
  
  Reviewer #2 (Public review):
  
  Summary
  
  Roiuk et al describe a work in which they have investigated the role of eIF2A in translation initiation in mammals without much success. Thus, the manuscript focuses on negative results. Further, the results, while original, are generally not novel, but confirmatory, since related claims have been made before independently in different systems with Haikwad et al study recently published in eLife being the most relevant.
  
  Despite this, we find this work highly important. This is because of a massive wealth of unreliable information and speculations regarding eIF2A role in translation arising from series of artifacts that began at the moment of eIF2A discovery. This, in combination with its misfortunate naming (eIF2A is often mixed up with alpha subunit of eIF2, eIF2S1) has generated a widespread confusion among researchers who are not experts in eukaryotic translation initiation. Given this, it is not only justifiable but critical to make independent efforts to clear up this confusion and I very much appreciate the authors' efforts in this regard.
  
  Strengths
  
  The experimental investigation described in this manuscript is thorough, appropriate and convincing.
  
  Weaknesses
  
  However, we are not entirely satisfied with the presentation of this work which we think should be improved.
  
  We thank the reviewer for the positive evaluation. We will revise the manuscript according to the reviewer's suggestions made in the detailed recommendations.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This is a valuable study providing solid evidence that the putative non-canonical initiation factor eIF2A has little or no role in the translation of any expressed mRNAs in cultured human (primarily HeLa) cells. Previous studies have implicated eIF2A in GTP-independent recruitment of initiator tRNA to the small (40S) ribosomal subunit, a function analogous to canonical initiation factor eIF2, and in supporting initiation on mRNAs that do not require scanning to select the AUG codon or that contain near-cognate start codons, especially upstream ORFs with non-AUG start codons, and may use the cognate elongator tRNA for initiation. Moreover, the detected functions for eIF2A were limited to, or enhanced by, stress conditions where canonical eIF2 is phosphorylated and inactivated, suggesting that eIF2A provides a back-up function for eIF2 in such stress conditions. CRISPR gene editing was used to construct two different knockout cell lines that were compared to the parental cell line in a large battery of assays for bulk or gene-specific translation in both unstressed conditions and when cells were treated with inhibitors that induce eIF2 phosphorylation. None of these assays identified any effects of eIF2A KO on translation in unstressed or stressed cells, indicating little or no role for eIF2A as a back-up to eIF2 and in translation initiation at near-cognate start codons, in these cultured cells.
  
  The study is very thorough and generally well executed, examining bulk translation by puromycin labeling and polysome analysis and translational efficiencies of all expressed mRNAs by ribosome profiling, with extensive utilization of reporters equipped with the 5'UTRs of many different native transcripts to follow up on the limited number of genes whose transcripts showed significant differences in translational efficiencies (TEs) in the profiling experiments. They also looked for differences in translation of uORFs in the profiling data and examined reporters of uORF-containing mRNAs known to be translationally regulated by their uORFs in response to stress, going so far as to monitor peptide production from a uORF itself. The high precision and reproducibility of the replicate measurements instil strong confidence that the myriad of negative results they obtained reflects the lack of eIF2A function in these cells rather than data that would be too noisy to detect small effects on the eIF2A mutations. They also tested and found no evidence for a recent claim that eIF2A localizes to the cytoplasm in stress and exerts a global inhibition of translation. Given the numerous papers that have been published reporting functions of eIF2A in specific and general translational control, this study is important in providing abundant, high-quality data to the contrary, at least in these cultured cells.
  
  Strengths:
  
  The paper employed two CRISPR knock-out cell lines and subjected them to a combination of high-quality ribosome profiling experiments, interrogating both main coding sequences and uORFs throughout the translatome, which was complemented by extensive reporter analysis, and cell imaging in cells both unstressed and subjected to conditions of eIF2 phosphorylation, all in an effort to test previous conclusions about eIF2A functioning as an alternative to eIF2.
  
  Weaknesses:
  
  There is some question about whether their induction of eIF2 phosphorylation using tunicamycin was extensive enough to state forcefully that eIF2A has little or no role in the translatome when eIF2 function is strongly impaired. Also, similar conclusions regarding the minimal role of eIF2A were reached previously for a different human cell line from a study that also enlisted ribosome profiling under conditions of extensive eIF2 phosphorylation; although that study lacked the extensive use of reporters to confirm or refute the identification by ribosome profiling of a small group of mRNAs regulated by eIF2A during stress.
  
  We thank the reviewer for the positive evaluation. We will revise the manuscript according to the recommendations made in the detailed recommendations. Regarding the two points mentioned here:
  
  (1) The reason eIF2alpha phosphorylation does not increase appreciably is because unfortunately the antibody is very poor. The fact that the Integrated Stress Response (ISR) is induced by our treatment can be seen, for instance, by the fact that ATF4 protein levels increase strongly (in the very same samples where eIF2alpha phosphorylation does not increase much, in Suppl. Fig. 5E). We will strengthen the conclusion that the ISR is indeed activated with additional experiments/data as suggested by the reviewer.
  
  (2) We agree that our results are in line with results from the previous study mentioned by the reviewer, so we will revise the manuscript to mention this other study more extensively in the discussion.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) I suggest to state (already in the abstract, but perhaps also even in the title, definitely in the rest of the paper) that this analysis is limited to the HeLa cell line.
  
  As suggested, we have now specified in both the title and the abstract that the work is done in HeLa cells.
  
  (2) In my view, it is a pity that the authors - given the tools are available - did not check the impact of high eIF2A levels on expression of individual mRNAs under normal and stress conditions. I am not suggesting to repeat ribo-seq in this setup, it would be too much to ask for, but re-examining some of the many reporters the authors generated with eIF2A overexpressed may point to some function, e.g. increased number of non-canonical initiation events (non-AUG-initiated)? If anything, the use of HeLa and the primary focus on eIF2A KO neglecting the prospective impact of eIF2A overexpression should be mentioned as two main limitations of this study.
  
  We thank the reviewer for the good suggestion to test our synthetic reporters with eIF2A overexpression. New Suppl. Fig. 4G now shows that overexpression of eIF2A does not affect translation of synthetic reporters carrying an ATG start codon in different initiation contexts, or carrying near-cognate start codons, in agreement with a lack of effect on translation which we previously observed with loss of eIF2A.
  
  (3) Ribo-seq with eIF2A. Did the authors focus on ORFs that are known, or whose isoforms are known, to be non-AUG initiated? Would the loss of eIF2A decrease FPs in their CDSes under at least some conditions?
  
  We have now assessed the read distribution on the eIF4G2 transcript in both the control and tunicamycin conditions ( Author response image 1). In our hands, eIF4G2 is one of the best examples of non-AUG initiation in human cells, since the main coding sequence starts with GTG and the CDS is well translated. Nonetheless, we do not observe any significant changes in read distribution (panels A-B) or overall translation efficiency of eIF4G2 upon eIF2A loss (panels C-D).
  
  Author response image 1.
  
  (A-B) Average reads occupancy on the eIF4G2 (ENST0000339995) transcript in DMSO treated (panel A, n=3) or tunicamycin treated samples (panel B, n=2) derived from either control (black) or eIF2A-KO (red) HeLa cells. Reads counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C-D) The total number of reads mapping to the eIF4G2 CDS, normalized to library sequencing depth per replica was quantified. No significant difference between control and eIF2A-KO cells was observed in either DMSO treated (panel C) or tunicamycin treated (panel D) cells. Significance by unpaired, two-sided, t-test. ns = not significant.
  
  Thank you for giving me the opportunity to review this article.
  
  Reviewer #2 (Recommendations for the authors):
  
  While some of our suggestions below may be considered subtle, in our opinion they are important and it would be good if the authors consider them for their revision, we also have a couple of technical suggestions.
  
  (1) Abstract.
  
  The authors failed to identify the role of eIF2A in translation initiation and have provided compelling evidence that eIF2A is not involved in recognition of non-AUG codons as start codons nor in recruitment of initiator tRNA during stress conditions which are two activities most commonly misattributed to eIF2A. However, they have not exhausted all possible potential functions of eIF2A, see below, it is also possible that eIF2A may have a role not yet suggested by anyone and it may function in translation initiation in special circumstances that have not been tested yet. The authors indeed discuss such possibility in the Discussion section. Given that there is genetic evidence (that is unaffected by biochemical impurities) linking eIF2A to other initiation factors (5B and 4E), we are not yet convinced that eIF2A does not have any role in translation initiation and therefore we find the last sentence of the abstract premature. We suggest to soften this statement into something like this: whether eIF2A has any role in translation remains unknown, it may even have a role in a different aspect of RNA Biology.
  
  We agree with the reviewer. We changed the last sentence of the abstract to read as follows:
  
  “It is possible that eIF2A plays a role in translation regulation in specific conditions that we have not tested here, or that it plays a role in a different aspect of RNA biology.”
  
  (2) Recently eIF2A has been implicated in ribosomal frameshifting, see Wei et al 2023 DOI: 10.1016/j.celrep.2023.112987
  
  Could authors look into PEG10 mRNA ribosome profile to see if there are detectable statistically significant changes in footprint density downstream of frameshift site between WT and eIF2A Kos? It is likely that the coverage will be insufficient to give a definitive answer, but it is worth checking, it would be a pity to miss it.
  
  We thank the reviewer for this suggestion. We have now looked at the distribution of ribosome footprints on the PEG10 transcript variant that is expressed in HeLa cells (ENST00000482108) and indeed observe coverage downstream of the annotated stop codon, consistent with a frameshifting event that results in an extended protein isoform being translated. Visual assessment of the read distribution between the main ORF and the "ORF extension" does not show a substantial difference between control and eIF2A knock-out cells ( Author response image 2A-B). Additionally, we quantified the ratio of reads mapping to the PEG10 ORF upstream of the slippery site versus those mapping downstream, extending into the predicted longer protein. Nonetheless, we could not detect significant changes between control and eIF2A-KO cells in either tested condition ( Author response image 2C-D).
  
  Author response image 2.
  
  (A-B) Average reads occupancy on the PEG10 (ENST00000482108) transcript in DMSO treated (panel A, n=3) or tunicamycin treated samples (panel B, n=2) derived from either control (black) or eIF2A-KO (red) HeLa cells are shown. Reads counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C-D) The ratio of reads mapping to the ORF upstream of the slippery site to reads mapping to the predicted extended protein downstream to the slippery site is shown. Reads counts were normalized to the sequencing depth. Neither DMSO treated samples (panel C) nor tunicamycin treated samples (panel D) had a significant difference between control and eIF2A-KO cells. Significance by unpaired, two-sided, t-test. ns = not significant.
  
  (3) Introduction
  
  Given the volume of unreliable claims regarding eIF2A in the literature and the overall confusion it is very difficult (may even be impossible) to write a clear coherent introduction into the topic. Nonetheless, there are few points that need to be taken into account.
  
  The authors state that eIF2A is capable to recruit initiator tRNA citing Zoll et al 2002. This activity was later shown to be a biochemical artefact (which was most likely reproduced by Kim et al 2018), eIF2A fraction was contaminated with eIF2D which does bind tRNAs in GTP-independent manner. eIF2A purified from RRL separates from initiator tRNA binding activity, see Dmitriev et al 2010 DOI: 10.1074/jbc.M110.119693. This point is also relevant to the second paragraph of Discussion, it should be acknowledged that it has been shown previously that eIF2A does not bind the initiator tRNA.
  
  We appreciate the advice provided by the reviewer. We have modified both the introduction and the 2nd paragraph of the discussion to reflect that the tRNA-binding activity is due to contaminating eIF2D rather than eIF2A.
  
  In many cases the authors describe certain claims as facts even though they refute them themselves. For example
  
  "Such eIF2A-driven non-AUG initiation events were shown to play a crucial role in different aspects of cell physiology and disease progression: cellular adaptation during the integrated stress response (Chen et al., 2019; Starck et al., 2016)" While non-AUG initiation events do play crucial roles in different aspects of cell physiology (reviewed in Andreev et al 2023 doi: 10.1186/s13059-022-02674-2) eIF2A has nothing to do with it as the authors show themselves. Therefore different language should be used, e.g.. "eIF2A has been suggested (or proposed or reported) to be responsible for non-AUG initiation events that were shown to play ..."
  
  The word "shown" is used in many other instances for the claims that the authors refute. "Shown" is only appropriate for strong evidence that leaves little doubt.
  
  We agree with the reviewer and made the suggested changes in the text.
  
  (4) Supplementary Fig. 1.
  
  Panel C is used to argue that eIF2A has a higher concentration than in the nucleus, perhaps it is worth explaining how this conclusion was drawn. If levels in cytoplasm are comparable to GAPDH and Tubulin but less than c-Myc in nucleus does it really mean that there is less eIF2A in the nucleus than in cytoplasm? This is not obvious to us. Also, presumably WCL stands for Whole Cell Lysate, it would be nice to introduce this abbreviation somewhere.
  
  To compare levels of eIF2A in the nuclear and cytosolic fractions, we lysed the two fractions in equal volumes of buffer (i.e. the cytosolic fraction was extracted in 200 µl of hypotonic buffer, and the nuclear fraction was extracted in 200 µl of cell extraction buffer). This assures that per microliter of lysate we have the same number of "cytosols" or nuclei. Hence, equal intensity bands in the cytosolic and nuclear fractions would mean that half of the protein is in the nucleus and half is in the cytosol. We originally described this in the Methods section, but now also mention it in the Results and in the figure legend.
  
  We replaced WCL with "whole cell" in the figure.
  
  (5) The differential translation analysis is described very briefly "To obtain values of translation efficiency, log2 fold changes, and adjusted p values the DESeq2 software package was used". Was TE calculated based on ribosome footprint to RNA-seq ratios? How exactly DESeq2 was used here? TE measured in this way spuriously correlates with RNA-seq values, see Larsson et al 2010 DOI: 10.1073/pnas.1006821107, perhaps it would be worse assessing differential translation with anota2seq (Oertlin et al 2019 doi: 10.1093/nar/gkz223.)? Anota2seq avoids calculating the ratios and enables comprehensive analysis of differential translation including detection of buffered translation which might be the case here while avoiding artefacts that may arise from varying RNA levels.
  
  We now specified in more detail in the Methods section how we analyzed the data. Indeed, the DeSeq2 was used on translation efficiency values, which we calculated as the ratio of ribosome footprints to RNA-seq.
  
  As suggested, we have now also performed the analysis using anota2seq (Suppl. Fig. 3C) and this analysis identified zero transcripts that are translationally regulated, in agreement with our analysis.
  
  (6) Section "eIF2a-inactivating stresses do not redirect tRNA delivery function to eIF2A."
  
  The description of ISR mechanism is a bit inaccurate. Strictly speaking eIF2alpha phosphorylation does not inactivate it eIF2alpha. It results in formation of a very stable eIF2*GDP*eIF2B complex, thus severely depleting eIF2B which serves as a GEF for eIF2. This in turn reduces the ternary complex (eIF2*GTP*tRNAi) concentration since there is no free eIF2B to exchange GDP for GTP. Without getting into much detail, we think it would be more accurate to say that eIF2alpha phosphorylation leads to ternary complex depletion instead of saying that stress inactivates eIF2alpha.
  
  We agree with the reviewer - we were trying to use simple, compact wording. We have now reworded the section title to "No detectable role for eIF2A in translation when eIF2 is inhibited" and rephrased the subsequent text to be correct.
  
  Also the subtitle uses eIF2a with small a that stands for alpha which potentially could lead to substantial confusion since in this case the difference between eIF2alpha and eIF2A is only in capitalisation of the last letter, many text-mining engines such as modern LLMs may not be able to pick the differences. Perhaps it would be better to refer to eIF2alpha by the HGNC approved name of its gene - eIF2S1 to avoid further confusions. For clarity it may be stated at the beginning that eIF2S1 is commonly known as eIF2alpha.
  
  We thank the reviewer for this point. We have removed all instances of eIF2a (with lowercase a) from the manuscript to avoid this source of confusion. In the first instance of eIF2a we also added the official HGNC gene name. However, we prefer to use eIF2a instead of eIF2S1 because people outside the translation field tend to know the subunit as eIF2a, and we think it is important that also people outside the translation field read this manuscript, since some of the questionable papers on eIF2A come from labs working at the interface between translation and other fields.
  
  Minor
  
  Introduction
  
  (7) "uses the CAT anticodon" change CAT to CAU
  
  We corrected CAT to CAU
  
  (8) "In the canonical initiation pathway", change "canonical" to "most common", canonical is somewhat a judgemental statement that originates in theology. Same applies to numerous occurrences of "canonical AUG", simply using "AUG" would be simpler and more accurate as you will avoid giving impression that there are "non-canonical AUGs".
  
  Done.
  
  (9) "eIF2A was initially considered to be a functional analogue of prokaryotic IF2 (Merrick and Anderson, 1975), however later this role was reassigned to the above-mentioned heterotrimeric factor eIF2 (a,b,g) (Levin et al., 1973)." - there is a chronological contradiction within this sentence, the initial consideration is attributed to 1975 while its later reassignment to 1973.
  
  We are grateful to the reviewer for spotting this mistake. There was a citation problem; we fixed it and now cite the correct paper for the initial discovery of eIF2A to PMID 5472357 (Shafritz et al 1970).
  
  (10) "On the other hand, studies on the role of eIF2A on viral IRES translation have arrived at conflicting results." Remove "On the other hand" since conflicting results have been mentioned above. In fact the entire sentence is somewhat redundant given prior "For example, eIF2A has been studied in the context of internal ribosome entry sites (IRES), where it was found to act both as a suppressor and an activator of IRESmediated initiation."
  
  We have rewritten the paragraph to make it more coherent.
  
  (11) Fig. 1. C-D. is using CHX abbreviation for cycloheximide, this need to be mentioned on the legend or elsewhere in the text. Otherwise CHX may not be clear for a reader uninitiated in ribosome profiling.
  
  We now mention in the figure legend that CHX stands for cycloheximide and indicate that it was used as a negative control to block translation.
  
  (12) Page 7, section "Ribosome profiling reveals a few eIF2Adependent transcripts"
  
  In this section you describe ribosome profiling experiments and identify few transcripts whose translation seems to be changing based on ribosome profiling data. Then you attempt to verify them using gene expression reporters and reasonably suggest that these are false positives. In essence this section argues that there are no eIF2A-dependent transcripts, therefore the title of this subsection is misleading, it makes sense to rename it so that it better reflects the content of this section.
  
  We agree and have renamed the section to "Ribosome profiling identifies no eIF2Adependent transcripts"
  
  (13) Page 8, top. Rephrase "To do this, we performed ribosome profiling on control and eIF2AKO cells, which sequences the mRNA footprints protected by ribosomes."
  
  Fixed.
  
  (14) Page 10, bottom. "Several studies have reported that eIF2A can delivery alternative initiator tRNAs to uORFs with nearcognate start codons". Change "delivery" to "deliver".
  
  Thanks for spotting it. We corrected to “deliver”
  
  (15) Page 13 "This suggests that, as in non-stressed conditions, eIF2A has a minimal effect on global translation also when eIF2a activity is low." - rephrase to avoid impression that eIF2alpha activity is low in normal conditions, also please see comment #6 above.
  
  We fixed this sentence to read: “This suggests that, as in non-stressed conditions, eIF2A has a minimal effect on global translation also when the integrated stress response is active.”
  
  Reviewer #3 (Recommendations for the authors):
  
  - The experimental data in Fig. S5E do not support the claim of increased eIF2 phosphorylation on TM treatment; although, comparing Fig. S5A with Fig. 1B supports a marked reduction in bulk translation and the reporter data in Fig. 4A show the expected induction of the uORF-containing reporters by TM. Because these are the conditions employed for ribosome profiling in stress conditions shown in Fig. 4B, it would be reassuring to document TM-induced translational efficiencies of ATF4 and the other known mRNAs resistant to eIF2 phosphorylation in the ribosome profiling data, including gene browser images of the replicate experiments. If the induction of TEs by TM for such mRNAs was not robust, it would be valuable to repeat the analysis using arsenite (SA) treatment, which produces a greater inhibition of bulk translation.
  
  Unfortunately, the eIF2alpha antibody is not very good and also detects the nonphosphorylated protein, causing high background and poor apparent induction in response to tunicamycin. The fact that the ISR was activated is visible from the induction of ATF that was assessed by western blot in the Suppl. Fig. 5E. To ensure that our ribosome profiling libraries also recorded the activation of ISR we built single gene plots for ATF4 both in control and HeLa eIF2A-KO cell. As shown in Author response image 3 A&B in both cell lines tunicamycin treatment led to the induction of ATF4. This can also be seen by the 4-fold induction in ATF4 translation efficiency in response to tunicamycin in both WT and eIF2A-KO cells ( Author response image 3C). Additionally, we checked that another marker induced by tunicamycin, HSPA5, is also translationally upregulated in both cell lines, as well as the downstream target of ATF4 – PPP1R15B. ( Author response image 3C).
  
  Author response image 3.
  
  (A-B) Average read occupancy on the ATF4 (ENST00000674920) transcript in DMSO treated (n=3) or tunicamycin treated samples (n=2) derived from either control (panel A) or eIF2A-KO (panel B) HeLa cells are shown. Read counts were normalized to sequencing depth and averaged between either 3 (DMSO-treated) or 2 (tunicamycin-treated) replicates. Graphs were then smoothened with a sliding window of 3 nt. (C) Scatter plot of log2(fold change) of Translation Efficiency TM/DMSO for control cells on the xaxis versus eIF2AKO cells on the y-axis. The induction of ATF4 as well as the downstream target PPP1R15B are shown. The upregulation of HSP5A translation, the other hallmark of ER-stress induced by tunicamycin treatment is shown.
  
  - It should be pointed out in the text that in both published studies being cited here of cells lacking eIF2A, that by Gaikwad et al. on a yeast eIF2A deletion mutant, and that by Ichihara et al. on human HEK293 CRISPR KO cells, the analyses included stress conditions in which eIF2 phosphorylation is induced (amino acid starvation or SA treatment, respectively), as was conducted here.
  
  Good point - we added this information into the introduction:
  
  "Furthermore, loss of eIF2A in several systems did not recapitulate these effects on non-AUG initiation in either non-stressed or stress conditions (caused either by amino acid depletion or sodium arsenate treatment) (Gaikwad et al., 2024; Ichihara et al., 2021)."
  
  - The Ichihara et al. (2021) study just mentioned reached some of the same conclusions for HEK cells obtained here by conducting ribosome profiling in untreated and SA-treated cells, finding only 1 mRNA (untreated) or four mRNAs (SA-treated cells) that showed significantly reduced TEs in the eIF2A knockout vs. parental cells. It seems appropriate for the authors to expand their treatment of this prior work by summarizing its findings in some detail and also noting how their study goes beyond this previous one.
  
  We have added a paragraph to the discussion pointing out that our data agree fully with Ichihara et al. (2021), and that Ichihara et al. (2021) also found only very few mRNAs that change in TE upon loss of eIF2A in either non-stressed or stressed conditions.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.20.624465v2
www.biorxiv.org www.biorxiv.org

Modulation of α-Synuclein Aggregation Amid Diverse Environmental Perturbation

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1
  
  Summary:
  
  In this paper, the authors performed molecular dynamics (MD) simulations to investigate the molecular basis of the association of alpha-synuclein chains under molecular crowding and salt conditions. Aggregation of alpha-synuclein is linked to the pathogenesis of Parkinson's disease, and the liquid-liquid phase separation (LLPS) is considered to play an important role in the nucleation step of the alpha-synuclein aggregation. This paper re-tuned the Martini3 coarse-grained force field parameters, which allows long-timescale MD simulations of intrinsically disordered proteins with explicit solvent under diverse environmental perturbation. Their MD simulations showed that alpha-synuclein does not have a high LLPS-forming propensity, but the molecular crowding and salt addition tend to enhance the tendency of droplet formation and therefore modulate the alpha-synuclein aggregation. The MD simulation results also revealed important intra- and inter-molecule conformational features of the alpha-synuclein chains in the formed droplets and the key interactions responsible for the stability of the droplets. These MD simulation data add biophysical insights into the molecular mechanism underlying the association of alpha-synuclein chains, which is important for understanding the pathogenesis of Parkinson's disease.
  
  Strengths:
  
  (1) The re-parameterized Martini 3 coarse-grained force field enables the large-scale MD simulations of the intrinsically disordered proteins with explicit solvent, which will be useful for a more realistic description of the molecular basis of LLPS.
  
  (2) This paper showed that molecular crowding and salt contribute to the modulation of the LLPS through different means. The molecular crowding minimally affects surface tension, but adding salt increases surface tension. It is also interesting to show that the aggregation pathway involves the disruption of the intra-chain interactions arising from C-terminal regions, which potentially facilitates the formation of inter-chain interactions.
  
  We thank the reviewer for pointing out the strengths of our study.
  
  Weaknesses:
  
  (1) Although the authors emphasized the advantage of the Martini3 force field for its explicit description of solvent, the whole paper did not discuss the water's role in the aggregation and LLPS.
  
  We thank the reviewer for pointing this out. We agree that we have not explored or discussed the role of water in aS aggregation or LLPS. We would like to convey that we would like to explore that in detail in a separate study altogether. However we have updated the “Discussion” section with the following lines to convey to the readers the importance water plays in aggregation and LLPS of aS.
  
  Page 24: “The significance of the solvent in alpha-synuclein (αS) aggregation remains underexplored. Recent studies [26, 55] underscore the pivotal role of water as a solvent in LLPS. It suggests that comprehending the solvent’s role, particularly water, is essential for attaining a deeper grasp of the thermodynamic and physical aspects of αS LLPS and aggregation. By delving into the solvent’s contribution, researchers can uncover additional factors influencing αS aggregation. Such insights hold the potential to advance our comprehension of protein aggregation phenomena, crucial for devising strategies to address diseases linked to protein misfolding and aggregation, notably Parkinson’s disease. Future investigations focusing on elucidating the interplay between αS, solvent (especially water), and other environmental elements could yield valuable insights into the mechanisms underlying LLPS and aggregation. Ultimately, this could aid in the development of therapeutic interventions or preventive measures for Parkinson’s and related diseases.”
  
  (2) This paper discussed the effects of crowders and salt on the surface tension of the droplets.
  
  The calculation of the surface tension relies on the droplet shape. However, for the formed clusters in the MD simulations, the typical size is <10, which may be too small to rigorously define the droplet shape. As shown in previous work cited by this paper [Benayad et al., J. Chem. Theory Comput. 2021, 17, 525−537], the calculated surface tension becomes stable when the chain number is larger than 100.
  
  We appreciate the insightful feedback from the reviewer. However, we would like to emphasize that the αS droplets exhibit a highly liquid-like behavior, characterized by frequent exchanges of chains between the dense and dilute phases, alongside a slow aggregation process. In the study by Benayad et al. (2020, JCTC) [ref. 30], FUS-LCD was the protein of choice at concentrations in the (mM) range. FUS-LCD is known to undergo very rapid LLPS at concentrations lower than 100 (μM) where for αS the critical concentration for LLPS is 500 (μM) and undergoes slower aggregation than FUS. Moreover, the diffusion constant of αS inside newly formed droplets (no liquid to solid phase transition has occurred) has been estimated to be 0.23-0.58 μm2/s (Ray et al, 2020, Nat. Comm.). The value of diffusion constant for FUS-LCD inside LLPS droplets has been estimated to be 0.17 μm2/s (Murthy et al. 2023, Nat. Struct. and Mol. Biol.). These prove that αS forms droplets that are less viscous than that formed by FUS-LCD. This dynamic nature impedes the formation of large droplets in the simulations, making it challenging to rigorously calculate surface tension from interfacial width, which, in turn, necessitates the computation of g(r) between water and the droplet.
  
  Furthermore, it's essential to note that our primary aim in calculating surface tension was not to determine its absolute value. Rather, we aimed to compare surface tensions obtained for the three distinct environments explored in this study. Hence, our primary objective is to compare the distributions of surface tensions rather than focusing solely on the mean values obtained. The distributions shown in Figure 4a clearly show a trend which we have stated in the article.
  
  (3) In this work, the Martini 3 force field was modified by rescaling the LJ parameters \epsilon and \sigma with a common factor \lambda. It has not been very clearly described in the manuscript why these two different parameters can be rescaled by a common factor and why it is necessary to separately tune these two parameters, instead of just tuning the coefficient \epsilon as did in a previous work [Larsen et al., PLoS Comput Biol 16: e1007870].
  
  We thank the reviewer for the comment. We think that the distance of the first hydration layer also should have an impact on aggregation/LLPS. Here we are scaling both the epsilon and sigma. A higher epsilon of water-protein interactions mean higher the energy required for removal of water molecules (dehydration) when a chain goes from the dilute to the dense phase. A higher sigma on the other hand means that the hydration shell will also be at a larger distance making dehydration easier. Moreover, tuning both (either by same or different parameter) required a change of the overall protein-water interaction by only 1%, thereby requiring only considerably minimal change in forcefield parameters (compared to the case where only epsilon is being tuned which required 6-10% change in epsilon from its original values.) . Thus we think one of the ways of tuning water-protein interactions which requires minimal retuning of Martini 3 is by optimizing both epsilon and sigma. However whether a single scaling parameter is good enough requires further exploration and is outside the scope of the current study. More importantly it would introduce another free parameter into the system and the lesser the number of free parameters, the better. For this study, a single parameter sufficed as depicted in Figure 9. To inform the readers of why we chose to scale both sigma and epsilon, we have added the following in the main text:
  
  Page 25-26: “Increasing the ϵ value of water-protein interactions results in a higher energy demand for removing water molecules (dehydration) as a chain transitions from the dilute to the dense phase. Conversely, a higher σ value implies that the hydration shell will be at a greater distance, facilitating dehydration if a chain moves into the dilute phase. Therefore, adjusting water-protein interactions based on the protein’s single-chain behavior may not significantly influence the protein’s phase behavior. Furthermore, fine-tuning both ϵ and σ parameters only requires a minimal change in the overall protein-water interaction (1%). As a result, this adjustment minimally alters the force field parameters.”
  
  (4) Both the sizes and volume fractions of the crowders can affect the protein association. It will be interesting to perform MD simulations by adding crowders with various sizes and volume fractions. In addition, in this work, the crowders were modelled by fullerenes, which contribute to protein aggregation mainly by entropic means as discussed in the manuscript. It is not very clear how the crowder effect is sensitive to the chemical nature of the crowders (e.g., inert crowders with excluded volume effect or crowders with non-specific attractive interactions with proteins, etc) and therefore the force field parameters.
  
  We thank the reviewer for a potential future direction. In this investigation our main focus was to simulate the inertness features of crowders only, to ensure that only entropic effect of the crowders are explored. Although this study focuses on the factors that enable aS to form an aggregates/LLPS under different environmental conditions, it would be interesting to explore in a systematic way the mechanism of action of crowders of varying shapes, sizes and interactions. Therefore we added the following lines in the “Discussion” section to let the readers know that this is also a future prospect of investigation.
  
  Page 22: “Under physiological conditions, crowding effects emerge prominently. While crowders are commonly perceived to be inert, as has been considered in this investigation, the morphology, dimensions, and chemical interactions of crowding agents with αS in both dilute and dense phases may potentially exert considerable influence on its LLPS. Hence, a comprehensive understanding through systematic exploration is another avenue that warrants extensive investigation.”
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Figure S1. The title of the figure and the description in the figure caption are inconsistent?
  
  We thank the reviewer for the comment and we have updated the article with the correct caption.
  
  (2) Page 14, line 3, the authors may want to provide more descriptions of the "ms1", "ms2", and "ms3" for better understanding.
  
  We are grateful to the reviewer for pointing this out. We have added a line describing in brief what “ms1”, “ms2” and “ms3” represent. It reads “Subsequent to the investigation, we utilize three representative conformations, each corresponding to one of the macrostates. We designate these macrostates as 1 (ms1), 2 (ms2), and 3 (ms3) (Figure S7)” (Page 28)
  
  (3) Page 20, the authors may want to briefly explain how the normalized Shannon entropy was calculated.
  
  We thank the reviewer for pointing this out. This is plain Shannon Entropy and the word “normalized” should not have been there. To avoid confusion we have provided the equation we have used to calculate the Shannon entropy (Eq 8) (Page 21).
  
  Reviewer #2 (Public Review):
  
  In the manuscript "Modulation of α-Synuclein Aggregation Amid Diverse Environmental Perturbation", Wasim et al describe coarse-grained molecular dynamics (cgMD) simulations of α-Synuclein (αS) at several concentrations and in the presence of molecular crowding agents or high salt. They begin by bench-marking their cgMD against all-atom simulations by Shaw. They then carry 2.4-4.3 µs cgMD simulations under the above-noted conditions and analyze the data in terms of protein structure, interaction network analysis, and extrapolated fluid mechanics properties. This is an interesting study because a molecular scale understanding of protein droplets is currently lacking, but I have a number of concerns about how it is currently executed and presented.
  
  We thank the reviewer for finding our study interesting.
  
  (1) It is not clear whether the simulations have reached a steady state. If they have not, it invalidates many of their analysis methods and conclusions.
  
  We have used the last 1 μs (1.5-2.5 1 μs) from each simulation for further analysis in this study. To understand whether the simulations have reached steady state or not, we plot the time profile of the concentration of the protein in the dilute phase for all three cases.
  
  Author response image 1.
  
  Except for the scenario of only αS (Figures a and b), the rest show very steady concentrations across various sections of the trajectory (Figures c-f). The larger sudden fluctuations observed inFigures a and b are due to the fact that only αS undergo very slow spontaneous aggregation and owing to the fact that the dense phase itself is very fluxional, addition/removal of a few chains to/from the dense to dilute phase register themselves as large fluctuations in the protein concentration in the dilute phase. For the other two scenarios (Figures c-f) aggregation has been accelerated due to the presence of crowders/salt. This causes larger aggregates to be formed. Therefore addition/removal of one or two chains does not significantly affect the concentration and we do not see such sudden large jumps. In summary, the large jumps seen in Figures a and b are due to slow, fluxional aggregation of pure αS and finite size effects. However as these still are only fluctuations, we posit that the systems have reached steady states. This claim is further supported by the following figure where the time profile of a few useful system wide macroscopic properties show no change between 1.5-2.5 µs.
  
  We also have added a brief discussion in the Methods section (Page 29-30) with these figures in the Supplementary Information.
  
  Author response image 2.
  
  “In this study, we utilized the final 1 µs from each simulation for further analysis. To ascertain whether the simulations have achieved a steady state, we plotted the time profile of protein concentration in the dilute phase for all three cases. Except for minor intermittent fluctuation involving only αS in neat water (Figures S8a and S8b), the remaining cases exhibit notably stable concentrations throughout various segments of the trajectory (Figures S8 c-f). The relatively higher fluctuations observed in Figures S8a and b stem from the slow, spontaneous aggregation of αS alone, compounded by the inherently ambiguous nature of the dense phase.
  
  Consequently, the addition or removal of a few chains from the dense to the dilute phase results in significant fluctuations in protein concentration within the dilute phase. Conversely, in the other two scenarios (Figures S8c-f), aggregation is expedited by the presence of crowders/salt, leading to the formation of larger aggregates. Consequently, the addition or removal of one or two chains has negligible impact on concentration, thereby mitigating sudden large jumps. In summary, the conspicuous jumps depicted in Figures S8a and b arise from the gradual, fluctuating aggregation of pure αS and finite size effects. However, since these remain within the realm of fluctuations, we assert that the systems have indeed reached steady states. This assertion is bolstered by the subsequent figure, where the time profile of several pertinent system-wide macroscopic properties reveals no discernible change between 1.5-2.5 µs (Figures S9).”
  
  (2) The benchmarking used to validate their cgMD methods is very minimal and fails to utilize a large amount of available all-atom simulation and experimental data.
  
  We disagree with the reviewer on this point. We have cited multiple previous studies [26, 27] that have chosen Rg as a metric of choice for benchmarking coarse-grained model and have used a reference (experimental or otherwise) to tune Martini force fields. Majority of the notable literature where Rg was used as a benchmark during generation of new coarse-grained force fields are works by Dignon et al. (PLoS Comp. Biol.) [ref. 25], Regy et al (Protein Science. 2021) [ref. 26], Joseph et al.(Nature Computational Science. 2021) [ref. 27] and Tesei et al (Open Research Europe, 2022) [ref. 28]. From a polymer physics perspective, tuning water-protein interactions is simply changing the solvent characteristics for the biopolymer and Rg has been generally considered a suitable metric in the case of coarse-grained model. Moreover we try to match the distribution of the Rg rather than only the mean value. This suggests that at a single molecule level, the cgMD simulations at the optimum water of water-protein interactions would allow the protein to sample the conformations present in the reference ensemble. We use the extensively sampled 70 μs all-atom data from DE Shaw Research to obtain the reference Rg distribution. Also we perform a cross validation by comparing the fraction of bound states in all-atom and cgMD dimer simulations which also seem to corroborate well with each other at optimum water-protein interactions. To let the readers understand the rationale behind choosing Rg we have added a section in the Methods section (Page 25) that explains why Rg is plausibly a good metric for tuning water-protein interactions in Martini 3, at least when dealing with IDPs.
  
  Our optimized model is further supported by the FRET experiments by Ray et al. [6]. They found that interchain NAC-NAC interactions drive LLPS. Residue level contact maps obtained from our simulations also show decreased intrachain NAC-NAC interactions with an increased interchain NAC-NAC interactions inside the droplet. This corroborates well with the experimental observations and furthermore validates the metrics we have used for optimization of the water-protein interactions. However the comparison with the FRET data by Ray et al. was not present earlier and we have added the following lines in the updated draft.
  
  Page17: “Thus we observed that increased inter-chain NAC-NAC regions facilitate the formation of αS droplets which also have previously been seen from FRET experiments on αS LLPS
  
  droplets[6].”
  
  (3) They also miss opportunities to compare their simulations to experimental data on aSyn protein droplets.
  
  We thank the reviewer for pointing this out. We have tried to compare the results from our simulations to existing experimental FRET data on αS. Please see the previous response where we have described our comparison with FRET observations.
  
  (4) Aspects such as network analysis are not contextualized by comparison to other protein condensed phases.
  
  For a proper comparison between other protein condensed phases, we would require the position phase space of such condensates which is not readily available. Therefore we tried to explain it in a simpler manner to paint a picture of how αS forms an interconnecting network inside the droplet phase.
  
  (5) Data are not made available, which is an emerging standard in the field.
  
  We thank the reviewer for mentioning this. We have provided the trajectories between 1.5-2.5 μs, which we used for the analysis presented in the article, via a zenodo repository along with other relevant files related to the simulations (https://zenodo.org/records/10926368).
  
  Firstly, it is not clear that these systems are equilibrated or at a steady state (since protein droplets are not really equilibrium systems). The authors do not present any data showing time courses that indicate the system to be reaching a steady state. This is problematic for several of their data analysis procedures, but particularly in determining free energy of transfer between the condensed and dilute phases based on partitioning.
  
  We have addressed this concern as stated previously in the response. We have updated the article accordingly.
  
  Secondly, the benchmarking that they perform against the 73 µs all-atom simulation of aSyn monomer by Shaw and coworkers provides only very crude validation of their cgMD models based on reproducing Rg for the monomer. The authors should make more extensive comparisons to the specific conformations observed in the DE Shaw work. Shaw makes the entire trajectory publicly available. There are also a wealth of experimental data that could be used for validation with more molecular detail. See for example, NMR and FRET data used to benchmark Monte Carlo simulations of aSyn monomer (as well as extensive comparisons to the Shaw MD trajectory) in Ferrie at al: A Unified De Novo Approach for Predicting the Structures of Ordered and Disordered Proteins, J. Phys. Chem. B 124 5538-5548 (2020)
  
  DOI:10.1021/acs.jpcb.0c02924
  
  I note that NMR measurements of aSyn in liquid droplets are available from Vendruscolo: Observation of an α-synuclein liquid droplet state and its maturation into Lewy body-like assemblies, Journal of Molecular Cell Biology, Volume 13, Issue 4, April 2021, Pages 282-294, https://doi.org/10.1093/jmcb/mjaa075.
  
  In addition, there are FRET studies by Maji: Spectrally Resolved FRET Microscopy of α-Synuclein Phase-Separated Liquid Droplets, Methods Mol Biol 2023:2551:425-447. doi: 10.1007/978-1-0716-2597-2_27.
  
  So the authors are missing opportunities to better validate the simulations and place their structural understanding in greater context. This is just based on my own quick search, so I am sure that additional and possibly better experimental comparisons can be found.
  
  We have performed a comparison with existing FRET measurements by Ray et al. (2020) as discussed in a previous response and also updated the same in the article. The doi (10.1007/978-1-0716-2597-2_27) provided by the reviewer is however for a book on Methods to characterize protein aggregates and does not contain any information regarding the observations from FRET experiments. The other doi (https://doi.org/10.1093/jmcb/mjaa075) for the article from Vendrusculo group does not contain information directly relevant to this study. Moreover NMR measurements cannot be predicted from cgMD since full atomic resolution is lost upon coarse-graining of the protein . A past literature survey by the authors found very little scientific literature on molecular level characterization of αS LLPS droplets.
  
  Thirdly, the small word network analysis is interesting, but hard to contextualize. For instance, the 8 Å cutoff used seems arbitrary. How does changing the cutoff affect the value of S determined? Also, how does the value of S compare to other condensed phases like crystal packing or amyloid forms of aSyn?
  
  The 8 Å cutoff is actually arbitrary since a distance based clustering always requires a cutoff which is empirically decided. However 8 Å is quite large compared to other cutoffs used for distance based clustering. For example in ref 26, 5 Å was used as a cutoff for calculation of protein clusters. Larger cutoffs will lead to sparser network structures. However we used the same cutoff for all distance based clustering which makes the networks obtained comparable. We wanted to perform a comparison among the networks formed by αS under different environmental conditions.
  
  Fourthly, I see no statement on data availability. The emerging standard in the computational field is to make all data publicly available through Github or some similar mechanism.
  
  We thank the reviewer for pointing this out and we have provided the raw data between 1.5-2.5 μs for each scenario along with other relevant files via a zenodo repository (https://zenodo.org/records/10926368).
  
  Finally, on page 16, they discuss the interactions of aSyn(95-110), but the sequence that they give is too long (seeming to contain repeated characters, but also not accurate). aSyn(95-110) = VKKDQLGKNEEGAPQE. Presumably this is just a typo, but potentially raises concerns about the simulations (since without available data, one cannot check that the sequence is accurate) and data analysis elsewhere.
  
  This indeed is a typographical error. We have updated the article with the correct sequence. The validity of the simulations can be verified from the data we have shared via the zenodo repository (https://zenodo.org/records/10926368).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.19.563053v3
www.biorxiv.org www.biorxiv.org

Future movement plans interact in sequential arm movements

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1:
  
  Mehrdad Kashefi et al. investigated the availability of planning future reaches while simultaneously controlling the execution of the current reach. Through a series of experiments employing a novel sequential arm reaching paradigm they developed, the authors made several findings: 1) participants demonstrate the capability to plan future reaches in advance, thereby accelerating the execution of the reaching sequence, 2) planning processes for future movements are not independent one another, however, it's not a single chunk neither, 3) Interaction among these planning processes optimizes the current movement for the movement that comes after for it.
  
  The question of this paper is very interesting, and the conclusions of this paper are well supported by data. However, certain aspects require further clarification and expansion.
  
  We thank reviewer one for their evaluation of the work.
  
  (1) The question of this study is whether future reach plans are available during an ongoing reach. In the abstract, the authors summarized that "participants plan at least two future reaches simultaneously with an ongoing reach and that the planning processes of the two future reaches are not independent of one another" and showed the evidence in the next sentences. However the evidence is about the relationship about ongoing reach and future plans but not about in between future plans (Line 52-55). But the last sentence (Line 55-58) mentioned about interactions between future plans only. There are some discrepancies between sentences. Could you make the abstract clear by mentioning interference between 1) ongoing movement and future plans and 2) in between future plans?
  
  We thank Reviewer for their comment. We have separated the longer sentence in the original abstract into two shorter ones. This should clarify that the two pieces of evidence pertain to the interaction of planning processes.
  
  (2) I understood the ongoing reach and future reaches are not independent from the results of first experiment (Figure 2). A target for the current reach is shown at Horizon 1, on the other hand, in Horizon 2, a current and a future target are shown on the screen. Inter-reach-interval was significantly reduced from H1 to H2 (Figure 2). The authors insist that "these results suggest that participants can plan two targets (I guess +1 and +2) ahead of the current reach (I guess +0)". But I think these results suggest that participants can plan a target (+1) ahead of the current reach (+0) because participants could see the current (+0) and a future target (+1) in H2. Could the authors please clarify this point?
  
  We thank Reviewer for raising this point. Our conclusion that “participants can plan two targets ahead of the current reach” is supported by the reduction in Inter-Response Interval (IRI) observed when comparing H2 to H3 in the 75 ms Dwell time condition. Specifically, on average, participants were 16 ms faster when they could see two future targets on the screen (H3) than when they could see only one (H2). To clarify this in the paper, we have revised the wording in line 124 to explicitly state that the conclusion pertains to the 75 ms Dwell time condition. Additionally, we emphasize that the strongest evidence for planning two future targets comes from the experiment shown in Figure 3.
  
  (3) Movement correction for jump of the +1 target takes longer time in H3 compared to H2 (Figure 4). Does this perturbation have any effect on reaching for +2 target? If the +1 jump doesn't affect reaching for +2 target, combined with the result that jump of the +2 target didn't affect the movement time of +1 target (Figure 3C), perturbation (target jump) only affects the movement directly perturbed. Is this implementation correct? If so, does these results support to decline future reaches are planned as motor chunk? I would like to know the author's thoughts about this.
  
  In the experiment presented in Figure 4, once we jumped the +1 target, the reach to that target was changed and participants replaned a corrective movement to the new location of the +1 target. This usually was followed by a longer-than-usual pause at the new location of +1 target for resuming the sequence and finishing the trial. Consequently, in these jump trials, it was impossible to compare the +2 reach to no-jump trials, as the normal sequence of movement was disrupted, and the reach to the +2 target originated from a different starting location. Nevertheless, we addressed the possibility that the two future reaches were planned as a chunk by the analysis shown in figure 5: There we showed that a displacement of the +2 target did not influence the reach to the +1 target, indicating that the movement plans could be updated independently.
  
  (4) Any discussion about Saccade position (Figure 7)?
  
  We thank reviewer 1 for this important comment. The following discussion section is added for the gaze position results.
  
  In our sequence task, participants switched their gaze location only once per reach, suggesting that information about the location of the next target is perceived parafoveally (Figure 7A). This observation aligns with previous studies (Clavagnier et al., 2007; González-Alvarez et al., 2007; Sivak and MacKenzie, 1990) that found participants keep their visual attention on the current sequence item and can perceive the location of spatial targets even when foveal vision is occluded. However, when comparing gaze locations for conditions Horizon >1, we observed that participants systematically biased their gaze location based on the sequence context. The gaze position shifted toward the next target, potentially allowing for more accurate location estimation (Figures 7C-D). Notably, changes in gaze location were observed even in Horizon 2, despite no changes in the curvature of hand movements in this horizon (Figure 6B). This suggests that information about the next target may first be available in the circuitry that controls eye movements and later in the cortical areas that control voluntary upper limb movements. Further control studies are required to investigate this hypothesis.
  
  Reviewer #2:
  
  Summary:
  
  In this work, Kashefi et al. investigate the planning of sequential reaching movements and how the additional information about future reaches affects planning and execution. This study, carried out with human subjects, extends a body of research in sequential movements to ask important questions: How many future reaches can you plan in advance? And how do those future plans interact with each other?
  
  The authors designed several experiments to address these questions, finding that information about future targets makes reaches more efficient in both timing and path curvature. Further, with some clever target jump manipulations, the authors show that plans for a distant future reach can influence plans for a near future reach, suggesting that the planning for multiple future reaches is not independent. Lastly, the authors show that information about future targets is acquired parafoveally--that is, subjects tend to fixate mainly on the target they are about to reach to, acquiring future target information by paying attention to targets outside the fixation point.
  
  The study opens up exciting questions about how this kind of multi-target planning is implemented in the brain. As the authors note in the manuscript, previous work in monkeys showed that preparatory neural activity for a future reaching movement can occur simultaneously with a current reaching movement, but that study was limited to the monkey only knowing about two future targets. It would be quite interesting to see how neural activity partitions preparatory activity for a third future target, given that this study shows that the third target's planning may interact with the second target's planning.
  
  Strengths:
  
  A major strength of this study is that the experiments and analyses are designed to answer complementary questions, which together form a relatively complete picture of how subjects act on future target information. This complete description of a complex behavior will be a boon to future work in understanding the neural control of sequential, compound movements.
  
  We thank the reviewer for their thorough reading of our work.
  
  Weaknesses:
  
  I found no real glaring weaknesses with the paper, though I do wish that there had been some more discussion of what happens to planning with longer dwell times in target. In the later parts of the manuscript, the authors mention that the co-articulation result (where reaches are curved to make future target acquisition more efficient) was less evident for longer dwell times, likely because for longer dwell times, the subject needs to fully stop in target before moving to the next one. This result made me wonder if the future plan interaction effect (tested with the target jumps) would have been affected by dwell time. As far as I can tell, the target jump portion only dealt with the shorter dwell times, but if the authors had longer dwell time data for these experiments, I would appreciate seeing the results and interpretations.
  
  We thank the reviewer for raising this point. In our time (Figure 2) and curvature analysis (Figure 6), we collected data with five levels of the horizon and three levels of dwell time to explore the space of parameters and to see if there is any interaction between dwell time and the horizon of planning the future targets. Apriori, we expected that the full stop in each target imposed by the 400 ms dwell time would be long enough to remove any effect of future targets on how the current move is executed. In line with our initial hypothesis, the systematic curvature of reaches based on the future target was smaller in longer dwell times (Figure 6E). Nevertheless, we observed a significant curvature even in 400 ms dwell time. Based on this observation, we expect running the jump experiments (Figures 4 and 5) in longer dwell times will lead to the same pattern of results but with a smaller effect size since longer dwells break the interdependence of sequence elements (Kalidindi & Crevecoeur, 2023). In the end, for the jump experiments, we limited our experimental conditions to the fastest dwell time (75 ms dwell) since we were conceptually interested in situations where movements in the sequence are maximally dependent on each other.
  
  Beyond this , the authors also mentioned in the results and discussion the idea of "neural resources" being assigned to replan movements, but it's not clear to me what this might actually mean concretely. I wonder if the authors have a toy model in mind for what this kind of resource reassignment could mean. I realize it would likely be quite speculative, but I would greatly appreciate a description or some sort of intuition if possible.
  
  Our use of the term "neural resources" is inspired by classic psychology literature on how cognitive resources such as attention and working memory are divided between multiple sequence components. Early studies on working memory suggest that human participants can retain and manipulate a fixed number of abstract items in working memory (Miller, 1956). However, more recent literature postulates that a specific number of items does not limit working memory, rather, it is limited by a finite attentional resource that is softly allocated to task items.
  
  Here we borrowed the same notion of soft distribution of resources for the preparation of multiple sequence items. A large portion of our observation in this paper and also previous work on sequence production can be explained by a simple model that assumes one central planning resource that is “softly” divided between sequence elements when participants see future items of the sequence (Author Response Image 1). The first sequence element receives the majority of the resources and is planned the most. The rest of the sequence receives the remaining planning resources in an exponentially decaying manner for preparation of the movement during the execution of the ongoing movement. Once the ongoing movement is over, the resource is then transferred to the next sequence item and this process is repeated until the sequence is over. Assignment of planning resources to future items explains why participants are faster when seeing future items (Figure 2). But this comes with a cost – if the ongoing movement is perturbed, the replanning process is delayed since some of the resources are occupied by future planning (Figure 4). This naturally leads to the question of how this resource allocation is implemented in neural tissue. To address this, we are conducting the same sequence task with the horizon in non-human primates (NHPs), and the investigation of these neural implementation questions will be the focus of future studies.
  
  Author response image 1.
  
  Basic diagram showing a soft distribution of a limited planning resource. The diagram shows a Horizon 3 condition in which two future reaches (+1 and +2) are planned while executing a movement (+0). The majority of resources is assigned to the execution of the ongoing movement while the reset is distributed for planning future movements. Once the movement is over, the chain of preparation and execution moves forward.
  
  Recommendations for the author:
  
  Reviewer #1
  
  We thank reviewer one for these comments regarding the clarity and consistency of figures and terminology.
  
  (1) Figure 3. Are "+1 Move" in Fig. 3B and "+ 1 Movement" in Fig. 3C as same as "E + 1" in Fig. 3A? Also does "Dwell" in Fig. 3B mean same as "+1 Dwell" in Fig. 3C? Consistent terminology would help readers to understand the figure.
  
  “+1 Move” in Figure 3B is the same as +1 movement in Figure 3C. “Dwell” in Figure 3B is the same as +1 Dwell in Figure 3C. We changed the figure for more consistency.
  
  (2) Figure 3. A type in the second last line in the legend, "pre-jump target for no-jump and jump and condition". The second "and" isn't necessary.
  
  The typo is corrected. Thank you.
  
  (3) Figure 4C. Is "Movement time" equivalent with "E + 1"?
  
  “Movement time” is equivalent to E+1 only in no-jump conditions. When the jump occurs,
  
  Movement time contains all the
  
  (4) Figure 6B. Is the gray circle in between the graph and target positions there by mistake?
  
  We fixed this typo. Thank you.
  
  (5) Figure 6E. It's hard to distinguish H2-H5 from the color differences.
  
  We changed the H5 to full white with a black stroke to improve the contrast. Thank you.
  
  (6) Figure 7A. Blue dots are almost invisible.
  
  We added a black stroke to blue circles for more visibility. Thank you.
  
  Reviewer #2
  
  I found this manuscript to be engaging and well written--many of the questions I had while reading were answered promptly in the next section. As such, my comments are mostly minor and primarily geared towards improving clarity in the manuscript.
  
  (1) One major recurring confusion I had while reading the manuscript was how to think about H1, H2, and H3. It was clearly explained in the text, and the explanations of the results were generally clear once I read through it all, but I found it strangely confusing at times when trying to interpret the figures for myself (e.g., in H2, 2 targets are on screen, but the second target can only be planned during the reach toward the first target). This confusion may just be me reading the manuscript over two days, but I wonder if it could be made clearer with some semantic iconography associated with each horizon added to the later figures alongside the H labels. As one option, perhaps the planning timeline part of Fig 1D could be simplified and shrunk down to make an icon for each horizon that clearly shows when planning overlaps for each horizon.
  
  (Please see the response to point #2 below)
  
  (2) Regarding Fig 1D: I like this figure, but it's unclear to me how the exact preparation and execution times are determined. Is this more of a general schematic of overlaps, or is there specific information about timing in here?
  
  We thank reviewer 2 for their important feedback. The role of Figure 1D was to summarize the timing of the experiments for different horizons. That is, to clarify the relative timing of the targets appearing on the screen (shown with a small circle above the horizontal line) and targets being captured by participants (the ticks and their associated number on the line). Execution is shown as the time interval that the hand is moving between the targets and planning is the potential planning time for participants from the target appearing on the screen until initiation of the reach to that target. We added the relevant parts of Figure 1D to the subplots for each subsequent experiment, to summarize the timing of other experiments and their analyses. For the experiments with target jump, a small vertical arrow shows the time of the target jump relative to other events.
  
  However, this figure will be less useful, if the connection between the timing dots and ticks is not communicated. We agree that in the original manuscript, this important figure was only briefly explained in the caption of Figure 1. We expanded the explanation in the caption of Figure 1 and referenced the dots and ticks in the main text.
  
  (3) Fig 6B - for some reason I got confused here: I thought the central target in this figure was the start target, and it took me embarrassingly long to figure out that the green target was the start target. This is likely because I'm used to seeing center-out behavioral figures. Incidentally, I wasn't confused by 7c (in fact, seeing 7c is what made me understand 6b), so maybe the solution is to clearly mark a directionality to the reach trajectories, or to point an arrow at the green target like in previous figures. Also, the bottom left gray target in the figure blends into the graph on the left--I didn't notice it until rereading. Because there's white space between that target and the green one, it might be good to introduce some white space to separate the graph from the targets more. The target arrangement makes more sense in panel C, but by the time I got there, I had already been a bit confused.
  
  Thanks for raising this point. As shown in Figure 6C, we used the reach to the +1 target for the curvature analysis. The confusion about Figure 6B is probably due to continuing the reach trajectories after the +1 target. That also explains why Figure 7C seemed more straightforward. To solve this issue we modified Figure 6B such that the reaches are shown with full opacity right until the +1 target and then shown with more transparency. We believe this change focuses the reader's attention to the reach initiated from the +0 target to the +1 target.
  
  As for the gray target in Figure 6B, we originally had the gray target as it is a potential start location for the reach to the +0 target, and for having similar visuals between the plots. The gray target is now removed from Figure 6B.
  
  (4) Line 253 - I'm not sure I understand the advantage over simple averaging that the authors mention here--would be nice to get a bit more intuition.
  
  Thanks for raising this point. We used a two-factor model in our analysis, with each factor representing the angle of the last and next target, respectively. Both factors had five levels: -120, -60, 0, 60, and 120 degrees relative to the +1 reach. In a balanced two-factor design, where each combination of factor levels has an equal number of trials, using a linear model and simple averaging would yield equivalent results. However, when the number of trials for the combinations of the two factors is unbalanced, simple averaging can lead to misleading differences in the levels of the second factor. Additionally, the linear model allows us to investigate potential interactions between the two factors, which is not possible with simple averaging.
  
  (5) Fig 7a - I would have liked to see the traces labeled in figure (i.e. hand trajectory vs. eye trajectory)
  
  Hand and eye trajectories are now labeled in the figure.
  
  (6) Fig 7c - very minor, but the hexagon of targets is rotated 30 degrees from all previous hexagons shown (also, this hex grid target arrangement can't lead to the trajectory shown in 7a, so it can't be that this was a different experimental grid). I'm guessing this was a simple oversight.
  
  We used the same grid in the eye-tracking experiment. The targets are to visually match the previous plots. Thank you for raising this point.
  
  Reference
  
  Clavagnier, S., Prado, J., Kennedy, H., & Perenin, M.-T. (2007). How humans reach: distinct cortical systems for central and peripheral vision. The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 13(1), 22–27.
  
  González-Alvarez, C., Subramanian, A., & Pardhan, S. (2007). Reaching and grasping with restricted peripheral vision. Ophthalmic & Physiological Optics: The Journal of the British College of Ophthalmic Opticians , 27(3), 265–274.
  
  Kalidindi, H. T., & Crevecoeur, F. (2023). Task dependent coarticulation of movement sequences (p.2023.12.15.571847). https://doi.org/10.1101/2023.12.15.571847
  
  Miller, G. A. (1956). The magical number seven plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.
  
  Sivak, B., & MacKenzie, C. L. (1990). Integration of visual information and motor output in reaching and grasping: the contributions of peripheral and central vision. Neuropsychologia, 28(10), 1095–1116.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.24.542099v3
www.biorxiv.org www.biorxiv.org

Unveiling the signaling network of FLT3-ITD AML improves drug sensitivity prediction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This useful study could potentially represent a step forward towards personalized medicine by combining cell-based data and a prior-knowledge network to derive Boolean-based predictive logic models to uncover altered protein/signaling networks within cancer cells. However, the level of evidence supporting the conclusions is inadequate, and further validation of the reported approach is required. If properly validated, these findings could be of interest to medical biologists working in the field of cancer and would inform drug development and treatment choices in the field of oncology.
  
  We thank the editor and the reviewer for their constructive comments, which helped us to improve our story. We have now performed new analyses and experiments to further support our proposed approach.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  (1) The authors deploy a combination of their own previously developed computational methods and databases (SIGNOR and CellNOptR) to model the FLT3 signaling landscape in AML and identify synergistic drug combinations that may overcome the resistance AML cells harboring ITD mutations in the TKI domain of FLT3 to FLT3 inhibitors. I did not closely evaluate the details of these computational models since they are outside of my area of expertise and have been previously published. The manuscript has significant issues with data interpretation and clarity, as detailed below, which, in my view, call into question the main conclusions of the paper.
  
  The authors train the model by including perturbation data where TKI-resistant and TKIsensitive cells are treated with various inhibitors and the activity (i.e. phosphorylation levels) of the key downstream nodes are evaluated. Specifically, in the Results section (p. 6) they state "TKIs sensitive and resistant cells were subjected to 16 experimental conditions, including TNFa and IGF1 stimulation, the presence or absence of the FLT3 inhibitor, midostaurin, and in combination with six small-molecule inhibitors targeting crucial kinases in our PKN (p38, JNK, PI3K, mTOR, MEK1/2 and GSK3)". I would appreciate more details on which specific inhibitors and concentrations were used for this experiment. More importantly, I was very puzzled by the fact that this training dataset appears to contain, among other conditions, the combination of midostaurin with JNK inhibition, i.e. the very combination of drugs that the authors later present as being predicted by their model to have a synergistic effect. Unless my interpretation of this is incorrect, it appears to be a "self-fulfilling prophecy", i.e. an inappropriate use of the same data in training and verification/test datasets.
  
  We thank the reviewer for this comment. We have now extensively revised the Figure 2B and edited the text to clarify and better describe the experimental conditions of our multiparametric analysis. As the reviewer stated, we have used different combinations of drugs, including midostaurin and JNK inhibitor to generate two cell-specific predictive models recapitulating the main signal transduction events, down-stream FLT3, occurring in resistant (FLT3ITD-TKD) and sensitive (FLT3ITD-JMD) cells. These experiments were performed by treating cells at very early time points to obtain a picture of the signaling response of FLT3-ITD positive cells. Indeed, we have measured the phosphorylation level of signaling proteins, because at these early time points (90 minutes) we do not expect a modulation of downstream crucial phenotypes, including apoptosis or proliferation. To infer perturbations impacting the apoptosis or proliferation phenotypes, we applied a computational two-steps strategy:
  
  (1) We extracted key regulators of ‘apoptosis’ and ‘proliferation’ hallmarks from SIGNOR database.
  
  (2) We applied our recently developed ProxPath algorithm to retrieve significant paths linking nodes of our two optimized models to ‘proliferation’ and ‘apoptosis’ phenotypes.
  
  This allowed us to evaluate in silico the “proliferation” and “apoptosis” rate upon inactivation of each node of the network. With the proposed approach, we identified JNK as a potential drug target to use in combination with FLT3 to restore sensitivity (i.e. in silico inducing apoptosis and reducing proliferation) of FLT3 ITD-TKD cells. We here want to stress once more that although the first piece of information (the effect of JNK and FLT3 inhibition) on sentinel readouts was provided in the training dataset, the second piece of information (the effect on this treatment over the entire model and, as a consequence, on the cellular phenotype) was purely the results of our computational models. As such, we hope that the reviewer will agree that this could not represent a “self-fulfilling prophecy".
  
  That said, we understand that this aspect was not clearly defined in the manuscript. For this reason, we have now 1) extensively revised the Figure 2B; 2) edited the text (pg. 6) to clarify the purpose and the results of our approach; and 3) described in further detail (pg. 16-18) the experimental conditions of our multiparametric analysis.
  
  (2) My most significant criticism is that the proof-of-principle experiment evaluating the combination effects of midostaurin and SP600125 in FLT3-ITD-TKD cell line model does not appear to show any synergism, in my view. The authors' interpretation of the data is that the addition of SP600125 to midostaurin rescues midostaurin resistance and results in increased apoptosis and decreased viability of the midostaurin-resistant cells. Indeed, they write on p.9: "Strikingly, the combined treatment of JNK inhibitor (SP600125) and midostaurin (PKC412) significantly increased the percentage of FLT3ITD-TKD cells in apoptosis (Fig. 4D). Consistently, in these experimental conditions, we observed a significant reduction of proliferating FLT3ITD- TKD cells versus cells treated with midostaurin alone (Fig. 4E)." However, looking at Figs 4D and 4E, it appears that the effects of the midostaurin/SP600125 combination are virtually identical to SP600125 alone, and midostaurin provides no additional benefit. No p-values are provided to compare midostaurin+SP600125 to SP600125 alone but there seems to be no appreciable difference between the two by eye. In addition, the evaluation of synergism (versus additive effects) requires the use of specialized mathematical models (see for example Duarte and Vale, 2022). That said, I do not appreciate even an additive effect of midostaurin combined with SP600125 in the data presented.
  
  We agree with the reviewer that the JNK inhibitor and midostaurin do not have neither a synergic nor additive effect and we have now revised the text accordingly. It is highly discussed in the scientific community whether FLT3ITD-TKD AML cells benefit from midostaurin treatments. In a recently published retroprospective study of K. Dohner et al. (Rücker et al., 2022), the authors investigated the prognostic and predictive impact of FLT3-ITD insertion site (IS) in 452 patients randomized within the RATIFY trial, which evaluated midostaurin additionally to intensive chemotherapy. Their study clearly showed that “Midostaurin exerted a significant benefit only for JMDsole” patients. In agreement with this result, we have demonstrated that midostaurin treatment had no effects on apoptosis of blasts derived from FLT3ITD-TKD patients (Massacci et al., 2023). On the other hand, we and others observed that midostaurin triggers apoptosis in FLT3ITD-TKD cells to a lesser extent as compared to FLT3ITDJMD cells (Arreba-Tutusaus et al., 2016). The data presented here (Fig. 4) and our previously published papers (Massacci et al., 2023; Pugliese et al., 2023) pinpoint that hitting cell cycle regulators (WEE1, CDK7, JNK) induce a significant apoptotic response of TKI resistant FLT3ITD-TKD cells. Prompted by the reviewer comment, we have now revised the text and discussion (pg.9; 14) highlighting the crucial role of JNK in apoptosis induction.
  
  (3) In my view, there are significant issues with clarity and detail throughout the manuscript. For example, additional details and improved clarity are needed, in my view, with respect to the design and readouts of the signaling perturbation experiments (Methods, p. 15 and Fig 2B legend). For example, the Fig 2B legend states: "Schematic representation of the experimental design: FLT3 ITD-JMD and FLT3 ITD-JMD cells were cultured in starvation medium (w/o FBS) overnight and treated with selected kinase inhibitors for 90 minutes and IGF1 and TNFa for 10 minutes. Control cells are starved and treated with PKC412 for 90 minutes, while "untreated" cells are treated with IGF1 100ng/ml and TNFa 10ng/ml with PKC412 for 90 minutes.", which does not make sense to me. The "untreated" cells appear to be treated with more agents than the control cells. The logic behind cytokine stimulation is not adequately explained and it is not entirely clear to me whether the cytokines were used alone or in combination. Fig 2B is quite confusing overall, and it is not clear to me what the horizontal axis (i.e. columns of "experimental conditions", as opposed to "treatments") represents. The Method section states "Key cell signaling players were analyzed through the X-Map Luminex technology: we measured the analytes included in the MILLIPLEX assays" but the identities of the evaluated proteins are not given in the Methods. At the same time, the Results section states "TKIs sensitive and resistant cells were subjected to 16 experimental conditions" but these conditions do not appear to be listed (except in Supplementary data; and Fig 2B lists 9 conditions, not 16). In my subjective view, the manuscript would benefit from a clearer explanation and depiction of the experimental details and inhibitors used in the main text of the paper, as opposed to various Supplemental files/Figures. The lack of clarity on what exactly were the experimental conditions makes the interpretation of Fig 2 very challenging. In the same vein, in the PCA analysis (Fig 2C) there seems to be no reference to the cytokine stimulation status while the authors claim that PC2 stratifies cells according to IGF1 vs TNFalpha. There are numerous other examples of incomplete or confusing legends and descriptions which, in my view, need to be addressed to make the paper more accessible.
  
  We thank the reviewer for his/her comment. We have now extensively revised the text of the manuscript (pg. 6), revised Fig. 2B (now Fig 2C) and methods (pg. 16-18) to improve the clarity of our manuscript, making the take-home messages more accessible. We believe that the revised versions of text and of Figure 2 better explain our strategy and clarify the experimental set up, we added details on the choices of the experimental conditions, and we proposed a better graphic representation of the analysis.
  
  (4) I am not sure that I see significant value in the patient-specific logic models because they are not supported by empirical evidence. Treating primary cells from AML patients with relevant drug combinations would be a feasible and convincing way to validate the computational models and evaluate their potential benefit in the clinical setting.
  
  We thank the reviewer for this comment. We have now performed additional experiments in a small cohort of FLT3-ITD positive patient-derived primary blasts. Specifically, we have treated blasts from 2 FLT3ITD-TKD patients and 3 FLT3ITD-JMD+TKD patients with PKC412 (100nM) 24h and/or 10μM SP600125 (JNK inhibitor). After 24h of treatment we have measured the apoptotic rate. As shown below and in the new Fig. 4F (see pg.10, main text), midostaurin triggers higher levels of apoptosis in FLT3ITD-JMD+TKD blasts as compared to FLT3ITD-TKD blasts. Importantly, treatment with the JNK inhibitor SP600125 alone triggers apoptosis in FLT3ITD-TKD blasts, validating the crucial role of JNK in FLT3ITD-TKD cell survival and TKI resistance. The combined treatment of midostaurin and SP600125 increases the percentage of apoptotic cells as compared to midostaurin treatment alone but to a lesser extent than single agent treatment. This result is in agreement with the current debate in the scientific community on the actual beneficial effect of midostaurin treatment in FLT3ITD-TKD AML patients.
  
  Author response image 1.
  
  Primary samples from AML patients with the FLT3ITD-TKD mutation (n=2, yellow bars) or the FLT3ITD-JMD/TKD mutation (n=3, blue bars) were exposed to Midostaurin (100nM, PKC412), and JNK inhibitor (10µM, SP600125) for 48 hours, or combinations thereof. The specific cell death of gated AML blasts was calculated to account for treatment-unrelated spontaneous cell death. The bars on the graph represent the mean values with standard errors.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This manuscript by Latini et al describes a methodology to develop Boolean-based predictive logic models that can be applied to uncover altered protein/signalling networks in cancer cells and discover potential new therapeutic targets. As a proof-of-concept, they have implemented their strategy on a hematopoietic cell line engineered to express one of two types of FLT3 internal tandem mutations (FLT3-ITD) found in patients, FLT3-ITD-TKD (which are less sensitive to tyrosine kinase inhibitors/TKIs) and FLT3-ITD-JMD (which are more sensitive to TKIs).
  
  Strengths:
  
  This useful work could potentially represent a step forward towards personalised targeted therapy, by describing a methodology using Boolean-based predictive logic models to uncover altered protein/signalling networks within cancer cells. However, the weaknesses highlighted below severely limit the extent of any conclusions that can be drawn from the results.
  
  Weaknesses:
  
  While the highly theoretical approach proposed by the authors is interesting, the potential relevance of their overall conclusions is severely undermined by a lack of validation of their predicted results in real-world data. Their predictive logic models are built upon a set of poorlyexplained initial conditions, drawn from data generated in vitro from an engineered cell line, and no attempt was made to validate the predictions in independent settings. This is compounded by a lack of sufficient experimental detail or clear explanations at different steps. These concerns considerably temper one's enthusiasm about the conclusions that could be drawn from the manuscript.
  
  We thank the reviewer for the thorough review and kind comments about our manuscript. We hope the changes and new data we provide further strengthen it in his or her eyes.
  
  Some specific concerns include:
  
  (1) It remains unclear how robust the logic models are, or conversely, how affected they might be by specific initial conditions or priors that are chosen. The authors fail to explain the rationale underlying their input conditions at various points. For example: - at the start of the manuscript, they assert that they begin with a pre-PKN that contains "76 nodes and 193 edges", though this is then ostensibly refined with additional new edges (as outlined in Fig 2A). However, why these edges were added, nor model performance comparisons against the basal model are presented, precluding an evaluation of whether this model is better.
  
  We understand the reviewer’s concern. We have now complemented the manuscript with an extended version of the proposed modelling strategy offering a detailed description of the pipeline and the rationale behind each choice (Supplementary material, pg.14-19). Furthermore, we also referenced the manuscript to a GitHub repository where users can follow and reproduce each step of the pipeline (https://github.com/SaccoPerfettoLab/FLT3ITD_driven_AML_Boolean_models).
  
  At a later step (relevant to Fig S4 and Fig 3), they develop separate PKNs, for each of the mutation models, that contain "206 [or] 208 nodes" and "756 [or] 782 edges", without explaining how these seemingly arbitrary initial conditions were arrived at. Their relation to the original parameters in the previous model is also not investigated, raising concerns about model over-fitting and calling into question the general applicability of their proposed approach. The authors need to provide a clearer explanation of the logic underlying some of these initial parameter selections, and also investigate the biological/functional overlap between these sets of genes (nodes).
  
  We thank the reviewer for raising this question. Very briefly, the proposed optimization strategy falls in a branch of the modelling, where the predictive model is, indeed, driven by the data (Blinov and Moraru, 2012). From a certain point of view, the scope of optimization is the one of fitting the experimental data in the best way possible. To achieve this, we followed standard practices (Dorier et al., 2016; Traynard et al., 2017). To address the issue of “calling into question the general applicability of their proposed approach”, we have compared the activity status of nodes in the models with ‘real data’ extracted from cell lines and patients’ samples to reassure about the robustness and scalability of the strategy (please see below, response to point 3 pg. 9).
  
  Finally, as mentioned in the previous point, we have now provided a detailed supplementary material, where we have described all the aspects mentioned by the reviewer: step-by-step changes in the PKN, the choice of the parameters and other details can be traced over the novel text and are also available in the GitHub repository (https://github.com/SaccoPerfettoLab/FLT3-ITD_driven_AML_Boolean_models).
  
  (2) There is concern about the underlying experimental data underpinning the models that were generated, further compounded by the lack of a clear explanation of the logic. For example, data concerning the status of signalling changes as a result of perturbation appears to be generated from multiplex LUMINEX assays using phosphorylation-specific antibodies against just 14 "sentinel" proteins. However, very little detail is provided about the rationale underlying how these 14 were chosen to be "sentinels" (and why not just 13, or 15, or any other number, for that effect?). How reliable are the antibodies used to query the phosphorylation status? What are the signal thresholds and linear ranges for these assays, and how would these impact the performance/reliability of the logic models that are generated from them?
  
  We thank the reviewer for this comment as it gives us the opportunity to clarify and better explain the criteria behind the experimental data generation.
  
  Overall, we revised the main text at page 6 and the Figure 2B to improve the clarity of our experimental design. Specifically, the sentinels were chosen because they were considered indirect or direct downstream effectors of the perturbations and were conceived to serve as both a benchmarking system of the study and a readout of the global perturbation of the system. To clarify this aspect, we have added a small network (compressed PKN) in Figure 2B to show that the proteins (green nodes) we chose to measure in the LUMINEX multiplex assay are “sentinels” of the activity of almost all the pathways included in the Prior knowledge network. Moreover, we implemented the methods section “Multiparametric experiment of signaling perturbation” (pg. 16-18), where we added details about the antibodies used in the assay paired with the target phosphosites and their functional role (Table 3). We also better specified the filtering process based on the number of beads detected per each antibody used (pg. 18). About the reliability of the measurements, we can say that the quality of the perturbation data impacts greatly on the logic models’ performance. xMAP technology been already used by the scientific community to generate highly reproducible and reliable multiparametric dataset for model training (Terfve et al., 2012). Additionally, we checked that for each sentinel we could measure a fully active state, a fully inactive state and intermediate states. Modulation of individual analytes are displayed in Figure S3.
  
  Author response image 2.
  
  Partial Figure of normalization of analytes activity through Hill curves. Experimental data were normalized and scaled from 0 to 1 using analyte-specific Hill functions. Raw data are reported as triangles, normalized data and squares. Partial Figure representing three plots of the FLT3 ITD-JMD data (Complete Figure in Supplementary material Fig S3).
  
  (3) In addition, there are publicly available quantitative proteomics datasets from FLT3-mutant cell lines and primary samples treated with TKIs. At the very least, these should have been used by the authors to independently validate their models, selection of initial parameters, and signal performance of their antibody-based assays, to name a few unvalidated, yet critical, parameters. There is an overwhelming reliance on theoretical predictions without taking advantage of real-world validation of their findings. For example, the authors identified a set of primary AML samples with relevant mutations (Fig 5) that could potentially have provided a valuable experimental validation platform for their predictions of effective drug combination. Yet, they have performed Boolean simulations of the predicted effects, a perplexing instance of adding theoretical predictions on top of a theoretical prediction!
  
  Additionally, there are datasets of drug sensitivity on primary AML samples where mutational data is also known (for example, from the BEAT-AML consortia), that could be queried for independent validation of the authors' models.
  
  We thank the reviewer for this comment that helped us to significantly strengthen our story. Prompted by his/her comment, we have now queried three different datasets for independent validation of our logic models. Specifically, we have taken advantage of quantitative phosphoproteomics datasets of FLT3-ITD cell lines treated with TKIs (Massacci et al., 2023), phosphoproteomic data of FLT3-ITD positive patients-derived primary blast (Kramer et al., 2022) and of drug sensitivity data on primary FLT3-ITD positive AML samples (BEAT-AML consortia)
  
  Comparison with phosphoproteomic data of FLT3-ITD cell lines treated with TKIs (Massacci et al., 2023)
  
  Here, we compared the steady state of our model upon FLT3 inhibition with the phosphoproteomic data describing the modulation of 16,319 phosphosites in FLT3-ITD BaF3 cells (FLT3ITD-TKD and FLT3ITD-JMD) upon TKI treatment (i.e. quizartinib, a highly selective FLT3 inhibitor). As shown in the table below and new Figure S5A, the activation status of the nodes in the two generated models is highly comparable with the level of regulatory phosphorylations reported in the reference dataset. Briefly, to determine the agreement between each model and the independent dataset, we focused on the phosphorylation level of specific residues that (i) regulate the functional activity of sentinel proteins (denoted in the ‘Mode of regulation’ column) and (ii) that were measured in this work to train the model. So, we cross-referenced the sentinel protein status in FLT3 inhibition simulation (as denoted in the 'Model simulation of FLT3 inhibition' column) with the functional impact of phosphorylation measured in Massacci et. al dataset (as denoted in the 'Functional impact in quizartinib dataset' column). Points of congruence were summarized in the 'Consensus' column. As an example, if the phosphorylation level of an activating residue decreases (e.g., Y185 of Mapk1), we can conclude that the protein is inhibited (‘Down-reg’) and this is coherent with model simulation in which Mapk1 is ‘Inactive’.
  
  Author response image 3.
  
  Comparison with phosphoproteomic data of FLT3-ITD patient-derived primary blasts (Kramer et al., 2022)
  
  Using the same criteria, we extended our validation efforts by comparing the activity status of the proteins in the “untreated” simulation (i.e. reproducing the tumorigenic state where FLT3, IGF1R and TNFR are set to be active) with their phosphorylation levels in the dataset by Kramer et al. (Kramer et al., 2022). Briefly, this dataset gathers phosphoproteomic data from a cohort of 44 AML patients and we restricted the analysis to 11 FLT3-ITD-positive patients. Importantly, all patients carry the ITD mutation in the juxta membrane domain (JMD), thus allowing for the comparison with FLT3 ITD-JMD specific Boolean model, exclusively.
  
  The results are shown in the heatmap below. Each cell in the heatmap reports the phosphorylation level of sentinel proteins’ residues in the indicated patient (red and blue indicate up- or- down-regulated phosphoresidues, respectively). Patients were clustered according to Pearson correlation. We observed a good level of agreement between the patients’ phosphoproteomics data and our model (reported in the column “Tumor simulation steady state”) for a subset of patients highlighted within the black rectangle. However, for the remaining patients, the level of agreement is poor. The main reason is that our work focuses on FLT3-ITD signaling and a systematic translation of the Boolean modeling approach to the entire cohort of AML patients would require the inclusion of the impact of other driver mutations in the network. This is actually a current and a future line of investigation of our group. We have revised the discussion, taking this result into consideration.
  
  Author response image 4.
  
  Comparison with drug sensitivity data on primary FLT3-ITD positive AML samples (BEAT-AML consortia)
  
  Here we took advantage of the Beat AML programme on a cohort of 672 tumour specimens collected from 562 patients. The BEAT AML consortium provides whole-exome sequencing, RNA sequencing and analyses of ex vivo drug sensitivity of this large cohort of patient-derived primary blasts. We focused on drug sensitivity screening on 134 patients carrying the typical FLT3-ITD mutation in the JMD region. Unfortunately, the ITD insertion in the TKD region is less characterized and additional in-depth sequencing studies are required to identify in this cohort FLT3ITD-TKD positive blasts. Next, we focused on those compounds hitting nodes present in the FLT3ITD-JMD Boolean model. Specifically, we selected drugs inhibiting FLT3, PI3K, mTOR, JNK and p38 and we calculated the average IC50 of FLT3ITD-JMD patient-derived primary blasts for each drug. These results are reported as a bar graph in the new Fig. S5B and below (upper panel) and were compared with the apoptotic and proliferation rate measured in silico simulation of the FLT3ITD-JMD Boolean model. Drug sensitivity screening on primary FLT3ITD-JMD blasts revealed that inhibition of FLT3, PI3K and mTOR induces cell death at low drug concentrations in contrast with JNK and p38 inhibitors showing higher IC50 values. These observations are consistent with our simulation results of the FLT3ITD-JMD model. As expected, in silico inhibition of FLT3 greatly impacts apoptosis and proliferation. Additionally, in silico suppression of mTOR and to a lesser extent PI3K and p38 affect apoptosis and proliferation. Of note, JNK inhibition neither in silico nor in vitro seems to affect viability of FLT3ITD-JMD cells.
  
  Author response image 5.
  
  Altogether these publicly available datasets independently validate our models, strengthening the reliability and robustness of our approach.
  
  We have now revised the main text (pg. 8; 9) and added a new Figure (Fig. S5) in the supplementary material; we collected the results of the analysis in TableS6.
  
  (4) There are additional examples of insufficient experimental detail that preclude a fuller appreciation of the relevance of the work. For example, it is alluded that RNA-sequencing was performed on a subset of patients, but the entire methodological section detailing the RNA-seq amounts to just 3 lines! It is unclear which samples were selected for sequencing nor where the data has been deposited (or might be available for the community - there are resources for restricted/controlled access to deidentified genomics/transcriptomics data).
  
  We apologize for the lack of description regarding the RNA sequencing of patient samples. We have now added details of this approach in the method section (pg. 24), clearly explained in text how we selected the patients for the analysis. Additionally, data has now been deposited in the GEO database (accession number: GSE247483).
  
  The sentences we have rephrased are below:
  
  “We analyzed the mutational and expression profiles of 262 genes (Table S7), relevant to hematological malignancies in a cohort of 14 FLT3-ITD positive de novo AML patients (Fig. 5A, panel a). Since, follow-up clinical data were available for 10 out of 14 patients (Fig. 5B, Table S9), we focused on this subset of patients. Briefly, the classification of these 10 patients according to their ITD localization (see Methods) was as follows: 8 patients with FLT3ITD-JMD, 4 with FLT3ITD-JMD+TKD, and 2 with FLT3ITD-TKD (Fig. 5A, panel b). The specific insertion sites of the ITD in the patient cohort are shown in Table S8.
  
  Similarly, in the "combinatory treatment inference" methods, it states "...we computed the steady state of each cell line best model....." and "Then we inferred the activity of "apoptosis" and "proliferation" phenotypes", without explaining the details of how these were done. The outcomes of these methods are directly relevant to Fig 4, but with such sparse methodological detail, it is difficult to independently assess the validity of the presented data.
  
  Overall, the theoretical nature of the work is hampered by real-world validation, and insufficient methodological details limit a fuller appreciation of the overall relevance of this work.
  
  We thank the reviewer for the insightful feedback regarding the methodology in our paper.<br /> About ‘real-world validation’ we have extensively replied to this issue in point 3 (pg. 9-14 of this document). For what concerns the ‘insufficient methodological details’, we have made substantial improvements to enhance clarity and reproducibility, that encompass: (i) revisions in the main text and in the Materials and Methods section; (ii) detailed explanation of each step and decisions taken that can be accessed either as an extended Materials and Methods section (Supplementary material, pg. 14-19) and through our GitHub repository (https://github.com/SaccoPerfettoLab/FLT3-ITD_driven_AML_Boolean_models). We sincerely hope this addition addresses concerns and facilitates a more thorough and independent assessment of our work.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The paper "Unveiling the signaling network of FLT3-ITD AML improves drug sensitivity prediction" reports the combination of prior knowledge signaling networks, multiparametric cell-based data on the activation status of 14 crucial proteins emblematic of the cell state downstream of FLT3 obtained under a variety of perturbation conditions and Boolean logic modeling, to gain mechanistic insight into drug resistance in acute myeloid leukemia patients carrying the internal tandem duplication in the FLT3 receptor tyrosine kinase and predict drug combinations that may reverse pharmacoresistant phenotypes. Interestingly, the utility of the approach was validated in vitro, and also using mutational and expression data from 14 patients with FLT3-ITD positive acute myeloid leukemia to generate patient-specific Boolean models.
  
  Strengths:
  
  The model predictions were positively validated in vitro: it was predicted that the combined inhibition of JNK and FLT3, may reverse resistance to tyrosine kinase inhibitors, which was confirmed in an appropriate FLT3 cell model by comparing the effects on apoptosis and proliferation of a JNK inhibitor and midostaurin vs. midostaurin alone.
  
  Whereas the study does have some complexity, readability is enhanced by the inclusion of a section that summarizes the study design, plus a summary Figure. Availability of data as supplementary material is also a high point.
  
  We thank the reviewer for his/her constructive comments about our manuscript. We believe that our story has been significantly strengthened by the changes and new data we provided.
  
  Weaknesses:
  
  (1) Some aspects of the methodology are not properly described (for instance, no methodological description has been provided regarding the clustering procedure that led to Figs. 2C and 2D).
  
  We apologize for the lack of proper description of the methodology. We have extensively revised the methods section and worked to improve the clarity. We have now added a description of the clustering procedures in the methods section (pg. 19) of new Fig. S2D., Fig. S2E.
  
  It is not clear in the manuscript whether the patients gave their consent to the use of their data in this study, or the approval from an ethical committee. These are very important points that should be made explicit in the main text of the paper.
  
  We thank the reviewer for this comment. We have now added the following sentence (pg. 24): “Peripheral blood (PB) samples from 14 AML patients were obtained upon patient’s informed consent.”
  
  The authors claim that some of the predictions of their models were later confirmed in the follow-up of some of the 14 patients, but it is not crystal clear whether the models helped the physicians to make any decisions on tailored therapeutic interventions, or if this has been just a retrospective exercise and the predictions of the models coincide with (some of) the clinical observations in a rather limited group of patients. Since the paper presents this as additional validation of the models' ability to guide personalized treatment decisions, it would be very important to clarify this point and expand the presentation of the results (comparison of observations vs. model predictions).
  
  As described in the introduction section, this study was inspired by an urgent clinical problem in AML research: patients carrying the ITD in the TKD domain of the FLT3 receptor display poor prognosis and do not respond to current therapy: Midostaurin (which on the other hand is effective in patients with the ITD in the JMD domain).
  
  To fill this gap, we gathered a team of 18 participants, of which 7 have a clinical background and have expertise in the diagnosis, treatment and management of AML patients and 5 are experts in Boolean modeling. The scope of the project is the development of a computational approach to identify possible alternative solutions for FLT3ITD-TKD AML patients, generating future lines of investigations. Drug combinations are currently under investigation as a potential means of avoiding drug resistance and achieving more effective and durable treatment responses. However, it is impractical to test for potential synergistic properties among all available drugs using empirical experiments alone. With our approach, we developed models that recreated in silico the main differences in the signaling of sensitive and resistant cells to support the prioritization of novel therapies. Prompted by the reviewer suggestions, we have now extended the validation of our models, through the comparison with publicly available cell lines and patient-derived dataset. We have also confirmed our results by performing in vitro experiments in patient-derived primary blasts treated with midostaurin and/or JNK inhibitor. Importantly, we have already demonstrated that hitting cell cycle regulators in FLT3ITD-TKD cells can be an effective approach to kill resistant leukemia cells (Massacci et al., 2023; Pugliese et al., 2023). We are aware that changing the clinical practice and the therapies for patients require a proper clinical study which goes far beyond the scope of this manuscript.
  
  However, we hope that our results can be translated soon from “bench-to-bed”. Importantly, we believe that our study can open lines of investigations aimed at the application of our approach to identify promising therapeutic strategies in other clinical settings.
  
  Recommendations for the authors
  
  The reviewers have highlighted significant issues regarding the inadequate level of evidence to support some of the conclusions, plus lack of an exhaustive methodological description that may jeopardize reproducibility.
  
  We hope that the editor and the reviewers will appreciate the extensive revision we made and new data and analysis we provided to strengthen our story.
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) In Fig 2D the hierarchical tree is off-set in relation to the treatment symbols and names in the middle of the Figure. In addition, I do not see FLT3i combination with JNKi in the JMD cells (perhaps, a coloring error?).
  
  We thank the reviewer for this observation. We have now revised the hierarchical tree, which is now in Figure S2D, we have aligned the tree with the symbols and names and corrected the colouring error for the sample FLT3i+JNKi in JMD cells.
  
  (2) Midostaurin and PKC412 refer to the same drug and are used interchangeably in the manuscript. Using one name consistently would improve readability.
  
  We have now improved the readability of the text and the Figures by choosing “Midostaurin” when we refer to the FLT3 inhibitor.
  
  (3) It is not clear to me why the FLT3-ITD-JMD cells are not presented in Fig. 4B. Perhaps their values are 0? In that case, the readability would be improved by including a thin blue line representing zero values. Additionally, on p.8 the authors state "Interestingly, in the FLT3ITDTKD model, the combined inhibition of JNK and FLT3, exclusively, in silico restores the TKI sensitivity, as revealed by the evaluation of the apoptosis and proliferation levels (Fig. 4B-C)." but Fig. 4C shows no differential effects of JNK inhibition in sensitive versus resistant cells.
  
  To address the reviewer's point, we’ve added a thin blue line representing the zero values of the FLT3ITD-JMD in the results of the simulations in Figure 4B. Regarding the Figure 4C, the reviewer is right in saying that there is no difference in terms of proliferation between sensitive and resistant cells upon JNKi and FLT3i co-inhibition. However, we can see lower proliferation levels in both cell lines as compared to the “untreated” condition. Indeed, the simulation suggests that by combining JNK and FLT3 inhibition we restore the resistant phenotype lowering the proliferation rate of the resistant cells to the TKI-sensitive levels.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I have addressed a number of concerns in the public review. Much better effort needs to be made to provide sufficient methodological detail (to permit independent validation by a sufficiently capable and motivated party) and explain the rationale of important parameter selections. Furthermore, I urge the authors to take advantage of the plethora of publicly available real-world data to validate their predicted outcomes.
  
  We are grateful to the reviewer for the careful revisions. All the aspects raised have been discussed in the specific sections of the public review. In summary, we have provided more methodological details, by revising the text, the methods session, by adding a new step-by-step description of the modelling strategy, the parameters and the criteria adopted in each phase (supplementary methods) and by referring to the entire code developed. Prompted by the reviewer suggestions, we have performed a novel and extensive comparison of our model with three different publicly available datasets. This analysis significantly strengthens our story, and a new supplementary Figure (Fig. S5) summarizes our findings (pg. 9-14 of this document).
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) At first sight, the distribution of the data points in the PCA space does not really seem to speak of nice clustering. Have the authors computed any clustering validation metric to assess if their clustering strategy is adequate and how informative the results are? Further analysis of this point of the article is precluded by the absence of a clear methodological description.
  
  Here we have used the PCA analysis to obtain a global view of our complex multiparametric data. We have now worked on the PCA to improve its readability. As shown in the new Figure 2D, PCA analysis showed that the activity level of sentinel proteins stratifies cells according to FLT3 activation status (component 1: presence vs absence of FLT3i) and cytokine stimulation (component 2: IGF1 vs TNF⍺). We have now added new experimental details on this part in the methods section (pg. 19) and we deposited the code used for the clustering strategy on the GitHub repository (https://github.com/SaccoPerfettoLab/FLT3ITD_driven_AML_Boolean_models).
  
  (2) Whereas scientists and medical professionals who work in the field of oncology may be familiar with some of the abbreviations used here, it would be good for improved readability by a more general audience to make sure that all the abbreviations (e.g., TKI) are properly defined the first time that they appear in the text.
  
  We thank the reviewer for this observation. To improve the readability of the text, we properly defined all the abbreviations in their first appearance, and we added the “Abbreviation” paragraph at page 15 of the manuscript to summarize them all.
  
  (3) How were the concentrations of the combined treatments chosen in the cell assays used as validation?
  
  We thank the reviewer for giving us the chance to clarify this point. We implemented the Methods with additional information about the treatments used in the validations. We detailed the SP600125 IC50 evaluation and usage in our cell lines (pg.22): IC50 values are approximately 1.5 µM in FLT3-ITD mutant cell lines; the SP600125 treatment affects cell viability, reaching a plateau phase of cell death and at about 2 µM. I used the minimal dose of SP600125 (10µM) to properly inhibit JNK. (Kim et al., 2010; Moon et al., 2009).
  
  We also specified (pg.22) that the concentration of Midostaurin was chosen based on the previously published work (Massacci et al., 2022): FLT3 ITD-TKD cells treated with Midostaurin 100nM show lower apoptotic rate and higher cell viability compared to FLT3 ITD-JMD cells.
  
  The concentration of SB203580 and UO126 was chosen based on previous data available in the lab and set up experiments (pg.22).
  
  (4) The authors say that "we were able to derive patient-specific signaling features and enable the identification of potential tailored treatments restoring TKI resistance" and that "our predictions were confirmed by follow-up clinical data for some patients". However, the results section on this part of the manuscript is rather scarce (the main text should be much more descriptive about the results summarized in Fig. 5, which are not self-explanatory).
  
  We thank the reviewer for this observation. We have now expanded the text to provide a more comprehensive description of the results about personalized Boolean model generation and usage and the content presented in Fig. 5 (pg.10-12).
  
  (5) I do not really agree with the final conclusion about this paper being "the proof of concept that our personalized informatics approach described here is clinically valid and will enable us to propose novel patient-centered targeted drug solutions". First, the clinical data used here belongs to a rather low number of patients. Second, as mentioned before, it is not clear if the models have been used to make any prospective decision or if this conclusion is drawn from an in vitro assay plus a retrospective analysis on a limited number of patients. Moreover, a description of the results and the discussion of the part of the manuscript dealing with patientspecific models is rather scarce, and it is difficult to see how the authors support their conclusions. Also, the statement " In principle, the generalization of our strategy will enable to obtain a systemic perspective of signaling rewiring in different cancer types, driving novel personalized approaches" may be a bit overoptimistic if one considers that so far, the approach has only been applied to a single type of drug-resistant cancer.
  
  We thank the reviewer for this comment. We agree with the referees that the clinical data we used belongs to a rather low number of patients. However, during the revision we have extensively worked to support the clinical relevance of our models and our discoveries. Specifically, we have compared our Boolean logic models with two different publicly available datasets on phosphoproteomics and drug sensitivity of FLT3ITD-JMD and FLT3ITD-TKD cell lines and blasts (FigS5 and answer to reviewer 2, point 3). Importantly, these datasets independently validated our models, highlighting that our approach has a translational value. Additionally, we have performed novel experiments by measuring the apoptotic rate of patient-derived primary blasts upon pharmacological suppression of JNK (Fig. 4H, pg. 10 of main text). Our data highlights that our approach has the potential to suggest novel effective treatments.
  
  That said, we have now revised the discussion to avoid overstatements.
  
  References
  
  Arreba-Tutusaus, P., Mack, T.S., Bullinger, L., Schnöder, T.M., Polanetzki, A., Weinert, S., Ballaschk, A., Wang, Z., Deshpande, A.J., Armstrong, S.A., Döhner, K., Fischer, T., Heidel, F.H., 2016. Impact of FLT3-ITD location on sensitivity to TKI-therapy in vitro and in vivo. Leukemia 30, 1220–1225. https://doi.org/10.1038/leu.2015.292
  
  Blinov, M.L., Moraru, I.I., 2012. Logic modeling and the ridiculome under the rug. BMC Biol 10, 92. https://doi.org/10.1186/1741-7007-10-92
  
  Dorier, J., Crespo, I., Niknejad, A., Liechti, R., Ebeling, M., Xenarios, I., 2016. Boolean regulatory network reconstruction using literature based knowledge with a genetic algorithm optimization method. BMC Bioinformatics 17, 410. https://doi.org/10.1186/s12859-016-1287-z
  
  Kramer, M.H., Zhang, Q., Sprung, R., Day, R.B., Erdmann-Gilmore, P., Li, Y., Xu, Z., Helton, N.M., George, D.R., Mi, Y., Westervelt, P., Payton, J.E., Ramakrishnan, S.M., Miller, C.A., Link, D.C., DiPersio, J.F., Walter, M.J., Townsend, R.R., Ley, T.J., 2022. Proteomic and phosphoproteomic landscapes of acute myeloid leukemia. Blood 140, 1533–1548. https://doi.org/10.1182/blood.2022016033
  
  Massacci, G., Venafra, V., Latini, S., Bica, V., Pugliese, G.M., Graziosi, S., Klingelhuber, F., Krahmer, N., Fischer, T., Mougiakakos, D., Boettcher, M., Perfetto, L., Sacco, F., 2023. A key role of the WEE1-CDK1 axis in mediating TKI-therapy resistance in FLT3-ITD positive acute myeloid leukemia patients. Leukemia 37, 288–297. https://doi.org/10.1038/s41375-022-01785-w
  
  Pugliese, G.M., Venafra, V., Bica, V., Massacci, G., Latini, S., Graziosi, S., Fischer, T., Mougiakakos, D., Boettcher, M., Perfetto, L., Sacco, F., 2023. Impact of FLT3-ITD location on cytarabine sensitivity in AML: a network-based approach. Leukemia 37, 1151–1155. https://doi.org/10.1038/s41375-023-01881-5
  
  Rücker, F.G., Du, L., Luck, T.J., Benner, A., Krzykalla, J., Gathmann, I., Voso, M.T., Amadori, S., Prior, T.W., Brandwein, J.M., Appelbaum, F.R., Medeiros, B.C., Tallman, M.S., Savoie, L., Sierra, J., Pallaud, C., Sanz, M.A., Jansen, J.H., Niederwieser, D., Fischer, T., Ehninger, G., Heuser, M., Ganser, A., Bullinger, L., Larson, R.A., Bloomfield, C.D., Stone, R.M., Döhner, H., Thiede, C., Döhner, K., 2022. Molecular landscape and prognostic impact of FLT3-ITD insertion site in acute myeloid leukemia: RATIFY study results. Leukemia 36, 90–99. https://doi.org/10.1038/s41375-021-01323-0
  
  Terfve, C., Cokelaer, T., Henriques, D., MacNamara, A., Goncalves, E., Morris, M.K., van Iersel, M., Lauffenburger, D.A., Saez-Rodriguez, J., 2012. CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Syst Biol 6, 133. https://doi.org/10.1186/1752-0509-6-133
  
  Traynard, P., Tobalina, L., Eduati, F., Calzone, L., Saez-Rodriguez, J., 2017. Logic Modeling in Quantitative Systems Pharmacology: Logic Modeling in Quantitative Systems Pharmacology. CPT Pharmacometrics Syst. Pharmacol. 6, 499–511. https://doi.org/10.1002/psp4.12225
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.22.546072v2
www.biorxiv.org www.biorxiv.org

New submission 05/05/2023, 09:36:22

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank you for the time you took to review our work and for your feedback!
  
  The major changes to the manuscript are:
  
  1) Promoted by multiple reviewers, we have replaced the statistical analysis in Figure 1L with a bootstrap analysis, added an ANOVA (in Table S1), and have also added the same analysis with mice as a statistical unit as Figure S4J to the manuscript.
  
  2) In response to reviewer 1, comment 3, we have replaced the response latency maps previously shown in Figures 3B, 3C, 3E and 3F with response amplitude maps.
  
  3) In response to reviewer 2, comment 1, we have added a variant of the response traces shown in Figures 3B, 3C, 3E and 3F with mice as the statistical unit as Figures S2C and S2D.
  
  4) In response to reviewer 2, public review, we have added data from additional experiments as Figures S6F-S6H, that control for the effect of a saline injection.
  
  A detailed point-by-point response to all reviewer concerns is provided in the following.
  
  Reviewer #1 (Public Review):
  
  The authors present a study of visuo-motor coupling primarily using wide-field calcium imaging to measure activity across the dorsal visual cortex. They used different mouse lines or systemically injected viral vectors to allow imaging of calcium activity from specific cell-types with a particular focus on a mouse-line that expresses GCaMP in layer 5 IT (intratelencephalic) neurons. They examined the question of how the neural response to predictable visual input, as a consequence of self-motion, differed from responses to unpredictable input. They identify layer 5 IT cells as having a different response pattern to other cell-types/layers in that they show differences in their response to closed-loop (i.e. predictable) vs open-loop (i.e. unpredictable) stimulation whereas other cell-types showed similar activity patterns between these two conditions. They analyze the latencies of responses to visuomotor prediction errors obtained by briefly pausing the display while the mouse is running, causing a negative prediction error, or by presenting an unpredicted visual input causing a positive prediction error. They suggest that neural responses related to these prediction errors originate in V1, however, I would caution against overinterpretation of this finding as judging the latency of slow calcium responses in wide-field signals is very challenging and this result was not statistically compared between areas. Surprisingly, they find that presentation of a visual grating actually decreases the responses of L5 IT cells in V1. They interpret their results within a predictive coding framework that the last author has previously proposed. The response pattern of the L5 IT cells leads them to propose that these cells may act as 'internal representation' neurons that carry a representation of the brain's model of its environment. Though this is rather speculative. They subsequently examine the responses of these cells to anti-psychotic drugs (e.g. clozapine) with the reasoning that a leading theory of schizophrenia is a disturbance of the brain's internal model and/or a failure to correctly predict the sensory consequences of self-movement. They find that anti-psychotic drugs strongly enhance responses of L5 IT cells to locomotion while having little effect on other cell-types. Finally, they suggest that anti-psychotics reduce long-range correlations between (predominantly) L5 cells and reduce the propagation of prediction errors to higher visual areas and suggest this may be a mechanism by which these drugs reduce hallucinations/psychosis.
  
  This is a large study containing a screening of many mouse-lines/expression profiles using wide-field calcium imaging. Wide-field imaging has its caveats, including a broad point-spread function of the signal and susceptibility to hemodynamic artifacts, which can make interpretation of results difficult. The authors acknowledge these problems and directly address the hemodynamic occlusion problem. It was reassuring to see supplementary 2-photon imaging of soma to complement this data-set, even though this is rather briefly described in the paper. Overall the paper's strengths are its identification of a very different response profile in the L5 IT cells compared other layers/cell-types which suggests an important role for these cells in handling integration of self-motion generated sensory predictions with sensory input. The interpretation of the responses to anti-psychotic drugs is more speculative but the result appears robust and provides an interesting basis for further studies of this effect with more specific recording techniques and possibly behavioral measures.
  
  We thank the reviewer for the feedback and the help with improving the manuscript. We agree, the findings presented in this study are merely a starting point. The two questions we are currently pursuing in follow up work are:
  
  1) Do the findings generalize to all known antipsychotic drugs?
  
  2) What is the mechanism by which these drugs induce a decorrelation of activity, specifically in layer 5 neurons?
  
  But we suspect these questions will take at least a few more years of research to answer.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This work investigates the effects of various antipsychotic drugs on cortical responses during visuomotor integration. Using wide-field calcium imaging in a virtual reality setup, the researchers compare neuronal responses to self-generated movement during locomotion-congruent (closed loop) or locomotionincongruent (open loop) visual stimulation. Moreover, they probe responses to unexpected visual events (halt of visual flow, sudden-onset drifting grating). The researchers find that, in contrast to a variety of excitatory and inhibitory cell types, genetically defined layer 5 excitatory neurons distinguish between the closed and the open loop condition and exhibit activity patterns in visual cortex in response to unexpected events, consistent with unsigned prediction error coding. Motivated by the idea that prediction error coding is aberrant in psychosis, the authors then inject the antipsychotic drug clozapine, and observe that this intervention specifically affects closed loop responses of layer 5 excitatory neurons, blunting the distinction between the open and closed loop conditions. Clozapine also leads to a decrease in long-range correlations between L5 activity in different brain regions, and similar effects are observed for two other antipsychotics, aripripazole and haloperidol, but not for the stimulant amphetamine. The authors suggest that altered prediction error coding in layer 5 excitatory neurons due to reduced longrange correlations in L5 neurons might be a major effect of antipsychotic drugs and speculate that this might serve as a new biomarker for drug development.
  
  Strengths:
  
  Relevant and interesting research question:
  
  The distinction between expected and unexpected stimuli is blunted in psychosis but the neural mechanisms remain unclear. Therefore, it is critical to understand whether and how antipsychotic drugs used to treat psychosis affect cortical responses to expected and unexpected stimuli. This study provides important insights into this question by identifying a specific cortical cell type and long-range interactions as potential targets. The authors identify layer 5 excitatory neurons as a site where functional effects of antipsychotic drugs manifest. This is particularly interesting as these deep layer neurons have been proposed to play a crucial role in computing the integration of predictions, which is thought to be disrupted in psychosis. This work therefore has the potential to guide future investigations on psychosis and predictive coding towards these layer 5 neurons, and ultimately improve our understanding of the neural basis of psychotic symptoms.
  
  Broad investigation of different cell types and cortical regions:
  
  One of the major strengths of this study is quasi-systematic approach towards cell types and cortical regions. By analysing a wide range of genetically defined excitatory and inhibitory cell types, the authors were able to identify layer 5 excitatory neurons as exhibiting the strongest responses to unexpected vs. expected stimuli and being the most affected by antipsychotic drugs. Hence, this quasi-systematic approach provides valuable insights into the functional effects of antipsychotic drugs on the brain, and can guide future investigations towards the mechanisms by which these medications affect cortical neurons.
  
  Bridging theory with experiments
  
  Another strength of this study is its theoretical framework, which is grounded in the predictive coding theory. The authors use this theory as a guiding principle to motivate their experimental approach connecting visual responses in different layers with psychosis and antipsychotic drugs. This integration of theory and experimentation is a powerful approach to tie together the various findings the authors present and to contribute to the development of a coherent model of how the brain processes visual information both in health and in disease.
  
  Weaknesses:
  
  Unclear relevance for psychosis research
  
  From the study, it remains unclear whether the findings might indeed be able to normalise altered predictive coding in psychosis. Psychosis is characterised by a blunted distinction between predicted and unpredicted stimuli. The results of this study indicate that antipsychotic drugs further blunt the distinction between predicted and unpredicted stimuli, which would suggest that antipsychotic drugs would deteriorate rather than ameliorate the predictive coding deficit found in psychosis. However, these findings were based on observations in wild-type mice at baseline. Given that antipsychotics are thought to have little effects in health but potent antipsychotic effects in psychosis, it seems possible that the presented results might be different in a condition modelling a psychotic state, for example after a dopamine-agonistic or a NMDA-antagonistic challenge. Therefore, future work in models of psychotic states is needed to further investigate the translational relevance of these findings.
  
  Incomplete testing of predictive coding interpretation
  
  While the investigation of neuronal responses to different visual flow stimuli Is interesting, it remains open whether these responses indeed reflect internal representations in the framework of predictive coding. While the responses are consistent with internal representation as defined by the researchers, i.e., unsigned prediction error signals, an alternative interpretation might be that responses simply reflect sensory bottom-up signals that are more related to some low-level stimulus characteristics than to prediction errors. Moreover, This interpretational uncertainty is compounded by the fact that the used experimental paradigms were not suited to test whether behaviour is impacted as a function of the visual stimulation which makes it difficult to assess what the internal representation of the animal actual was. For these reasons, the observed effects might reflect simple bottom-up sensory processing alterations and not necessarily have any functional consequences. While this potential alternative explanation does not detract from the value of the study, future work would be needed to explain the effect of antipsychotic drugs on responses to visual flow. For example, experimental designs that systematically vary the predictive strength of coupled events or that include a behavioural readout might be more suited to draw from conclusions about whether antipsychotic drugs indeed alter internal representations.
  
  Methodological constraints of experimental design
  
  While the study findings provide valuable insights into the potential effects of antipsychotic drugs, it is important to acknowledge that there may be some methodological constraints that could impact the interpretation of the results. More specifically, the experimental design does not include a negative control condition or different doses. These conditions would help to ensure that the observed effects are not due to unspecific effects related to injection-induced stress or time, and not confined to a narrow dose range that might or might not reflect therapeutic doses used in humans. Hence, future work is needed to confirm that the observed effects indeed represent specific drug effects that are relevant to antipsychotic action.
  
  Conclusion:
  
  Overall, the results support the idea that antipsychotic drugs affect neural responses to predicted and unpredicted stimuli in deep layers of cortex. Although some future work is required to establish whether this observation can indeed be explained by a drug-specific effect on predictive coding, the study provides important insights into the neural underpinnings of visual processing and antipsychotic drugs, which is expected to guide future investigations on the predictive coding hypothesis of psychosis. This will be of broad interest to neuroscientists working on predictive coding in health and in disease.
  
  We thank the reviewer for the feedback and the help with improving the manuscript.
  
  Regarding the concern of a lack of a negative control, we have repeated the correlation measurement experiments in a cohort of Tlx3-Cre x Ai148 mice that received injections of saline. This analysis is now shown in Figure S6F-S6H. Saline injections did not change correlations in L5 IT neurons. Combined with the absence of changes in the L5 IT correlation structure following amphetamine injections (Figures 7G – 7I), this suggests that unspecific effects related to stress of injection, or simply time, cannot explain the observed decorrelation effect of the antipsychotic drugs.
  
  And we fully agree, a lot more work is needed to confirm that the observed effects are specific and relevant to antipsychotic action.
  
  Reviewer #3 (Public Review):
  
  The study examines how different cell types in various regions of the mouse dorsal cortex respond to visuomotor integration and how antipsychotic drugs impacts these responses. Specifically, in contrast to most cell types, the authors found that activity in Layer 5 intratelencephalic neurons (Tlx3+) and Layer 6 neurons (Ntsr1+) differentiated between open loop and closed loop visuomotor conditions. Focussing on Layer 5 neurons, they found that the activity of these neurons also differentiated between negative and positive prediction errors during visuomotor integration. The authors further demonstrated that the antipsychotic drugs reduced the correlation of Layer 5 neuronal activity across regions of the cortex, and impaired the propagation of visuomotor mismatch responses (specifically, negative prediction errors) across Layer 5 neurons of the cortex, suggesting a decoupling of long-range cortical interactions.
  
  The data when taken as a whole demonstrate that visuomotor integration in deeper cortical layers is different than in superficial layers and is more susceptible to disruption by antipsychotics. Whilst it is already known that deep layers integrate information differently from superficial layers, this study provides more specific insight into these differences. Moreover, this study provides a first step into understanding the potential mechanism by which antipsychotics may exert their effect.
  
  Whilst the paper has several strengths, the robustness of its conclusions is limited by its questionable statistical analyses. A summary of the paper's strengths and weaknesses follow.
  
  Strengths:
  
  The authors perform an extensive investigation of how different cortical cell types (including Layer 2/3, 4 , 5, and 6 excitatory neurons, as well as PV, VIP, and SST inhibitory interneurons) in different cortical areas (including primary and secondary visual areas as well as motor and premotor areas), respond to visuomotor integration. This investigation provides strong support to the idea that deep layer neurons are indeed unique in their computational properties. This large data set will be of considerable interest to neuroscientists interested in cortical processing.
  
  The authors also provide several lines of evidence that visuomotor information is differentially integrated in deep vs. superficial layers. They show that this is true across experimental paradigms of visuomotor processing (open loop, closed loop, mismatch, drifting grating conditions) and experimental manipulations, with the demonstration that Layer 5 visuomotor integration is more sensitive to disruption by the antipsychotic drug clozapine, compared with cortex as a whole.
  
  The study further uses multiple drugs (clozapine, aripiprazole and haloperidol) to bolster its conclusion that antipsychotic drugs disrupt correlated cortical activity in Layer 5 neurons, and further demonstrates that this disruption is specific to antipsychotics, as the psychostimulant amphetamine shows no such effect.
  
  In widefield calcium imaging experiments, the authors effectively control for the impact of hemodynamic occlusions in their results, and try to minimize this impact using a crystal skull preparation, which performs better than traditional glass windows. Moreover, they examine key findings in widefield calcium imaging experiments with two-photon imaging.
  
  Weaknesses:
  
  A critical weakness of the paper is its statistical analysis. The study does not use mice as its independent unit for statistical comparisons but rather relies on other definitions, without appropriate justification, which results in an inflation of sample sizes. For example, in Figure 1, independent samples are defined as locomotion onsets, leading to sample sizes of approx. 400-2000 despite only using 6 mice for the experiment. This is only justified if the data from locomotion onsets within a mouse is actually statistically independent, which the authors do not test for, and which seems unlikely. With such inflated sample sizes, it becomes more likely to find spurious differences between groups as significant. It also remains unclear how many locomotion onsets come from each mouse; the results could be dominated by a small subset of mice with the most locomotion onsets. The more disciplined approach to statistical analysis of the dataset is to average the data associated with locomotion onsets within a mouse, and then use the mouse as an independent unit for statistical comparison. A second example, for instance, is in Figure 2L, where the independent statistical unit is defined as cortical regions instead of mice, with the left and right hemispheres counting as independent samples; again this is not justified. Is the activity of cortical regions within a mouse and across cortical hemispheres really statistically independent? The problem is apparent throughout the manuscript and for each data set collected. An additional statistical issue is that it is unclear if the authors are correcting for the use of multiple statistical tests (as in for example Figure 1L and Figure 2B,D). In general, the use of statistics by the authors is not justified in the text.
  
  Finally, it is important to note that whilst the study demonstrates that antipsychotics may selectively impact visuomotor integration in L5 neurons, it does not show that this effect is necessary or sufficient for the action of antipsychotics; though this is likely beyond the scope of the study it is something for readers to keep in mind.
  
  We thank the reviewer for the feedback and the help with improving the manuscript.
  
  Regarding the concerns of statistical analysis, this may partially be a misunderstanding. We apologize for the lack of clarity. For example, the data in Figures 1F-1K is indeed shown as averaged over locomotion onsets, but there is no statistical analysis performed in these panels. The unit for the statistical analysis shown in Figure 1L is brain area (not locomotion onset). A central tenet of the analysis shown in Figures 1L and 2 is that the effect of differential activation during closed and open loop locomotion onsets is not specific to visual areas of cortex. In visual areas of cortex, one would expect to find a difference. In essence, the surprising finding here is the lack of a difference in other cell types but L5 IT neurons. Thus, in the analyses of those figure panels we are testing whether the effect is present on average across all cortical areas. Hence, we chose the statistical unit of Figure 1L to be cortical areas, not mice. We have added the same analysis with mice as a statistical unit as Figure S4J.
  
  Reviewer #1 (Recommendations For The Authors):
  
  I have a few concerns and questions that I would like to see addressed:
  
  1) Figure 1L - the statistics are a little unusual here as the errors are across visual areas rather than across mice or hemispheres. This isn't ideal as ideally, we want to generalize the results across animals, not areas, and the results seem to be driven mostly by V1/RSC. I would like to see comparisons using mice as the statistical unit either in an ANOVA with areas as factors or post-hoc comparisons per area.
  
  Based on the assumption that visual cortex should respond to visual stimuli, we would have expected to find a difference between closed and open loop locomotion onset responses in all cell types in visual areas of cortex (a closed loop locomotion onset being the combination of locomotion and visual flow onset, while an open loop locomotion onset lacks the visual flow component). Thus, the first surprise was that in most cell types we found very little difference between these two locomotion onset types. Conversely, in Tlx3-positive L5 IT neurons the difference was apparent well outside of the visual areas of cortex (even though the difference was indeed strongest in V1/RSC). To quantify the extent to which closed and open loop locomotion onsets result in different activity patterns across dorsal cortex we performed the analyses shown in Figures 1L and 2. To make the point that the effect was observable on average across cortical areas, we used cortical area as a unit in Figure 1L. We have added the analysis shown in Figure 1L with mice as the statistical unit as Figure S4J and have added the ANOVA information to Table S1, as suggested.
  
  2) The reduction of activity of L5 IT cells in V1 after the presentation of gratings is curious. The authors suggest it might have been due to one population of cells tuned for the orientation of the presented grating suppressing the remaining cells leading to an aggregate negative response. However, they also observed this negative response in the 2p signal for individual somata. Presumably in the 2p data they could check their hypothesis - is there a group of cells that were tuned for the grating? Is it possible that for some reason the L5 IT cells in the 2p were not being activated by the grating because of their RF locations? How large were the gratings - I didn't see this in the methods section?
  
  We can certainly identify neurons that selectively increase activity to one particular grating. See Author response image 1, for vertical and horizontal gratings. The gratings were presented full-field on a toroidal screen that surrounded the mouse (240 degrees horizontal and 100 degrees vertical coverage of the visual field). This covered a large fraction of the field of view of the mouse. While we did not map receptive fields of individual neurons in this study, it is unlikely that the receptive fields of the neurons recorded were outside the stimulated area. We have made this clearer in the manuscript.
  
  Author response image 1.
  
  The population L5 IT neuron response to full-field drifting grating stimuli was a decrease of activity, yet there were increasing responses in a subset of neurons. (A) Heatmap of responses of all L5 IT neuron somata recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice to drifting gratings of vertical orientation, sorted by their response. Data were sorted on odd trials and plotted on even trials to avoid regression to the mean artifacts. Dashed black box marks the top 10% responsive neurons. The data are a subset of the data shown in Figure S3D. (B) As in A, but for responses to drifting gratings of horizontal orientation. (C) Responses of top 10% vertical grating responsive neurons (dashed black box in A) to vertical (orange) or horizontal gratings (green). Neurons were selected on odd trials, and the average response of even trials is shown. (D) As in A, but sorted to the response of horizontal drifting gratings. (E) As in D, but for the horizontal grating stimulus. (F) As in C, but for the top 10% horizontal grating responsive neurons.
  
  3) I would caution against over-interpretation of latencies from wide-field GCaMP activity (Figure 3). A weaker response in a smaller population of neurons that has the same latency as a strong response in a large population of neurons will appear to have different latencies when convolved with the GCaMP kernel. Also there doesn't appear to be any statistical support for different latencies in different cortical areas. Either this should be correctly treated (ideally with linear mixed effects models to account for the increased correlation within animals) or the latency conclusions should be removed from the manuscript (my recommendation).
  
  We suspect that by “latency conclusions” the reviewer means “latency analysis”. The only time we mention latency differences is to state that: “In C57BL/6 mice that expressed GCaMP brain wide, both visuomotor mismatch and grating stimuli resulted in increases of activity that were strongest and appeared first in visual regions of dorsal cortex (Figures 3A-3C).”
  
  Nevertheless, we agree with the reviewer that response latency and response amplitude are not independent in our measurements and have replaced the latency plots in Figures 3B, 3C, 3E and 3F with average response maps.
  
  4) Given that the data is baseline corrected, is it possible that the effects of the anti-psychotic drugs on L5IT cells was due to a change in the baseline activity of this population?
  
  While we do find a small increase in average activity as a result of antipsychotic drug injections (Author response image 2), these effects are much smaller than those on locomotion onset responses.
  
  Author response image 2.
  
  On average, activity was increased in dorsal cortex after administration of antipsychotic drugs. Average calcium activity over the entire recording session before (naïve) and after (antipsy.) the administration of antipsychotic drugs. Colored lines indicate paired data for individual mice (Blue: 5 mice that had received clozapine, green: 3 mice that had received aripiprazole, red: 3 mice that had received haloperidol).
  
  To illustrate that the clozapine induced change in locomotion related activity cannot be explained by baseline activity differences, we have replotted the responses shown in Figures 4D and 4E, S3B, S5F without baseline subtraction (Author response image 3).
  
  Author response image 3.
  
  Antipsychotic drug injection only modestly shifts the baseline before locomotion onsets. (A) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) during closed (solid line, 1101 onsets) and open loop (dashed line, 348 onsets) locomotion onsets in 5 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in layer L5 IT neurons. Shading indicates SEM over onsets. Dashed horizontal line marks a value of F/F0 of 1.005 for comparison with panel B. Underlying data were the same as in Figures 4D and 4E. (B) As in A, but after a single intraperitoneal injection of the drug clozapine and for 707 closed and 350 open loop locomotion onsets. (C) Average response expressed as F/F0 (wherein F0 was defined as the median of a recording session) of L5 soma in V1, recorded with two-photon imaging in 7 Tlx3-Cre x Ai148 mice that expressed GCaMP6 in L5 IT neurons, during either closed (solid) or open loop (dashed) locomotion onsets. Shading indicates SEM over 8434 neurons. Dashed horizontal line marks a value of F/F0 of 1.045 for comparison with panel D. Underlying data were the same as in Figure S3B. (D) As in C, but for the 3 Tlx3 x Ai148 mice that had received a single intraperitoneal injection of clozapine. Underlying data were from Figure S5F.
  
  5) Figure 5/Figure S6 - Do the results really reflect an effect of distance or is it driven by areas from different hemispheres. Does the result hold if they factor out the effect of hemisphere or calculate the results within hemisphere?
  
  The effect appears qualitatively unchanged when we exclude interhemispheric connections from the analysis (Author response image 4).
  
  Author response image 4.
  
  As in Figures 6D-6F, but with the exclusion of interhemispheric connections. The decorrelation effect appears qualitatively unchanged.
  
  Reviewer #2 (Recommendations For The Authors):
  
  In addition to my public review, I only have one statistics-related and a few minor editing suggestions for the abstract. I hope that these might help the authors to improve their manuscript.
  
  1) It seems that the researchers are combining observations across different subjects, as seen in Figure 1F-L as well as in all of the other figures. While this has been a common practice in their field, it is now widely recognized that this approach can result in biased statistical inferences since it violates the assumptions of most statistical tests (see this recent discussion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906290/). As such, it may be beneficial for the authors to consider utilizing statistical tests that are designed to accurately deal with hierarchical data sets, like linear mixed models or hierarchical bootstrap, to confirm their key results. Additionally or alternatively, presenting data grouped by subject would help demonstrate the consistency of their findings across subjects.
  
  Please note, in Figures 1F-1K, there are no statistical tests – but the data are indeed averaged over locomotion onsets across all mice. We could use hierarchical sampling to calculate a bootstrap estimate of the mean response curves and show those instead, but that is also not standard practice in the field. We suspect this is also not what the reviewer is suggesting. In Figure 1L, the unit is indeed brain areas (see also our response to comment 1 of reviewer 1), but it is not areas x mice (i.e., the analysis is not hierarchical).
  
  We have now added a supplementary panel (Figure S4J) that shows the data of Figure 1L with mouse as the statistical unit (note, this is also not hierarchical). We have replaced the statistical test data using bootstrapping, as the reviewer suggests. This information can be found in Table S1.<br /> In Figures 2B and 2D, we have replaced the statistical test with hierarchical bootstrap, and updated the corresponding information in Table S1.
  
  For Figure 3, in which we show mismatch and grating onset responses averaged using onsets as the base unit, we have added supplementary panels (Figure S2) that show the same analysis using mice as the statistical unit. This did not change any of the conclusions. Note, there was no statistical testing in Figure 3.
  
  For the decorrelation effect of the different antipsychotic drugs that we show in Figures 6 and 7 the statistical unit is mice x region pairs (that is, while the structure is hierarchical, all mice contribute the same number of pairs). Our data are underpowered to use hierarchical bootstrap for testing the drug effects individually. However, if we combine all antipsychotic drug data (clozapine, aripiprazole, and haloperidol) we reach the same conclusions with hierarchical bootstrap as with the statistical tests (ttest and ranksum) used in the paper (Author response image 5).
  
  Author response image 5.
  
  Hierarchical bootstrap of the combined distribution of correlation values shown in Figures 6F, 7C and 7F did not change the conclusion that administration of antipsychotic drugs reduces L5 IT neuron correlations. Statistical comparisons using hierarchical bootstrap: Short-range vs no change, p < 0.001; long-range vs no change, p < 0.001; short-range vs longrange, p < 0.05.
  
  2) Given the impressive amount of data, I found it sometimes a little difficult to follow the manuscript. The authors might want to consider including a high-level overview of their results and rationales at the end of the introduction, and start each Results subsection with a sentence referring back to that highlevel overview ("To test whether X, we did Y and present it in this section.")
  
  We have attempted to improve the writing along these lines.
  
  3) Some suggestions that might further improve the clarity of writing.
  
  Abstract: Does the brain really distinguish between different "activity patterns", or would externallygenerated and self-generated "stimuli" be a slightly more accurate term to describe the observed alterations in schizophrenia?
  
  We would argue that (outside of sensory organs) the brain only has access to activity patterns, not stimuli directly. We would prefer to keep the phrasing with activity patterns here.
  
  Line 12: It might be easier to follow if the authors explicitly related that sentence back to the previous sentence "their ability to identify self-generated activity patterns" -> "their ability to distinguish between externally and self/internally generated ..."
  
  Absolutely correct – we have improved the writing here.
  
  Line 14: It remains unclear how visuomotor integration relates to the problem of distinguishing between self- and externally generated stimuli.
  
  We have attempted to expand on this in the abstract.
  
  Line 26: it remains unclear how the results support the activation of "internal representations" as this term has not been defined previously
  
  We have removed “internal representation” from the abstract.
  
  Results, line 80ff: I was confused by the description of all the different investigated cell types, as the first figure panels then only talk about brain wide and L5. Maybe the authors might find that shortening this with a reference to the methods might improve the flow.
  
  We have moved the list of cell types and mouse lines to the methods, as suggested.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors should strongly consider reassessing their statistics as outlined in the Public Review.
  
  Specifically:
  
  1) They should justify their definition of independent statistical unit; if this is not the mouse, they should justify why another definition (i.e. locomotion onset) is used, and show that their defined statistical unit achieves the requirements of being statistically independent (i.e. variance of the unit within a mouse is statistically indistinguishable from variance found between mice; more formally they could calculate the intraclass correlation (ICC)).
  
  We assume the reviewer is referring mainly to Figure 1 and therein to panel 1L.
  
  Since we did not perform statistical tests on the calcium traces, we are not sure why we would need to justify the choice of the unit we were showing. Moreover, Figure S2 shows the data of the V1 ROI averaged over mice to address this concern. As also mentioned to reviewer 2, we have amended this Figure S2 for the mouse-averaged traces of the V1 ROI data shown in main Figure 3.
  
  3) They should justify the statistical tests they use and whether they corrected for multiple comparisons; why for example was an ANOVA not used for Figure 1L and Figure 2B,D?
  
  We did not rely on ANOVA statistics for Figure 1L because we were mainly interested in carving out that Tlx3- (and Ntsr1-) positive mice inhabit a unique space when comparing the similarity of activity during closed and open loop locomotion onsets. We appreciate the reviewer taking a slightly different point of view on the data and now additionally report the ANOVA test result in Table S1. We have also opted to replace the statistical test in Figure 1L with bootstrapping. Lastly, we added Figure S4J which now shows the data in Figure 1L but with mice as the statistical unit.
  
  With similar logic, in Figure 2, we were not interested in comparing how the correlation of activity in cortical regions with locomotion behavior evolves over regions within a visuomotor feedback condition (closed loop, open loop or dark) but rather how a given region compares across feedback conditions.
  
  Still, we have opted to replace the statistical test in Figures 2B and 2D with hierarchical bootstrap, as also suggested by reviewer #2, comment 1. This did not change the significance indicator bars. We have accordingly updated Table S1 in which we report the full statistics.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.01.31.478462v4
www.biorxiv.org www.biorxiv.org

Separable Dorsal Raphe Dopamine Projections Mimic the Facets of a Loneliness-like State

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors had previously found that brief social isolation could increase the activity of these neurons, and that manipulation of these neurons could alter social behavior in a social rank-dependent fashion. This manuscript explored which of the outputs were responsible for this, identifying the central nucleus of the amygdala as the key output region. The authors identified some discrete behavior changes associated with these outputs, and found that during photostimulation of these outputs, neuronal activity appeared altered in 'social response' neurons.
  
  Strengths:
  
  Rigorous analysis of the anatomy. Careful examination of the heterogenous effects on cell activity due to stimulation, linking the physiology with the behavior via photostimulation during recording in vivo.
  
  Weaknesses:
  
  (1) There are some clear imbalances in the sample size across the different regions parsed. The CeA has a larger sample size, likely in part to the previous work suggesting differential effects depending on social rank/dominance. Given the potential variance, it may be hard to draw conclusions about the impact of stimulation across different social ranks for other groups.
  
  While it may be difficult to draw conclusions about the impact of stimulation across different social ranks, we believe that the dominance-induced variance in our dataset reveals key insights into how social history may affect the function of these circuits. However, we do recognize that there are imbalances in sample size across the different circuits that we probed. To test whether we could detect a significant effect in our DRN<sup>DAT</sup>-CeA:ChR2 group with a sample size matched to the DRN<sup>DAT</sup>-BLP:ChR2 group (the lowest sample size of the three circuits probed), we subsampled and ran tests for statistical significance using the following MATLAB code:
  
  Author response image 1.
  
  We found that out of 1000 subsamples, we detected a statistically significant effect 40.5% of the time (Author response image 2A). This suggests that the optogenetic effect exists, though it is moderate and is variable across mice (as explained by the significant correlation between social rank and optogenetic effect).
  
  To test whether these inconsistent effects may be an effect of variance induced by social rank, we wrote the following MATLAB code to maintain the distribution of social rank in our subsamples:
  
  Author response image 2.
  
  P-values from subsampling analysis show a moderately reproducible social preference effect in DRN<sup>DAT</sup>-CeA:ChR2 mice, but not in DRN<sup>DAT</sup>-BNST:ChR2 mice. (A-D) Histograms showing distribution of paired t-test p-values comparing OFF and ON social preference scores (as shown in Figure 4A-I) in subsampled groups (to match the sample size of the DRN<sup>DAT</sup>-BLP:ChR2 group). (A) 14 DRN<sup>DAT</sup>-CeA:ChR2 mice were randomly subsampled, a paired t-test was performed, and the resulting p-values were binned and plotted. (B) Same as (A), but ensuring that the proportion of subordinate, intermediate, and dominant mice in the subsampled groups were the same as the original distribution. (C) Same as (A), but with DRN<sup>DAT</sup>-BNST:ChR2 mice. (D) Same as (B), but with DRN<sup>DAT</sup>-BNST:ChR2 mice.
  
  Author response image 3.
  
  We found that out of 1000 subsamples, we detected a statistically significant effect 45.5% of the time when we maintained the original distribution of social rank in DRN<sup>DAT</sup>-CeA:ChR2 mice (Author response image 2B). This suggests that reducing the sample size to N=14 reduces the statistical power and indeed can make an effect harder to reliably detect. The reviewer is correct in saying that sample imbalance may skew conclusions. However, given the rank-dependent optogenetic effect on social preference seen in DRN<sup>DAT</sup>-CeA:ChR2 mice (N=29 mice, p=0.002, Figure 4H) that is notably absent in DRN<sup>DAT</sup>-BLP:ChR2 mice (N=14 mice, p=0.806, Figure 4I), we hypothesize that we would not see a significant effect of photoactivating the DRN<sup>DAT</sup>-BLP circuit on social preference, even with a larger sample size. While we acknowledge there may be evidence that there could be an effect in the DRN<sup>DAT</sup>-BLP projection, this analysis reveals that this effect is not as robust as the effect we see in the DRN<sup>DAT</sup>-CeA projection, which is the focus of this study. An in-depth exploration of the DRN<sup>DAT</sup> projection to the BLP is certainly warranted in future studies.
  
  Interestingly, the same analysis approach applied to DRN<sup>DAT</sup>-BNST:ChR2 mice suggest a reliably negative result, with subsampling only resulting in a significant result 1.1% of the time (Author response image 2C) and 1.7% of the time if maintaining the original rank distribution (Author response image 2D).
  
  (2) It is somewhat unclear why only the 'social object ratio' was used to assess the effects versus more direct measurements of social behavior.
  
  We decided to use ‘social:object ratio’ as we felt that measurement more directly supported our claim of increased social preference through optogenetic manipulation; however, in our updated manuscript, we included direct measurements of social behavior in the revised manuscript (Figure 4—figure supplement 1) and have updated the legend to reflect this addition (lines 1679-1684; 1698-1708).
  
  (3) Somewhat related, while it is statistically significant, it is unclear if the change seen in face investigation of biologically significant, on average, it looks like a few-seconds difference and that was not modulated by social rank.
  
  While the effect size is relatively small (4.19 seconds, 2.32% of the session), we believe we should report any statistically significant findings we discover. However, due to the small effect size, we have de-emphasized our claims regarding this finding in the text (line 172).
  
  (4) There are several papers studying these neurons that have explored behaviors examined here, as well as the physiological connectivity that are not cited that would provide important context for this work. In particular, multiple groups have found a dopamine-mediated IPSP in the BNST, in contrast to this work. There are technical differences that may drive these differences, but not addressing them is a major weakness.
  
  In the revised text, we have cited the groups who have found different effects of dopamine-mediated effects in the ovBNST (specifically from Krawczyk et al., 2011, Maracle et al., 2018, and Yu et al., 2021) and reconciled these results with those from our study (lines 422-432).
  
  (5) The inclusion of some markers for receptors for some of these outputs is interesting, and the authors suggest that this may be important, but this is somewhat disconnected from the rest of the work performed.
  
  We agree that we cannot make any causal signaling mechanism claims with the current downstream receptor RNA expression data (and we are careful in avoiding making those claims in the text), but we include these data to offer a potential mechanism and hope that these descriptive data will be useful to the field for follow up studies.
  
  Reviewer #2 (Public review):<br /> Summary:
  
  The authors perform a series of studies to follow up on their previous work, which established a role for dorsal raphe dopamine neurons (DRN) in the regulation of social-isolation-induced rebound in mice. In the present study, Lee et. al, use a combination of modern circuit tools to investigate putatively distinct roles of DRN dopamine transporting containing (DAT) projections to the bed nucleus of the stria terminalis (BNST), central amygdala (CeA), and posterior basolateral amygdala (BLP). Notably, they reveal that optogenetic stimulation of distinct pathways confers specific behavioral states, with DRNDAT-BLP driving aversion, DRNDAT-BNST regulating non-social exploratory behavior, and DRNDAT-CeA promoting socialability. A combination of electrophysiological studies and in situ hybridization studies reveal heterogenous dopamine and neuropeptide expression and different firing properties, providing further evidence of pathway-specific neural properties. Lastly, the authors combine optogenetics and calcium imaging to resolve social encoding properties in the DRNDAT-CeA pathway, which correlates observed social behavior to socially engaged neural ensembles.
  
  Collectively, these studies provide an interesting way of dissecting out separable features of a complex multifaceted social-emotional state that accompanies social isolation and the perception of 'loneliness.' The main conclusions of the paper provide an important and interesting set of findings that increase our understanding of these distinct DRN projections and their role in a range of social (e.g., prosocial, dominance), non-social, and emotional behaviors. However, as noted below, the examination of these circuits within a homeostatic framework is limited given that a number of the datasets did not include an isolated condition. The DRNDAT-CeA pathway was investigated with respect to social homeostatic states in the present study for some of the datasets.
  
  Strengths:
  
  (1) The authors perform a comprehensive and elegant dissection of the anatomical, behavioral, molecular, and physiological properties of distinct DRN projections relevant to social, non-social, and emotional behavior, to address multifaceted and complex features of social state.<br /> (2) This work builds on prior findings of isolation-induced changes in DRN neurons and provides a working framework for broader circuit elements that can be addressed across the social homeostatic state.<br /> (3) This work characterizes a broader circuit implicated in social isolation and provides a number of downstream targets to explore, setting a nice foundation for future investigation.<br /> (4) The studies account for social rank and anxiety-like behavior in several of the datasets, which are an important consideration to the interpretation of social motivation states, especially in male mice with respect to dominance behavior.
  
  Weaknesses:
  
  (1) The conceptual framework of the study is based on the premise of social isolation and perceived 'loneliness' under the framework of social homeostasis, analogous to hunger. In this framework, social isolation should provoke an aversive state and compensatory social contact behavior. In the authors' prior work, they demonstrate synaptic changes in DRN neurons and social rebound following acute social isolation. Thus, the prediction would be that downstream projections also would show state-dependent changes as a function of social housing conditions (e.g., grouped vs. isolated). In the current paper, a social isolation condition was not included for the majority of the studies conducted (e.g., Figures 1-6 do not include an isolated condition, Figures 7-8 do include an isolated condition). Thus, while Figure 1-6 adds a very interesting and compelling set of data that is of high value to the social behavior field with respect to social and emotional processing and general circuit characterization, these studies do not directly investigate the impacts of dynamic social homeostatic state. The main claim of the paper, including the title (e.g., separable DRN projections mediate facets of loneliness-like state), abstract, intro, and discussion presents the claim of this work under the framework of dynamic social homeostatic states, which should be interpreted with caution, as the majority of the work in the paper did not include a social isolation comparison.
  
  In previous studies, loneliness-like phenotypes have been characterized across species as having the key dimensions of an aversive state that increases prosociality[1–5]. These two features are amplified by photostimulation of DRN DA neurons, and as we show in this manuscript, are separable across different projections to each target, and our ability to distinctly mimic different aspects of the constellation of features we characterize as “loneliness.”
  
  However we agree with the reviewer that we do not intend to imply that the mouse currently feels lonely. Indeed, isolating the animals would occlude our ability to see photostimulation-induced mimicry of specific features of the loneliness-like phenotype, and this is precisely why we did not isolate animals for our ChR2 gain-of-function experiments. To address the reviewers’ concern, we will change the title of our manuscript from making a claim of “mediating” (which we agree would rely more heavily on mediating actual (ethologically-induced) loneliness rather than “mimicry” (photostimulation-induced) behaviors associated with a loneliness-like phenotype. We have changed language regarding this claim throughout our manuscript (Lines 1, 83, 285, 369).
  
  For the ChR2 experiments in particular, we intended the optogenetic manipulation to be a gain-of-function one to test the hypothesis that activation of these circuits is sufficient to recapitulate different facets of a loneliness-like state (i.e. prosociality, aversion, and increased exploratory behavior). As such, that is why we only included group-housed conditions for these experiments—to mimic the phenotype of social isolation without social isolation. To test the necessity of these circuits in mediating different facets of a loneliness-like state, we agree that silencing the studied projections in an isolated state is critical, which is what we show in Figure 8. We agree that the addition of an isolated condition to understand the circuit-specific impact of dynamic social homeostatic state is important (particularly through in vivo recordings of these specific circuits during relevant behaviors), and would be a great follow-up to this study.
  
  (2) In Figure 1, the authors confirm co-laterals in the BNST and CeA via anatomical tracing studies. The goal of the optogenetic studies is to dissociate the functional/behavioral roles of distinct projections. However, one limitation of optogenetic projection targeting is the possibility of back-propagating action potentials (stimulation of terminals in one region may back-propagate to activate cell bodies, and then afferent projections to other regions), and/or stimulation of fibers of passage. Therefore, one limitation in the dataset for the optogenetic stimulation studies is the possibility of non-specific unintended activation of projections other than those intended (e.g., DRNDAT-CeA). This can be dealt with by administering lidocaine to prevent back-propagating action potentials.
  
  While back-propagating action potentials are potentially confounding for the manipulation techniques presented in this paper, we do show circuit-specific optogenetic behavioral effects despite significant collateralization (specifically between DRN<sup>DAT</sup> neurons projecting to the CeA and BNST; Figure 1H), suggesting circuit-specificity. Namely, we see that stimulation of DRN<sup>DAT</sup> terminals in CeA promotes social preference (Figure 4E,K) whereas stimulation of DRN<sup>DAT</sup> terminals in BNST promotes rearing (exploratory) behavior (Figure 3G). There is a non-negligible chance that we are stimulating DRN<sup>DAT</sup> fibers of passage, which we have addressed in a caveat disclaimer included in the revised discussion (lines 345-347).
  
  (3) It is unclear from the test, but in the subjects' section of the methods, it appears that only male animals were included in the study, with no mention of female subjects. It should be clear to the reader that this was conducted in males only if that is the case, with consideration or discussion, about female subjects and sex as a biological variable.
  
  In the revised manuscript, we have included discussion about sex as a biological variable (lines 342-345).
  
  (4) Averaged data are generally reported throughout the study in the form of bar graphs, across most figures. Individual data points would increase the transparency of the data.
  
  In an effort to increase the transparency of the data, we have prepared source data for each data panel in the final version of the manuscript and will upload it to eLife.
  
  REFERENCES
  
  (1) Cacioppo, J.T., Hughes, M.E., Waite, L.J., Hawkley, L.C., and Thisted, R.A. (2006). Loneliness as a specific risk factor for depressive symptoms: cross-sectional and longitudinal analyses. Psychol Aging 21, 140–151. https://doi.org/10.1037/0882-7974.21.1.140.
  
  (2) Cacioppo, S., Capitanio, J.P., and Cacioppo, J.T. (2014). Toward a Neurology of Loneliness. Psychol Bull 140, 1464–1504. https://doi.org/10.1037/a0037618.
  
  (3) Baumeister, R.F., and Leary, M.R. (1995). The need to belong: Desire for interpersonal attachments as a fundamental human motivation. Psychological Bulletin 117, 497–529. https://doi.org/10.1037/0033-2909.117.3.497.
  
  (4) Niesink, R.J., and Van Ree, J.M. (1982). Short-term isolation increases social interactions of male rats: A parametric analysis. Physiology & Behavior 29, 819–825. https://doi.org/10.1016/0031-9384(82)90331-6.
  
  (5) Panksepp, J., and Beatty, W.W. (1980). Social deprivation and play in rats. Behavioral & Neural Biology 30, 197–206. https://doi.org/10.1016/S0163-1047(80)91077-8.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors investigated the role of dopaminergic neurons (dopamine transporter expressing, DAT) in the dorsal raphe nucleus (DRN) in regulating social and affective behavior through projections to the central nucleus of the amygdala (CeA), bed nucleus of the stria terminalis (BNST), and the posterior subdivision of the basolateral amygdala. The largest effect observed was in the DRN-DAT projections to the CeA. Augmenting previously published results from this group (Matthews et al., 2016), the comprehensive behavioral analysis relative to social dominance, gene expression analysis, electrophysiological profiling, and in vivo imaging provides novel insights into how DRN-DAT projections to the CeA influence the engagement of social behavior in the contexts of group-housed and socially isolated mice.
  
  Strengths:
  
  Correlational analysis with social dominance is a nice addition to the study. The overall computational analyses performed are well-designed and rigorous.
  
  Weaknesses:
  
  (1) Analysis of dopamine receptor expression did not include Drd3, Drd4, or Drd5 which may provide more insights into how dopamine modulates downstream targets. This is particularly relevant to the BNST projection in which the densest innervation did not robustly co-localize with the expression of either Drd1 or Drd2. It is also possible that dopamine release from DRN-DAT neurons in any or all of these structures modulates neurotransmitter release from inputs to these regions that contain D2 receptors on their terminals.
  
  Although we find that there is more Vipr2 and Npbwr1 expression compared to Drd1 and Drd2 expression in ovBNST, we still do find that a substantial proportion of cells in ovBNST express dopamine receptors (particularly D2 dopamine receptors, as shown in Figure 5C). In our revised manuscript, we have discussed potential functional mechanism through D3, D4, and D5 dopamine receptors, as well as pre-synaptic dopamine receptor expression (lines 459-461).
  
  (2) Although not the focus of this study, without pharmacological blockade of dopamine receptors, it is not possible to assess what the contribution of dopamine is to the behavioral outcomes. Given the co-release of glutamate and GABA from these neurons, it is possible that dopamine plays only a marginal role in the functional connectivity of DRN-DAT neurons.
  
  While we agree with the reviewer’s comments, we are careful to avoid making claims about dopamine-mediated physiological and behavioral effects of DRN<sup>DAT</sup> neurons (despite that these neurons are genetically identified through the expression of dopamine transporter [DAT]), mentioned in lines 222-228 in the text.
  
  (3) Photostimulation parameters used during the behavioral studies (8 pulses of light delivered at 30 Hz for several minutes) could lead to confounding results limiting data interpretation. As shown in Figure 6J, 8 pulses of light delivered at 30 Hz result in a significant attenuation of the EPSC amplitude in the BLP and CeA projection. Thus, prolonged stimulation could lead to significant synaptic rundown resulting in an overall suppression of connectivity in the later stages of the behavioral analyses.
  
  Despite attenuation of EPSC amplitude in BLP and CeA projections and potential synaptic rundown, we still observe significant behavioral effects through optogenetic manipulation of these circuits (increasing the likelihood of capturing a ‘true positive’ rather than a ‘false negative’ effect). In general, we attempt to reduce the duty cycle by sparingly delivering trains of optogenetic stimulation (eight 5-ms pulses every 5 seconds). Additionally, in the real time place preference task where stimulation of the DRN<sup>DAT</sup>-BLP projection significantly reduces the time spent in the “ON” chamber, stimulation is only delivered when the mouse is in the “ON” compartment of the apparatus. However, we do feel that the reviewer’s concern that EPSC attenuation and potential synaptic rundown may potentially explain the robust place avoidance effects in DRN<sup>DAT</sup>-BLP:ChR2 mice in the first half of the session (Figure 2G). Importantly, we show in our previous published work (Matthews et al., 2016, Cell; Figure 3) through fast-scan cyclic voltammetry (FSCV) that dopamine transients were consistently recorded in response to eight pulses of 30 Hz DRN<sup>TH</sup> stimulation delivered every 5 seconds in the BNST, though less consistently in the CeA.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.03.636224v2
www.biorxiv.org www.biorxiv.org

Structure of human PIEZO1 and its slow inactivating channelopathy mutants

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This useful manuscript shows a set of interesting data including the first cryo-EM structures of human PIEZO1 as well as structures of disease-related mutants in complex with the regulatory subunit MDFIC, which generate different inactivation phenotypes. The molecular basis of PIEZO channel inactivation is of great interest due to its association with several pathologies. This manuscript provides some structural insights that may help to ultimately build a molecular picture of PIEZO channel inactivation. While the structures are of use and clear conformational differences can be seen in the presence of the auxiliary subunit MDFIC, the strength of the evidence supporting the conclusions of the paper, especially the proposed role for pore lipids in inactivation, is incomplete and there is a lack of data to support them.
  
  We thank the editors and reviewers for taking the time and effort to review our manuscript. The evidence supporting the key role of pore lipids in hPIEZO1 activation is as follows. i. Compared with wild-type hPIEZO1, the hydrophobic acyl chain tails of the pore lipids retracted from the hydrophobic pore region in slower inactivating mutant hPIEZO1-A1988V (Fig. 7a-b). ii. Previous electrophysiological functional studies revealed that substituting this hydrophobic pore formed by I2447, V2450, and F2454 with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). iii. In the structure of the HX channelopathy mutant R2456H, the interaction between the hydrophilic phosphate group head of pore lipids and R2456 is disrupted, remodeling the blade and pore module and resulting in a significantly slow-inactivating rate. iv. The interaction between pore lipids and lipidated-MDFIC stabilizes the pore lipids to reseal the pore upon activation of the hPIEZO1-MDFIC complex.
  
  According to previously proposed models for the role of pore lipids in mechanosensitive ion channels, such as MscS (PMID: 33568813), MS K2P (PMID: 25500157) and OSCA channels (PMID: 37402734), the pore lipids seal the channel pores in closed state and could be removed in open state by mechanical force induced membrane deformation, which obeys the force-from-lipids principle. Therefore, in our putative model, the pore lipids seal the hydrophobic pore of hPIEZO1 in the closed state. Upon activation of hPIEZO1, the pore lipids retract from the hydrophobic pore and interact with multi-lipidated MDFIC, stabilizing in the inactivation state. The mild channelopathy mutants make the pore lipids retract from the hydrophobic pore and harder to close upon activation. For the severe channelopathy mutant, the interaction between the pore lipids and R2456 is disrupted, resulting in the missing of pore lipids and significantly slow-inactivating. We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript by Shan, Guo, Zhang, Chen et al., shows a raft of interesting data including the first cryo-EM structures of human PIEZO1. Clearly, the molecular basis of PIEZO channel inactivation is of great interest and as such this manuscript provides some valuable extra information that may help to ultimately build a molecular picture of PIEZO channel inactivation. However, the current manuscript though does not provide any compelling evidence for a detailed mechanism of PIEZO inactivation.
  
  Strengths:
  
  This manuscript documents the first cryo-EM structures of human PIEZO1 and the gain of function mutants associated with hereditary anaemia. It is also the first evidence showing that PIEZO1 gain of function mutants are also regulated by the auxiliary subunit MDFIC.
  
  We thank reviewer #1 for the encouragement.
  
  Weaknesses:
  
  While the structures are interesting and clear differences can be seen in the presence of the auxiliary subunit MDFIC the major conclusions and central tenets of the paper, especially a role for pore lipids in inactivation, lack data to support them. The post-translational modification of PIEZOser# auxiliary subunit MDFIC is not modelled as a covalent interaction.
  
  We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.
  
  The lipids densities of the post-transcriptional modification of PIEZO1 auxiliary subunit MDFIC are shown below. As the lipids densities are not confident, we only use the single-chain lipids to represent them. And the lipidated MDFIC is proven by the MDFIC identification paper.
  
  Author response image 1.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Mechanically activated ion channels PIEZOs have been widely studied for their role in mechanosensory processes like touch sensation and red blood cell volume regulation. PIEZO in vivo roles are further exemplified by the presence of gain-of-function (GOF) or loss-of-function (LOF) mutations in humans that lead to disease pathologies. Hereditary xerocytosis (HX) is one such disease caused due to GOF mutation in Human PIEZO1, which are characterized by their slow inactivation kinetics, the ability of a channel to close in the presence of stimulus. But how these mutations alter PIEZO1 inactivation or even the underlying mechanisms of channel inactivation remains unknown. Recently, MDFIC (myoblast determination family inhibitor proteins) was shown to directly interact with mouse PIEZO1 as an auxiliary subunit to prolong inactivation and alter gating kinetics. Furthermore, while lipids are known to play a role in the inactivation and gating of other mechanosensitive channels, whether this mechanism is conserved in PIEZO1 is unknown. Thus, the structural basis for PIEZO1 inactivation mechanism, and whether lipids play a role in these mechanisms represent important outstanding questions in the field and have strong implications for human health and disease.
  
  To get at these questions, Shan et al. use cryogenic electron microscopy (Cryo-EM) to investigate the molecular basis underlying differences in inactivation and gating kinetics of PIEZO1 and human disease-causing PIEZO1 mutations. Notably, the authors provide the first structure of human PIEZO1 (hPIEZO1), which will facilitate future studies in the field. They reveal that hPIEZO1 has a more flattened shape than mouse PIEZO1 (mPIEZO1) and has lipids that insert into the hydrophobic pore region. To understand how PIEZO1 GOF mutations might affect this structure and the underlying mechanistic changes, they solve structures of hPIEZO1 as well as two HXcausing mild GOF mutations (A1988V and E756del) and a severe GOF mutation (R2456H). Unable to glean too much information due to poor resolution of the mutant channels, the authors also attempt to resolve MCFIC-bound structures of the mutants. These structures show that MDFIC inserts into the pore region of hPIEZO1, similar to its interaction with mPIEZO1, and results in a more curved and contracted state than hPIEZO1 on its own. The authors use these structures to hypothesize that differences in curvature and pore lipid position underlie the differences in inactivation kinetics between wild-type hPIEZO1, hPIEZO1 GOF mutations, and hPIEZO1 in complex with MDFIC.
  
  Strengths:
  
  This is the first human PIEZO1 structure. Thus, these studies become the stepping stone for future investigations to better understand how disease-causing mutations affect channel gating kinetics.
  
  We thank reviewer #2 for the positive comments.
  
  Weaknesses:
  
  Many of the hypotheses made in this manuscript are not substantiated with data and are extrapolated from mid-resolution structures.
  
  We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, the authors used structural biology approaches to determine the molecular mechanism underlying the inactivation of the PIEZO1 ion channel. To this end, the authors presented structures of human PIEZO1 and its slow-inactivating mutants. The authors also determined the structures of these PIEZO1 constructs in complexes with the auxiliary subunit MDFIC, which substantially slows down PIEZO1 inactivation. From these structures, the authors suggested an anti-correlation between the inactivation kinetics and the resting curvature of PIEZO1 in detergent. The authors also observed a unique feature of human PIEZO1 in which the lipid molecules plugged the channel pore. The authors proposed that these lipid molecules could stabilize human PIEZO1 in a prolonged inactivated state.
  
  We thank reviewer #3 for the summary.
  
  Strengths:
  
  Notedly, this manuscript reported the first structures of a human PIEZO1 channel, its channelopathy mutants, and their complexes with MDFIC. The evidence that lipid molecules could occupy the channel pore of human PIEZO1 is solid. The authors' proposals to correlate PIEZO1 resting curvature and pore-resident lipid molecules with the inactivation kinetics are novel and interesting.
  
  Thanks for the positive comments.
  
  Weaknesses:
  
  However, in my opinion, additional evidence is needed to support the authors' proposals.
  
  (1) The authors determined the apo structure of human PIEZO1, which showed a more flattened architecture than that of the mouse PIEZO1. Functionally, the inactivation kinetics of human PIEZO1 is faster than its mouse counterpart. From this observation (and some subsequent observations such as the complex with MDFIC), the authors proposed the anti-correlation between curvature and inactivation kinetics. However, the comparison between human and mouse PIEZO1 structure might not be justified. For example, the human and mouse structures were determined in different detergent environments, and the choice of detergent could influence the resting curvature of the PIEZO structures.
  
  We apologize for the misleading statement about the anti-correlation between curvature and inactivation kinetics of PIEZOs. We cannot conclude that the observation of curvature variation of mPIEZO1 and hPIEZO1 is related to their inactivation kinetics based on structural studies and electrophysiological assay. The difference in structural basis between mPIEZO1 and hPIEZO1 is what we want to state. To avoid this misleading, we have revised the manuscript.
  
  For the concern about detergent, we cannot fully exclude its influence on the curvature of PIEZOs. However, previously reported structures of mPiezo1 (PDB: 7WLT, 5Z10, 6B3R) were in the different detergent environments or in lipid bilayer, but the curvature of mPiezo1 is similar as shown below. Considering the high sequence similarity between mPiezo1 and hPiezo1, we hypothesize that the curvature of both hPiezo1 and mPiezo1 may be unaffected by the detergent.
  
  Author response image 2.
  
  Overall structural comparison of curved mPIEZO1 in the lipid bilayer (PDB: 7WLT), mPiezo1 in CHAPS (PDB: 6B3R) and mPiezo1 in Digitonin (PDB: 5Z10).
  
  (2) Related to point 1), the 3.7 Å structure of the A1988V mutant presented by the authors showed a similar curvature as the WT but has a slower inactivating kinetics.
  
  Based on the structural comparison between hPIEZO1 and its A1998V mutant, the retraction of pore lipids from the hydrophobic center pore in hPIEZO1-A1998V is mainly responsible for its slower inactivating kinetics.
  
  (3) Related to point 1), the authors stated that human PIEZO1 might not share the same mechanism as mouse PIEZO1 due to its unique properties. For example, MDFIC only modifies the curvature of human PIEZO1, and lipid molecules were only observed in the pore of the human PIEZO1. Therefore, it may not be justified to draw any conclusions by comparing the structures of PIEZO1 from humans and mice.
  
  Thanks for the constructive suggestion. To avoid this misleading, we have revised the manuscript.
  
  (4) Related to point 1), it is well established that PIEZO1 opening is associated with a flattened structure. If the authors' proposal were true, in which a more flattened structure led to faster inactivation, we would have the following prediction: more opening is associated with faster inactivation. In this case, we would expect a pressure-dependent increase in the inactivation kinetics.
  
  Could the authors provide such evidence, or provide other evidence along this direction?
  
  We appreciate the reviewer’s comment. We are not claiming a relationship between the flattened structure and activation/inactivation. We only present the results of the structure of wild-type/mutant PIEZO1.
  
  (5) In Figure S2, the authors showed representative experiments of the inactivation kinetics of PIEZO1 using whole-cell poking. However, poking experiments have high cell-to-cell variability.
  
  The authors should also show statics of experiments obtained from multiple cells.
  
  We have shown the statics of representative electrophysiology experiments obtained from multiple cells in Figure S2.
  
  (6) In Figure 2 and Figure 5, when the authors show the pore diameter, it could be helpful to also show the side chain densities of the pore lining residues.
  
  We appreciate the reviewer’s suggestion. The side chain of the pore lining restricted residues have been shown in Figure 2 and Figure 5 and the densities of pore domain have been shown in Figure S4 and S14. Interestingly, the pore lining restricted residues in mPIEZO1 and hPIEZO1 is highly conserved.
  
  (7) The authors observed pore-plugging lipids in slow inactivating conditions such as channelopathy mutations or in complex with MDFIC. The authors propose that these lipid molecules stabilize a "deep resting state" of PIEZO1, making it harder to open and harder to inactivate once opened. This will lead to the prediction that the slow-inactivating conditions will lead to a higher activation threshold, such as the mid-point pressure in the activation curve. Is this true?
  
  Yes, it is true. In Figure S2, the MDFIC-induced slow-inactivation conditions in hPIEZO1-MDFIC, hPIEZO1-A1988V-MDFIC, hPIEZO1-E756del-MDFIC and hPIEZO1-R2456H-MDFIC result in larger half-activation thresholds than hPIEZO1, hPIEZO1-A1988V, hPIEZO1-E756del and hPIEZO1-R2456H, respectively.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I document the major issues below:
  
  (1) Mouse vs Human inactivation
  
  Line 21- "than the slower inactivating curved mouse PIEZO1 (mPIEZO1)."
  
  Where is the data in this paper or any other paper that human PIEZO1 inactivates faster than mouse PIEZO1? This is central to the way the authors present the paper. In fact, the tau quoted for the hPIEZO1 of ~10 ms is similar to that often measured for mPIEZO1. The reference in the discussion for mouse vs human inactivation times is a review of mechanotransduction. Either the authors need to directly compare the tau of mP1 vs hP1 or quote the relevant primary literature if it exists.
  
  As measured in HEK-PIKO cells transfected with mPiezo1, the inactivation time of mPiezo1 is 13 ± 1 ms (PMID: 29261642) at -80 mV.
  
  The tau is also voltage-dependent. The tau is beyond 20 ms at -60 mV for mPIEZO1 (PMID:
  
  20813920) and for hPIEZO1 is still around 10 ms.
  
  (2) MDFIC-lipidation
  
  Without seeing the PDB or EMDB I can't guarantee this but from Figure 6d it seems like the Sacylation in the distal C-terminus of MDFIC is not modelled as a covalent interaction, these lipids are covalently added to the Cys residues in S-acylation via zDHHC enzymes. This should be modelled correctly.
  
  Thanks for this suggestion. As the lipid densities of the post-transcriptional modification of PIEZOs auxiliary subunit MDFIC are not confident, we only use the single-chain lipids to represent them.
  
  And the lipidated MDFIC is proven by the MDFIC identification paper (PMID: 37590348).
  
  (3) Pore lipids and inactivation
  
  The lipids close to the pore are interesting and the density for a lipid is also seen in the mouse MDFIC-PIEZO1 complex from Zhou, Ma et al, 2023. However, there is no data provided by the authors that the lipid is functionally relevant to anything. There is not even a correlation with inactivation in Figure 7. P1+MDFIC inactivates slowest yet the lipids are present within the pore. Second, there is no evidence for what these structures are: closed, or inactivated? In fact, the Xiao lab is now interpreting the 7WLU structure as inactivated.
  
  The evidence supporting the key role of pore lipids in hPIEZO1 activation is as follows. i. Compared with wild-type hPIEZO1, the hydrophobic acyl chain tails of the pore lipids retracted from the hydrophobic pore region in slower inactivating mutant hPIEZO1-A1988V (Fig. 7a-b). ii. Previous electrophysiological functional studies revealed that substituting this hydrophobic pore formed by I2447, V2450, and F2454 with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). iii. In the structure of the HX channelopathy mutant R2456H, the interaction between the hydrophilic phosphate group head of pore lipids and R2456 is disrupted, remodeling the blade and pore module and resulting in a significantly slow-inactivating rate. iv. The interaction between pore lipids and lipidated-MDFIC stabilizes the pore lipids to reseal the pore upon activation of the hPIEZO1-MDFIC complex. Overall, the pore lipid is involved in inactivation, and we have toned down the statement.
  
  (4) Cytosolic plug
  
  There is additional cytosolic density for the human PIEZO1 that the authors intimate could be from a different binding partner. IS it possible to refine this density? Is it from the PIEZO1-tag? At the very least a little more information about this density should be given if it is going to be mentioned like this.
  
  Our purification result shows that the protein is tag-free. We are also curious about the extra cytosolic density, but we do not know what it is.
  
  (5) Reduced sensitivity of PIEZO1 in the presence of MDFIC and its regulatory mechanism
  
  This was reported in the first article however no data is presented by the authors to support MDFIC increasing the mechanical energy required to open PIEZO1. The sentence in the discussion; "MDFIC enables hPIEZO1 to respond to different forces by modifying the pore module through lipid interactions." is not supported by any functional data and seems to be an over-interpretation of the structures.
  
  We appreciate this suggestion. The half-activation threshold of hPEIZO1 and hPEIZO1-MDFIC is measured to be 7 μm and 9 μm, respectively (Fig.S2). In addition, the mechanical currents amplitude of hPIEZO1-MDFIC is extremely small compared to that of WT reaching the nA level (Fig.S2). Therefore, the less mechanosensitive hPIEZO1-MDFIC may require more mechanical energy to open than PIEZO1 WT.
  
  6) Both referencing of the PIEZO1 literature and prose could be improved.
  
  Thanks for the suggestion. We have improved the referencing and prose.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The authors speculate that the difference in curvature between human and mouse PIEZO1 results in its fast inactivation but do not provide experimental evidence to support this idea. This claim would have been bolstered by showing that the GOF human mutations have a more curved structure, but these proved too structurally unstable to be solved at high resolution. However, the authors state that the 3.7 angstrom map solved for hPIEZO1-A1988V does have an overall similar architecture as wild-type hPIEZO1; thus, contradicting their hypothesis.
  
  We apologize for the misleading statement. In our revised manuscript, we do not claim a relationship between the flattened structure and activation/inactivation. We only present the results of the structure of wild-type/mutant PIEZO1.
  
  The structure comparison between the A1988V mutant and WT shows a similar architecture but a different occupancy pattern of pore lipids. Therefore, we suggested that the A1988V mutant has slightly slower inactivation kinetics, mainly due to the exit of pore lipids from the pore.
  
  (2) The authors show that interaction with MDFIC alters hPIEZO1 structure to be more curved and use this to support their idea that changing the curvature of the protein underlies the prolonged inactivation kinetics. It has been previously shown that MDFIC does not change the structure of mPIEZO1 but does alter its inactivation and gating kinetics. How does this discrepancy fit into the inactivation model proposed by the authors? Similarly, their claim that MDFIC slows hPIEZO1 inactivation and weakens mechanosensitivity just by affecting the pore module and changing blade curvature is made based on observation and no experimental data to test it.
  
  We have revised the manuscript to avoid misleading the relationship between the curvature and the inaction kinetics of hPIEZO1. The evidence reported previously that substitution of the hydrophobic pore, formed by I2447, V2450, and F2454, with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). In addition, the severe HX channelopathy mutant R2456H, wherein the interaction between the hydrophilic phosphate group head and R2456 is disrupted, leads to remodeling of the blade and pore module. Indeed, our observation is limited and further experiments will be performed to support our model.
  
  (3) How does their model fit in cell types that have PIEZO1 (or GOF mutant PIEZO1) but not MDFIC?
  
  In cell types that have PIEZO1 or GOF mutant PIEZO1 but not MDFIC, PIEZO1 or GOF mutant PIEZO1 may have a faster inactivation rate than those that bind to MDFIC. It can be proved that overexpressed PIEZOs exhibit faster inactivation kinetics than those in some native cell types with MDFIC expression (PMID: 20813920, 30132757).
  
  (4) Figure S2 is missing quantification of the electrophysiology data. The authors should show summary data in addition to their representative traces including the Imax for all conditions, tau for data shown in b, and sample size for all conditions, and related statistics. The text claims that MDFIC decreases mechanosensitivity (line 156) but there is no data to support this.
  
  For the electrophysiological assay in Figure S2, we referred to previously reported mPIEZO1 mutants (PMID: 23487776, 28716860). We confirmed that the slower inactivation phenotypes of these mutations of hPIEZO1 are similar to those of mPIEZO1.
  
  The half-activation threshold of hPEIZO1 and hPEIZO1-MDFIC is measured to be 7 μm and 9 μm, respectively. This tendency of increased half-activation threshold of hPIEZO1 upon binding with MDFIC is also shown in the electrophysiological result of hPIEZO1 channelopathy mutants.
  
  (5) In line 144, the authors mention that they were able to validate the MDFIC density with multilipidated cysteines on the C-terminal amphipathic helix, but they do not show the density with fitted lipids. While individual densities for some of the lipids are shown in extended Figure 12, it would be helpful to include a figure where they show the map for MDFIC with fitted lipids in it.
  
  Thanks for the valuable suggestion. As the lipid densities of the post-transcriptional modification of PIEZOs auxiliary subunit MDFIC are not confident, we only use the single-chain lipids to represent them. And the lipidated MDFIC is proven by the MDFIC identification paper.
  
  (6) The authors show that R2456 interacts with a lipid at the pore module and hypothesize that this underlies the fast inactivation of hPIEZO1. While they did not obtain a high-resolution structure of this mutant, this hypothesis could be tested by substituting R for side chains with different charges and performing electrophysiology to determine the effects on inactivation.
  
  Thanks for the constructive suggestion. We will perform the electrophysiology assay for R2456 mutants with different side chains.
  
  7) Figure 4 shows overall structure of hPIEZO1 GOF mutations A1988V and E756del in complex with MDFIC. Other than showing an overall similar structure to wildtype hPIEZO1, the authors do not show how the human mutations A1988V alter the structure of the protein at the site of change. Understanding how these mutations affect the local architecture of the protein has important relevance for human physiology.
  
  As the GOF channelopathy mutant hPIEZO1-A1988V is structurally unstable, the density at the site of A1988V is too weak to figure out the related interaction in the structure of the hPIEZO1-A1988V mutant.
  
  Minor comment:
  
  In general, the manuscript will benefit from heavy copy editing. For example, the word cartoon is misspelled in many of the figure legends.
  
  We apologize for the mistake. The manuscript has been checked and revised.
  
  Reviewer #3 (Recommendations for the authors):
  
  Some portions of this manuscript were not well written. For example, at the end of the 3rd paragraph in the introduction, the authors talked about HX mutations and their correlation with malaria infection and plasma iron. This is irrelevant information and will only distract the readers. It would be ideal if the authors could go through the entire manuscript and improve its clarity.
  
  Thanks for the suggestion. We have revised the sentences about HX mutations as suggested and improved the entire manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.14.603468v2
www.biorxiv.org www.biorxiv.org

A direct neural signature of serial dependence in working memory

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  We were delighted by the reviewers' general comments. We thank the reviewers for their thoughtful reviews, constructive criticism, and analysis suggestions. We have carefully addressed each of their points during the revision of the manuscript.
  
  Unfortunately, after the paper was submitted to eLife, the first author, who ran all the analyses, left academia. We now realized that we currently do not have suﬃcient resources to perform all additional analyses as requested by the reviewers.
  
  The following is the authors’ response to the original reviews:
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This study uses MEG to test for a neural signature of the trial history effect known as 'serial dependence.' This is a behavioral phenomenon whereby stimuli are judged to be more similar than they really are, in feature space, to stimuli that were relevant in the recent past (i.e., the preceding trials). This attractive bias is prevalent across stimulus classes and modalities, but a neural source has been elusive. This topic has generated great interest in recent years, and I believe this study makes a unique contribution to the field. The paper is overall clear and compelling, and makes effective use of data visualizations to illustrate the findings. Below, I list several points where I believe further detail would be important to interpreting the results. I also make suggestions for additional analyses that I believe would enrich understanding but are inessential to the main conclusions.
  
  (1) In the introduction, I think the study motivation could be strengthened, to clarify the importance of identifying a neural signature here. It is clear that previous studies have focused mainly on behavior, and that the handful of neuroscience investigations have found only indirect signatures. But what would the type of signature being sought here tell us? How would it advance understanding of the underlying processes, the function of serial dependence, or the theoretical debates around the phenomenon?
  
  Thank you for pointing this out. Our MEG study was designed to address two questions: 1) we asked whether we could observe a direct neural signature of serial dependence, and 2) if so, whether this signature occurs at the encoding or post-encoding stage of stimulus processing in working memory. This second question directly concerns the current theoretical debate on serial dependence.
  
  Previous studies have found only indirect signatures of serial dependence such as reactivations of information from the previous trial or signatures of a repulsive bias, which were in contrast to the attractive bias in behavior. Thus, it remained unclear whether an attractive neural bias can be observed as a direct reflection of the behavioral bias. Moreover, previous studies observed the neuronal repulsion during early visual processes, leading to the proposal that neural signals become attracted only during later, post-encoding processes. However, these later processing stages were not directly accessible in previous studies. To address these two questions, we combined MEG recordings with an experimental paradigm with two items and a retro-cue. This design allowed to record neural signals during separable encoding and post-encoding task phases and so to pinpoint the task phase at which a direct neural signature of serial dependence occurred that mirrored the behavioral effect.
  
  We have slightly modified the Introduction to strengthen the study motivation.
  
  (1a) As one specific point of clarification, on p. 5, lines 91-92, a previous study (St. JohnSaaltink et al.) is described as part of the current study motivation, stating that "as the current and previous orientations were either identical or orthogonal to each other, it remained unclear whether this neural bias reflected an attraction or repulsion in relation to the past." I think this statement could be more explicit as to why/how these previous findings are ambiguous. The St. John-Saaltink study stands as one of very few that may be considered to show evidence of an early attractive effect in neural activity, so it would help to clarify what sort of advance the current study represents beyond that.
  
  Thank you for this comment. In the study by St. John-Saaltink et al. (2016), two gratings oriented at 45° and 135° were always presented to either the left or right side of a central fixation point in a trial (90° orientation difference). As only the left/right position of the 45° and 135° gratings varied across trials, the target stimulus in the current trial was either the same or differed by exactly 90° from the previous trial. In consequence, this study could not distinguish whether the observed bias was attractive or repulsive, which concerned both the behavioral effect and the V1 signal. Furthermore, the bias in the V1 signal was partially explained by the orientation that was presented at the same position in the previous trial, which could reflect a reactivation of the previous orientation rather than an actual altered orientation.
  
  We have changed the Introduction accordingly.
  
  References:
  
  St. John-Saaltink E, Kok P, Lau HC, de Lange FP (2016) Serial Dependence in Perceptual Decisions Is Reflected in Ac6vity Pa9erns in Primary Visual Cortex. Journal of Neuroscience 36: 6186–6192.
  
  (1b) The study motivation might also consider the findings of Ranieri et al (2022, J. Neurosci) Fornaciai, Togoli, & Bueti (2023, J. Neurosci), and Lou& Collins (2023, J. Neurosci) who all test various neural signatures of serial dependence.
  
  Thank you. As all listed findings showed neural signatures revealing a reactivation of the previous stimulus or a response during the current trial, we have added them to the paragraph in the Introduction referring to this class of evidence for the neural basis for serial dependence.
  
  (2) Regarding the methods and results, it would help if the initial description of the reconstruction approach, in the main text, gave more context about what data is going into reconstruction (e.g., which sensors), a more conceptual overview of what the 'reconstruction' entails, and what the fidelity metric indexes. To me, all of that is important to interpreting the figures and results. For instance, when I first read, it was unclear to me what it meant to "reconstruct the direction of S1 during the S2 epoch" (p. 10, line 199)? As in, I couldn't tell how the data/model knows which item it is reconstructing, as opposed to just reporting whatever directional information is present in the signal.
  
  (2a) Relatedly, what does "reconstruction strength" reflect in Figure 2a? Is this different than the fidelity metric? Does fidelity reflect the strength of the particular relevant direction, or does it just mean that there is a high level of any direction information in the signal? In the main text explain what reconstruction strength and what fidelity is?
  
  Thank you for pointing this out. We applied the inverted encoding model method to MEG data from all active sensors (271) within defined time-windows of 100 ms length. MEG data was recorded in two sessions on different days. Specifically, we constructed an encoding model with 18 motion direction-selective channels. Each channel was designed to show peak sensitivity to a specific motion direction, with gradually decreasing sensitivity to less similar directions. In a training step, the encoding model was fiCed to the MEG data of one session to obtain a weight matrix that indicates how well the sensor activity can be explained by the modeled direction. In the testing step, the weight matrix was inverted and applied to the MEG data of the other session, resulting in a response profile of ‘reconstruction strengths’, i.e., how strongly each motion direction was present in a trial. When a specific motion direction was present in the MEG signal, the reconstruction strengths peaked at that specific direction and decreased with increasing direction difference. If no information was present, reconstruction strengths were comparable across all modeled directions, i.e., the response profile was flat. To integrate response profiles across trials, single trial profiles were aligned to a common center direction (i.e., 180°) and then averaged.
  
  To quantify the accuracy of each IEM reconstruction, i.e., how well the response profile represents a specific motion direction relative to all other directions we computed the ‘reconstruction fidelity’. Fidelity was obtained by projecting the polar vector of the reconstruction at every direction angle (in steps of 1°) onto the common center (180°) and averaging across all direction angles (Rademaker et al 2019, Sprague, Ester & Serences, 2016). As such, ‘reconstruction fidelity’ is a summary metric with fidelity greater than zero indicating an accurate reconstruction.
  
  How does the model know which direction to reconstruct? Our modelling procedure was informed about the stimulus in question during both the training and the testing step. Specifically, we informed our model during the training step about e.g., the current S2. Then, we fit the model to training data from the S2 epoch and applied it to testing data from the S2 epoch. Crucially, during the testing step the motion direction in question, i.e., current S2, becomes relevant again. For example, when S2 was 120°, the reconstructions were shifted by 60° in order to align with the common center, i.e., 180°. In addition, we also tested whether we could reconstruct the motion direction of S1 during the S2 epoch. Here, we used again the MEG data from the S2 epoch but now for S1 training. i.e., the model was informed about S1 direction. Accordingly, the recentering step during testing was done with regard to the S1 direction. Similarly, we also reconstructed the motion direction of the previous target (i.e., the previous S1 or S2), e.g., during the S2 epoch.
  
  Together, the multi-variate pattern of MEG activity across all sensors during the S2 epoch could contain information about the currently presented direction of S2, the direction of the preceding S1 and the direction of the target stimulus from the previous trial (i.e., either previous S1 or previous S2) at the same time. An important exception from this regime was the cross-reconstruction analysis (Appendix 1—figure 2). Here we trained the encoding model on the currently relevant item (S1 during the S1 epoch, S2 during the S2 epoch and the cued item during the retro-cue epoch) of one MEG session and reconstructed the previous target on the other MEG session.
  
  Finally, to examine shifts of the neural representation, single-trial reconstructions were assigned to two groups, those with a previous target that was oriented clockwise (CW) in relation to the currently relevant item and those with a previous target that was oriented counter-clockwise (CCW). The CCW reconstructions were flipped along the direction space, hence, a negative deviation of the maximum of the reconstruction from 180° indicated an attraction toward the previous target, whereas a positive deviation indicated a repulsion. Those reconstructions were then first averaged within each possible motion direction and then across them to account for different presentation numbers of the directions, resulting in one reconstruction per participant, epoch and time point. To examine systematic shifts, we then tested if the maximum of the reconstruction was systematically different from the common center (180°). For display purposes, we subtracted the reconstructed maximum from 180° to compute the direction shifts. A positive shift thus reflected attraction and a negative shift reflected repulsion.
  
  We have updated the Results accordingly.
  
  References:
  
  Rademaker RL, Chunharas C, Serences JT (2019) Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience. 22: 1336-1344.
  
  Sprague TC, Ester EF, Serences JT (2016) Restoring Latent Visual Working Memory Representations in Human Cortex. Neuron. 91: 694-707
  
  (3) Then in the Methods, it would help to provide further detail still about the IEM training/testing procedure. For instance, it's not entirely clear to me whether all the analyses use the same model (i.e., all trained on stimulus encoding) or whether each epoch and timepoint is trained on the corresponding epoch and timepoint from the other session. This speaks to whether the reconstructions reflect a shared stimulus code across different conditions vs. that stimulus information about various previous and current trial items can be extracted if the model is tailored accordingly.
  
  As reported above, our modeling procedure was informed about same stimulus during both the training and the testing step, except for the cross-reconstruction analysis.
  
  Regarding the training and testing data, the model was always trained on data from one session and tested on data from the other session, so that each MEG session once served as the training data set and once as the test data set, hence, training and test data were independent. Importantly, training and testing was always performed in an epoch- and time point-specific way: For example, the model that was trained on the first 100-ms time bin from the S1 epoch of the first MEG session was tested on the first 100-ms time bin from the S1 epoch of the second MEG session.
  
  Specifically, when you say "aim of the reconstruction" (p. 31, line 699), does that simply mean the reconstruction was centered in that direction (that the same data would go into reconstructing S1 or S2 in a given epoch, and what would differentiate between them is whether the reconstruction was centered to the S1 or S2 direction value)?
  
  As reported above, during testing the reconstruction was centered at the currently relevant direction. The encoding model was trained with the direction labels of S1, S2 or the target item, corresponding to the currently relevant direction, i.e., S1 in S1 epochs, S2 in S2 epochs and target item (S1 or S2) in the retro-cue epoch. The only exception was the reconstruction of S1 during the S2 epoch. Here the encoding model was trained on the S1 direction, but with data from the S2 epoch and then applied to the S2 epoch data and recentered to the S1 direction. So here, S1 and S2 were indeed trained and tested separately for the same epoch.
  
  (4) I think training and testing were done separately for each epoch and timepoint, but this could have important implications for interpreting the results. Namely if the models are trained and tested on different time points, and reference directions, then some will be inherently noisier than others (e.g., delay period more so than encoding), and potentially more (or differently) susceptible to bias. For instance, the S1 and S2 epochs show no attractive bias, but they may also be based on more high-fidelity training sets (i.e., encoding), and therefore less susceptible to the bias that is evident in the retrocue epoch.
  
  Thanks for pointing this out. Training and testing were performed in an epoch- and time point-specific way. Thus, potential differences in the signal-to-noise ratio between different task phases could cause quality differences between the corresponding reconstructed MEG signals. However, we did not observe such differences. Instead, we found comparable time courses of the reconstruction fidelities and the averaged reconstruction strengths between epochs (Figure 2b and 2c, respectively). Fig. 2b, e.g., shows that reconstruction fidelity for motion direction stimuli built up slowly during the stimulus presentation, reaching its maximum only after stimulus offset. This observation may contrast to different stimulus materials with faster build-ups, like the orientation of a Gabor.
  
  We agree with the reviewer that, regardless of the comparable but not perfectly equal reconstruction fidelities, there are good arguments to assume that the neural representation of the stimulus during its encoding is typically less noisy than during its post-encoding processing and that this difference could be one of the reasons why serial dependence emerged in our study only during the retro-cue epoch. However, the argument could also be reversed: a biased representation, which represents a small and hard-to-detect neural effect, might be easier to observe for less noisy data. So, the fact that we found a significant bias only during the potentially “noisier” retro-cue epoch makes the effect even more noteworthy.
  
  We mentioned the limitation related to our stimulus material already at the end of the Discussion. We have now added a new paragraph to the Discussion to address the two opposing lines of reasoning.
  
  (4) I believe the work would benefit from a further effort to reconcile these results with previous findings (i.e., those that showed repulsion, like Sheehan & Serences), potentially through additional analyses. The discussion attributes the difference in findings to the "combination of a retro-cue paradigm with the high temporal resolution of MEG," but it's unclear how that explains why various others observed repulsion (thought to happen quite early) that is not seen at any stage here. In my view, the temporal (as well as spatial) resolution of MEG could be further exploited here to better capture the early vs. late stages of processing. For instance, by separately examining earlier vs. later time points (instead of averaging across all of them), or by identifying and analyzing data in the sensors that might capture early vs. late stages of processing. Indeed, the S1 and S2 reconstructions show subtle repulsion, which might be magnified at earlier time points but then shift (toward attraction) at later time points, thereby counteracting any effect. Likewise, the S1 reconstruction becomes biased during the S2 epoch, consistent with previous observations that the SD effects grow across a WM delay. Maybe both S1 and S2 would show an attractive bias emerging during the later (delay) portion of their corresponding epoch? As is, the data nicely show that an attractive bias can be detected in the retrocue period activity, but they could still yield further specificity about when and where that bias emerges.
  
  We are grateful for this suggestion. Before going into detail, we would like to explain our motivation for choosing the present analysis approach that included averaging time points within an epoch of interest.
  
  Our aim was to detect a neuronal signature of serial dependence which is manifested as an attractive shift of about 3.5° degrees within the 360° direction space. To be able to detect such a small effect in the neural data and given the limited resolution of the reconstruction method and the noisy MEG signals, we needed to maximize the signal-to-noise ratio. A common method to obtain this is by averaging data points. In our study we asked subjects to perform 1022 trials, down-sampled the MEG data from the recorded sampling rate of 1200 Hz to 10 Hz (one data point per 100 ms) that we used for the estimation of reconstruction fidelity and calculated the final neural shift estimates by averaging time points that showed a robust reconstruction fidelity, thus representing interpretable data points.
  
  Our procedure to maximize the signal-to-noise ratio was successful as we were able to reliably reconstruct the presented and remembered motion direction in all epochs (Figure 1a and 1b in the manuscript). However, the reconstruction did not work equally well for all time points within each epoch. In particular, there were time points with a non-significant reconstruction fidelity. In consequence, for the much smaller neural shift effect we did not expect to observe reliable time-resolved results, i.e., when considering each time point separately. Instead, we used the reconstruction results to define the time window in order to calculate the neural shift, i.e., we averaged across all time points with a significant reconstruction fidelity.
  
  Author response image 1 depicts the neural shift separately for each time point during the retro-cue epoch. Importantly, the gray parts of the time courses indicate time points where the reconstruction of the presented or cued stimulus was not significant. This means that the reconstructed maxima at those time points were very variable/unreliable and therefore the neural shifts were hardly interpretable.
  
  Author response image 1.
  
  Time courses of the reconstruction shift reveal a tendency for an attractive bias during the retrocue phase. Time courses of the neural shift separately for each time point during the S1 (left panel), S2 (middle panel) and retro-cue epochs (right panel). Gray lines indicate time points with non-significant reconstruction fidelities and therefore very variable and non-interpretable neural reconstruction shifts. The colored parts of the lines correspond to the time periods of significant reconstruction fidelities with interpretable reconstruction shifts. Error bars indicate the middle 95% of the resampling distribution. Time points with less than 5% (equaling p < .05) of the resampling distribution below 0° are indicated by a colored circle. N = 10.
  
  First, the time courses in the Author response image 1 show that the neural bias varied considerably between subjects, as revealed by the resampling distributions, at given time points. In this resampling procedure, we drew 10 participants in 10.000 iterations with replacement and calculated the reconstruction shift based on the mean reconstruction of the resampled participants. The observed variability stresses the necessity to average the values across all time points that showed a significant reconstruction fidelity to increase the signal-to-noise ratio.
  
  Second, despite this high variability/low signal-to-noise ratio, Author response image 1 (right panel) shows that our choice for this procedure was sensible as it revealed a clear tendency of an attractive shift at almost all time points between 300 through 1500 ms after retro-cue onset with only a few individual time-points showing a significant effect (uncorrected for multiple comparisons). It is worth to mention that this time course did not overlap with the time course of previous target cross-reconstruction (Appendix 1—figure 2, right panel), as there was no significant target cross-reconstruction during the retro-cue epoch with an almost flat profile around zero. Also, there was no overlap with previous target decoding in the retro-cue epoch (Figure 5 in the manuscript). Here, the previous target was reactivated significantly only at early time points of 200 and 300 ms post cue onset (i.e., at time points with a non-significant reconstruction fidelity and therefore no interpretable neural shift), while the nominally highest values of the attractive neural shift were visible at later time points that also showed a significant reconstruction fidelity (Figure 2b in the manuscript).
  
  Third, Author response image 1 (left and middle panel) shows the time courses of the neural shift during the S1 and S2 epochs. While no neural shift could be observed for S1, during the S2 epoch the time-resolved analysis indicated an initial attractive shift followed by a (nonsignificant) tendency for a repulsive shift. After averaging neural shifts across time points with a significant reconstruction fidelity, there was no significant effect with an overall tendency for repulsion, as reported in the paper. The attractive part of the neural shift during the S2 epoch was nominally strongest at very early time points (at 100-300 ms after S2 onset) and overlapped perfectly with the reactivation of the previous target as shown by the cross-reconstruction analysis (Appendix 1—figure 2, middle panel). This overlap suggests that the neural attractive shift did not reflect an actual bias of the early S2 representation, but rather a consequence of the concurrent reactivation of the previous target in the same neural code as the current representation. Finally, this neural attractive shift during S2 presentation did not correlate with the behavioral error (single trial-wise correlation: no significant time points during S2 epoch) or the behavioral bias (subject-wise correlation). In contrast, for the retro-cue epoch, we observed a significant correlation between the neural attractive shift and behavior.
  
  Together, the time-resolved results show a clear tendency for an attractive neural bias during the retro-cue phase, thus supporting our interpretation that the attractive shift during the retro-cue phase reflects a direct neuronal signature of serial dependence. However, these additional analyses also demonstrated a large variability between participants and across time points, warranting a cautious interpretation. We conclude that our initial approach of averaging across time points was an appropriate way of reducing the high level of noise in the data and revealed the reported significant and robust attractive neural shift in the retrocue phase.
  
  (5) A few other potentially interesting (but inessential considerations): A benchmark property of serial dependence is its feature-specificity, in that the attractive bias occurs only between current and previous stimuli that are within a certain range of similarity to each other in feature space. I would be very curious to see if the neural reconstructions manifest this principle - for instance, if one were to plot the trialwise reconstruction deviation from 0, across the full space of current-previous trial distances, as in the behavioral data. Likewise, something that is not captured by the DoG fivng approach, but which this dataset may be in a position to inform, is the commonly observed (but little understood) repulsive effect that appears when current and previous stimuli are quite distinct from each other. As in, Figure 1b shows an attractive bias for direction differences around 30 degrees, but a repulsive one for differences around 170 degrees - is there a corresponding neural signature for this component of the behavior?
  
  We appreciate the reviewer's idea to split the data. However, given that our results strongly relied on the inclusion of all data points, i.e., including all distances in motion direction between the current S1, S2 or target and the previous target and requiring data averaging, we are concerned that our study was vastly underpowered to be able to inform whether the attractive bias occurs only within a certain range of inter-stimulus similarity. To address this important question, future studies would require neural measurements with much higher signal-to-noise-ratio than the present MEG recordings with two sessions per participant and 1022 trials in total.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The study aims to probe the neural correlates of visual serial dependence - the phenomenon that estimates of a visual feature (here motion direction) are attracted towards the recent history of encoded and reported stimuli. The authors utilize an established retro-cue working memory task together with magnetoencephalography, which allows to probe neural representations of motion direction during encoding and retrieval (retro-cue) periods of each trial. The main finding is that neural representations of motion direction are not systematically biased during the encoding of motion stimuli, but are attracted towards the motion direction of the previous trial's target during the retrieval (retro-cue period), just prior to the behavioral response. By demonstrating a neural signature of attractive biases in working memory representations, which align with attractive behavioral biases, this study highlights the importance of post-encoding memory processes in visual serial dependence.
  
  Strengths:
  
  The main strength of the study is its elegant use of a retro-cue working memory task together with high temporal resolution MEG, enabling to probe neural representations related to stimulus encoding and working memory. The behavioral task elicits robust behavioral serial dependence and replicates previous behavioral findings by the same research group. The careful neural decoding analysis benefits from a large number of trials per participant, considering the slow-paced nature of the working memory paradigm. This is crucial in a paradigm with considerable trial-by-trial behavioral variability (serial dependence biases are typically small, relative to the overall variability in response errors). While the current study is broadly consistent with previous studies showing that attractive biases in neural responses are absent during stimulus encoding (previous studies reported repulsive biases), to my knowledge it is the first study showing attractive biases in current stimulus representations during working memory. The study also connects to previous literature showing reactivations of previous stimulus representations, although the link between reactivations and biases remains somewhat vague in the current manuscript. Together, the study reveals an interesting avenue for future studies investigating the neural basis of visual serial dependence.
  
  Weaknesses:
  
  (1) The main weakness of the current manuscript is that the authors could have done more analyses to address the concern that their neural decoding results are driven by signals related to eye movements. The authors show that participants' gaze position systematically depended on the current stimuli's motion directions, which together with previous studies on eye movement-related confounds in neural decoding justifies such a concern. The authors seek to rule out this confound by showing that the consistency of stimulus-dependent gaze position does not correlate with (a) the neural reconstruction fidelity and (b) the repulsive shift in reconstructed motion direction. However, both of these controls do not directly address the concern. If I understand correctly the metric quantifying the consistency of stimulus-dependent gaze position (Figure S3a) only considers gaze angle and not gaze amplitude. Furthermore, it does not consider gaze position as a function of continuous motion direction, but instead treats motion directions as categorical variables. Therefore, assuming an eye movement confound, it is unclear whether the gaze consistency metric should strongly correlate with neural reconstruction fidelity, or whether there are other features of eye movements (e.g., amplitude differences across participants, and tuning of gaze in the continuous space of motion directions) which would impact the relationship with neural decoding. Moreover, it is unclear whether the consistency metric, which does not consider history dependencies in eye movements, should correlate with attractive history biases in neural decoding. It would be more straightforward if the authors would attempt to (a) directly decode stimulus motion direction from x-y gaze coordinates and relate this decoding performance to neural reconstruction fidelity, and (b) investigate whether gaze coordinates themselves are history-dependent and are attracted to the average gaze position associated with the previous trials' target stimulus. If the authors could show that (b) is not the case, I would be much more convinced that their main finding is not driven by eye movement confounds.
  
  The reviewer is correct that our eye-movement analysis approach considered gaze angle (direction) and not gaze amplitude. We considered gaze direction to be the more important feature to control for when investigating the neural basis of serial dependence that manifests, given the stimulus material used in our study, as a shift/deviation of angle/direction of a representation towards the previous target motion direction. To directly relate gaze direction and MEG data to each other we equaled the temporal resolution of the eye tracking data to match that of the MEG data. Specifically, our analysis procedure of gaze direction provided a measure indicating to which extent the variance of the gaze directions was reduced compared with random gaze direction patterns, in relation to the specific stimulus direction within each 100 ms time bin. Importantly, this procedure was able to reveal not only systematic gaze directions that were in accordance with the stimulus direction or the opposite direction, but also picked up all stimulus-related gaze directions, even if the relation differed across participants or time.
  
  Our analysis approach was highly sensitive to detect stimulus-related gaze directions during all task phases (Appendix 1—figure 3). As expected, we found systematic gaze directions when S1 and S2 were presented on the screen, and they were reduced thereafter, indicating a clear relationship between stimulus presentation and eye movement. Systematic gaze directions were also present in the retro-cue phase where no motion direction was presented. Here they showed a clearly different temporal dynamic as compared to the S1 and S2 phases. They appeared at later time points and with a higher variability between participants, indicating that they coincided with retrieving the target motion direction from working memory.
  
  To relate gaze directions with MEG results, we calculated Spearman rank correlations. We found that there was no systematic relationship at any time point between the stimulus related reconstruction fidelity and the amount of stimulus-related gaze direction. Even more, the correlation varied strongly from time point to time point revealing its random nature. In addition to the lack of significant correlations, we observed clearly distinct temporal profiles for gaze direction (Appendix 1—figure 3a and Appendix 1—figure 3b) and the reconstruction fidelities (Figure 2b in the manuscript, Appendix 1—figure 3c), in particular in the critical retro-cue phase.
  
  We favored this analysis approach over one that directly decoded stimulus motion direction from x-y gaze coordinates, as we considered it hardly feasible to compute an inverted encoding model with only two eye-tracker channels as an input (in comparison to 271 MEG sensors), and to our knowledge, this has not been done before. Other decoding methods have previously been applied to x-y gaze coordinates. However, in contrast to the inverted encoding model, they did not provide a measure of the representation shift which would be crucial for our investigation of serial dependence.
  
  We appreciate the suggestion to conduct additional analyses on eye tracking data (including different temporal and spatial resolution and different features) and their relation to MEG data. However, the first author, who ran all the analyses, has in the meantime left academia. Unfortunately, we currently do not have sufficient resources to perform additional analyses.
  
  While the presented eye movement control analysis makes us confident that our MEG finding was not crucially driven by stimulus-related gaze directions, we agree with the reviewer that we cannot completely exclude that other eye movement-related features could have contributed to our MEG findings. However, we would like to stress that whatever that main source for the observed MEG effect was (shift of the neuronal stimulus representation, (other) features of gaze movement, or shift of the neuronal stimulus representation that leads to systematic gaze movement), our study still provided clear evidence that serial dependence emerged at a later post-encoding stage of object processing in working memory. This central finding of our study is hard to observe with behavioral measures alone and is not affected by the possible effects of eye movements.
  
  We have slightly modified our conclusion in the Results and Appendix 1. Please see also our response to comment 1 from reviewer 3.
  
  (2) I am not convinced by the across-participant correlation between attractive biases in neural representations and attractive behavioral biases in estimation reports. One would expect a correlation with the behavioral bias amplitude, which is not borne out. Instead, there is a correlation with behavioral bias width, but no explanation of how bias width should relate to the bias in neural representations. The authors could be more explicit in their arguments about how these metrics would be functionally related, and why there is no correlation with behavioral bias amplitude.
  
  We are grateful for this suggestion. We correlated the individual neuronal shift with the two individual parameter fits of the behavior shift, i.e., amplitude (a) and tuning width (w). We found a significant correlation between the individual neural bias and the w parameter (r = .70, p = .0246) but not with the a parameter (r = -.35, p = .3258) during the retro-cue period (Appendix 1—figure 1). This indicates that a broader tuning width of the individual bias (as reflected by a smaller w parameter) was associated with a stronger individual neural attraction.
  
  It is important to note that for the calculation of the neural shift, all trials entered the analysis to increase the signal-to-noise ratio, i.e., it included many trials where current and previous targets were separated by, e.g., 100° or more. These trials were unlikely to produce serial dependence. Subjects with a more broadly tuned serial dependence had more interitem differences that showed a behavioral attraction and therefore more trials affected by serial dependence that entered the calculation of the neural shift. In contrast, individual differences in the amplitude (a) parameter were most likely too small, and higher individual amplitude did not involve more trials as compared to smaller amplitude to affect the neural bias in a way to be observed in a significant correlation.
  
  We have added this explanation to Appendix 1.
  
  (3) The sample size (n = 10) is definitely at the lower end of sample sizes in this field. The authors collected two sessions per participant, which partly alleviates the concern. However, given that serial dependencies can be very variable across participants, I believe that future studies should aim for larger sample sizes.
  
  We want to express our appreciation for raising this issue. We apologize that we did not explicitly explain and justifythe choice for the sample size used in our paper, in particular, as we had in fact performed a formal a-priori power analysis.
  
  At the time of the sample size calculation, there were no comparable EEG or MEG studies to inform our power calculation. Thus, we based our calculation merely on the behavioral effect reported in the literature and, in particular, observed in a behavioral study from our lab that included four different experiments with overall more than 100 participants with 1632 trials each (see Fischer et al., 2020), in which the behavioral serial dependence effect (target vs. nontarget) was very robust. Based on the contrast between target and non-target with an effect size of 1.359 in Experiment 1, a power analysis with 80% desired power led to a small, estimated sample size of 6 subjects.
  
  However, we expected that the detection of the neural signature of this effect would require more participants. Therefore, we based our power calculation on a much smaller behavioral effect, i.e. the modulation of serial dependence by the context-feature congruency that we observed in our previous study (Fischer et al., 2020). In particular, we focused on Experiment 1 of the previous study that used color as the feature for retro-cueing, as we planned to use exactly the same paradigm for the MEG study. In contrast to the serial dependence effect, its modulation by color resulted in a more conservative power estimate: Based on an effect size of 0.856 in that experiment, a sample size of n = 10 should yield a power of 80% with two MEG sessions per subject.
  
  At the time when we conducted our study, two other studies were published that investigated serial dependence on the neural level. Both studies included a smaller number of data points than our study: Sheehan & Serences (2022) recorded about 840 trials in each of 6 participants, resulting in fewer data points both on the participant and on the trial level. Hajonides et al. (2023) measured 20 participants with 400 trials each, again resulting in fewer datapoints than our study (10 participants with 1022 trials each). Taken together, our a-priori sample size estimation resulted in comparable if not higher power as compared to other similar studies, making us feel confident that the estimated sample was sufficient to yield reliable results.
  
  We have now included this description and the results of this power analysis in the Materials and Methods section.
  
  Despite this, we fully agree with the reviewer that our study would profit from higher power. With the knowledge of the results from this study, future projects should attempt to increase substantially the signal-to-noise-ratio by increasing the number of trials in particular, in order to observe, e.g., robust time-resolved effects (see our comments to review 1).
  
  References:
  
  Fischer C, Czoschke S, Peters B, Rahm B, Kaiser J, Bledowski C (2020) Context information supports serial dependence of multiple visual objects across memory episodes. Nature Communication 11: 1932.
  
  Sheehan TC, Serences JT (2022) Attractive serial dependence overcomes repulsive neuronal adaptation PLOS Biology 20: e3001711.
  
  Hajonides JE, Van Ede F, Stokes MG, Nobre AC, Myers NE (2023) Multiple and Dissociable Effects of Sensory History on Working-Memory Performance Journal of Neuroscience 43: 2730–2740.
  
  (4) It would have been great to see an analysis in source space. As the authors mention in their introduction, different brain areas, such as PPC, mPFC, and dlPFC have been implicated in serial biases. This begs the question of which brain areas contribute to the serial dependencies observed in the current study. For instance, it would be interesting to see whether attractive shifts in current representations and pre-stimulus reactivations of previous stimuli are evident in the same or different brain areas.
  
  We appreciate this suggestion. As mentioned above, we currently do not have sufficient resources to perform a MEG source analysis.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This study identifies the neural source of serial dependence in visual working memory, i.e., the phenomenon that recall from visual working memory is biased towards recently remembered but currently irrelevant stimuli. Whether this bias has a perceptual or postperceptual origin has been debated for years - the distinction is important because of its implications for the neural mechanism and ecological purpose of serial dependence. However, this is the first study to provide solid evidence based on human neuroimaging that identifies a post-perceptual memory maintenance stage as the source of the bias. The authors used multivariate pattern analysis of magnetoencephalography (MEG) data while observers remembered the direction of two moving dot stimuli. After one of the two stimuli was cued for recall, decoding of the cued motion direction re-emerged, but with a bias towards the motion direction cued on the previous trial. By contrast, decoding of the stimuli during the perceptual stage was not biased.
  
  Strengths:
  
  The strengths of the paper are its design, which uses a retrospective cue to clearly distinguish the perceptual/encoding stage from the post-perceptual/maintenance stage, and the rigour of the careful and well-powered analysis. The study benefits from high within participant power through the use of sensitive MEG recordings (compared to the more common EEG), and the decoding and neural bias analysis are done with care and sophistication, with appropriate controls to rule out confounds.
  
  Weaknesses:
  
  A minor weakness of the study is the remaining (but slight) possibility of an eye movement confound. A control analysis shows that participants make systematic eye movements that are aligned with the remembered motion direction during both the encoding and maintenance phases of the task. The authors go some way to show that this eye gaze bias seems unrelated to the decoding of MEG data, but in my opinion do not rule it out conclusively. They merely show that the strengths of the gaze bias and the strength of MEGbased decoding/neural bias are uncorrelated across the 10 participants. Therefore, this argument seems to rest on a null result from an underpowered analysis.
  
  Our MEG as well eye-movement analysis showed that they were sensitive to pick up robustly stimulus-related effects, both for presented and remembered motion directions. When relating both signals to each other by correlating MEG reconstruction strength with gaze direction, we found a null effect, as pointed out by the reviewer. Importantly, there was also a null effect when the shift of the reconstruction (representing our main finding) was correlated with gaze direction. Furthermore, an examination of the individual time courses of gaze direction and individual MEG reconstruction strength revealed that the lack of a relationship between MEG and gaze data did not rest on a singular observation but was present across all time points. Even more, the temporal profile of the correlation varied strongly from time point to time point revealing its random nature and indicating that there was no hint of a pattern that just failed to reach significance. Taking these observations together, our MEG findings were unlikely to be explained by eye position.
  
  Nevertheless, we agree with the reviewer that there is general problem of interpreting a null effect with a limited number of observations (and an analysis approach that focused on one out of many possible features of the gaze movement). Thus, we admit that there is a (slight) possibility that eye movements contributed to the observed MEG effects. This possibility, however, did not affect our novel finding that serial dependence occurred during the postencoding stage of object processing in working memory.
  
  Please see also our response to point 1 from reviewer 2.
  
  Impact:
  
  This important study contributes to the debate on serial dependence with solid evidence that biased neural representations emerge only at a relatively late post-perceptual stage, in contrast to previous behavioural studies. This finding is of broad relevance to the study of working memory, perception, and decision-making by providing key experimental evidence favouring one class of computational models of how stimulus history affects the processing of the current environment.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor concerns:
  
  The significance statement opens "Our perception is biased towards sensory input from the recent past." This is a semantic point, but it seems a somewhat odd statement, given there is so much debate about whether serial dependence is perceptual vs. decisional, and that the current work indeed claims that it emerges at a late, post-encoding stage.
  
  Thank you for this point. We agree. “Visual cognition is biased towards sensory input from the recent past.” would be a more appropriate statement. According to the Journal's guidelines, however, the paragraph with the Significant Statement will be not included in the final manuscript.
  
  It would be preferable for data and code to be available at review so that reviewers might verify some procedural points for clarity.
  
  Code and preprocessed data used for the presented analyses are now available on OSF via http://osf.io/yjc93/. Due to storage limitations, only the preprocessed MEG data for the main IEM analyses focusing on the current direction are uploaded. For access to additional data, please contact the authors.
  
  For instance, I could use some clarification on the trial sequence. The methods first say the direction was selected randomly, but then later say each direction occurred equally often, and there were restrictions on the relationships between current and previous trial items. So it seems it couldn't have truly been random direction selection - was the order selected randomly from a predetermined set of possibilities?
  
  For the S1/S2 stimuli in a trial the dots moved fully coherent in a direction randomly drawn from a pool of directions between 5° and 355° spaced 10° from one another, therefore avoiding cardinal directions. Across trials, there was a predetermined set of possible differences in motion direction between the current and the previous target. This set included 18 motion direction differences, ranging from -170° to 180°, in steps of 10°. Trial sequences were balanced in a way that each of these differences occurred equally often during a MEG session.
  
  I could also use some additional assurance the sample size (participants or data points) is sufficient for the analysis approach deployed here.
  
  We performed a formal a-priori power analysis to justify our choice for the sample size. Please see our response to reviewer 2, point 3, where we explained the procedure of the apriori power analysis in detail. We have now included this description and the results of this power analysis in the Materials and Methods.
  
  Did you consider a decoding approach, instead of reconstruction, to test what information predominates the signal, in an unbiased way?
  
  Thank you for this argument. With our analysis approach based on the inverted encoding model, we believe to be unbiased, since we first reconstructed whether the MEG signal contained information about the presented and remembered motion direction. Only in the next step, we tested whether this reconstructed signal showed an offset and if so, whether this offset was biased towards or away from the previous target. A decoding approach aims to answer classification questions and is not suitable to reveal the actual shifts of the neural information. In our study, we could decode, e.g., the current direction or the previous target, but this would not answer the question of whether and at which stage of object processing the current representation was biased towards the past. Moreover, in a decoding approach to reveal which information predominates in the signal, we would have to classify different options (e.g. current information vs previous), thereby biasing the possible set of results more than in our chosen analysis.
  
  I think the claim of a "direct" neural signature may come off as an overstatement when the spatial and temporal aspects of the attractive bias are still so coarsely specified here.
  
  Thank you for pointing this out. We agree that the term “direct neural signature” can be seen as an overstatement when it is interpreted to indicate a narrowly defined activity of a brain region (ideally via “direct” invasive recordings) that reflects serial dependence. Our definition of the term “direct” referred to the observation of an attractive shift in a neural representation of the current target motion direction item towards the previous target. This was in contrast to previous “indirect” evidence for the neural basis of serial dependence based on either repulsive shifts of neural representations that were opposite to the attractive bias in behavior or on a reactivation of previous information in the current trial without presenting evidence for the actual neural shift. With this definition in mind, we consider the title of our study a valid description of our findings.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I was wondering why the authors chose a bootstrap test for their neural bias analysis instead of a permutation test, similar to the one they used for their behavioral analysis. As far as I know, bootstrap tests do not provide guaranteed type-1 error rate control. The procedure for the permutation test would be quite straightforward here, randomly permuting the sign of each participant's neural shift and recording the group-average shift in a permutation distribution. This test seems more adequate and more consistent with the behavioral analysis.
  
  Thank you for this comment. We adapted a resampling approach (bootstrapping) that was similar to that by Ester et al. (2020) who also investigated categorical biases and also applied a reconstruction method (Inverted Encoding Model) to assess significance of a bias of the reconstructed orientation against zero in a certain direction. The bootstrapping method relied on a) detecting an offset against zero and b) evaluating the robustness of the observed effect across participants. In contrast, a permutation approach, as suggested by the reviewer, assesses whether an empirical neural shift is more extreme than the permutation distribution. The permutation approach seems more suited to assess the magnitude of the shift which in our study was not a priority. Therefore, we reasoned that the bootstrapping for our inference statistics was better suited to assess the direction of the neural shift and its robustness across participants.
  
  We have added this additional information to the Materials and Methods:
  
  References:
  
  Ester EF, Sprague TC, Serences JT (2020) Categorical biases in human occipitoparietal cortex. Journal of Neuroscience 40:917–931.
  
  The manuscript could be improved by more clearly spelling how the training and testing data were labelled, particularly for the reactivation analyses. If I understood correctly, in the first reactivation analysis the authors train and test on current trial data, but label both training and testing data according to the previous trial's motion direction. In the second analysis, they label the training data according to the current motion direction, but label the testing data according to the previous motion direction. Is that correct?
  
  Yes, this is correct. Please see also our response to reviewer 1, point 2 and 3, for a detailed description.
  
  I was surprised to see that the shift in the reconstructed direction is about three times larger than the behavioral attraction bias. Would one not expect these to be comparable in magnitude? It would be helpful to address and discuss this in the discussion section.
  
  Thank you for pointing this out. We agree with the reviewer that as both measures provided an identical metric (angle degree), one would expect that their magnitudes should be directly comparable. However, we speculate that these magnitudes inform only about the direction of the bias and their significant difference from zero, thus they operate on different scales and are not directly comparable. For example, Hallenbeck et al. (2022) showed that fMRI-based reconstructed orientation bias and behavioral bias correlated on both individual and group level, despite strong magnitude differences. This is in line with our observation and supports the speculation that the magnitudes of neural and behavioral biases operate on different scales and, thus, are not directly comparable.
  
  We have updated to the Discussion accordingly.
  
  References:
  
  Hallenbeck GE, Sprague TC, Rahmati M, Sreenivasan KK, Curtis CE (2022) Working memory representations in visual cortex mediate distraction effects Nature Communications 12: 471.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) It may be worth showing that the gaze bias towards the current/cued stimulus is not biased towards the previous target. One option might be to run the same analysis pipeline used for the MEG decoding but on the eye-tracking data. Another could be to remove all participants with significant gaze bias, but given the small sample size, this might not be feasible.
  
  We appreciate this suggestion. However, as mentioned above, we currently do not have sufficient resources to conduct additional analyses on the eye tracking data.
  
  (2) Minor typo: Figure 3c - bias should be 11.7º, not -11.7º.
  
  Corrected. Thank you!
  
  Note on data/code availability: The authors state that preprocessed data and analysis code will be made available on publication, but are not available yet.
  
  Code and preprocessed data used for the present analyses are now available on OSF via http://osf.io/yjc93/. Due to storage limitations, only the preprocessed MEG data for the main IEM analyses focusing on the current direction are uploaded. For access to additional data, please contact the authors.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.13.593912v2
www.biorxiv.org www.biorxiv.org

The mechanism of mammalian proton-coupled peptide transporters

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.
  
  We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we have prepared a revised manuscript, but before that we address some of the comments made above in the general assessment:
  
  “lack of incorporation of a protonation coordinate in the free energy landscape”.
  
  We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.
  
  “possibility of protonation of the substrate”.
  
  The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we have amended our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.
  
  “errors with the chosen constant pH MD method for membrane proteins”.
  
  We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such added a cautionary note to our paper. We also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we have promoted this validation, which was in the supplementary figures, into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.
  
  “dismissal of hysteresis emerging from the MEMENTO method”.
  
  We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD for path generation, and find this improvement again for PepT2 in this study. We address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.
  
  “the likelihood of other residues being affected by peptide binding”.
  
  In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We have now made our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.
  
  As for the additional suggested changes in presentation, we provide the requested details on the CpHMD analysis. Furthermore, we use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we have opted to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We have also changed the colours schemes of these plots in our revision to improve accessibility. We have additionally taken the opportunity to fix some typos and further clarified some other statements throughout the manuscript, besides the requests from the reviewers.
  
  Reviewer #1 (Public Review):
  
  The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.
  
  We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.
  
  (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.
  
  a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.
  
  We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:
  
  “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”
  
  Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:
  
  “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.
  
  Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.
  
  In our revision, we have expanded on our discussion of the reasoning behind employing a non-reactive approach and the limitations that imposes on what questions can be answered in this study.
  
  Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.
  
  The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we now make this clear in the appropriate figure captions.
  
  b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).
  
  This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the previous version indicate explicitly that this may involve the substrate. We make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We now make note of this point in the revised manuscript.
  
  As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”
  
  We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.
  
  (2) I have more serious concerns about the CpHMD employed in the study.
  
  a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.
  
  We discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way.
  
  In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This was figure S20 before, though in the revised version we have moved this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.
  
  Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.
  
  Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.
  
  Author response image 1.
  
  All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.
  
  Author response image 2.
  
  Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1.
  
  b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.
  
  In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we now acknowledge explicitly. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of nanoseconds in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We discuss such considerations in the revised paper.
  
  Reviewer #2 (Public Review):
  
  This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.
  
  Strengths:
  
  This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.
  
  Weaknesses:
  
  Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.
  
  We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this, denote it with question marks in the mechanistic overview we give in Figure 8 and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.
  
  Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?
  
  Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and have added details to the latter sentence to help clarify better the nature of the occluded state.
  
  The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.
  
  In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we added more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.
  
  We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.
  
  The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.
  
  Some of the key results include:
  
  (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.
  
  (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.
  
  (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.
  
  Strengths:
  
  (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.
  
  (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.
  
  (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.
  
  Weaknesses:
  
  (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.
  
  The reviewer is right to point out that the statement and Figure S3 as they were do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, did indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We have also remade the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.
  
  (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.
  
  If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.
  
  Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We have revised the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.
  
  (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.
  
  We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.
  
  Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.
  
  There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.
  
  We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.
  
  We hope that the reviewer will be satisfied by our revision, where we replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations For The Authors):
  
  Figure S1: it would be useful to label the panels.
  
  We have now done this.
  
  At the bottom of page 4, it is written that "the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." But it is hard to interpret that from the figure.
  
  See also our response to reviewer #3. We have revised the wording of this statement, and also highlight in Figure S5 the crucial runs we are referring to, in order to make them easier to discern.
  
  At the bottom of page 5, and top of page 6, there is a lot of "other" information shown, which is inserted for the record - this is a bit glossed over and hard to follow.
  
  The “other” information refers to further conditions we had calculated PMFs for and that gave some insight, but which were secondary for drawing our key conclusions. We thank the reviewer for their feedback that this section needs clarification. We have revised this paragraph to make it easier to follow and highlight better the conclusions we draw form the data.
  
  In Figure 7 it looks as though the asterisks have shifted.
  
  We are indebted to the reviewer for spotting this error, the asterisks are indeed shifted one bar to the right of their intended position. The revised version fixes this issue.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Minor points: In Figure 1a, The 7PMY label and arrow are slightly misplaced.
  
  Figure 1a is a schematic diagram to show the available structures of PepT2 homologues (see also the response to reviewer #2 above). The 7PMY label placement is intentional to indicate a partially occluded inwards-facing state. As we write in the figure caption: “Intermediate positions between states indicate partial gate opening”.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.04.578827v2
www.biorxiv.org www.biorxiv.org

New submission 14/12/2023, 08:24:25

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the latest reviews.
  
  A revised version of the manuscript models "slope-based" excitability changes in addition to "threshold-based" changes. This serves to address the above concern that as constructed here changes in excitability threshold are not distinguishable from changes in input. However, it remains unclear what the model would do should only a subset of neurons receive a given, fixed input. In that case, are excitability changes sufficient to induce drift? This remains an important question that is not addressed by the paper in its current form.
  
  Thank you for this important point. In the simulation of two memories (Fig. S6), we stimulated half of the neural population for each of the two memories. We therefore also showed that drift happens when only a subset of neuron was simulated.
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Current experimental work reveals that brain areas implicated in episodic and spatial memory have a dynamic code, in which activity r imulated networks for epresenting familiar events/locations changes over time. This paper shows that such reconfiguration is consistent with underlying changes in the excitability of cells in the population, which ties these observations to a physiological mechanism.
  
  Delamare et al. use a recurrent network model to consider the hypothesis that slow fluctuations in intrinsic excitability, together with spontaneous reactivations of ensembles, may cause the structure of the ensemble to change, consistent with the phenomenon of representational drift. The paper focuses on three main findings from their model: (1) fluctuations in intrinsic excitability lead to drift, (2) this drift has a temporal structure, and (3) a readout neuron can track the drift and continue to decode the memory. This paper is relevant and timely, and the work addresses questions of both a potential mechanism (fluctuations in intrinsic excitability) and purpose (time-stamping memories) of drift.
  
  The model used in this study consists of a pool of 50 all-to-all recurrently connected excitatory neurons with weights changing according to a Hebbian rule. All neurons receive the same input during stimulation, as well as global inhibition. The population has heterogeneous excitability, and each neuron's excitability is constant over time apart from a transient increase on a single day. The neurons are divided into ensembles of 10 neurons each, and on each day, a different ensemble receives a transient increase in the excitability of each of its neurons, with each neuron experiencing the same amplitude of increase. Each day for four days, repetitions of a binary stimulus pulse are applied to every neuron.
  
  The modeling choices focus in on the parameter of interest-the excitability-and other details are generally kept as straightforward as possible. That said, I wonder if certain aspects may be overly simple. The extent of the work already performed, however, does serve the intended purpose, and so I think it would be sufficient for the authors to comment on these choices rather than to take more space in this paper to actually implement these choices. What might happen were more complex modeling choices made? What is the justification for the choices that are made in the present work?
  
  The two specific modeling choices I question are (1) the excitability dynamics and (2) the input stimulus. The ensemble-wide synchronous and constant-amplitude excitability increase, followed by a return to baseline, seems to be a very simplified picture of the dynamics of intrinsic excitability. At the very least, justification for this simplified picture would benefit the reader, and I would be interested in the authors' speculation about how a more complex and biologically realistic dynamics model might impact the drift in their network model. Similarly, the input stimulus being binary means that, on the singleneuron level, the only type of drift that can occur is a sort of drop-in/drop-out drift; this choice excludes the possibility of a neuron maintaining significant tuning to a stimulus but changing its preferred value. How would the use of a continuous input variable influence the results.
  
  (1) In our model, neurons tend to compete for allocation to the memory ensemble: neurons with higher excitability tend to be preferentially allocated and neurons with lower excitability do not respond to the stimulus. Because relative, but not absolute excitability biases this competition, we suggest that the exact distribution of excitability would not impact the results qualitatively. On the other hand, the results might vary if excitability was considered dependent on the activity of the neurons as previously reported experimentally (Cai 2016, Rachid 2016, Pignatelli 2019). An increase in excitability following neural activity might induce higher correlation among ensembles on consecutive days, decreasing the drift.
  
  (2) We thank the reviewer for this very good point. Indeed, two recent studies (Geva 2023 , Khatib 2023) have highlighted distinct mechanisms for a drift of the mean firing rate and the tuning curve. We extended the last part of the discussion to include this point: “Finally, we intended to model drift in the firing rates, as opposed to a drift in the turning curve of the neurons. Recent studies suggest that drifts in the mean firing rate and tuning curve arise from two different mechanisms [33, 34]. Experience drives a drift in neurons turning curve while the passage of time drives a drift in neurons firing rate. In this sense, our study is consistent with these findings by providing a possible mechanism for a drift in the mean firing rates of the neurons driven a dynamical excitability. Our work suggests that drift can depend on any experience having an impact on excitability dynamics such as exercise as previously shown experimentally [9, 35] but also neurogenesis [9, 31, 36], sleep [37] or increase in dopamine level [38]”
  
  Result (1): Fluctuations in intrinsic excitability induce drift
  
  The two choices highlighted above appear to lead to representations that never recruit the neurons in the population with the lowest baseline excitability (Figure 1b: it appears that only 10 neurons ever show high firing rates) and produce networks with very strong bidirectional coupling between this subset of neurons and weak coupling elsewhere (Figure 1d). This low recruitment rate need may not necessarily be problematic, but it stands out as a point that should at least be commented on. The fact that only 10 neurons (20% of the population) are ever recruited in a representation also raises the question of what would happen if the model were scaled up to include more neurons.
  
  This is a very good point. To test how the model depends on the network size, we plotted the drift index against the size of the ensemble. With this current implementation, we did not observe a significant correlation between the drift rate and size of the initial ensemble (Figure S2).
  
  Author response image 1.
  
  The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.
  
  Result (2): The observed drift has a temporal structure
  
  The authors then demonstrate that the drift has a temporal structure (i.e., that activity is informative about the day on which it occurs), with methods inspired by Rubin et al. (2015). Rubin et al. (2015) compare single-trial activity patterns on a given session with full-session activity patterns from each session. In contrast, Delamare et al. here compare full-session patterns with baseline excitability (E = 0) patterns. This point of difference should be motivated. What does a comparison to this baseline excitability activity pattern tell us? The ordinal decoder, which decodes the session order, gives very interesting results: that an intermediate amplitude E of excitability increase maximizes this decoder's performance. This point is also discussed well by the authors. As a potential point of further exploration, the use of baseline excitability patterns in the day decoder had me wondering how the ordinal decoder would perform with these baseline patterns.
  
  This is a good point. Here, we aimed at dissociating the role of excitability from the one of the recurrent currents. We introduced a time decoder that compares the pattern with baseline excitability (E = 0), in order to test whether the temporal information was encoded in the ensemble i.e. in the recurrent weights. By contrast, because the neural activity is by construction biased towards excitability, a time decoder performed on the full session would work in a trivial way.
  
  Result (3): A readout neuron can track drift
  
  The authors conclude their work by connecting a readout neuron to the population with plastic weights evolving via a Hebbian rule. They show that this neuron can track the drifting ensemble by adjusting its weights. These results are shown very neatly and effectively and corroborate existing work that they cite very clearly.
  
  Overall, this paper is well-organized, offers a straightforward model of dynamic intrinsic excitability, and provides relevant results with appropriate interpretations. The methods could benefit from more justification of certain modeling choices, and/or an exploration (either speculative or via implementation) of what would happen with more complex choices. This modeling work paves the way for further explorations of how intrinsic excitability fluctuations influence drifting representations.
  
  Reviewer #2 (Public Review):
  
  In this computational study, Delamare et al identify slow neuronal excitability as one mechanism underlying representational drift in recurrent neuronal networks and that the drift is informative about the temporal structure of the memory and when it has been formed. The manuscript is very well written and addresses a timely as well as important topic in current neuroscience namely the mechanisms that may underlie representational drift.
  
  The study is based on an all-to-all recurrent neuronal network with synapses following Hebbian plasticity rules. On the first day, a cue-related representation is formed in that network and on the next 3 days it is recalled spontaneously or due to a memory-related cue. One major observation is that representational drift emerges day-by-day based on intrinsic excitability with the most excitable cells showing highest probability to replace previously active members of the assembly. By using a daydecoder, the authors state that they can infer the order at which the reactivation of cell assemblies happened but only if the excitability state was not too high. By applying a read-out neuron, the authors observed that this cell can track the drifting ensemble which is based on changes of the synaptic weights across time. The only few questions which emerged and could be addressed either theoretically or in the discussion are as follows:
  
  Would the similar results be obtained if not all-to-all recurrent connections would have been molded but more realistic connectivity profiles such as estimated for CA1 and CA3?
  
  This is a very interesting point. We performed further simulations to show that the results are not dependent on the exact structure of the network. In particular, we show that all-to-all connectivity is not required to observe a drift of the ensemble. We found similar results when the recurrent weights matrix was made sparse (Fig. S4a-c, Methods). Similarly to all-to-all connectivity, we found that the ensemble is informative about its temporal history (Fig. S4d) and that an output neuron can decode the ensemble continuously (Fig. S4e).
  
  Author response image 2.
  
  Sparse recurrent connectivity shows similar drifting behavior as all-to-all connectivity. The same simulation protocol as Fig. 1 was used while the recurrent weights matrix was made 50% sparse (Methods). a) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. b) Recurrent weights matrices after each of the four stimuli show the drifting assembly. c) Correlation of the patterns of activity between the first day and every other days. d) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. e) Center of mass of the distribution of the output weights (Methods) across days. c-e) Data are shown as mean ± s.e.m. for n = 10 simulations.
  
  How does the number of excited cells that could potentially contribute to an engram influence the representational drift and the decoding quality?
  
  This is indeed a very good question. We did not observe a significant correlation between the drift rate and size of the initial ensemble (Fig. S2).
  
  Author response image 3.
  
  The rate of the drift does not depend on the size of the engram. Drift rate against the size of the original engram. Each dot shows one simulation (Methods). n = 100 simulations.
  
  How does the rate of the drift influence the quality of readout from the readout-out neuron?
  
  We thank the reviewer for this interesting question. We introduced a measure of the “read-out quality” and plotted this value against the rate of the drift. We found a small correlation between the two quantities. Indeed, the read-out quality decreases with the rate of the drift.
  
  Author response image 4.
  
  The quality of the read-out decreases with the rate of the drift. Read-out quality computed on the firing rate of the output neuron against the rate of the drift (Methods). Each dot shows one simulation. n = 100 simulations.
  
  Reviewer #3 (Public Review):
  
  The authors explore an important question concerning the underlying mechanism of representational drift, which despite intense recent interest remains obscure. The paper explores the intriguing hypothesis that drift may reflect changes in the intrinsic excitability of neurons. The authors set out to provide theoretical insight into this potential mechanism.
  
  They construct a rate model with all-to-all recurrent connectivity, in which recurrent synapses are governed by a standard Hebbian plasticity rule. This network receives a global input, constant across all neurons, which can be varied with time. Each neuron also is driven by an "intrinsic excitability" bias term, which does vary across cells. The authors study how activity in the network evolves as this intrinsic excitability term is changed.
  
  They find that after initial stimulation of the network, those neurons where the excitability term is set high become more strongly connected and are in turn more responsive to the input. Each day the subset of neurons with high intrinsic excitability is changed, and the network's recurrent synaptic connectivity and responsiveness gradually shift, such that the new high intrinsic excitability subset becomes both more strongly activated by the global input and also more strongly recurrently connected. These changes result in drift, reflected by a gradual decrease across time in the correlation of the neuronal population vector response to the stimulus.
  
  The authors are able to build a classifier that decodes the "day" (i.e. which subset of neurons had high intrinsic excitability) with perfect accuracy. This is despite the fact that the excitability bias during decoding is set to 0 for all neurons, and so the decoder is really detecting those neurons with strong recurrent connectivity, and in turn strong responses to the input. The authors show that it is also possible to decode the order in which different subsets of neurons were given high intrinsic excitability on previous "days". This second result depends on the extent by which intrinsic excitability was increased: if the increase in intrinsic excitability was either too high or too low, it was not possible to read out any information about past ordering of excitability changes.
  
  Finally, using another Hebbian learning rule, the authors show that an output neuron, whose activity is a weighted sum of the activity of all neurons in the network, is able to read out the activity of the network. What this means specifically, is that although the set of neurons most active in the network changes, the output neuron always maintains a higher firing rate than a neuron with randomly shuffled synaptic weights, because the output neuron continuously updates its weights to sample from the highly active population at any given moment. Thus, the output neuron can readout a stable memory despite drift.
  
  Strengths:
  
  The authors are clear in their description of the network they construct and in their results. They convincingly show that when they change their "intrinsic excitability term", upon stimulation, the Hebbian synapses in their network gradually evolve, and the combined synaptic connectivity and altered excitability result in drifting patterns of activity in response to an unchanging input (Fig. 1, Fig. 2a). Furthermore, their classification analyses (Fig. 2) show that information is preserved in the network, and their readout neuron successfully tracks the active cells (Fig. 3). Finally, the observation that only a specific range of excitability bias values permits decoding of the temporal structure of the history of intrinsic excitability (Fig. 2f and Figure S1) is interesting, and as the authors point out, not trivial.
  
  Weaknesses:
  
  The way the network is constructed, there is no formal difference between what the authors call "input", Δ(t), and what they call "intrinsic excitability" Ɛ_i(t) (see Equation 3). These are two separate terms that are summed (Eq. 3) to define the rate dynamics of the network. The authors could have switched the names of these terms: Δ(t) could have been considered a global "intrinsic excitability term" that varied with time and Ɛ_i(t) could have been the external input received by each neuron i in the network. In that case, the paper would have considered the consequence of "slow fluctuations of external input" rather than "slow fluctuations of intrinsic excitability", but the results would have been the same. The difference is therefore semantic. The consequence is that this paper is not necessarily about "intrinsic excitability", rather it considers how a Hebbian network responds to changes in excitatory drive, regardless of whether those drives are labeled "input" or "intrinsic excitability".
  
  This is a very good point. We performed further simulations to model “slope-based”, instead of “threshold-based”, changes in excitability (Fig. S5a, Methods). In this new definition of excitability, we changed the slope of the activation function, which is initially sampled from a random distribution. By introducing a varying excitability, we found very similar results than when excitability was varied as the threshold of the activation function (Fig. S5b-d). We also found similarly that the ensemble is informative about its temporal history (Fig. S5e) and that an output neuron can decode the ensemble continuously (Fig. S5f).
  
  Author response image 5.
  
  Change of excitability as a variable slope of the input-output function shows similar drifting behavior as considering a change in the threshold. The same simulation protocol as Fig. 1 was used while the excitability changes were modeled as a change in the activation function slope (Methods). a) Schema showing two different ways of defining excitability, as a threshold (top) or slope (bottom) of the activation function. Each line shows one neuron and darker lines correspond to neurons with increased excitability. b) Firing rates of the neurons across time. The red traces correspond to neurons belonging to the first assembly, namely that have a firing rate higher than the active threshold after the first stimulation. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the four stimuli show the drifting assembly. d) Correlation of the patterns of activity between the first day and every other days. e) Student's test t-value of the ordinal time decoder, for the real (blue) and shuffled (orange) data and for different amplitudes of excitability E. f) Center of mass of the distribution of the output weights (Methods) across days. d-f) Data are shown as mean ± s.e.m. for n = 10 simulations.
  
  Given how the learning rule that defines input to the readout neuron is constructed, it is trivial that this unit responds to the most active neurons in the network, more so than a neuron assigned random weights. What would happen if the network included more than one "memory"? Would it be possible to construct a readout neuron that could classify two distinct patterns? Along these lines, what if there were multiple, distinct stimuli used to drive this network, rather than the global input the authors employ here? Does the system, as constructed, have the capacity to provide two distinct patterns of activity in response to two distinct inputs?
  
  This is an interesting point. In order to model multiple memories, we introduced non-uniform feedforward inputs, defining different “contexts” (Methods). We adapted our model so that two contexts target two random sub-populations in the network. We also introduced a second output neuron to decode the second memory. The simulation protocol was adapted so that each of the two contexts are stimulated every day (Fig. S6a). We found that the network is able to store two ensembles that drift independently (Fig. S6 and S7a). We were also able to decode temporal information from the patterns of activity of both ensembles (Fig. S7b). Finally, both memories could be decoded independently using two output neurons (Fig. S7c and d).
  
  Author response image 6.
  
  Two distinct ensembles can be encoded and drift independently. a) and b) Firing rates of the neurons across time. The red traces in panel b) correspond to neurons belonging to the first assembly and the green traces to the second assembly on the first day. They correspond to neurons having a firing rate higher than the active threshold after the first stimulation of each assembly. The black bars show the stimulation and the dashed line shows the active threshold. c) Recurrent weights matrices after each of the eight stimuli showing the drifting of the first (top) and second (bottom) assembly.
  
  Author response image 7.
  
  The two ensembles are informative about their temporal history and can be decoded using two output neurons. a) Correlation of the patterns of activity between the first day and every other days, for the first assembly (red) and the second assembly (green). b) Student's test t-value of the ordinal time decoder, for the first (red, left) and second ensemble (green, right) for different amplitudes of excitability E. Shuffled data are shown in orange. c) Center of mass of the distribution of the output weights (Methods) across days for the first (w?ut , red) and second (W20L't , green) ensemble. a-c) Data are shown as mean ± s.e.m. for n = 10 simulations. d) Output neurons firing rate across time for the first ensemble (Yl, top) and the second ensemble (h, bottom). The red and green traces correspond to the real output. The dark blue, light blue and yellow traces correspond to the cases where the output weights were randomly shuffled for every time points after presentation of the first, second and third stimulus, respectively.
  
  Impact:
  
  Defining the potential role of changes in intrinsic excitability in drift is fundamental. Thus, this paper represents a potentially important contribution. Unfortunately, given the way the network employed here is constructed, it is difficult to tease apart the specific contribution of changing excitability from changing input. This limits the interpretability and applicability of the results.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.16.532958v2
www.biorxiv.org www.biorxiv.org

The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after figure below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.
  
  Author response image 1.
  
  Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.
  
  The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.
  
  Reviewer #1 (Public Review):
  
  In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins.
  
  The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future).
  
  A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate.
  
  As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.
  
  A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass.
  
  Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing.
  
  Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset.
  
  Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.
  
  This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.
  
  Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly?
  
  Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.
  
  The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage.
  
  One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.
  
  Reviewer #2 (Public Review):
  
  ## Summary
  
  The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.
  
  ## Strengths
  
  (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this).
  
  We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.
  
  (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected.
  
  Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.
  
  (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences.
  
  ## Weaknesses
  
  (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences.
  
  The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances.
  
  Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.
  
  The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?"
  
  Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.
  
  I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is
  
  where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g
  
  E[Oi,g].
  
  Let’s re-write the in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as
  
  where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias .This can be expressed in terms of the equilibrium GC content by recognizing that
  
  As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.
  
  If we do this, then
  
  Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=1. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1)..
  
  We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).
  
  This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay.
  
  Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids.
  
  We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.
  
  Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method.
  
  Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.02.530449v2
www.biorxiv.org www.biorxiv.org

Biallelic pathogenic variants in DNAH3 cause male infertility in humans and mice

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  (1) Combined Public Reviews:
  
  Strengths:
  
  This work investigates the role of DNAH3 in sperm mobility and male infertility and utilised gold-standard molecular biology techniques, showing strong evidence of its role in male infertility. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.
  
  We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.
  
  Weaknesses:
  
  (1.1) The manuscript lacks a comparison with previous studies on DNAH3 in the Discussion section.
  
  We thank the reviewers' comments.
  
  Recently, Meng et al. identified bi-allelic variants in DNAH3 from patients diagnosed with asthenoteratozoospermia, revealing multiple morphological defects and a disrupted "9+2" arrangement in the patients' sperm (https://doi.org/10.1093/hropen/hoae003, PMID: 38312775). Furthermore, they generated Dnah3 KO mice, which were infertile, and exhibited moderate morphological abnormalities with a normally structured “9 + 2” microtubule arrangement. In our study, we also observed similar phenotypic differences between the phenotypes of DNAH3-deficient patients and Dnah3 KO mice. These findings indicate that DNAH3 may play crucial yet distinct roles in human and mouse male reproduction. Additionally, our TEM analysis demonstrated a notable absence of IDAs in sperm from both DNAH3-deficent patients and Dnah3 KO mice, resembling the findings of Meng et al. To further investigate, we conducted immunofluorescent staining and western blotting to assess the levels of IDA-associated proteins (DNAH1, DNAH6 and DNALI1) and ODA-associated proteins (DNAH8, DNAH17 and DNAI1) in sperm samples from both our DNAH3-deficient patients and Dnah3 KO mice. Our data revealed a reduction in IDA-associated protein levels and comparable ODA-associated protein levels in comparison to normal controls and WT mice, respectively, thus corroborating the TEM observations. These results suggest that DNAH3 is involved in sperm flagellar development in human and mice, specifically through its role in the assembly of IDAs.
  
  Intriguingly, in our study, none of the patients with DNAH3 deficiency reported experiencing any of the principal symptoms associated with PCD. Additionally, our Dnah3 KO mice exhibited normal ciliary development in the lung, brain, eye, and oviduct. Similarly, Meng et al. did not mention any PCD symptoms in their DNAH3-deficient patients, and their Dnah3 KO mice also demonstrated normal ciliary morphology in the trachea and brain. These combined observations suggest that DNAH3 may play a more significant role in sperm flagellar development than in other motile cilia functions. Given that DNAH3 is expressed in ciliary tissues, its role in these tissues remains intriguing and could be elucidated through sequencing of larger cohorts of individuals with PCD.
  
  We have added these discussions in line 267 to 283, and line 300 to 303.
  
  (1.2) The variants of DNAH3 in four infertile men were identified through whole-exome sequencing. Providing an overview of the WES data would be beneficial to offer additional insights into whether other variants may contribute the infertility. This could also help explain why ICSI only works for two out of four patients with DNAH3 variants.
  
  We thank the reviewer's helpful suggestions.
  
  We have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467). The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed below (Table R1). A summary of WES has been presented in Table S1.
  
  Author response table 1.
  
  Quality of whole exome sequencing on infertile men.
  
  The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.
  
  Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.
  
  We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.
  
  Additionally, we did not identify any pathogenic variants that associated with fertilization failure and early embryonic development in the two patients with failed ICSI outcomes. Therefore, these different ICSI outcomes might be attributed to additional unexplained factors from the female partners.
  
  (1.3) Quantification of images would help substantiate the conclusions, particularly in Figures 2, 3, 4, and 6. Improved images in Figures 3A, 4B, and 4C, would help increase confidence in the claims made.
  
  In response to reviewer’s valuable suggestions. We presume that the reviewer means quantification of images in Figure S6, but not Figure 6.
  
  We have compiled statistics for results shown in Figures 2, 3, 4, and S6. Specifically:
  
  - The percentages of abnormal flagellar morphology in normal control and patients, associated with the observations in Figure 2A, have been shown in Figure S1A.
  
  - The percentages of aberrant axonemal ultrastructure in different cross-sections of sperm from in normal control and patients, correspond to the findings in Figure 3A, have been presented in Figure S1B.
  
  - The percentages of abnormal flagellar morphology in WT mice and Dnah3 KO mice have been shown in Figure S7A.
  
  - The percentages of aberrant axonemal arrangement in different cross-sections of sperm from WT mice and Dnah3 KO mice, corresponding to the findings in Figure 4B, have been presented in Figure S7C.
  
  - The percentages of microtubule doublets presenting IDAs in sperm from WT mice and Dnah3 KO mice, related to Figure 4B, have been detailed in Figure S7D.
  
  - The percentages of malformed mitochondria in the midpiece of sperm from WT mice and Dnah3 KO mice, associated with the observations in Figure 4C, have been presented in Figure S7E.
  
  Moreover, we have revised Figures 3A, 4B, and 4C by replacing the unclear TEM images.
  
  (2) Reviewer #1 (Recommendations for The Authors):
  
  (2.1) Please add reference(s) that support what is claimed in lines 83-84.
  
  We are very grateful for the reviewer's careful comments, we have added a reference that describing the homology and expression of DNAH3.
  
  (2.2) In line 286, change "suggested" to "suggest".
  
  Thanks for the reviewer's comments. We have corrected the grammar.
  
  (2.3) Please add reference(s) that support what is claimed in lines 359-360.
  
  According to the reviewer’s suggestions, we have included references detailing the STA-PUT velocity sedimentation for isolation of single human and mouse testicular cells.
  
  (2.4) In line 365, change "in" to "into".
  
  Thanks for the reviewer’s careful comments, we have corrected this word.
  
  (2.5) In Figure 7, I suggest changing "patients" to "wife or partners of patient". Given that the results are indeed from the spouses of the infertile men, I suggest making this small change to keep the consistency and clarity of what the authors did.
  
  In response to reviewer’s kind suggestions, we have replaced “Patient” by “partners of Patient” and revised Figure 7.
  
  (3) Reviewer #2 (Recommendations for The Authors):
  
  (3.1) A summary of the WES data would be needed (i.e. number of reads, mapping quality, etc). As mentioned in the public review, it would be beneficial to present a summary of all variants identified in the data and clarify whether DNAH3 is the only gene that contains variants and whether these variants have been validated.
  
  Many thanks for reviewer’s kind suggestions.
  
  The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed (see author response table 1) A summary of WES has been presented in Table S1.
  
  The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.
  
  Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.
  
  We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.
  
  (3.2) It would be beneficial to the scientific community if the raw data of WES could be uploaded to a public data repository, such as GEO.
  
  According to the reviewer's suggestion, we have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467) and described its availability in the "Data Availability" section.
  
  (3.3) In line 115, it is not clear how the prediction was made. Clarifying them by adding citations or describing methods that predict these pathways/functions would help strengthen it.
  
  Thanks for the reviewer's comments.
  
  SIFT, PolyPhen-2, MutationTaster and CADD assess the deleteriousness of genetic variants by considering genomic features and evolutionary constraint of the surrounding sequence or structural and chemical property altercations by the amino acid substitutions. We have added websites and references of these tools in the manuscript (line 116 to 118).
  
  Here are the principles of these tools.
  
  - The SIFT considers the position at which the change occurred and the type of amino acid change, and then to predict whether an amino acid substitution in a protein will affect protein function [https://sift.bii.a-star.edu.sg/, PMID: 12824425].
  
  - The PolyPhen-2 predicts the impact of an amino acid substitution on a human protein by considering several features, including sequence, phylogenetic, and structural information [http://genetics.bwh.harvard.edu/pph2/, PMID: 20354512].
  
  - The MutationTaster utilizes a Bayes classifier to predict the functional consequences of amino acid substitutions, intronic and synonymous changes, short insertions/deletions (indels), etc. [https://www.mutationtaster.org/, PMID: 24681721].
  
  - The CADD scores are based on diverse genomic features derived from surrounding sequence context, gene model annotations, evolutionary constraint, epigenetic measurements, and functional predictions [https://cadd.gs.washington.edu/, PMID: 30371827].
  
  (4) Reviewer #3 (Recommendations for The Authors):
  
  (4.1) Please ensure that all gene names used in your manuscript have been approved by the HUGO nomenclature committee. For example, "c.3590C>T (p.P1197L)" should be described as "c.3590C>T (Pro1197Leu)".
  
  In response to the reviewer's suggestion, we have improved all the names of gene and variants according to the HUGO nomenclature committee and HGVS Variant Nomenclature Committee, respectively.
  
  (4.2) For Table 1, the authors should provide the rates of abnormal sperm morphologies using the sperm cells from normal male controls.
  
  Thanks for the reviewer’s careful comments. Consistent with the WHO laboratory manual (World Health Organization. WHO laboratory manual for the examination and processing of human semen. World Health Organization, 2021.), our routine semen analysis establishes 4% as the minimum rate of sperm with normal morphology but does not define the maximum rate of various tail defects. However, we reviewed the routine semen analysis on the normal controls in our study, and the approximate distribution of sperm with various flagellar in the normal controls was as follows: normal flagella, 78.6%; absent flagella, 1.7%; short flagella, 0.6%; coiled flagella, 12.5%; bent flagella, 7.9%; irregular flagella, 1.8%.
  
  (4.3) In Table 2, "Mutation Tester" or "Mutation Taster"?
  
  We thank the reviewer’s comments. It should be "MutationTaster", and we have corrected this mistake in Table 2 and the manuscript.
  
  (4.4) In Figure 2B, the bars for patient 1 should be aligned.
  
  Following the reviewer's valuable suggestion, we have ensured consistent scar bar alignment in Figure 2B and implemented this alignment throughout all other figures.
  
  (4.5) In Figure 3A, what about the ultrastructure for sperm heads in DNAH3 deficient sperm cell? The authors previously mentioned abnormalities in sperm head morphologies (Figure 2B) in patients with DNAH3 mutations.
  
  We thank the reviewers for their kind comments. A small fraction of abnormal sperm head of our patients was captured under TEM, manifested by round head with loose chromatin (Author response image 1)
  
  Author response image 1.
  
  Ultrastructure of sperm head from DNAH3-deficient infertile men. TEM analysis revealed a fraction of round head with loose chromatin in patients harboring DNAH3 variants. Scale bars, 200 nm.
  
  (4.6) In Figure S6, the authors should provide the rates of abnormal sperm morphologies for Dnah3 KO male mice.
  
  In response to the reviewer's valuable suggestion, we have quantified morphological defects in spermatozoa from both Dnah3 KO and WT mice. Compared to about 17% morphological abnormalities in sperm from WT mice, the morphological abnormalities in sperm from Dnah3 KO mice were about 37%. The results are presented in the revised Figure S7.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.19.580977v2
www.biorxiv.org www.biorxiv.org

Progressively shifting patterns of co-modulation among premotor cortex neurons carry dynamically similar signals during action execution and observation

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Major changes in the revised manuscript include:
  
  (1) The distinction between condition-dependent versus condition-independent variation in neural activity has been clarified.
  
  (2) Principal angle calculations have been added.
  
  (3) Neurons modulated during action execution but not during action observation have been analyzed to compare and contrast with mirror neurons.
  
  (4) Canonical correlation analysis has been extended to three dimensions.
  
  (5) Speculations have been moved to and modified in the Discussion.
  
  (6) Computational details have been expanded in the Methods.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary and strengths. This paper starts with an exceptionally fair and balanced introduction to a topic, the mirror neuron literature, which is often debated and prone to controversies even in the choice of the terminology. In my opinion, the authors made an excellent job in this regard, and I really appreciated it. Then, they propose a novel method to look at population dynamics to compare neural selectivity and alignment between execution and observation of actions performed with different types of grip.
  
  Thank you.
  
  Weakness.
  
  Unfortunately, the goal and findings within this well-described framework are less clear to me. The authors aimed to investigate, using a novel analytic approach, whether and to what extent a match exists between population codes and neural dynamics when a monkey performs an action or observes it performed by an experimenter. This motivation stems from the fact that the general evidence in the literature is that the match between visual and motor selectivity of mirror neuron responses is essentially at a chance level. While the approach devised by the author is generally well-described and understandable, the main result obtained confirms this general finding of a lack of matching between the two contexts in 2 out of the three monkeys. Nevertheless, the authors claim that the patterns associated with execution and observation can be re-aligned with canonical correlation, indicating that these distinct neural representations show dynamical similarity that may enable the nervous system to recognize particular actions. This final conclusion is hardly acceptable to me, and constitutes my major concern, at least without a more explicit explanation: how do we know that this additional operation can be performed by the brain?
  
  Point taken. In the Discussion, we now have clarified that this is our speculation rather than a conclusion and we also offer an alternative interpretation (lines 724 to 744):
  
  “One classic interpretation of similar latent dynamics in the PM MN population during execution and observation would be that this similarity provides a means for the brain to recognize similar movements performed by the monkey during execution and by the experimenter during observation. Through some process akin to a communication subspace (Semedo et al., 2019), brain regions beyond PM might recognize the correspondence between the latent dynamics of the executed and observed actions.
  
  Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021). Though neurons active only during observation of others (AO units) have been hypothesized to drive observation activity in MNs, the present AO populations were too small to analyze with the approaches we applied here. Nevertheless, the similar relative organization of the execution and observation population activity in PM MNs revealed here by alignment of their latent dynamics through CCA could constitute a correspondence between particular movements that might be made by the subject in response to particular movements made by the other individual, i.e. responsive movements which would not necessarily be motorically similar to the observed movements.”
  
  Is this a computational trick to artificially align something that is naturally non-aligned, or can it capture something real and useful?
  
  We feel this is more than a trick. In the Introduction, we now have clarified (lines 166 to 170):
  
  “Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population.”
  
  In the Results we give the follow example (lines 446 to 455):
  
  “Such alignment would indicate that neural representations of trials involving the four objects bore a similar relationship to one another in neural space during execution and observation, even though they occurred in different subspaces. For example, the trajectories of PMd+M1 neuron populations recorded from two different monkeys during center-out reaching movements could be aligned well (Safaie et al., 2023). CCA showed, for example, that in both brains the neural trajectory for the movement to the target at 0° was closer to the trajectory for movement to the target at 45° than to the trajectory for the movement to the target at 180°. Relationships among these latent dynamic representations of the eight movements thus were similar even though the neural populations were recorded from two different monkeys.”
  
  And in the Discussion we now compare (lines 677 to 686):
  
  “Corresponding neural representations of action execution and observation during task epochs with higher neural firing rates have been described previously in PMd MNs and in PMv MNs using representational similarity analysis RSA (Papadourakis and Raos, 2019). And during force production in eight different directions, neural trajectories of PMd neurons draw similar “clocks” during execution, cooperative execution, and passive observation (Pezzulo et al., 2022). Likewise in the present study, despite execution and observation trajectories progressing through largely distinct subspaces, in all three monkeys execution and observation trajectory segments showed some degree of alignment, particularly the Movement and Hold segments (Figure 8C), indicating similar relationships among the latent dynamic representations of the four RGM movements during execution and observation.”
  
  Based on the accumulated evidence on space-constrained coding of others' actions by mirror neurons (e.g., Caggiano et al. 2009; Maranesi et al. 2017), recent evidence also cited by the authors (Pomper et al. 2023), and the most recent views supported even by the first author of the original discovery (i.e., Vittorio Gallese, see Bonini et al. 2022 on TICS), it seems that one of the main functions of these cells, especially in monkeys, might be to prepare actions and motor responses during social interaction rather than recognizing the actions of others - something that visual brain areas could easily do better than motor ones in most situations. In this perspective, and given the absence of causal evidence so far, the lack of visuo-motor congruence is a potentially relevant feature of the mechanism rather than something to be computationally cracked at all costs.
  
  We agree that this perspective provides a valuable interpretation of our findings. In the Discussion, we have added the following paragraph (lines 730 to 744):
  
  “Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021). Though neurons active only during observation of others (AO units) have been hypothesized to drive observation activity in MNs, the present AO populations were too small to analyze with the approaches we applied here. Nevertheless, the similar relative organization of the execution and observation population activity in PM MNs revealed here by alignment of their latent dynamics through CCA could constitute a correspondence between particular movements that might be made by the subject in response to particular movements made by the other individual, i.e. responsive movements which would not necessarily be motorically similar to the observed movements.”
  
  Specific comments on Results/Methods:
  
  I can understand, based on the authors' hypothesis, that they employed an ANOVA to preliminarily test whether and which of the recorded neurons fit their definition of "mirror neurons". However, given the emphasis on the population level, and the consolidated finding of highly different execution and observation responses, I think it could be interesting to apply the same analysis on (at least also) the whole recorded neuronal population, without any preselection-based on a single neuron statistic. Such preselection of mirror neurons could influence the results of EXE-OBS comparisons since all the neurons activated only during EXE or OBS are excluded. Related to this point, the authors could report the total number of recorded neurons per monkey/session, so that also the fraction of neurons fitting their definition of mirror neuron is explicit.
  
  We are aware that a number of recent studies from other laboratories already have analyzed the entire population of neurons during execution versus observation, without selectively analyzing neurons active during both execution and observation (Jiang et al., 2020; Albertini et al., 2021). However, our focus lies not in how the entire PM neural population encodes execution versus observation, but in the differential activity of the mirror neuron subpopulation in these two contexts. Our new Table 2 presents the numbers of mirror neurons (MN), action execution only neurons (AE), action observation only neurons (AO), and neurons not significantly task-related during either execution or observation (NS). Although we often recorded substantial numbers of AE neurons, very few AO neurons were found in our recordings. In analyzing the AE subpopulation, we found unexpected differences in canonical correlation alignment between and within the MN and AE neuron populations. In view of the editors’ comments that “…the reviewers provided several specific recommendations of new analyses to include. However, now the paper feels extremely long…”. We have chosen to focus on comparing AE neurons with MNs.
  
  Furthermore, the comparison of the dynamics of the classification accuracy in figures 4 and 5, and therefore the underlying assumption of subspaces shift in execution and observation, respectively, reveal substantial similarities between monkeys despite the different contexts, which are clearly greater than the similarities among neural subspaces shifts across task epochs: to me, this suggests that the main result is driven by the selected neural populations in different monkeys/implants rather than by an essential property of the neuronal dynamics valid across animals. Could the author comment on this issue? This could easily explain the "strange" result reported in figure 6 for monkey T.
  
  We have taken the general approach of emphasizing findings common across individual animals, but also reporting individual differences. We have added the following in the Discussion (lines 645 to 654):
  
  “We did not attempt to classify neurons in our PM MN populations as strictly congruent, broadly congruent, or non-congruent. Nevertheless, the minimal overlap we found in instantaneous execution and observation subspaces would be consistent with a low degree of congruence in our PM MN populations. Particularly during one session monkey T was an exception in this regard, showing a considerable degree of overlap between execution and observation subspaces, not unlike the shared subspace found in other studies that identified orthogonal execution and observation subspaces as well (Jiang et al., 2020). Although our microelectrode arrays were placed in similar cortical locations in the three monkeys, by chance monkey T’s PM MN population may have included a substantial proportion of congruent neurons.”
  
  Reviewer #2 (Public Review):
  
  In this work, the authors set out to identify time-varying subspaces in the premotor cortical activity of monkeys as they executed/observed a reach-grasp-hold movement of 4 different objects. Then, they projected the neural activity to these subspaces and found evidence of shifting subspaces in the time course of a trial in both conditions, executing and observing. These shifting subspaces appear to be distinct in execution and observation trials. However, correlation analysis of neural dynamics reveals the similarity of dynamics in these distinct subspaces. Taken together, Zhao and Schieber speculate that the condition-dependent activity studied here provides a representation of movement that relies on the actor.
  
  This work addresses an interesting question. The authors developed a novel approach to identify instantaneous subspaces and decoded the object type from the projected neural dynamics within these subspaces. As interesting as these results might be, I have a few suggestions and questions to improve the manuscript:
  
  (1) Repeating the analyses in the paper, e.g., in Fig5, using non-MN units only or the entire population, and demonstrating that the results are specific to MNs would make the whole study much more compelling.
  
  We have added analyses of those non-MNs modulated significantly during action execution but not during observation, which we refer to as AE neurons. The additional findings from these analyses are spread throughout the manuscript:
  
  Lines 284-293:
  
  “We also examined the temporal progression of the instantaneous subspace of AE neurons. As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3). During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D). After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset. As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”
  
  Lines 411-419:
  
  “During execution trials, classification accuracy for AE populations (Figure 6I-L) showed a time course quite similar to that for MN populations, though amplitudes were lower overall, most likely because of the smaller population sizes. During observation, AE populations showed only low-amplitude, short-lived peaks of classification accuracy around times I, G, M, and H (Figure 6 – figure supplement 1). Given that individual AE neurons showed no statistically significant modulation during observation trials, even these small peaks might not have been expected. Previous studies have indicated, however, that neurons not individually related to task events nevertheless may contribute to a population response (Shenoy et al., 2013; Cunningham and Yu, 2014; Gallego et al., 2017; Jiang et al., 2020).”
  
  Lines 495-508:
  
  “Although MNs are known to be present in considerable numbers in both the primary motor cortex and premotor cortex (see Introduction), most studies of movement-related cortical activity in these areas make no distinction between neurons with activity only during action execution (AE neurons) and those with activity during both execution and observation (MNs). This reflects an underlying assumption that during action execution, mirror neurons function in parallel with AE neurons, differing only during observation. We therefore tested the hypothesis that MN and AE neuron execution trajectory segments from the same session would align well. Figure 8C (blue) shows the mean CCs between MN and AE execution trajectory segments across 8 alignments (MN/AE; 2 R, 3 T, 3 F), which reached the highest values for the Hold segments . All three of these coefficients were substantially lower than those for the MN execution vs. observation alignments given above. Surprisingly, the alignment of AE neuron execution trajectory segments with those of the simultaneously recorded MN population was weaker than the alignment of MN trajectories during execution vs. observation.
  
  Did these differences in MN:1/2, MN:E/O, and MN/AE alignment result from consistent differences in their respective patterns of co-modulation, or from of greater trial-by-trial variability in the patterns of co-modulation among MNs during observation than during execution, and still greater variability among AE neurons during execution? The bootstrapping approach we used for CCA (see Methods) enabled us to evaluate the consistency of relationships among trajectory segments across repeated samplings of trials recorded from the same neuron population in the same session and in the same context (execution or observation). We therefore performed 500 iterations of CCA between two different random samples of MN execution (MN:E/E), MN observation (MN:O/O), or AE execution (AE:E/E) trajectory segments from a given session (2 R, 3 T, 3 F). This within-group alignment of MN execution trajectory segments from the same session (Figure 8D, MN:E/E, gray, Hold: () was as strong as between session alignment (Figure 8C, MN/1:2, black). But within-group alignment of MN observation trajectory segments (Figure 8D, MN:O/O, orange, Hold: () was lower than that found with MN execution segments (Figure 8C, MN:E/O, red, . Likewise, within-group alignment of AE neuron trajectory segments (Figure 8D, AE:E/E, light blue, Hold: () was lower than their alignment with MN execution segments (Figure 8C, MN/AE, blue, Hold: (). Whereas MN execution trajectories were relatively consistent within sessions, MN observation trajectories and AE execution trajectories were less so.”
  
  And in the Discussion we now suggest (lines 682 to 698):
  
  “Based on the assumption that AE neurons and MNs function as a homogenous neuron population during action execution, we had expected AE and MN execution trajectory segments to align closely. During execution trials, the progression of instantaneous condition-dependent subspaces and of classification accuracy in AE populations was quite similar to that in MN populations. We were surprised to find, therefore, that alignment between execution trajectory segments from AE populations and from the simultaneously recorded MN populations was even lower than alignment between MN execution and observation segments (Figure 8C, blue versus red). Moreover, whereas within-group alignment of MN execution trajectory segments was high, within-group alignment of AE neuron execution trajectory segments was low (Figure 8D, gray versus light blue). These findings indicate that the predominant patterns of co-modulation among MNs during execution are quite consistent within sessions, but the patterns of comodulation among AE neurons are considerably more variable. Together with our previous finding that modulation of MNs leads that of non-mirror neurons in time, both at the single neuron level and at the population level (Mazurek and Schieber, 2019), this difference in consistency versus variability leads us to speculate that during action execution, while MNs carry a consistent forward model of the intended movement, AE neurons carry more variable feedback information.”
  
  (2) The method presented here is similar and perhaps related to principal angles (https://doi.org/10.2307/2005662). It would be interesting to confirm these results with principal angles. For instance, instead of using the decoding performance as a proxy for shifting subspaces, principal angles could directly quantify the 'shift' (similar to Gallego et al, Nat Comm, 2018).
  
  Point taken. We now have calculated the principal angles as a function of time and present them as a new section of the Results including new figure 4 (lines 237 to 293).
  
  “Instantaneous subspaces shift progressively during both execution and observation
  
  We identified an instantaneous subspace at each one millisecond time step of RGM trials. At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods). Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation. To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H). This process was repeated 10 times with replacement to assess the variability of the principal angles. The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.
  
  Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course. In the Results we therefore illustrate only the first (i.e. smallest) principal angle. Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation. As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°. Of greater interest are the slower changes in the first principal angle in between these four time points. Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).
  
  Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials. Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H. The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution. Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.
  
  We also examined the temporal progression of the instantaneous subspace of AE neurons. As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3). During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D). After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset. As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”
  
  The related Methods are now described in subsection “Subspace Comparisons—Principal Angles”
  
  Relatedly, why the decoding of the 'object type' is used to establish the progressive shifting of the subspaces? I would be interested to see the authors' argument.
  
  We have clarified the reason for our decoding analysis as follows (lines 295 to 297):
  
  “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity. The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.”
  
  And… (lines 332 to 348):
  
  “Decodable information changes progressively during both execution and observation
  
  As RGM trials proceeded in time, the condition-dependent neural activity of the PM MN population thus changed in two ways. First, the instantaneous condition-dependent subspace shifted, indicating that the patterns of firing-rate co-modulation among neurons representing the four different RGM movements changed progressively, both during execution and during observation. Second, as firing rates generally increased, the neural trajectories representing the four RGM movements became progressively more separated, more so during execution than during observation.
  
  To evaluate the combined effects of these two progressive changes, we clipped 100 ms single-trial trajectory segments beginning at times I, G, M, or H, and projected these trajectory segments from individual trials into the instantaneous 3D subspaces at 50 ms time steps. At each of these time steps, we trained a separate LSTM decoder to classify individual trials according to which of the four objects was involved in that trial. We expected that the trajectory segments would be classified most accurately when projected into instantaneous subspaces near the time at which the trajectory segments were clipped. At other times we reasoned that classification accuracy would depend both on the similarity of the current instantaneous subspace to that found at the clip time as evaluated by the principal angle (Figure 4), and on the separation of the four trajectories at the clip time (Figure 5).”
  
  The object type should be much more decodable during movement or hold, than instruction, which is probably why the chance-level decoding performance (horizontal lines) is twice the instruction segment for the movement segment.
  
  Indeed, the object type is more decodable during the movement and hold than during instruction or delay epochs.
  
  (3) Why aren't execution and observation subspaces compared together directly? Especially given that there are both types of trials in the same session with the same recorded population of neurons. Using instantaneous subspaces, or the principal angles between manifolds during exec trials vs obs trials.
  
  Point taken. We now have added comparison of the execution and observation subspaces using the principal angles between instantaneous subspaces (lines 421 to 436):
  
  “Do PM mirror neurons progress through the same subspaces during execution and observation?
  
  Having found that PM mirror neuron populations show similar progressive shifts in their instantaneous neural subspace during execution and observation of RGM trials, as well as similar changes in decodable information, we then asked whether this progression passes through similar subspaces during execution and observation. To address this question, we first calculated the principal angles between the instantaneous mirror-neuron execution subspace at selected times I, G, M, or H and the entire time series of instantaneous mirror-neuron observation subspaces (Figure 7A-D). Conversely, we calculated the principal angles between the instantaneous observation subspaces at selected times I, G, M, or H and the entire time series of instantaneous execution subspaces (Figure 7E-H). Although the principal angles were slightly smaller than might be expected from chance alone, indicating some minimal overlap of execution and observation instantaneous subspaces, the instantaneous observation subspaces did not show any progressive shift toward the I, G, M, or H execution subspace (Figure 7A-D), nor did the instantaneous execution subspaces shift toward the I, G, M, or H observation subspace (Figure 7E-H).”
  
  (4) The definition of the instantaneous subspaces is a critical point in the manuscript. I think it is slightly unclear: based on the Methods section #715-722 and the main text #173-#181, I gather that the subspaces are based on trial averaged neural activity for each of the 4 objects, separately. So for each object and per timepoint, a vector of size (1, n) -n neurons- is reduced to a vector of (1, 2 or 3 -the main text says 2, methods say 3-) which would be a single point in the low-d space. Is this description accurate? This should be clarified in the manuscript.
  
  In the Methods, we now have clarified (lines 849 to 859):
  
  “Instantaneous subspace identification
  
  Instantaneous neural subspaces were identified at 1 ms intervals. At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step. PCA then was performed on these four points. Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace. Each instantaneous 3D subspace can be considered a filter described by a matrix, W, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, W_i, forming a time series of filters (Figure 1B).”
  
  (5) Isn't the process of projecting segments of neural dynamics and comparing the results equivalent to comparing the projection matrices in the first place? If so, that might have been a more intuitive avenue to follow.
  
  As described in more detail in our responses to item 2, above, we have added analyses of principal angles to compare the projection matrices directly. However, “the process of projecting segments of neural dynamics and comparing the results” incorporates the progressively increasing separation of the trajectory segments and hence is not simply equivalent to comparing the subspaces with principal angles.
  
  (6) Lines #385-#389: This process seems unnecessarily complicated. Also, given the number of trials available, this sometimes doesn't make sense. E.g. Monkey R exec has only 8 trials of one of the objects, so bootstrapping 20 trials 500 times would be spurious. Why not, as per Gallego et al, Nat Neurosci 2020 and Safaie et al, Nat 2023 which are cited, concatenate the trials?
  
  In the Methods we now clarify that (lines 953 to 969):
  
  “To provide an estimate of variability, we used a bootstrapping approach to CCA. From each of two data sets we randomly selected 20 trials involving each target object (totaling 80 trials) with replacement, clipped trajectory segments from each of those trials for 100 ms (100 points at 1 ms intervals) after the instruction onset, go cue, movement onset, or beginning of the final hold, and performed CCA as described above. (Note that because session 1 from monkey R included only 8 button trials (Table 1), we excluded this session from CCA analyses.) With 500 iterations, we obtained a distribution of the correlation coefficients (CCs) between the two data sets in each of the three dimensions of the aligned subspace, which permitted statistical comparisons. We then used this approach to evaluate alignment of latent dynamics between different sessions (e.g. execution trials on two different days), between different contexts (e.g. execution and observation), and between different neural populations (e.g. MNs and AE neurons).This bootstrapping approach further enabled us to assess the consistency of relationships among neural trajectories within a given group—i.e. the same neural population during the same context (execution or observation) in the same session—by drawing two separate random samples of 80 trials from the same population, context, and session (Figure 8D), which would not have been possible had we concatenated trajectory segments from all trials in the session (Gallego et al., 2020; Safaie et al., 2023).”
  
  And we report results that could not have been obtained by concatenating all the trials (lines 522 to 541):
  
  “Did these differences in MN:1/2, MN:E/O, and MN/AE alignment result from consistent differences in their respective patterns of co-modulation, or from of greater trial-by-trial variability in the patterns of co-modulation among MNs during observation than during execution, and still greater variability among AE neurons during execution? The bootstrapping approach we used for CCA (see Methods) enabled us to evaluate the consistency of relationships among trajectory segments across repeated samplings of trials recorded from the same neuron population in the same session and in the same context (execution or observation). We therefore performed 500 iterations of CCA between two different random samples of MN execution (MN:E/E), MN observation (MN:O/O), or AE execution (AE:E/E) trajectory segments from a given session (2 R, 3 T, 3 F). This within-group alignment of MN execution trajectory segments from the same session (Figure 8D, MN:E/E, gray, Hold: () was as strong as between session alignment (Figure 8C, MN/1:2, black). But within-group alignment of MN observation trajectory segments (Figure 8D, MN:O/O, orange, Hold: () was lower than that found with MN execution segments (Figure 8C, MN:E/O, red, . Likewise, within-group alignment of AE neuron trajectory segments (Figure 8D, AE:E/E, light blue, Hold: () was lower than their alignment with MN execution segments (Figure 8C, MN/AE, blue, Hold: (). Whereas MN execution trajectories were relatively consistent within sessions, MN observation trajectories and AE execution trajectories were less so.”
  
  Because only 8 button trials were available in Session 1 from Monkey R, we excluded this session from the CCA analyses. Sessions 2 and 3 from monkey R provide valid results, however. For example, we now state explicitly (lines 468 to 472):
  
  “As a positive control, we first aligned MN execution trajectory segments from two different sessions in the same monkey (which we abbreviate as MN:1/2). The 2 sessions in monkey R provided only 1 possible comparison, but the 3 sessions in monkeys T and F each provided 3 comparisons. For each of these 7 comparisons, we found the bootstrapped average of CC1, of CC2, and of CC3.”
  
  (7) Related to the CCA analysis, what behavioural epoch has been used here, the same as the previous analyses, i.e. 100ms? how many datapoint is that in time? Given that CCA is essentially a correlation value, too few datapoints make it rather meaningless. If that's the case, I encourage using, let's say, one window combined of I and G until movement, and one window of movement and hold, such that they are both easier to interpret. Indeed low values of exec-exec in CC2 compared to Gallego et al, Nat Neurosci, 2020 might be a sign of a methodological error.
  
  In the Methods described for CCA, we now have clarified that (lines 953 to 961):
  
  “To provide an estimate of variability, we used a bootstrapping approach to CCA. From each of two data sets we randomly selected 20 trials involving each target object (totaling 80 trials) with replacement, clipped trajectory segments from each of those trials for 100 ms (100 points at 1 ms intervals) after the instruction onset, go cue, movement onset, or beginning of the final hold, and performed CCA as described above. (Note that because session 1 from monkey R included only 8 button trials (Table 1), we excluded this session from CCA analyses.) With 500 iterations, we obtained a distribution of the correlation coefficients (CCs) between the two data sets in each of the three dimensions of the aligned subspace, which permitted statistical comparisons.”
  
  And in the Results we report that (lines 475 to 480):
  
  “The highest values for MN:1/2 correlations were obtained for the Movement trajectory segments . These values indicate consistent relationships among the Movement neural trajectory segments representing the four different RGM movements from session to session, as would have been expected from previous studies (Gallego et al., 2018; Gallego et al., 2020; Safaie et al., 2023).”
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In their study, Zhao et al. investigated the population activity of mirror neurons (MNs) in the premotor cortex of monkeys either executing or observing a task consisting of reaching to, grasping, and manipulating various objects. The authors proposed an innovative method for analyzing the population activity of MNs during both execution and observation trials. This method enabled to isolate the condition-dependent variance in neural data and to study its temporal evolution over the course of single trials. The method proposed by the authors consists of building a time series of "instantaneous" subspaces with single time step resolution, rather than a single subspace spanning the entire task duration. As these subspaces are computed on an instant time basis, projecting neural activity from a given task time into them results in latent trajectories that capture condition-dependent variance while minimizing the condition-independent one. The authors then analyzed the time evolution of these instantaneous subspaces and revealed that a progressive shift is present in subspaces of both execution and observation trials, with slower shifts during the grasping and manipulating phases compared to the initial preparation phase. Finally, they compared the instantaneous subspaces between execution and observation trials and observed that neural population activity did not traverse the same subspaces in these two conditions. However, they showed that these distinct neural representations can be aligned with Canonical Correlation Analysis, indicating dynamic similarities of neural data when executing and observing the task. The authors speculated that such similarities might facilitate the nervous system's ability to recognize actions performed by oneself or another individual.
  
  Strengths:
  
  Unlike other areas of the brain, the analysis of neural population dynamics of premotor cortex MNs is not well established. Furthermore, analyzing population activity recorded during non-trivial motor actions, distinct from the commonly used reaching tasks, serves as a valuable contribution to computational neuroscience. This study holds particular significance as it bridges both domains, shedding light on the temporal evolution of the shift in neural states when executing and observing actions. The results are moderately robust, and the proposed analytical method could potentially be used in other neuroscience contexts.
  
  Weaknesses:
  
  While the overall clarity is satisfactory, the paper falls short in providing a clear description of the mathematical formulas for the different methods used in the study.
  
  We have added the various mathematical formulas in the Methods.
  
  For Cumulative Separation (lines 864 to 871):
  
  “To quantify the separation between the four trial-averaged trajectory segments involving the different objects in a given instantaneous subspace, we then calculated their cumulative separation (𝐶𝑆) as:
  
  where d<sub>ij</sub>(t) is the 3-dimensional Euclidean distance between the i<sup>th</sup> and j<sup>th</sup> trajectories at time point 𝑡. We summed the 6 pairwise distances between the 4 trajectory segments across time points and normalized by the number of time points, 𝑇 = 100. The larger the 𝐶𝑆, the greater the separation of the trajectory segments.”
  
  For principal angles (lines 877 to 884):
  
  “For example, given the 3-dimensional instantaneous subspace at the time of movement onset, W<sub>M</sub> and at any other time, W<sub>i</sub>, we calculated their 3x3 inner product matrix and performed singular value decomposition to obtain:
  
  where 3x3 matrices P<sub>M</sub> and W<sub>P</sub> define new manifold directions which successively minimize the 3 principal angles specific to the two subspaces being compared. The elements of diagonal matrix 𝐶 then are the ranked cosines of the principal angles, 𝜃𝑖 , ordered from smallest to largest:
  
  For CCA (lines 945 to 952):
  
  “CCA was performed as follows: The original latent dynamics, L<sub>A</sub> and L<sub>B</sub>, first were transformed and decomposed as and . The first m = 3 column vectors of each 𝑄𝑖 provide an orthonormal basis for the column vectors of (where 𝑖 = 𝐴, 𝐵). Singular value decomposition on the inner product matrix of 𝑄𝐴 and 𝑄𝐵 then gives , and new manifold directions that maximize pairwise correlations are provided by and . We then projected the original latent dynamics into the new, common subspace: . Pairwise correlation coefficients between the aligned latent dynamics sorted from largest to smallest then are given by the elements of the diagonal matrix .”
  
  Moreover, it was not immediately clear why the authors did not consider a (relatively) straightforward metric to quantity the progressive shift of the instantaneous subspaces, such as computing the angle between consecutive subspaces, rather than choosing a (in my opinion) more cumbersome metric based on classification of trajectory segments representing different movements.
  
  Point taken. We now have calculated the principal angles as a function of time and present them as a new section of the Results including new figure 4 (lines 237 to 293).
  
  “Instantaneous subspaces shift progressively during both execution and observation
  
  We identified an instantaneous subspace at each one millisecond time step of RGM trials. At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods). Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation. To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H). This process was repeated 10 times with replacement to assess the variability of the principal angles. The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.
  
  Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course. In the Results we therefore illustrate only the first (i.e. smallest) principal angle. Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation. As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°. Of greater interest are the slower changes in the first principal angle in between these four time points. Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).
  
  Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials. Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H. The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution. Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.
  
  We also examined the temporal progression of the instantaneous subspace of AE neurons. As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3). During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D). After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset. As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”
  
  The related Methods are now described in subsection “Subspace Comparisons—Principal Angles”
  
  Specific comments:
  
  In the methods, it is stated that instantaneous subspaces are found with 3 PCs. Why does it say 2 here?
  
  We now have clarified. (lines 295 to 310):
  
  “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity. The neural trajectories during trials involving different objects separated increasingly as trials progressed in time. To illustrate this increasing separation, we clipped 100 ms segments of high-dimensional MN population trial-averaged trajectories beginning at times I, G, M, and H, for trials involving each of the four objects. We then projected the set of four object-specific trajectory segments clipped at each time into each of the four instantaneous 3D subspaces at times I, G, M, and H. This process was repeated separately for execution trials and for observation trials.
  
  For visualization, we projected these trial-averaged trajectory segments from an example session into the PC1 vs PC2 planes (which consistently captured > 70% of the variance) of the I, G, M, or H instantaneous 3D subspaces. In Figure 5, the trajectory segments for each of the four objects (sphere – purple, button – cyan, coaxial cylinder – magenta, perpendicular cylinder – yellow) sampled at different times (rows) have been projected into each of the four instantaneous subspaces defined at different times (columns). Rather than appearing knotted as in Figure 3, these short trajectory segments are distinct when projected into each instantaneous subspace.”
  
  And in the legend for Figure 5 we now clarify that:
  
  “Each set of these four segments then was projected into the PC1 vs PC2 plane of the instantaneous 3D subspace present at four different times (columns: I, G, M, H).”
  
  Another doubt on how instantaneous subspaces are computed: in the methods you state that you apply PCA on trial-averaged activity at each 50ms time step. From the next sentence, I gather that you apply PCA on an Nx4 data matrix (N being the number of neurons, and 4 being the trial-averaged activity of the four objects) every 50 ms. Is this right? It would help to explicitly specify the dimensions of the data matrix that goes into PCA computation.
  
  We apologize for this confusion. Although the LSTM decoding was performed in 50 ms time steps, the instantaneous subspaces were calculated at 1 ms intervals. In the Methods we now have clarified (lines 849 to 759):
  
  “Instantaneous subspace identification
  
  Instantaneous neural subspaces were identified at 1 ms intervals. At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step. PCA then was performed on these four points. Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace. Each instantaneous 3D subspace can be considered a filter described by a matrix, W, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, W_i, forming a time series of filters (Figure 1B).”
  
  It would help to include some equations in the methods section related to the LSTM decoding. Just to make sure I understood correctly: after having identified the instantaneous subspaces (every 50 ms), you projected the Instruction, Go, Movement, and Holding segments from individual trials (each containing 100 samples, since they are sampled from a 100ms window) onto each instantaneous subspace. So you have four trajectories for each subspace. In the methods, it is stated that a single LSTM classifier is trained for each subspace. Do you also have a separate classifier for each trajectory segment? What is used as input to the classifier? Each trajectory segment should be a 100x3 matrix once projected in an instantaneous subspace. Is that what (each of) the LSTMs take as input? And lastly, what is the LSTM trained to predict exactly? Just a label indicating the type of object that was manipulated in that trial? I apologize if I overlooked any detail, but I believe a clearer explanation of the LSTM, preferably with mathematical formulas, would greatly help readers understand this section.
  
  LSTM decoding is not readily described with a set of equations. However, we have expanded our description to provide the information requested (lines 910 to 937):
  
  “Decodable information—LSTM
  
  As illustrated schematically in Figure 1B, the same segment of high-dimensional neural activity projected into different instantaneous subspaces can generate low-dimensional trajectories of varying separation. The degree of separation among the projected trajectory segments will depend, not only on their separation at the time when the segments were clipped, but also on the similarity of the subspaces into which the trajectory segments are projected. To quantify the combined effects of trajectory separation and projection into different subspaces, we projected high-dimensional neural trajectory segments (each including 100 points at 1 ms intervals) from successful trials involving each of the four different target objects into time series of 3-dimensional instantaneous subspaces at 50 ms intervals. In each of these instantaneous subspaces, the neural trajectory segment from each trial thus became a 100 point x 3 dimensional matrix. For each instantaneous subspace in the time series, we then trained a separate long short-term memory (LSTM, (Hochreiter and Schmidhuber, 1997)) classifier to attribute each of the neural trajectories from individual trials to one of the four target object labels: sphere, button, coaxial cylinder, or perpendicular cylinder. Using MATLAB’s Deep Learning Toolbox, each LSTM classifier had 3 inputs (instantaneous subspace dimensions), 20 hidden units in the bidirectional LSTM layer, and a softmax layer preceding the classification layer which had 4 output classes (target objects). The total number of successful trials available in each session for each object is given in Table 1. To avoid bias based on the total number of successful trials, we used the minimum number of successful trials across the four objects in each session, selecting that number from the total available randomly with replacement. Each LSTM classifier was trained with MATLAB’s adaptive moment estimation (Adam) optimizer on 40% of the selected trials, and the remaining 60% were decoded by the trained classifier. The success of this decoding was used as an estimate of classification accuracy from 0 (no correct classifications) to 1 (100% correct classifications). This process was repeated 10 times and the mean ± standard deviation across the 10 folds was reported as the classification accuracy at that time. Classification accuracy of trials projected into each instantaneous subspace at 50 ms intervals was plotted as a function of trial time.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Here are some more specific comments.
  
  Abstract. Line 41. "same action" is not justified, there is plenty of evidence showing that the action does not need to be the same (or it has not even to be an action), rephrasing or substituting with "similar" is necessary, especially in the light of the subsequent sentence (which is totally correct).
  
  Thank you for pointing this out. As recommended, we have changed “same” to “similar” (lines 40 to 41):
  
  “Many neurons in the premotor cortex show firing rate modulation whether the subject performs an action or observes another individual performing a similar action.”
  
  Introduction. A relevant, missing reference in the otherwise exhaustive introduction is Albertini et al. 2021 J Neurophysiol, showing that neural dynamics and similarities between biological and nonbiological movements in premotor areas are greater than those between the same executed and observed movements.
  
  Thank you for pointing out this important finding. After revision, we felt it was now cited most appropriately in the revised Discussion as follows (lines 730 to 736):
  
  “Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021)."
  
  In Line 85, the sentence about Papadourakis and Raos 2019 has to be generalized to PMv, as they show that the proportion of congruent MNs is at chance in both PMd and PMv.
  
  Point taken. We have rephrased this sentence as follows (lines 88 to 89):
  
  “And in both PMv and PMd, the proportion of congruent neurons may not be different from that expected by chance alone (Papadourakis and Raos, 2019).”
  
  Lines 122-132. The initial sentence was unclear to me at first glance. I was wondering how subspaces could be "at other times over the course of the trial" if they are instantaneous. I could imagine that the subspaces referred to corresponding behavioral intervals of execution and observation conditions (and this may be what they will later call "condition dependent" activity), but nevertheless, they could hardly be understood as "instantaneous". I grasped the author's idea only when reading the results, with the statement "no-time dependent variance is captured". The idea is to take a static snapshot of the evolution of population activity at each checkpoint (i.e. I, G, M, and H): I suggest clarifying this point immediately in the introduction to improve readability.
  
  We have clarified this point by adding two paragraphs to the Introduction first defining condition independent versus condition-dependent variance and then explaining the use of instantaneous subspaces (lines 125 to 153):
  
  “A relevant but often overlooked aspect of such dynamics in neuron populations active during both execution and observation has to do with the distinction between condition independent and condition-dependent variation in neuronal activity (Kaufman et al., 2016; Rouse and Schieber, 2018). The variance in neural activity averaged across all the conditions in a given task context is condition-independent. For example, in an 8-direction center-out reaching task, averaging a unit’s firing rate as a function of time across all 8 directions may show an initially low firing rate that increases prior to movement onset, peaks during the movement, and then declines during the final hold, irrespective of the movement direction. Subtracting this condition-independent activity from the unit’s firing rate during each trial gives the remaining variance, and averaging separately across trials in each of the 8 directions then averages out noise variance, leaving the condition-dependent variance that represents the unit’s modulation among the 8 directions (conditions). Alternatively, condition-independent, condition dependent, and noise variance can be partitioned through demixed principal component analysis (Kobak et al., 2016; Gallego et al., 2018). The extent to which neural dynamics occur in a subspace shared by execution and observation versus subspaces unique to execution or observation may differ for the condition-independent versus condition-dependent partitions of neural activity. Here, we tested the hypothesis that the condition-dependent activity of PM mirror neuron populations progresses through distinct subspaces during execution versus observation, which would indicate distinct patterns of co-modulation amongst mirror neurons during execution versus observation.
  
  Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach. Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements. Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.”
  
  Results.
  
  Regarding the execution-observation alignment, as explained in my initial comment, it does not sound convincing. Applying a CCA to align EXE and OBS activities (which the authors had just shown being essentially not aligned), even separately for each epoch segment (line 396), seems to be a trick to show that they nonetheless share some similarities. Couldn't this be applied to any pairs of differently encoded conditions to create some sort of artificial link between them? Is the similarity in the neural data or rather in the method used to realign them?
  
  CCA would not align arbitrary sets of neural data. The similarity is in the data, not in the method. For example, in an 8-direction center-out task, the neural representation of movement to the 45° target is between the neural representations of the 0° and the 90° targets. If the same is true in a second data set, then CCA will give high correlation coefficients. But if in the second data set the neural representation of the 45° target is between the 135° and 180° targets, CCA will give low correlation coefficients.
  
  In the end, what does this tell us about the brain?
  
  In the Introduction we now clarify that (lines 166 to 170):
  
  “Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population.”
  
  And in the Results (lines 449 to 455):
  
  “For example, the trajectories of PMd+M1 neuron populations recorded from two different monkeys during center-out reaching movements could be aligned well (Safaie et al., 2023). CCA showed, for example, that in both brains the neural trajectory for the movement to the target at 0° was closer to the trajectory for movement to the target at 45° than to the trajectory for the movement to the target at 180°. Relationships among these latent dynamic representations of the eight movements thus were similar even though the neural populations were recorded from two different monkeys.”
  
  In relation to Figure 8 (lines 461 to 467)
  
  “But when both sets of trajectory segments are projected into another common subspace identified with CCA, as shown in Figure 8B, a similar relationship among the neural representations of the four movements during execution and observation is revealed. In both behavioral contexts the neural representation of movements involving the sphere (purple) is now closest to the representation of movements involving the coaxial cylinder (magenta) and farthest from that of movements involving the button (cyan). The two sets of trajectory segments are more or less “aligned.”
  
  And in the Discussion (lines 665 to 674):
  
  “Corresponding neural representations of action execution and observation during task epochs with higher neural firing rates have been described previously in PMd MNs and in PMv MNs using representational similarity analysis RSA (Papadourakis and Raos, 2019). And during force production in eight different directions, neural trajectories of PMd neurons draw similar “clocks” during execution, cooperative execution, and passive observation (Pezzulo et al., 2022). Likewise in the present study, despite execution and observation trajectories progressing through largely distinct subspaces, in all three monkeys execution and observation trajectory segments showed some degree of alignment, particularly the Movement and Hold segments (Figure 12A), indicating similar relationships among the latent dynamic representations of the four RGM movements during execution and observation.”
  
  Concerning the discussion, I would like to reconsider it after having seen the authors' response to the comments above and to my general concern about the relevance of the findings from the neurophysiological point of view.
  
  Certainly, please do.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Here are a few issues that I want to bring to the authors' attention (in no particular order):
  
  • I am not clear on what is meant by "condition-dependent". Is the condition exec vs obs, or the object types?
  
  In the Introduction, we now clarify (lines 125 to 144):
  
  “A relevant but often overlooked aspect of such dynamics in neuron populations active during both execution and observation has to do with the distinction between condition independent and condition-dependent variation in neuronal activity (Kaufman et al., 2016; Rouse and Schieber, 2018). The variance in neural activity averaged across all the conditions in a given task context is condition-independent. For example, in an 8-direction center-out reaching task, averaging a unit’s firing rate as a function of time across all 8 directions may show an initially low firing rate that increases prior to movement onset, peaks during the movement, and then declines during the final hold, irrespective of the movement direction. Subtracting this condition-independent activity from the unit’s firing rate during each trial gives the remaining variance, and averaging separately across trials in each of the 8 directions then averages out noise variance, leaving the condition-dependent variance that represents the unit’s modulation among the 8 directions (conditions). Alternatively, condition-independent, condition dependent, and noise variance can be partitioned through demixed principal component analysis (Kobak et al., 2016; Gallego et al., 2018). The extent to which neural dynamics occur in a subspace shared by execution and observation versus subspaces unique to execution or observation may differ for the condition-independent versus condition-dependent partitions of neural activity. Here, we tested the hypothesis that the condition-dependent activity of PM mirror neuron populations progresses through distinct subspaces during execution versus observation, which would indicate distinct patterns of co-modulation amongst mirror neurons during execution versus observation.”
  
  And in the Results, we have added a new Figure 3 to illustrate condition-independent versus conditiondependent activity using an example from the present data sets (lines 208 to 236):
  
  “Condition-dependent versus condition-independent neural activity in PM MNs
  
  Whereas a large fraction of condition-dependent neural variance during reaching movements without grasping can be captured in a two-dimensional subspace (Churchland et al., 2012; Ames et al., 2014), condition-dependent activity in movements that involve grasping is more complex (Suresh et al., 2020). In part, this may reflect the greater complexity of controlling the 24 degrees of freedom in the hand and wrist as compared to the 4 degrees of freedom in the elbow and shoulder (Sobinov and Bensmaia, 2021). Figure 3 illustrates this complexity in a PM MN population during the present RGM movements. Here, PCA was performed on the activity of a PM MN population across the entire time course of execution trials involving all four objects. The colored traces in Figure 3A show neural trajectories averaged separately across trials involving each of the four objects and then projected into the PC1 vs PC2 plane of the total neural space. Most of the variance in these four trajectories is comprised of a shared rotational component. The black trajectory, obtained by averaging trajectories from trials involving all four objects together, represents this condition-independent (i.e. independent of the object involved) activity. The condition-dependent (i.e. dependent on which object was involved) variation in activity is reflected by the variation in the colored trajectories around the black trajectory. The condition-dependent portions can be isolated by subtracting the black trajectory from each of the colored trajectories. The resulting four condition dependent trajectories have been projected into the PC1 vs PC2 plane of their own common subspace in Figure 3B. Rather than exhibiting a simple rotational motif, these trajectories appear knotted. To better understand how these complex, condition-dependent trajectories progress over the time course of RGM trials, we chose to examine time series of instantaneous subspaces.”
  
  While there is an emphasis on the higher complexity of manipulating objects compared to just reaching movements in the Abstract, the majority of the analysis relates to the instruction, movement initiation, and grasp, and there is no specific analyses looking at manipulation and how those presumably more complex dynamics compare to the reaching dynamics, and how they differ from reaching in the mirror neurons.
  
  We have clarified that (lines 178 to 187):
  
  “Because we chose to study relatively naturalistic movements, the reach, grasp, and manipulation components were not performed separately, but rather in a continuous fluid motion during the movement epoch of the task sequence (Figure 2B). In previous studies involving a version of this task without separate instruction and delay epochs, we have shown that joint kinematics, EMG activity, and neuron activity in the primary motor cortex, all vary throughout the movement epoch in relation to both reach location and object grasped, with location predominating early in the movement epoch and object predominating later (Rouse and Schieber, 2015, 2016a, b). The present task, however, did not dissociate the reach, the hand shape used to grasp the object, and the manipulation performed on the object.”
  
  • The analysis in Fig3C,D is interesting, however, in my opinion, requires control. For instance, what would these values look like if you projected the segments to a subspace defined by the activity during the entire length of the trial, or if you projected the activity during intertrials, just to get a sense of how meaningful these values are?
  
  This material is now presented in Figure 5 – figure supplement 1. In the legend to this figure supplement, we have clarified that (lines 327 to 328):
  
  “CS values, which we use only to characterize the phenomenon of trajectory separation,….”
  
  • MN is used (#85) before definition (#91). Similar for RGM, I believe.
  
  Thanks for catching this problem. We have now defined these abbreviations at first use as follows:
  
  In lines 89 to 92:
  
  “Though many authors apply the term mirror neurons strictly to highly congruent neurons, here we will refer to all neurons modulated during both contexts—execution and observation—as mirror neurons (MNs).”
  
  And in lines 148 to 150:
  
  We identified separate time series for execution trials and for observation trials, both involving four different reach-grasp-manipulation (RGM) movements.”
  
  • I believe in the Intro when presenting the three hypotheses, there is a First, and a Third, but no Second.
  
  We have revised this part of the Introduction without numbering our hypotheses as follows (lines 145 to 173):
  
  “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach. Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements. Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.
  
  We then tested the hypothesis that the condition-dependent subspace shifts progressively over the time course of behavioral trials (Figure 1A) by calculating the principal angles between four selected instantaneous subspaces that occurred at times easily defined in each behavioral trial—instruction onset (I), go cue (G), movement onset (M), and the beginning of the final hold (H)—and every other instantaneous subspace in the time series. Initial analyses showed that condition-dependent neural trajectories for the four RGM movements tended to separate increasingly over the course of behavioral trials. We therefore additionally examined the combined effects of i) the progressively shifting subspaces and ii) the increasing trajectory separation, by decoding neural trajectory segments sampled for 100 msec after times I, G, M, and H and projected into the time series of instantaneous subspaces (Figure 1B).
  
  Finally, we used canonical correlation to ask whether the prevalent patterns of mirror neuron co-modulation showed similar relationships among the four RGM movements during execution and observation (Figure 1C). Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population. And finally, because we previously have found that during action execution the activity of PM mirror neurons tends to lead that of non-mirror neurons which are active only during action execution (AE neurons) (Mazurek and Schieber, 2019), we performed parallel analyses of the instantaneous state space of PM AE neurons.”
  
  • The use of the term 'instantaneous subspaces' in the abstract confused me initially, as I wasn't sure what it meant. It might be a good idea to define or rephrase it.
  
  In the Abstract we now state (lines 51 to 52):
  
  “Rather than following neural trajectories in subspaces that contain their entire time course, we identified time series of instantaneous subspaces …”
  
  And in the Introduction, we have clarified (lines 145 to 153):
  
  “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach. Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements. Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.”
  
  And in the Methods (lines 849 to 859):
  
  “Instantaneous subspace identification
  
  Instantaneous neural subspaces were identified at 1 ms intervals. At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step. PCA then was performed on these four points. Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace. Each instantaneous 3D subspace can be considered a filter described by a matrix, 𝑊, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, 𝑊𝑖, forming a time series of filters (Figure 1B).”
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Page 4, lines 127-131. In the introduction, it was not immediately clear to me what you meant by 'separation' and 'decoding' of the projected neural activity. You do mention that you are separating/decoding trajectory segments representing different movements at the end of this paragraph, but at this point of the paper it was not very clear to me what those different movements were (I only understood that after reading the results section). I suggest briefly expanding on these concepts here.
  
  To clarify these points in the Introduction, we have expanded exposition of these concepts (lines 145 to 163):
  
  “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach. Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements. Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.
  
  We then tested the hypothesis that the condition-dependent subspace shifts progressively over the time course of behavioral trials (Figure 1A) by calculating the principal angles between four selected instantaneous subspaces that occurred at times easily defined in each behavioral trial—instruction onset (I), go cue (G), movement onset (M), and the beginning of the final hold (H)—and every other instantaneous subspace in the time series. Initial analyses showed that condition-dependent neural trajectories for the four RGM movements tended to separate increasingly over the course of behavioral trials. We therefore additionally examined the combined effects of i) the progressively shifting subspaces and ii) the increasing trajectory separation, by decoding neural trajectory segments sampled for 100 msec after times I, G, M, and H and projected into the time series of instantaneous subspaces (Figure 1B).”
  
  (2) Page 6, line 175. In the methods, it is stated that instantaneous subspaces are found with 3 PCs. Why does it say 2 here?
  
  Thank you for noticing this discrepancy. In the Methods, we have clarified that the instantaneous subspaces are 3-dimensional (see our reply to the next comment), but in Figure 5 (previously Figure 3), for purposes of visualization, we are projecting trajectory segments into the PC1-PC2 plane (lines 295 to 308):
  
  “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity. The neural trajectories during trials involving different objects separated increasingly as trials progressed in time. To illustrate this increasing separation, we clipped 100 ms segments of high-dimensional MN population trial-averaged trajectories beginning at times I, G, M, and H, for trials involving each of the four objects. We then projected the set of four object-specific trajectory segments clipped at each time into each of the four instantaneous 3D subspaces at times I, G, M, and H. This process was repeated separately for execution trials and for observation trials.
  
  For visualization, we projected these trial-averaged trajectory segments from an example session into the PC1 vs PC2 planes (which consistently captured > 70% of the variance) of the I, G, M, or H instantaneous 3D subspaces. In Figure 5, the trajectory segments for each of the four objects (sphere – purple, button – cyan, coaxial cylinder – magenta, perpendicular cylinder – yellow) sampled at different times (rows) have been projected into each of the four instantaneous subspaces defined at different times (columns).”
  
  And in the legend for Figure 5 we now clarify that:
  
  “Each set of these four segments then was projected into the PC1 vs PC2 plane of the instantaneous 3D subspace present at four different times (columns: I, G, M, H).”
  
  Another doubt on how instantaneous subspaces are computed: in the methods you state that you apply PCA on trial-averaged activity at each 50ms time step. From the next sentence, I gather that you apply PCA on an Nx4 data matrix (N being the number of neurons, and 4 being the trial-averaged activity of the four objects) every 50 ms. Is this right? It would help to explicitly specify the dimensions of the data matrix that goes into PCA computation.
  
  Thank you for catching an error: The instantaneous subspaces were computed at 1 ms intervals. (It is the LSTM decoding that was done in 50 ms time steps). We have clarified how the instantaneous subspaces were computed in the Methods (lines 849 to 859):
  
  “Instantaneous subspace identification
  
  Instantaneous neural subspaces were identified at 1 ms intervals. At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step. PCA then was performed on these four points. Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace. Each instantaneous 3D subspace can be considered a filter described by a matrix, 𝑊, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, 𝑊𝑖, forming a time series of filters (Figure 1B).”
  
  (3) Page 7, line 210-212. I am not sure if I missed it in the discussion, but have you speculated on why the greatest separation in observation trials was observed during the holding phase while in execution trials during the movement phase?
  
  This was a consistent finding, and we therefore point it out as a difference between execution and observation. Of course, this reflects greater condition-dependent variance in the PM MN population in the movement epoch than in the hold epoch during execution, whereas the reverse is true during observation. We have no clear speculation as to why this occurs, however.
  
  (4) Figure 3. Add a legend with color scheme for each object in panels A and B. Also, please specify what metric is represented by the colorbar of panels C, D, E, F (write it down next to the colorbar itself and not just in the caption).
  
  This is now Figure 5. We have added a color legend for A and B. Panels C, D, E, and F, now have been moved to Figure 5 – figure supplement 1, where we have indicated that the colorbar represents cumulative separation.
  
  (5) Page 9, line 228. I found the description of this decoding analysis a bit confusing initially (and perhaps still do), this should be clarified.
  
  We have clarified our decoding analysis in the Methods (lines 910 to 937):
  
  “Decodable information—LSTM
  
  As illustrated schematically in Figure 1B, the same segment of high-dimensional neural activity projected into different instantaneous subspaces can generate low-dimensional trajectories of varying separation. The degree of separation among the projected trajectory segments will depend, not only on their separation at the time when the segments were clipped, but also on the similarity of the subspaces into which the trajectory segments are projected. To quantify the combined effects of trajectory separation and projection into different subspaces, we projected high-dimensional neural trajectory segments (each including 100 points at 1 ms intervals) from successful trials involving each of the four different target objects into time series of 3-dimensional instantaneous subspaces at 50 ms intervals. In each of these instantaneous subspaces, the neural trajectory segment from each trial thus became a 100 point x 3 dimensional matrix. For each instantaneous subspace in the time series, we then trained a separate long short-term memory (LSTM, (Hochreiter and Schmidhuber, 1997)) classifier to attribute each of the neural trajectories from individual trials to one of the four target object labels: sphere, button, coaxial cylinder, or perpendicular cylinder. Using MATLAB’s Deep Learning Toolbox, each LSTM classifier had 3 inputs (instantaneous subspace dimensions), 20 hidden units in the bidirectional LSTM layer, and a softmax layer preceding the classification layer which had 4 output classes (target objects). The total number of successful trials available in each session for each object is given in Table 1. To avoid bias based on the total number of successful trials, we used the minimum number of successful trials across the four objects in each session, selecting that number from the total available randomly with replacement. Each LSTM classifier was trained with MATLAB’s adaptive moment estimation (Adam) optimizer on 40% of the selected trials, and the remaining 60% were decoded by the trained classifier. The success of this decoding was used as an estimate of classification accuracy from 0 (no correct classifications) to 1 (100% correct classifications). This process was repeated 10 times and the mean ± standard deviation across the 10 folds was reported as the classification accuracy at that time. Classification accuracy of trials projected into each instantaneous subspace at 50 ms intervals was plotted as a function of trial time.”
  
  (6) Page 9, line 268. This might be trivial, but can you speculate on why the accuracy for Instruction segments had a lower peak compared to the rest of the segments? Is it because there is less 'distinct' information embedded in neural data about the type of object manipulated until you are actually reaching toward it or holding it? The latter seems straightforward, but the former not so much.
  
  Thank you for asking this question. We have added the following speculations (lines 592 to 604):
  
  “Short bursts of “signal” related discharge are known to occur in a substantial fraction of PMd neurons beginning at latencies of ~60 ms following an instructional stimulus (Weinrich et al., 1984; Cisek and Kalaska, 2004). Here we found that the instantaneous subspace shifted briefly toward the subspace present at the time of instruction onset (I), similarly during execution and observation. This brief trough in principal angle (Figure 4A) and the corresponding peak in classification accuracy (Figure 7A) in part may reflect smoothing of firing rates with a 50 ms Gaussian kernel. We speculate, however, that the early rise of this peak at the time of instruction onset also reflects the anticipatory activity often seen in PMd neurons in expectation of an instruction, which may not be entirely non-specific, but rather may position the neural population to receive one of a limited set of potential instructions (Mauritz and Wise, 1986). We attribute the relatively low amplitude of peak classification accuracy for Instruction trajectory segments to the likely possibility that only the last 40 ms of our 100 ms Instruction segments captured signal related discharge.”
  
  (7) Figure 8. Shouldn't the plots in panel A resemble those in Figure 3? Here you are projecting the hold trajectory segments into the subspace at time H, which should be the same as in Fig. 3A/B bottom right panel.
  
  The previous Figure 8 is now Figure 8 panels A and B, and the previous Figure 3 is now Figure 5. The data used in these two figures come from two different recording sessions in two different monkeys. The current Figure 8A,B uses data from monkey F, session 2; whereas Figure 5 uses data from monkey T, session 3, which we now state in the legend to each figure, respectively. Consequently, the relative arrangement of the trajectory segments in the instantaneous subspace at time H differs. The session used in Figure 8A,B, which we now show in three dimensions, better illustrates how CCA identifies a common subspace in which execution versus observations segments show alignment (Figure 8B) that was not evident in their original subspaces (Figure 8A).
  
  (8) Page 14, line 369. Are you computing CCA using only 2 components? I thought the subspaces were 3 dimensional. Why not align all three dimensions?
  
  We have expanded this analysis to use all three dimensions, as illustrated in Figure 8 above.
  
  (9) Page 14, line 407. Does this mean that instantaneous subspaces between execution and observation trials are more similar to each other during the Movement and Holding phase? Is this related to the fact that in those moments there is a smaller progressive shift of the subspaces within execution and observation trials?
  
  Our new analyses of principal angles (see our reply to your comment 11, below) show that the progressive shifting of the instantaneous subspace continues through the movement and hold epochs. We now discuss this better alignment of the Movement and Hold trajectory segments as follows (lines 656 to 664):
  
  “Given the complexity of condition-dependent neural trajectories across the entire time course of RGM trials (Figure 3B), rather than attempting to align entire neural trajectories, we applied canonical correlation to trajectory segments clipped for 100 ms following four well defined behavioral events: Instruction onset, Go cue, Movement onset, and the beginning of the final Hold. In all cases, alignment was poorest for Instruction segments, somewhat higher for Go segments, and strongest for Movement and Hold segments. This progressive increase in alignment likely reflects a progressive increase in the difference between average neuron firing rates for trials involving different objects (Figure 6) relative to the trial-by-trial variance in firing rate for a given object.”
  
  (10) page 15, line 431. Typo, it should be Table 3.
  
  We have removed Table 3 which no longer applies.
  
  (11) A more general observation: did you try to compute another metric to assess the progressive shift of subspaces over time? I am thinking of something like computing the principal angles between consecutive subspaces. If it is true that the shifts happen over time, but it slows down during movement and hold, you should be able to conclude it from principal angles as well. Am I missing something? Is there any reason you went with classification accuracy instead of a metric like this?
  
  Point taken. We now have calculated the principal angles as a function of time and have presented them as a new section of the Results including new Figure 4 and Figure 4 – figure supplement 3 (lines 237 to 293).
  
  “Instantaneous subspaces shift progressively during both execution and observation
  
  We identified an instantaneous subspace at each one millisecond time step of RGM trials. At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods). Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation. To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H). This process was repeated 10 times with replacement to assess the variability of the principal angles. The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.
  
  Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course. In the Results we therefore illustrate only the first (i.e. smallest) principal angle. Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation. As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°. Of greater interest are the slower changes in the first principal angle in between these four time points. Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).
  
  Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials. Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H. The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution. Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.
  
  We also examined the temporal progression of the instantaneous subspace of AE neurons. As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3). During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D). After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset. As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”
  
  The related Methods are now described is subsection “Subspace Comparisons—Principal Angles”
  
  Is there any reason you went with classification accuracy instead of a metric like this?
  
  We now point out that (lines 295 to 297):
  
  “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity. The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.”
  
  And we further clarify this as follows (lines 331 to 348):
  
  “Decodable information changes progressively during both execution and observation
  
  As RGM trials proceeded in time, the condition-dependent neural activity of the PM MN population thus changed in two ways. First, the instantaneous condition-dependent subspace shifted, indicating that the patterns of firing-rate co-modulation among neurons representing the four different RGM movements changed progressively, both during execution and during observation. Second, as firing rates generally increased, the neural trajectories representing the four RGM movements became progressively more separated, more so during execution than during observation.
  
  To evaluate the combined effects of these two progressive changes, we clipped 100 ms single-trial trajectory segments beginning at times I, G, M, or H, and projected these trajectory segments from individual trials into the instantaneous 3D subspaces at 50 ms time steps. At each of these time steps, we trained a separate LSTM decoder to classify individual trials according to which of the four objects was involved in that trial. We expected that the trajectory segments would be classified most accurately when projected into instantaneous subspaces near the time at which the trajectory segments were clipped. At other times we reasoned that classification accuracy would depend both on the similarity of the current instantaneous subspace to that found at the clip time as evaluated by the principal angle (Figure 4), and on the separation of the four trajectories at the clip time (Figure 5).”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.06.565833v4
www.biorxiv.org www.biorxiv.org

The neural dynamics of positive and negative expectations of pain

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers for their careful and overall positive evaluation of our work and the constructive feedback! To address the main concerns, we have:
  
  – Clarified a major misunderstanding of our instructions: Participants were only informed that they would receive different stimuli of medium intensity and were thus not aware that the stimulation temperature remained constant
  
  – Implemented a new analysis to evaluate how participants rated their expectation and pain levels in the control condition
  
  – Added a paragraph in the discussion in which we argue that our paradigm is comparable to previous studies
  
  Below, we provide responses to each of the reviewers’ comments on our manuscript.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this important paper, the authors investigate the temporal dynamics of expectation of pain using a combined fMRI-EEG approach. More specifically, by modifying the expectations of higher or lower pain on a trial-to-trial basis, they report that expectations largely share the same set of activations before the administration of the painful stimulus, and that the coding of the valence of the stimulus is observed only after the nociceptive input has been presented. fMRIinformed EEG analysis suggested that the temporal sequence of information processing involved the Dorsolateral prefrontal cortex (DLPFC), the anterior insula, and the anterior cingulate cortex. The strength of evidence is convincing, and the methods are solid, but a few alternative interpretations about the findings related to the control group, as well as a more in-depth discussion on the correlations between the BOLD and EEG signals would strengthen the manuscript.
  
  Thank you for your positive evaluation! In the revised version of the manuscript, we elaborated on the control condition and the BOLD-EEG correlations in more detail.
  
  Strengths:
  
  In line with open science principles, the article presents the data and the results in a complete and transparent fashion.
  
  From a theoretical standpoint, the authors make a step forward in our understanding of how expectations modulate pain by introducing a combination of spatial and temporal investigation. It is becoming increasingly clear that our appraisal of the world is dynamic, guided by previous experiences, and mapped on a combination of what we expect and what we get. New research methods, questions, and analyses are needed to capture these evolving processes.
  
  Thank you very much for these positive comments!
  
  Weaknesses:
  
  The control condition is not so straightforward. Across the manuscript it is defined as "no expectation", and in the legend of Figure 1 it is mentioned that the third state would be "no prediction". However, it is difficult to conceive that participants would not have any expectations or predictions. Indeed, in the description of the task it is mentioned that participants were instructed that they would receive stimuli during "intermediate sensitive states". The results of the pain scores and expectations might support the idea that the control condition is situated in between the placebo and nocebo conditions. However, since this control condition was not part of the initial conditioning, and participants had no reference to previous stimuli, one might expect that some ratings might have simply "regressed to the mean" for a lack of previous experience.
  
  General considerations and reflections:
  
  Inducing expectations in the desired direction is not a straightforward task, and results might depend on the exact experimental conditions and the comparison group. In this sense, the authors' choice of having 3 groups of positive, negative, and "neutral" expectations is to be praised. On the other hand, also control groups form their expectations, and this can constitute a confounder in every experiment using expectation manipulation, if not appropriately investigated.
  
  Thank you for raising these important concerns! Firstly, as it seems that we did not explain the experimental procedure in a clear fashion, there appeared to be a general misunderstanding regarding our instructions. We want to emphasize that we did not tell participants that the stimulus intensity would always be the same, but that pain stimuli would be different temperatures of medium intensity. Furthermore, our instruction did not necessarily imply that our algorithm detected a state of medium sensitivity, but that the algorithm would not make any prediction, e.g., due to highly fluctuating states of pain sensitivity, or no clear-cut state of high or low pain sensitivity. We changed this in the Methods (ll. 556-560, 601-606, 612-614) and Results (ll. 181-192) sections of the manuscript to clarify these important features of our procedure.
  
  Then, we absolutely agree that participants explicitly and implicitly form expectations regarding all conditions over time, including the control condition. We carefully considered your feedback and rephrased the control condition, no longer framing it as eliciting “no expectations” but as “neutral expectations” in the revised version of the manuscript. This follows the more common phrasing in the literature and acknowledges that participants indeed build up expectations in the control condition. However, we do still think that we can meaningfully compare the placebo and nocebo condition to the control condition to investigate the neuronal underpinnings of expectation effects. Independently of whether participants build up an expectation of “medium” intensities in the control condition, which caused them to perceive stimuli in line with this expectation, or if they simply perceived the stimuli as they were (of medium intensity) with limited effects of expectations, the crucial difference to the placebo and nocebo conditions is that there was no alteration of perception due to previous experiences or verbal information and no shift of perception from the actual stimulus intensity towards any direction in the control condition. This allowed us to compare the neural basis of a modulation of pain perception in either direction to a condition in which this modulation did not take place.
  
  Author response image 1.
  
  Variability within conditions over time. Relative variability index for expectation (left) and pain ratings (right) per condition and measurement block.
  
  Lastly, we want to highlight that our finding of the control condition being rated in between the placebo and nocebo condition is in line with many previous studies that included similar control conditions and advanced our understanding of pain-related expectations (Bingel et al., 2011; Colloca et al., 2010; Shih et al., 2019). We thank the reviewer for the very interesting idea to evaluate the development of ratings in the control condition in more detail and added a new analysis to the manuscript in which we compared how much intra-subject variance was within the ratings of each of the three conditions and how much this variance changed over time. For this aim, we computed the relative variability index (Mestdagh et al., 2018), a measure that quantifies intra-subject variation over multiple ratings, and compared between the three conditions and the three measurement blocks. We observed differences in variances between conditions for both expectation (F(2,96) = 8.14, p < .001) and pain ratings (F(2,96) = 3.41, p = .037). For both measures, post-hoc tests revealed that there was significantly more variance in the placebo compared to the control condition (both p_holm < .05), but no difference between control and nocebo. The substantial and comparable variation in pain and expectation ratings in all three conditions (or at least between control and nocebo) shows that participants did not always expect and perceive the same intensity within conditions. Variance in expectation ratings decreased from the first block compared to the other two blocks (_F(1.35,64.64) = 5.69, p = .012; both p_holm < .05), which was not the case for pain ratings. Most importantly, there was no interaction effect of block and condition for neither expectation (_F(2.65,127.06) = 0.40, p = .728) nor pain ratings (F(4,192) = 0.48, p = .748), which implies that expectations were similarly dynamically updated in all conditions over the course of the experiment. This speak against a “regression to the mean” in the control condition and shows that control ratings fluctuated from trial to trial. We included this analysis and a more in-depth discussion of the choice of conditions in the Result (ll. 219-232) and Discussion (ll. 452-486) sections of the revised manuscript.
  
  In addition, although fMRI is still (probably) the best available tool we have to understand the spatial representation of cortical processing, limitations about not only the temporal but even the spatial resolution should be acknowledged. Given the anatomical and physiological complexity of the cortical connections, as we know from the animal world, it is still well possible that subcircuits are activated also for positive and negative expectations, but cannot be observed due to the limitation of our techniques. Indeed, on an empirical/evolutionary basis it would remain unclear why we should have a system that waits for the valence of a stimulus to show differential responses.
  
  We agree that the spatial resolution of fMRI is limited and that our signal is often not able to dissociate different subcircuits. Whether on this basis differential processes occurred cannot be observed in fMRI but is indeed possible. We now include this reasoning in our Discussion (ll. 373-377):
  
  “Importantly, the spatial resolution of fMRI is limited when it comes to discriminating whether the same pattern of activity is due to identical activation or to activation in different sub-circuits within the same area. Nonetheless, the overlap of areas is an indicator for similar processes involved in a more general preparation process.”
  
  Also, moving in a dimension of network and graph theory, one would not expect single areas to be responsible for distinct processes, but rather that they would integrate information in a shared way, potentially with different feedback and feedforward communications. As such, it becomes more difficult to assume the insula is a center for coding potential pain, perhaps more of a node in a system that signals potential dangers for the integrity of the body.
  
  We appreciate the feedback on our interpretation of our results and agree that the overall network activity most likely determines how a large part of expectations and pain are coded. We therefore adjusted the Discussion, embedding the results in an interpretation considering networks (ll. 427-430, 432-435,438-442 ).
  
  The authors analyze the EEG signal between 0.5 to 128 Hz, finding significant results in the correlation between single-trial BOLD and EEG activity in the higher gamma range (see Figure 6 panel C). It would be interesting to understand the rationale for including such high frequencies in the signal, and the interpretation of the significant correlation in the high gamma range.
  
  On a technical level, we adapted our EEG processing pipeline from Hipp et al. (2011) who similarly investigated signals up to 128 Hz. Of note, the spectral smoothing was adjusted to match 3/4 octave, meaning that the frequency resolution at 128 Hz is rather broad and does not only contain oscillations at 128 Hz sharp. Gamma oscillations in general have repeatedly been reported in relation to pain and feedforward signals reflecting noxious information (e.g. Ploner et al., 2017; Strube et al., 2021). Strube et al. (2021) reported the highest effects of pain stimulus intensity and prediction error processing at high gamma frequencies (100 and 98 Hz, respectively). These findings could also serve as basis to interpret our results in this frequency range: If anticipatory activation in the ACC is linked to high gamma oscillations, which appear to play an important role in feedforward signaling of pain intensity and prediction errors, this could indicate that later processing of intensity in this area is already pre-modulated before the stimulus actually occurs. Of note: although not significant, it looks as if the cluster extends further into pain processing on a descriptive level. We added additional explanation regarding the interpretation of the correlation in the Discussion (ll. 414425):
  
  “The link between anticipatory activity in the ACC and EEG oscillatory activity was observed in the high gamma band, which is consistent with findings that demonstrate a connection between increased fMRI BOLD signals and a relative shift from lower to higher frequencies (Kilner et al., 2005). Gamma oscillations have been repeatedly reported in the context of pain and expectations and have been interpreted as reflecting feedforward signals of noxious information ( e.g. Ploner et al., 2017; Strube et al., 2021). In combination with our findings, this might imply that high frequency oscillations may not only signal higher actual or perceived pain intensity during pain processing (Nickel et al., 2022; Ploner et al., 2017; Strube et al., 2021; Tu et al., 2016), but might also be instrumental in the transfer of directed expectations from anticipation into pain processing.”
  
  Reviewer #2 (Public Review):
  
  I think this is a very promising paper. The combination of EEG and fMRI is unique and original. However, I also have some suggestions that I think could help improve the manuscript.
  
  This manuscript reports the findings of an EEG-fMRI study (n = 50) on the effects of expectations on pain. The combination of EEG with fMRI is extremely original and well-suited to study the transition from expectation to perception. However, I think that the current treatment of the data, as well as the way that the manuscript is currently written, does not fully capitalize on the potential of this unique dataset. Several findings are presented but there is currently no clear message coming out of this manuscript.
  
  First, one positive point is that the experimental manipulation clearly worked. However, it should be noted that the instructions used are not typical of studies on placebo/nocebo. Participants were not told that the stimulations would be of higher/lower intensity. Rather, they were told that objective intensities were held constant, but that EEG recordings could be used to predict whether they would perceive the stimulus as more or less intense. I think that this is an interesting way to manipulate expectations, but there could have been more justification in the introduction for why the authors have chosen this unusual procedure.
  
  Most importantly, we again want to emphasize again that participants were not aware that the stimulation temperature was always the same but were informed that they would receive different stimuli of medium intensity. We now clarify this in the revised Results (ll. 190-192) and Methods (ll. 612-614) sections.
  
  While we agree that our procedure was not typical, we do not think that the manipulation is not comparable to previous studies on pain-related expectations. To our knowledge, either expectations regarding a treatment that changes pain perception (treatment expectancy) or expectations regarding stimulus intensities (stimulus expectancy) are manipulated (see Atlas & Wager, 2014). In our study, participants received a cue that induced expectations in regard to a ”treatment”, although in this case the “treatment” came from changes in their own brain activity. This is comparable to studies using TENS-devices that are supposedly changing peripheral pain transmission (Skvortsova et al., 2020). Thus, although not typical, our paradigm could be classified as targeting treatment expectancies and allowed us to examine effects on a trial-by-trial level within subjects. We added a paragraph regarding the comparability of our paradigm with previous studies in the Discussion of the revised manuscript (ll. 452-464) .
  
  Also, the introduction mentions that little is known about potential cerebral differences between expectations of high vs. low pain expectations. I think the fear conditioning literature could be cited here. Activations in ACC, SMA, Ins, parahippocampal gyrus, PAG, etc. are often associated with upcoming threat, whereas activations vmPFC/default mode network are associated with safety.
  
  We thank you for your suggestions to add literature on fear conditioning. We agree there is some overlap between fear conditioning and expectation effects in humans, but we also believe there are fundamental differences regarding their underlying processes and paradigms. E.g. the expectation effects are not driven by classical learning algorithms but act in a large amount as self-fulfilling prophecies (see e.g. Jepma et al., 2018). However, we now acknowledge the similarities e.g in the recruitment of the insula and the vmPFC of the modalities in our Introduction (ll. 132-136 ).
  
  The fact that the authors didn't observe a clearer distinction between high and low expectations here could be related to their specific instructions that imply that the stimulus is the same and that it is the subjective perception that is expected to change. In any case, this is a relatively minor issue that is easy to address.
  
  We apologize again for the lack of clarity in our instructions: Participants were unaware that they would receive the exact same stimulus. The clear effects of the different conditions on expectation and pain ratings also challenge the notion that participants always expected the same level of stimulation and/or perception. Additionally, if participants were indeed expecting a consistent level of intensity in all conditions, one would also assume to see the same anticipatory activation in the control condition as in the placebo and nocebo conditions, which is not the case. Thus, we respectfully disagree that the common effects might be explained by our instructions but would argue that they indeed reflect common (anticipatory) processes of positive and negative expectations.
  
  Towards the end of the introduction, the authors present the aims of the study in mainly exploratory terms:
  
  (1) What are the differences between anticipation and perception?
  
  (2) What regions display a difference between high and low expectations (high > low or low < high) vs. an effect of expectation regardless of the direction (high and low different than neutral)?
  
  I think these are good questions, but the authors should provide more justification, or framework, for these questions. More specifically, what will they be able to conclude based on their observations?
  
  For instance (note that this is just an example to illustrate my point. I encourage the authors to come up with their own framework/predictions) :
  
  (1) Possibility #1: A certain region encodes expectations in a directed fashion (high > low) and that same region also responds to perception in the same direction (high > low). This region would therefore modulate pain by assimilating perception towards expectations.
  
  (2) Possibility # 2: different regions are involved in expectation and perception. Perhaps this could mean that certain regions influence pain processing through descending facilitation for instance...
  
  Thank you for pointing out that our hypotheses were not crafted carefully enough. We tried to give better explanations for the possible interpretations of our hypotheses. Additionally, we interpreted our results on the background of a broader framework for placebo and nocebo effects (predictive coding) to derive possible functions of the described brain areas. We embedded this in our Introduction (ll. 74-86, 158-175 ) and Discussion (ll. 384-388 ), interpreting the anticipatory activity and the activity during pain processing in the context of expectation formation as described in Büchel et al. (2014).
  
  Interpretation derived from our framework (ll. 384-388):
  
  e.g.: “Following the framework of predictive coding, our results would suggest that the DPMS is the network responsible for integrating ascending signals with descending signals in the pain domain and that this process is similar for positive and negative valences during anticipation of pain but differentiates during pain processing.”
  
  Regarding analyses, I think that examining the transition from expectations to perception is a strong angle of the manuscript given the EGG-fMRI nature of the study. However, I feel that more could have been done here. One problem is that the sequence of analyses starts by identifying an fMRI signal of interest and then attempts to find its EEG correlates. The problem is that the low temporal resolution of fMRI makes it difficult to differentiate expectation from perception, which doesn't make this analysis a good starting point in my opinion. Why not start by identifying an EEG signal that differentiates perception vs expectation, and then look for its fMRI correlates?
  
  We appreciate your feedback on the transition from expectations to perceptions and also think that additional questions could be answered with our data set. However, based on the literature we had specific hypotheses regarding specific brain areas, and we therefore decided to start from the fMRI data with the superior spatial resolution and EEG was used to focus on the temporal dynamics within the areas important for anticipatory processes. We share the view that many different approaches in analyzing our data are possible. On the other hand, identifying relevant areas based on EEG characteristics inherits even more uncertainty due to the spatial filtering of the EEG signal. For the research question of this study a more accurate evaluation of the involved areas and the related representation was more important. We therefore decided to only implement the procedure already present in the manuscript.
  
  Finally, I found the hypotheses on "valenced" vs. "absolute" effects a little bit more difficult to follow. This is because "neutral" is not really neutral: it falls in between low and high. If I follow correctly, participants know that the temperature is always the same. Therefore, if they are told that the machine cannot predict whether their perception is going to be low or high, then it must be because it is likely to be in between. Ratings of expectation and pain ratings confirm that. The neutral condition is not "devoid" of expectations as the authors suggest.
  
  Therefore, it would make sense to look at regions with the following pattern low > neutral > high, or vice-versa, low < neutral < high. Low & high being different than neutral is more difficult to interpret. I don't think that you can say that it reflects "absolute" expectations because neutral is also the expectation of a medium temperature. Perhaps it reflects "certainty/uncertainty" or something like that, but it is not clear that it reflects "expectations".
  
  Thank you for your valuable feedback! We considered your concerns about the interpretation of our results and completely agree that the control condition cannot be interpreted as void of expectations (ll. 119-123). We therefore evaluated the control condition in more detail in a separate analysis (ll. 219-232) and integrated a new assessment of the conditions into the Discussion (ll. 465-486). We changed the phrasing of our control condition to “neutral expectations”, as we agree that the control condition is not void of expectations and this phrasing is more in line with other studies (e.g. Colloca et al., 2010; Freeman et al., 2015; Schmid et al., 2015). We would argue that the neutral expectations can still be meaningfully compared to positive and negative expectations because only the latter shift expectations and perception in one direction. Thus, we changed our wording throughout the manuscript to acknowledge that we indeed did not test for general effects of expectations vs. no expectations, but for effects of directed expectations. Please also see our reasoning regarding the control condition in response to Reviewer 1, in which we addressed the interpretation of the control condition. We therefore still believe that the contrasts that we calculated between conditions are valid. The proposed new contrast largely overlaps with our differential contrast low>high and vice versa already reported in the manuscript (for additional results also see Supplements).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Figure 6, panel C. The figure mentions Anterior Cingulate Cortex R, whereas the legend mentions left ACC. Please check.
  
  Thanks for catching this, we changed the figure legend accordingly.
  
  Reviewer #2 (Recommendations For The Authors):
  
  - I don't think that activity during the rating of expectations is easily interpretable. I think I would recommend not reporting it.
  
  The majority of participants completed the expectation rating relatively quickly (M = 2.17 s, SD = 0.35 s), which resulted in the overlap between the DLPFC EEG cluster and the expectation rating encompassing only a limited portion of the cluster (~ 1 s). We agree that this activity still is more difficult to interpret, yet we have decided to report it for reasons of completeness.
  
  - The effects on SIIPS are interesting. I think that it is fine to present them as a "validation" of what was observed with pain ratings, but it also seems to give a direction to the analyses that the authors don't end up following. For instance, why not try other "signatures" like the NPS or signatures of pain anticipation? Also, why not try to look at EEG correlates of SIIPS? I don't think that the authors "need" to do any of that, but I just wanted to let them know that SIIPS results may stir that kind of curiosity in the readers.
  
  While this would be indeed very interesting, these additional analyses are not directly related to our current research question. We fear that too many analyses could be confusing for the readers. Nonetheless, we are grateful for your suggestion and will implement additional brain signatures in future studies.
  
  - The shock was calibrated to be 60%. Why not have high (70%) and low (30%) conditions at equal distances from neutral, like 80% and 40% for instance? The current design makes it hard to distinguish high from control. Perhaps the "common" effects of high + low are driven by a deactivation for low (30%)?
  
  We appreciate your feedback! We adjusted the temperature during the test phase to counteract habituation typically happening with heat stimuli. We believe that this was a good measure as participants rated the control condition at roughly VAS 50 (M = 51.40) which was our target temperature and then would be equidistant to the VAS 70 and VAS 30 during conditioning when no habituation should have taken place yet. We further tested whether participants rated placebo and nocebo trials at equal distances from the control condition and found no existent bias for either of the conditions. To do this, we computed the individual placebo effect (control minus placebo) and nocebo effect (nocebo minus control) for each participant during the test phase and statistically compared whether they differed in terms of magnitude. There was no significant difference between placebo and nocebo effects for both expectation (placebo effect M = 14.25 vs. nocebo effect M = 17.22, t(49) = 1.92, p = .061) and pain ratings (placebo effect M = 6.52 vs. nocebo effect M = 5.40, t(49) = -1.11, p = .274). This suggests that our expectation manipulation resulted in comparable shifts in expectation and pain ratings away from the control condition for both the placebo and nocebo condition and thus hints against any bias of the conditioning temperatures. Please also note that the analysis of the common effects was masked for differences of the high and low, therefore the effects cannot be driven by one condition by itself.
  
  - If I understand correctly, all fMRI contrasts were thresholded with FWE. This is fine, but very strict. The authors could have opted for FDR. Maybe I missed something here....
  
  While it is true that FDR is the more liberal approach, it is not valid for spatially correlated fMRI data and is no longer available in SPM for the correction of multiple comparisons. The newly implemented topological peak based FDR correction is comparably sensitive with the FWE correction (see. Chumbley et al. BELEG). We opted for the slightly more conservative approach in our preregistration (_p_FWE < .05), therefore a change of the correction is not possible.
  
  Altogether, I think that this is a great study. The combination of EEG and fMRI is truly unique and affords many opportunities to examine the transition from expectations to perception. The experimental manipulation of expectations seems to have worked well, and there seem to be very promising results. However, I think that more could have been done. At least, I would recommend trying to give more of a theoretical framework to help interpret the results.
  
  We are very grateful for your positive feedback. We took your suggestion seriously and tried to implement a more general framework from the literature (see Büchel et al., 2014) to provide a better explanation for our results.
  
  References
  
  Atlas, L. Y., & Wager, T. D. (2014). A meta-analysis of brain mechanisms of placebo analgesia: Consistent findings and unanswered questions. Handbook of Experimental Pharmacology, 225, 37–69. https://doi.org/10.1007/978-3-662-44519-8_3
  
  Bingel, U., Wanigasekera, V., Wiech, K., Ni Mhuircheartaigh, R., Lee, M. C., Ploner, M., & Tracey, I. (2011). The effect of treatment expectation on drug efficacy: Imaging the analgesic benefit of the opioid remifentanil. Science Translational Medicine, 3(70), 70ra14. https://doi.org/10.1126/scitranslmed.3001244
  
  Büchel, C., Geuter, S., Sprenger, C., & Eippert, F. (2014). Placebo analgesia: A predictive coding perspective. Neuron, 81(6), 1223–1239. https://doi.org/10.1016/j.neuron.2014.02.042
  
  Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain, 151(2), 430–439. https://doi.org/10.1016/j.pain.2010.08.007
  
  Freeman, S., Yu, R., Egorova, N., Chen, X., Kirsch, I., Claggett, B., Kaptchuk, T. J., Gollub, R. L., & Kong, J. (2015). Distinct neural representations of placebo and nocebo effects. NeuroImage, 112, 197–207. https://doi.org/10.1016/j.neuroimage.2015.03.015
  
  Hipp, J. F., Engel, A. K., & Siegel, M. (2011). Oscillatory synchronization in large-scale cortical networks predicts perception. Neuron, 69(2), 387–396. https://doi.org/10.1016/j.neuron.2010.12.027
  
  Jepma, M., Koban, L., van Doorn, J., Jones, M., & Wager, T. D. (2018). Behavioural and neural evidence for self-reinforcing expectancy effects on pain. Nature Human Behaviour, 2(11), 838–855. https://doi.org/10.1038/s41562-018-0455-8
  
  Kilner, J. M., Mattout, J., Henson, R., & Friston, K. J. (2005). Hemodynamic correlates of EEG: A heuristic. NeuroImage, 28(1), 280–286. https://doi.org/10.1016/j.neuroimage.2005.06.008
  
  Nickel, M. M., Tiemann, L., Hohn, V. D., May, E. S., Gil Ávila, C., Eippert, F., & Ploner, M. (2022). Temporal-spectral signaling of sensory information and expectations in the cerebral processing of pain. Proceedings of the National Academy of Sciences of the United States of America, 119(1). https://doi.org/10.1073/pnas.2116616119
  
  Ploner, M., Sorg, C., & Gross, J. (2017). Brain Rhythms of Pain. Trends in Cognitive Sciences, 21(2), 100–110. https://doi.org/10.1016/j.tics.2016.12.001
  
  Schmid, J., Bingel, U., Ritter, C., Benson, S., Schedlowski, M., Gramsch, C., Forsting, M., & Elsenbruch, S. (2015). Neural underpinnings of nocebo hyperalgesia in visceral pain: A fMRI study in healthy volunteers. NeuroImage, 120, 114–122. https://doi.org/10.1016/j.neuroimage.2015.06.060
  
  Shih, Y.‑W., Tsai, H.‑Y., Lin, F.‑S., Lin, Y.‑H., Chiang, C.‑Y., Lu, Z.‑L., & Tseng, M.‑T. (2019). Effects of Positive and Negative Expectations on Human Pain Perception Engage Separate But Interrelated and Dependently Regulated Cerebral Mechanisms. Journal of Neuroscience, 39(7), 1261–1274. https://doi.org/10.1523/JNEUROSCI.2154-18.2018
  
  Skvortsova, A., Veldhuijzen, D. S., van Middendorp, H., Colloca, L., & Evers, A. W. M. (2020). Effects of Oxytocin on Placebo and Nocebo Effects in a Pain Conditioning Paradigm: A Randomized Controlled Trial. The Journal of Pain, 21(3-4), 430–439. https://doi.org/10.1016/j.jpain.2019.08.010
  
  Strube, A., Rose, M., Fazeli, S., & Büchel, C. (2021). The temporal and spectral characteristics of expectations and prediction errors in pain and thermoception. ELife, 10. https://doi.org/10.7554/eLife.62809
  
  Tu, Y., Zhang, Z., Tan, A., Peng, W., Hung, Y. S., Moayedi, M., Iannetti, G. D., & Hu, L. (2016). Alpha and gamma oscillation amplitudes synergistically predict the perception of forthcoming nociceptive stimuli. Human Brain Mapping, 37(2), 501–514. https://doi.org/10.1002/hbm.23048
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.05.583509v2
www.biorxiv.org www.biorxiv.org

New submission 25/09/2023, 09:01:35

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The current manuscript provides a timely contribution to the ongoing discussion about the mechanism of the apical sodium/bile acid transporter (ASBT) transporters. Recent structures of the mammalian ASBT transporters exhibited a substrate binding mode with few interactions with the core domain (classically associated with substrate binding), prompting an unusual proposal for the transport mechanism. Early structures of ASBT homologues from bacteria also exhibit unusual substrate binding in which the core substrate binding domain is less engaged than expected. Due to the ongoing questions of how substrate binding and mechanism are linked in these transporters, the authors set out to deepen our understanding of a model ABST homolog from bacteria N. meningitidis (ABST-NM).
  
  The premise of the current paper is that the bacterial ASBT homologs are probably not physiological bile acid transporters, and that structural elucidation of a natively transported substrate might provide better mechanistic information. In the current manuscript, the authors revisit the first BASS homologue to be structurally characterized, ABST-NM. Based on bacteriological assays in the literature, the authors identify the coenzyme A precursor pantoate as a more likely substrate for ABSTNM than taurocholate, the substrate in the original structure. A structure of ASBT-NM with pantoate exhibits interesting differences in structure. The structures are complemented with MD simulations, and the authors propose that the structures are consistent with a classical elevator transport mechanism.
  
  The structural experiments are generally solid, although showing omit maps would bolster the identification of the substrate binding site.
  
  We have added an omit map in Fig S2.
  
  One shortcoming is that, although pantoate binding is observed, the authors do not show transport of this substrate, undercutting the argument that the pantoate structure represents binding of a "better" or more native substrate. Mechanistic proposals, like the proposed role of T112 in unlocking the transporter, would be much better supported by transport data.
  
  In the absence of being able to source radiolabelled pantoate at a reasonable cost, we decided to focus on binding studies, relying on the fact that pantoate/pyruvate uptake has been shown in other BASS transporters. While we agree that transport needs to be substantiated, our crystallographic and molecular dynamics studies combined provide a picture of sodium ions stabilising the substrate binding site to enable the binding of the substrate, which in turn induces further conformational changes. Such changes would be consistent with a mechanism of sodium driven transport with clear coupling of the sodium ions to substrate translocation. We are not saying this is a “better” substrate but rather that a substrate binding like this would be able to elicit the conformational changes necessary for transport – something that has been missing from previous studies.
  
  Reviewer #2 (Public Review):
  
  The manuscript starts with a demonstration of pantoate binding to ASBTnm using a thermostability assay and ITC, and follows with structure determinations of ASBTnm with or without pantoate. The structure of ASBTnm in the presence of pantoate pinpoints the binding site of pantoate to the "crossover" region formed by partially unwinded helices TMs 4 and 9. Binding of pantoate induces modest movements of side chain and backbone atoms at the crossover region that are consistent with providing coordination of the substrate. The structures also show movement of TM1 that opens the substrate binding site to the cytosol and mobility of loops between the TMs. MD simulations of the ASBT structure embedded in lipid bilayer suggests a stabilizing effect of the two sodium ions that are known to co-transport with the substrate. Binding study on pantoate analogs further demonstrates the specificity of pantoate as a substrate.
  
  The weakness of the manuscript includes a lack of transport assay for pantoate and a lack of demonstration that the observed conformational changes in TM1 and the loops are relevant to the binding or transport of pantoate.
  
  We agree that the manuscript would have been bolstered by transport data (see response to reviewer 1). The take-home message from the movement of TM1 and the loops is that they are flexible. It is probably unlikely that TM1 moves like this during the transport cycle and we have avoided overplaying the significance of this movement. Instead, we have focussed on the conformational changes in the pantoate binding site. We have made an additional movie concentrating on the binding site and not including TM1.
  
  Overall, the structural, functional and computational studies are solid and rigorous, and the conclusions are well justified. In addition, the authors discussed the significance of the current study in a broader perspective relevant to recent structures of mammalian BASS members.
  
  Reviewer #3 (Public Review)
  
  The manuscript describes new ligand-bound structures within the larger bile acid sodium symporter family (BASS). This is the primary advance in the manuscript, together with molecular simulations describing how sodium and the bile acids sit in the structure when thermalized. What I think is fairly clear is that the ligands are more stable when the sodiums are present, with a marked reduction in RMSD over the course of repeated trajectories. This would be consistent with a transport model where sodium ions bind first, and then the bile acid binds, followed by a conformational change to another state where the ligands unbind.
  
  While the authors mention that BASS transporters are thought to undergo an elevator transport mechanisms, this is not tested here. In my reading, all the crystal structures describe the same conformational state, and the simulations do not make an attempt to induce a transition on accessible simulation timescales. Instead, there is a morph between two states where different substrates are bound, which induces a conformational change that looks unrelated to the transport cycle.
  
  To make our conclusions clearer we have added another movie showing a morph between the structure without substrate (instead of using the structure with taurocholate, which we were using as a representative of the unbound structure) and that with pantoate and have omitted the panel domain including TM1. While both of these structures are inward-facing, there are significant conformational changes within TM4 that we have described in the article.
  
  Instead, the focus is on what kinds of substrates bind to this transporter, interrogating this with isothermal calorimetry together with mutations. With a Kd in the micromolar range, even the best binder, pantoate, actually isn't a particularly tight binder in the pharmaceutical sense. For a transporter, tight binding is not actually desirable, since the substrate needs to be able to leave after conformational change places it in a position accessible to the other side.
  
  As the referee points out the Kd that we observe would be consistent with those for substrates of other transporters.
  
  There is one really important point that readers and authors should be aware of. In Figure 2A, the names are not consistent with the chemical structure. "-ate" denotes when a carboxylic acid is in the deprotonated form, creating a charged carboxylate. What is drawn is pantoic acid, ketopantoic acid, and pantoethenic acid. Less importantly, the wedges and hashes for the methyl group are arguably not appropriate, since the carbon they are attached to is not a chiral center. For the crystallization, this makes no difference, since under near-neutral pKas the carboxylic acid will spontaneously deprotonate, and the carboxylate form will be the most common. However, if the structures in Figure 2A were used for classical molecular simulation, that would be a big problem, since now that would be modeling the much rarer neutral form rather than the charged state. I am reasonably sure based on Figure 5 that the MD correctly modeled the deprotonated form with a carboxylate, but that is inconsistent with Figure 2A. Otherwise, the structure and simulation analysis falls into the mainstream of modern structural biology work.
  
  We have corrected the inconsistency of the protonaNon state in the naming of the molecular structures. Thank you for poinNng this out – though the names represented the predominant form in soluNon, the more aestheNcally pleasing protonated form got the beOer of us in our representaNons. The correct form was used in the MD.
  
  Reviewer #1 (Recommendations For The Authors):
  
  1) Omit maps (Fo-Fc) should be shown for pantoate and for the sodiums in the structure.
  
  This has been added to supplementary Figure 2.
  
  2) Line 86 - could you briefly describe the alternative mechanism proposed for the mammalian NTPCs?
  
  We have added an extra line to describe this deviation from the classical alternating access model.
  
  3) Line 124 - where is the lipid like molecule, and does it interact with either the kinked helix or the substrate? A supplemental figure would be helpful.
  
  The lipid like molecule lies between the substrate and the kinked helix, but doesn’t interact strongly with either. It would appear that the lipid would bind in the crevice rather than causing the crevice. We add Author response image 1 here but have not added it to the supplementary figures. The maps and PDB file are available for download.
  
  Author response image 1.
  
  The 2mFo-DFc density is at 1σ, the mFo-DFc density is at 2.5σ.
  
  4) I notice that the apo and pantoate structures are crystallized in different space groups. How does this compare to the original TCH structure? Is there any chance that crystal packing is altering the TM1 geometry or loop 1?
  
  We cannot rule out the effect of the crystallisation conditions on the movement of the TM1. We have now solved a number of different structures of ASBTNM and this is the first time we observe TM1 in this conformation. As stated above we have refrained from overplaying the significance of the movement of TM1 to transport, other than to say that some adjustments need to be made to accommodate the pantoate.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor comments:
  
  Pg 3, "... with a 5-fold inverted repeat...", Should be 2-fold?
  
  Changed, thank you.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Is there any chance that the MD simulations (even in a reduced form) could be uploaded to Zenodo or a similar repository?
  
  We have taken up this suggestion and added the information in the paper: MD trajectories in the GROMACS XTC format were deposited in the OSF.io repository under DOI 10.17605/OSF.IO/KFDT5 under the open CC-BY Attribution 4.0 International license. The trajectories contain all atoms and were subsampled at 5-ns intervals. GROMACS run input files (TPR format) and initial coordinate files (GRO format) together with topology files (GROMACS format) are also included.
  
  Watch the "Å" symbol in Figures 5, S6, S7. This looks like they were made in matplotlib, and probably used something like: "$\AA$", which puts the symbol in math mode. This makes the Å symbol in italics. Matplotlib has gotten better UTF-8 support
  
  Changed, thank you.
  
  Your citation for LINCS duplicates the citation for PME. I think you want the Hess 1998 paper. 10.1002/(SICI)1096-987X(199709)18%3A12<1463%3A%3AAID-JCC4>3.0.CO%3B2-H
  
  Changed, thank you
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.02.543391v2
www.biorxiv.org www.biorxiv.org

Discovering Root Causal Genes with High Throughput Perturbations

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 1:
  
  (1) The notion of a “root” causal gene - which the authors define based on a graph theoretic notion of topologically sorting graphs - requires a graph that is directed and acyclic. It is the latter that constitutes an important weakness here - it simply is a large simplification of human biology to draw out a DAG including hundreds of genes and a phenotype Y and to claim that the true graph contains no cycles.
  
  We agree that real causal graphs in biology often contain cycles. We now include additional experimental results with cyclic directed graphs in the Supplementary Materials. RCSP outperformed the other algorithms even in this setting, but we caution the reader that the theoretical interpretation of the RCS score may not coincide with a root causal effect when cycles exist:
  
  “We also evaluated the algorithms on directed graphs with cycles. We generated a linear SEM over ρ + 1 = 1000 variables in . We sampled the coefficient matrix β from a Bernoulli (1/(p − 1)) distribution but did not restrict the non-zero coefficients to the upper triangular portion of the matrix. We then proceeded to permute the variable ordering and weight each entry as in the Methods for the DAG. We repeated this procedure 30 times and report the results in Supplementary Figure 3.
  
  RCSP again outperformed all other algorithms even in the cyclic case. The results suggest that conditioning on the surrogate ancestors also estimates the RCS well even in the cyclic case. However, we caution that an error term E<sub>i</sub> can affect the ancestors of when cycles exist. As a result, the RCS may not isolate the causal effect of the error term and thus not truly coincide with the notion of a root causal effect in cyclic causal graphs.”
  
  (2) I also encourage the authors to consider more carefully when graph structure learned from Perturb-seq can be ported over to bulk RNA-seq. Presumably this structure is not exactly correct - to what extent is the RCSP algorithm sensitive to false edges in this graph? This leap - from cell line to primary human cells - is also not modeled in the simulation. Although challenging - it would be ideal for the RCSP to model or reflect the challenges in correctly identifying the regulatory structure.
  
  We now include additional experimental results, where we gradually increased the incongruence between the DAG modeling the Perturb-seq and the DAG modeling the bulk RNA-seq using a mixture of graphs. The performance of RCSP degraded gradually, rather than abruptly, with increasing incongruence. We therefore conclude that RCSP is robust to differences between the causal graphs representing Perturb-seq and bulk RNA-seq:
  
  “We next assessed the performance of RCSP when the DAG underlying the Perturb-seq data differs from the DAG underlying the bulk RNA-seq data. We considered a mixture of two random DAGs in bulk RNA-seq, where one of the DAGs coincided with the Perturb-seq DAG and second alternate DAG did not. We instantiated and simulated samples from each DAG as per the previous subsection. We generated 0%, 25%, 50%, 75%, and 100% of the bulk RNA-seq samples from the alternate DAG, and the rest from the Perturb-seq DAG. We ideally would like to see the performance of RCSP degrade gracefully, as opposed to abruptly, as the percent of samples derived from the alternate DAG increases.
  
  We summarize results in Supplementary Figure 4. As expected, RCSP performed the best when we drew all samples from the same underlying DAG for Perturb-seq and bulk RNA-seq. However, the performance of RCSP also degraded slowly as the percent of samples increased from the alternate DAG. We conclude that RCSP can accommodate some differences between the underlying DAGs in Perturb-seq and bulk RNA-seq with only a mild degradation in performance.”
  
  (3) It should also be noted that in most Perturb-seq experiments, the entire genome is not perturbed, and frequently important TFs (that presumably are very far “upstream” and thus candidate “root” causal genes) are not expressed highly enough to be detected with scRNA-seq. In that context - perhaps slightly modifying the language regarding RCSP’s capabilities might be helpful for the manuscript - perhaps it would be better to describe it as an algorithm for causal discovery among a set of genes that were perturbed and measured, rather than a truly complete search for causal factors. Perhaps more broadly it would also benefit the manuscript to devote slightly more text to describing the kinds of scenarios where RCSP (and similar ideas) would be most appropriately applied - perhaps a well-powered, phenotype annotated Perturb-seq dataset performed in a disease relevant primary cell.
  
  We now clarify that Perturb-seq can only identify root causal genes among the perturbed set of genes in the Discussion:
  
  “Modern genome-wide Perturb-seq datasets also adequately perturb and measure only a few thousand, rather than all, gene expression levels. RCSP can only identify root causal genes within this perturbed and measured subset.”
  
  We now also describe the scenario where RCSP can identify root causal genes well in the Introduction:
  
  “Experiments demonstrate marked improvements in performance, when investigators have access to a large bulk RNA-seq dataset and a genome-wide Perturb-seq dataset from a cell line of a disease-relevant tissue.”
  
  Reviewer 2:
  
  (1) The process from health-to-disease is not linear most of the time with many checks along the way that aim to prevent the disease phenotype. This leads to a non-deterministic nature of the path from health-to-disease. In other words, with the same root gene perturbations, and depending on other factors outside of gene expression, someone may develop a phenotype in a year, another in 10 years and someone else never. Claiming that this information is included in the error terms might not be sufficient to address this issue. The authors should discuss this limitation.
  
  The proposed approach accommodates the above non-deterministic nature. The error terms of model factors that are outside of gene expression. We model the relation from gene expression to Y as probabilistic rather than deterministic because , where E<sub>Y</sub> introduces stochasticity. Thus, two individuals with the same instantiations of the root causes may develop disease differently. We now clarify this in Methods:
  
  “The error terms model root causes that are outside of gene expression, such as genetic variation or environmental factors. Moreover, the relation from gene expression to Y is stochastic because , where E<sub>Y</sub> introduces the stochasticity. Two individuals may therefore have the exact same error term values over but different instantiations of Y.”
  
  (2) The paper assumes that the network connectivity will remain the same after perturbation. This is not always true due to backup mechanisms in the cells. For example, suppose that a cell wants to create product P and it can do it through two alternative paths: Path #1: A → B → P, Path #2: A → C → P. Now suppose that path #1 is more efficient, so when B can be produced, path #2 is inactive. Once the perturbation blocks element B from being produced, the graph connectivity changes by activation of path #2. I did not see the authors taking this into consideration, which seems to be a major limitation in using Perturb-seq results to infer conductivities.
  
  We agree that backup mechanisms can exist and therefore now include additional experimental results, where we gradually increased the incongruence between the DAG modeling the Perturb-seq and the DAG modeling the bulk RNA-seq using a mixture of graphs. The performance of RCSP degraded gradually, rather than abruptly, with increasing incongruence. We therefore conclude that RCSP is robust to differences between the causal graphs representing Perturb-seq and bulk RNA-seq:
  
  “We next assessed the performance of RCSP when the DAG underlying the Perturb-seq data differs from the DAG underlying the bulk RNA-seq data. We considered a mixture of two random DAGs in bulk RNA-seq, where one of the DAGs coincided with the Perturb-seq DAG and second alternate DAG did not. We generated 0%, 25%, 50%, 75%, and 100% of the bulk RNA-seq samples from the alternate DAG, and the rest from the Perturb-seq DAG. We ideally would like to see the performance of RCSP degrade gracefully, as opposed to abruptly, as the percent of samples derived from the alternate DAG increases.
  
  We summarize results in Supplementary Figure 4. As expected, RCSP performed the best when we drew all samples from the same underlying DAG for Perturb-seq and bulk RNA-seq. However, the performance of RCSP also degraded slowly as the percent of samples increased from the alternate DAG. We conclude that RCSP can accommodate some differences between the underlying DAGs in Perturb-seq and bulk RNA-seq with only a mild degradation in performance.”
  
  (3) There is substantial system heterogeneity that may cause the same phenotype. This goes beyond the authors claim that although the initial gene causes of a disease may differ from person to person, at some point they will all converge to changes in the same set of “root genes.” This is not true for many diseases, which are defined based on symptoms and lab tests at the patient level. You may have two completely different molecular pathologies that lead to the development of the same symptoms and test results. Breast cancer with its subtypes is a prime example of that. In theory, this issue could be addressed if there is infinite sample size. However, this assumption is largely violated in all existing biological datasets.
  
  The proposed method accommodates the above heterogeneity. We do not assume that the root causes affect the same set of root causal genes. Instead the root causes and root causal genes may vary from person to person. We write in the Introduction:
  
  “The problem is further complicated by the existence of complex disease, where a patient may have multiple root causal genes that differ from other patients even within the same diagnostic category... We thus also seek to identify patient-specific root causal genes in order to classify patients into meaningful biological subgroups each hopefully dictated by only a small group of genes.”
  
  The root causal genes may further affect different downstream genes at the patient-specific level. However root causal genes tend to have many downstream effects so that virtually every gene expression level becomes correlated with Y. We now clarify this by describing the omnigenic root causal model in the Introduction as follows:
  
  “Finally, application of the algorithm to two complex diseases with disparate pathogeneses recovers an omnigenic root causal model, where a small set of root causal genes drive pathogenesis but impact many downstream genes within each patient. As a result, nearly all gene expression levels are correlated with the diagnosis at the population level.”
  
  (4) Were the values of the synthetic variables Z-scored?
  
  Yes, all variables were z-scored. We now clarify this in Methods:
  
  “We also standardized all variables before running the regressions to prevent gaming of the marginal variances in causal discovery (Reisach et al., 2021; Ng et al., 2024).”
  
  (5) The algorithm seems to require both RNA-seq and Perturb-seq data (Algorithm 1, page 14). Can it function with RNA-seq data only? What will be different in this case?
  
  The algorithm cannot function with observational bulk RNA-seq data only. We included Perturb-seq because causal discovery with observational RNA-seq data alone tends to be inaccurate and unstable, as highlighted by the results of CausalCell. We further emphasize that we do not rely on d-separation faithfulness in Methods, which is typically required for causal discovery from observational data alone:
  
  “We can also claim the backward direction under d-separation faithfulness. We however avoid making this additional assumption because real biological data may not arise from distributions obeying d-separation faithfulness in practice.”
  
  (6) Synthetic data generation: how many different graphs (SEMs) did they start from? (30?) How many samples per graph? Did they test different sample sizes?
  
  We now clarify that we generate 30 random SEMs, each associated with a DAG. We used 200 samples for the bulk RNA-seq to mimic a relatively large but common sample size. We also drew 200 samples for each perturbation or control in the Perturb-seq data. We did not consider multiple sample sizes due to the time required to complete each run. Instead, we focused on a typical scenario where investigators would apply RCSP. We now write the following in the Methods:
  
  “We drew 200 samples for the bulk RNA-seq data to mimic a large but common dataset size. We introduced knockdown perturbations in Perturb-seq by subtracting an offset of two in the softplus function: . We finally drew 200 samples for the control and each perturbation condition to generate the Perturb-seq data. We repeated the above procedure 30 times.” We also include the following in Results:
  
  “We obtained 200 cell samples from each perturbation, and another 200 controls without perturbations. We therefore generated a total of 2501 × 200 = 500,200 single cell samples for each Perturb-seq dataset. We simulated 200 bulk RNA-seq samples.”
  
  (7) The presentation of comparative results (Supplementary Figures 4 and 7) is not clear. No details are given on how these results were generated. (what does it mean “The first column denotes the standard deviation of the outputs for each algorithm?”) Why all other methods have higher SD differences than RCSP? Is it a matter of scaling? Shouldn’t they have at least some values near zero since the authors “added the minimum value so that all histograms begin at zero?”
  
  Each of these supplementary figures contains a 6 by 3 table of figures. By the first column, we mean column one (with rows 1 through 6) of each figure. The D-RCS and D-SD scores represent standard deviations of the RCS and SD scores from zero of each gene, respectively. We can similarly compute the standard deviation of the outputs of the algorithms. We now clarify this in the Supplementary Materials:
  
  “The figure contains 6 rows and 3 columns. Similar to the D-RCS, we can compute the standard deviation of the output of each algorithm from zero for each gene. The first column in Supplementary Figure 7 denotes the histograms of these standard deviations across the genes.”
  
  Many histograms do not appear to start at zero because the bars are too small to be visible. We now clarify this in the Supplementary Materials as well:
  
  “Note that the bars at zero are not visible for many algorithms, since only a few genes attained standard deviations near the minimum.”
  
  (8) Why RCSP results are more like a negative binomial distribution and every other is kind of normal?
  
  All other methods have higher standard deviations than RCSP because they fail to compute an accurate measure of the root causal effect. Recall that, just like a machine has a few root causal problems, only a few root casual genes have large root causal effects under the omnigenic root causal model. The results of RCSP look more like a negative binomial distribution because most RCS scores are concentrated around zero and only a few RCS scores are large – consistent with the omnigenic root causal model. The other algorithms fail to properly control for the upstream genes and thus attain large standard deviations for nearly all genes. We now clarify these points in the Supplementary Materials as follows:
  
  “If an algorithm accurately identifies root causal genes, then it should only identify a few genes with large conditional root causal effects under the omnigenic root causal model. The RCSP algorithm had a histogram with large probability mass centered around zero with a long tail to the right. The standard deviations of the outputs of the other algorithms attained large values for nearly all genes. Incorporating feature selection and causal discovery with CausalCell introduced more outliers in the histogram of ANM. We conclude that only RCSP detected an omnigenic root causal model.”
  
  (9) What is the significance of genes changing expression “from left to right” in a UMAP plot? (e.g., Fig. 3h and 3g)
  
  The first UMAP dimension captured the variability of the RCS scores for most root causal genes. As a result, we could focus our analysis on the black cluster in Figure 3 (g) with large RCS scores in the subsequent pathway enrichment analysis summarized in Figure 3 (j). If two dimensions were involved, then we would need to analyze at least two clusters (e.g., black and pink), but this was not the case. We now clarify this in Results:
  
  “The RCS scores of most of the top genes exhibited a clear gradation increasing only from the left to the right hand side of the UMAP embedding; we plot an example in Figure 3 (h). We found three exceptions to this rule among the top 30 genes (example in Figure 3 (i) and see Supplementary Materials). RCSP thus detected genes with large RCS scores primarily in the black cluster of Figure 3 (g). Pathway enrichment analysis within this cluster alone yielded supra-significant results on the same pathway detected in the global analysis...”
  
  (10) The authors somewhat overstate the novelty of their algorithm. Representation of GRNs as causal graphs dates back in 2000 with the work of Nir Friedman in yeast. Other methods were developed more recently that look on regulatory network changes at the single sample level which the authors do not seem to be aware (e.g., Ellington et al, NeurIPS 2023 workshop GenBio and Bushur et al, 2019, Bioinformatics are two such examples). The methods they mention are for single cell data and they are not designed to connect single sample-level changes to a person’s phenotype. The RCS method needs to be put in the right background context in order to bring up what is really novel about it.
  
  We agree that many methods already exist for uncovering associational, predictive (Markov, neighborhood) and causal gene regulatory networks. We now cite the above papers. However, the novelty in our manuscript is not causal graph discovery, but rather estimation of root causal effects, detection of root causal genes, and the proposal of the omnigenic root causal model. We now clarify this in the
  
  Introduction:
  
  “Many algorithms focus on discovering associational or predictive relations, sometimes visually represented as gene regulatory networks (Costa et al., 2017; Ellington et al., 2023). Other methods even identify causal relations (Friedman et al., 2000; Wang et al., 2023; Wen et al., 2000; Buschur et al., 2000), but none pinpoint the first gene expression levels that ultimately generate the vast majority of pathogenesis. Simply learning a causal graph does not resolve the issue because causal graphs do not summarize the effects of unobserved root causes, such as unmeasured environmental changes or variants, that are needed to identify all root causal genes. We therefore define the Root Causal Strength (RCS) score...”
  
  Reviewer 3:
  
  (1) Several assumptions of the method are problematic. The most concerning is that the observational expression changes are all causally upstream of disease. There is work using Mendelian randomization (MR) showing that the opposite is more likely to be true: most differential expression in disease cohorts is a consequence rather than a cause of disease (Porcu et al., 2021). Indeed, the oxidative stress of AMD has known cellular responses including the upregulation of p53. The authors need to think carefully about how this impacts their framework. Can the theory say anything in this light? Simulations could also be designed to address robustness.
  
  Strictly speaking, we believe that differential expression in disease most likely has a cyclic causal structure: gene expression causes a diagnosis or symptom severity, and a diagnosis or symptom severity lead to treatments and other behavioral changes that perturb gene expression. For example, revTMWR in Porcu et al. (2021) uses trans-variants that are less likely to directly cause gene expression and instead directly cause a phenotype. However, TWMR as proposed in Porcu et al. (2019) instead uses cis-eQTLs and finds many putative causal relations from gene expression to phenotype. Thus, both causal directions likely hold.
  
  RCSP uses disease-relevant tissue believed to harbor gene expression levels that cause disease. However, RCSP theoretically cannot handle the scenario where Y is a non-sink vertex and is a parent of a gene expression level because modern Perturb-seq datasets usually do not perturb or measure Y. We therefore empirically investigated the degree of error by running experiments, where we set Y to a non-sink vertex, so that it can cause gene expression. We find that the performance of RCSP degrades considerably for gene expression levels that contain Y as a parent. Thus RCSP is sensitive to violations of the sink target assumption:
  
  “We finally considered the scenario where Y is a non-sink (or non-terminal) vertex. If Y is a parent of a gene expression level, then we cannot properly condition on the parents because modern Perturbseq datasets usually do not intervene on Y or measure Y . We therefore empirically investigated the degradation in performance resulting from a non-sink target Y, in particular for gene expression levels where Y is a parent. We again simulated 200 samples from bulk RNA-seq and each condition of Perturbseq with a DAG over 1000 vertices, an expected neighborhood size of 2 and a non-sink target Y . We then removed the outgoing edges from Y and resampled the DAG with a sink target. We compare the results of RCSP for both DAGs in gene expression levels where Y is a parent. We plot the results in Supplementary Figure 5. As expected, we observe a degradation in performance when Y is not terminal, where the mean RMSE increased from 0.045 to 0.342. We conclude that RCSP is sensitive to violations of the sink target assumption.”
  
  (2) A closely related issue is the DAG assumption of no cycles. This assumption is brought to bear because it is required for much classical causal machinery, but is unrealistic in biology where feedback is pervasive. How robust is RCSP to (mild) violations of this assumption? Simulations would be a straightforward way to address this.
  
  We agree that real causal graphs in biology often contain cycles. We now include additional experimental results with cyclic directed graphs in the Supplementary Materials. RCSP outperformed the other algorithms even in this setting, but we caution the reader that the theoretical interpretation of the RCS score may not coincide with a root causal effect when cycles exist:
  
  “We also evaluated the algorithms on directed graphs with cycles. We generated a linear SEM over p + 1 = 1000 variables in . We sampled the coefficient matrix β from a Bernoulli (1/(p − 1)) distribution but did not restrict the non-zero coefficients to the upper triangular portion of the matrix. We then proceeded to permute the variable ordering and weight each entry as in the Methods for the DAG. We repeated this procedure 30 times and report the results in Supplementary Figure 3.
  
  RCSP again outperformed all other algorithms even in the cyclic case. The results suggest that conditioning on the surrogate ancestors also estimates the RCS well even in the cyclic case. However, we caution that an error term E<sub>i</sub> can affect the ancestors of , when cycles exist. As a result, the RCS may not isolate the causal effect of the error term and thus not truly coincide with the notion of a root causal effect in cyclic causal graphs.”
  
  (3) The authors spend considerable effort arguing that technical sampling noise in X can effectively be ignored (at least in bulk). While the mathematical arguments here are reasonable, they miss the bigger picture point that the measured gene expression X can only ever be a noisy/biased proxy for the expression changes that caused disease: 1) Those events happened before the disease manifested, possibly early in development for some conditions like neurodevelopmental disorders. 2) bulk RNA-seq gives only an average across cell-types, whereas specific cell-types are likely “causal.” 3) only a small sample, at a single time point, is typically available. Expression in other parts of the tissue and at different times will be variable.
  
  We agree that many other sources of error exist. The causal model of RNA-expression in Methods corresponds to a single snapshot in time for each sample. We now clarify this in the Methods as follows:
  
  “We represent a snapshot of a biological causal process using an SEM over obeying Equation (3).”
  
  We thus only detect the root causal genes in a single snapshot in time for each sample in bulk RNA-seq. If we cannot detect the root causal effect in a gene due to the signal washing out over time as in (1), or if the root causal effect in different cell types cancel each other out to exactly zero in bulk as in (2), then we cannot detect those root causal genes even with an infinite sample size.
  
  (4) While there are connections to the omnigenic model, the latter is somewhat misrepresented. The authors refer to the “core genes” of the omnigenic model as being at the end (longitudinal) of pathogenesis. The omnigenic model makes no statements about temporal ordering: in causal inference terminology the core genes are simply the direct causes of disease.
  
  We now clarify that we use the word pathogenesis to mean the causal cascade from root causes to the diagnosis. In this case, the direct causes of the diagnosis correspond to the end of pathogenesis, while the root causes correspond to the beginning. For example, if , with Y a diagnosis, then X<sub>1</sub> is a root causal gene while X<sub>2</sub> is a core (direct causal) gene. We now clarify this in the Introduction:
  
  “Root causes of disease correspond to the most upstream causes of a diagnosis with strong causal effects on the diagnosis. Pathogenesis refers to the causal cascade from root causes to the diagnosis. Genetic and non-genetic factors may act as root causes and affect gene expression as an intermediate step during pathogenesis. We introduce root causal gene expression levels – or root causal genes for short – that correspond to the initial changes to gene expression induced by genetic and non-genetic root causes that have large causal effects on a downstream diagnosis (Figure 1 (a)). Root causal genes differ from core genes that directly cause the diagnosis and thus lie at the end, rather than at the beginning, of pathogenesis (Boyle et al., 2017).”
  
  (5) A key observation underlying the omnigenic model is that genetic heritability is spread throughout the genome (and somewhat concentrated near genes expressed in disease relevant cell types). This implies that (almost) all expressed genes, or their associated (e)SNPs, are “root causes”.
  
  We now clarify that genetic heritability can be spread throughout the genome in the omnigenic root causal model as well in the Discussion:
  
  “Further, each causal genetic variant tends to have only a small effect on disease risk in complex disease because the variant can directly cause Y or directly cause any causal gene including those with small root causal effects on Y ; thus, all error terms that cause Y can model genetic effects on Y. However, the root causal model further elaborates that genetic and non-genetic factors often combine to produce a few root causal genes with large root causal effects, where non-genetic factors typically account for the majority of the large effects in complex disease. Many variants may therefore cause many genes in diseases with only a few root causal genes.”
  
  We finally add Figure 5 into the Discussion as a concrete example illustrating the omnigenic root causal model:
  
  (6) The claim that root causal genes would be good therapeutic targets feels unfounded. If these are highly variable across individuals then the choice of treatment becomes challenging. By contrast the causal effects may converge on core genes before impacting disease, so that intervening on the core genes might be preferable. The jury is still out on these questions, so the claim should at least be made hypothetical.
  
  We clarify that we do not claim that root causal genes are better treatment targets than core genes in terms of magnitudes of causal effects on the phenotype. For example, in the common cold with a virus as the root cause, giving a patient an antiviral will eliminate fever and congestion, but so will giving a decongestant and an antipyretic. We only claim that treating root causal genes can eliminate disease near its pathogenic onset, just like giving an antiviral can eliminate the viral load and stop pathogenesis. We write the following the Introduction:
  
  “Treating root causal genes can modify disease pathogenesis in its entirety, whereas targeting other causes may only provide symptomatic relief... Identifying root causal genes is therefore critical for developing treatments that eliminate disease near its pathogenic onset.”
  
  We also further clarify in the Discussion that root causal genes account for deleterious causal effects not captured by the diagnosis Y:
  
  “We finally emphasize that the root causal model accounts for all deleterious effects of the root causal genes, whereas the core gene model only captures the deleterious effects captured by the diagnosis Y. For example, the disease of diabetes causes retinopathy, but retinopathy is not a part of the diagnostic criteria of diabetes. As a result, the gene expression levels that cause retinopathy but not the diagnosis of diabetes are not core genes, even though they are affected by the root causal genes.”
  
  We do agree that root causal genes may differ substantially between patients, although it is unclear if the heterogeneity is too great to develop treatments.
  
  (7) The closest thing to a gold standard I believe we have for “root causal genes” is integration of molecular QTLs and GWAS, specifically coloc/MR. Here the “E” of RCSP are explicitly represented as SNPs. I don’t know if there is good data for AMD but there certainly is for MS. The authors should assess the overlap with their results. Another orthogonal avenue would be to check whether the root causal genes change early in disease progression.
  
  Colocalization and Mendelian randomization unfortunately cannot identify root causal effects because they all attempt, either heuristically (colocalization) or rigorously (MR), to identify variants that cause each gene expression level rather than variants that directly cause each gene expression level and thus make up the error terms. We therefore need new methods that can identify direct causal variants in order to assess overlap.
  
  We checked whether root causal genes change early in disease progression using knowledge of pathogenesis. In particular, oxidative stress induces pathogenesis in AMD, and RCSP identified root causal genes involved in oxidative stress in AMD:
  
  “The pathogenesis of AMD involves the loss of RPE cells. The RPE absorbs light in the back of the retina, but the combination of light and oxygen induces oxidative stress, and then a cascade of events such as immune cell activation, cellular senescence, drusen accumulation, neovascularization and ultimately fibrosis (Barouch et al., 2007). We therefore expect the root causal genes of AMD to include genes involved in oxidative stress during early pathogenesis. The gene MIPEP with the highest D-RCS score in Figure 3 (d) indeed promotes the maturation of oxidative phosphorylation-related proteins (Shi et al., 2011). The second gene SLC7A5 is a solute carrier that activates mTORC1 whose hyperactivation increases oxidative stress via lipid peroxidation (Nachef et al., 2021; Go et al., 2020). The gene HEATR1 is involved in ribosome biogenesis that is downregulated by oxidative stress (Turi et al., 2018). The top genes discovered by RCSP thus identify pathways known to be involved in oxidative stress.”
  
  Similarly, T cell infiltration across the blood brain barrier initiates pathogenesis in MS, and RCSP identified root causal genes involved in this infiltration:
  
  “Genes with the highest D-RCS scores included MNT, CERCAM and HERPUD2 (Figure 4 (d)). MNT is a MYC antagonist that modulates the proliferative and pro-survival signals of T cells after engagement of the T cell receptor (Gnanaprakasam et al., 2017). Similarly, CERCAM is an adhesion molecule expressed at high levels in microvessels of the brain that increases leukocyte transmigration across the blood brain barrier (Starzyk et al., 2000). HERPUD2 is involved in the endoplasmic-reticulum associated degradation of unfolded proteins (Kokame et al., 2000). Genes with the highest D-RCS scores thus serve key roles in known pathogenic pathways of MS.”
  
  (8) The available Perturb-seq datasets have limitations beyond on the control of the authors. 1) The set of genes that are perturbed. The authors address this by simply sub-setting their analysis to the intersection of genes represented in the perturbation and observational data. However, this may mean that a true ancestor of X is not modeled/perturbed, limiting the formal claims that can be made. Additionally, some proportion of genes that are nominally perturbed show little to no actual perturbation effect (for example, due to poor guide RNA choice) which will also lead to missing ancestors.
  
  We now clarify that Perturb-seq can only identify root causal genes among the adequately perturbed set of genes in the Discussion:
  
  “Modern genome-wide Perturb-seq datasets also only adequately perturb and measure a few thousand, rather than all, gene expression levels. RCSP can only identify root causal genes within this perturbed and measured subset.”
  
  (9) The authors provide no mechanism for statistical inference/significance for their results at either the individual or aggregated level. While I am a proponent of using effect sizes more than p-values, there is still value in understanding how much signal is present relative to a reasonable null.
  
  We now explain that RCSP does not perform statistical inference in Methods because it is not clear how to define the appropriate cut-off for the RCS score under the null distribution:
  
  “We focus on statistical estimation rather than statistical inference because Φ<sub>i</sub> > 0 when E<sub>i</sub> causes Y under mild conditions, so we reject the null hypothesis that Φ<sub>i</sub> \= 0 for many genes if many gene expression levels cause Y. However, just like a machine typically breaks down due to only one or a few root causal problems, we hypothesize that only a few genes have large RCS scores Φ<sub>i</sub> ≫ 0 even in complex disease.”
  
  (10) I agree with the authors that age coming out of a “root cause” is potentially encouraging. However, it is also quite different in nature to expression, including being “measured” exactly. Will RCSP be biased towards variables that have lower measurement error?
  
  We tested the above hypothesis by plotting sequencing depth against the D-RCS scores of each gene. We observed a small negative correlation between sequencing depth and D-RCS scores, indicating the D-RCS scores are slightly biased upwards with low sequencing depth. However, genes with the largest D-RCS scores exhibited a wide variety of sequencing depths in both MS and AMD, suggesting that sequencing depth has minimal effect on the largest D-RCS scores. We now explain these results for AMD in the Supplementary Materials:
  
  “Theorem 1 states that RCS scores may exhibit bias with insufficient sequencing depth. The genes with large D-RCS scores may therefore simply have low sequencing depths. To test this hypothesis, we plotted sequencing depth against D-RCS scores. Consistent with Theorem 1, we observed a small negative correlation between D-RCS and sequencing depth (ρ \= −0.16, p=2.04E-13), and D-RCS scores exhibited greater variability at the lowest sequencing depths (Supplementary Figure 8). However, genes with the largest D-RCS scores had mean sequencing depths interspersed between 20 and 3000. We conclude that genes with the largest D-RCS scores had a variety of sequencing depths ranging from low to high.”
  
  We also report the results for MS:
  
  “We plot sequencing depth against the D-RCS scores of each gene similar to the AMD dataset. We again observed a small negative correlation (ρ \= −0.136, p_<_2.2E-16), indicating that genes with low sequencing depths had slightly higher D-RCS scores on average (Supplementary Figure 12). However, genes with the largest D-RCS scores again had a variety of sequencing depths. We conclude that sequencing depth has minimal correlation with the largest D-RCS scores.”
  
  (11) Finally, it’s a stretch to call K562 cells “lymphoblasts.” They are more myeloid than lymphoid.
  
  We now clarify that K562 cells are undifferentiated blast cells that can be induced to differentiate into lymphoblasts in Results:
  
  “We next ran RCSP on 137 samples collected from CD4+ T cells of multiple sclerosis (MS; GSE137143) as well as Perturb-seq data of 1,989,578 undifferentiated blast cells that can be induced to differentiate into lymphoblasts, or the precursors of T cells and other lymphocytes.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.13.574491v5
www.biorxiv.org www.biorxiv.org

Homeostatic regulation of REM sleep by the preoptic area of the hypothalamus

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.
  
  We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.
  
  Points that could be addressed or discussed:
  
  (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.
  
  We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.
  
  (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?
  
  We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.
  
  We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .
  
  We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).
  
  (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?
  
  Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.
  
  Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.
  
  Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.
  
  (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?
  
  POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.
  
  Author response image 1.
  
  Location of virus expression (A) and optic fiber placement (B) within subregions of POA.
  
  (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?
  
  Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.
  
  Reviewer #2 (Public Review):
  
  Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.
  
  We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.
  
  The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.
  
  We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.
  
  Author response image 2.
  
  Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.
  
  REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.
  
  Author response image 3.
  
  Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.
  
  REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.
  
  The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).
  
  The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?
  
  The effect size for REM sleep deprivation is now added in the text.
  
  It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.
  
  We apologize for omitting this important paper. In the revised manuscript, we added this citation.
  
  (2) Materials and Methods.
  
  Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.
  
  GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).
  
  The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).
  
  We have now added these important citations in the revised manuscript.
  
  (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.
  
  We have now added these citations in the revised manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).
  
  This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.
  
  Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.
  
  We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .
  
  Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.
  
  Author response image 4.
  
  ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.
  
  There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.
  
  To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .
  
  Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?
  
  We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.
  
  Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).
  
  Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.
  
  REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.
  
  Thank you for this suggestion. We have added the description in the main text.
  
  Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).
  
  We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.
  
  References
  
  Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.
  
  Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.
  
  Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.
  
  Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.
  
  Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.
  
  Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.
  
  Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.
  
  Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.
  
  Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.
  
  Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.
  
  Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.
  
  Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.
  
  Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.
  
  Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.
  
  Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.
  
  Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.
  
  McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.
  
  Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.
  
  Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.
  
  Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.
  
  Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.
  
  Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.
  
  Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.
  
  Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.
  
  Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.
  
  Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.
  
  Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.
  
  Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.
  
  Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.
  
  Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.
  
  Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.22.554341v2
www.biorxiv.org www.biorxiv.org

New submission 20/10/2023, 09:31:13

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”. We also thank them for a careful reading and useful comments to improve the manuscript. We have built on these comments to provide an improved version of the manuscript, and address them point by point below .
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This paper makes important contributions to the structural analysis of the DNA replication-linked nucleosome assembly machine termed Chromatin Assembly Factor-1 (CAF-1). The authors focus on the interplay of domains that bind DNA, histones, and replication clamp protein PCNA.
  
  Strengths:
  
  The authors analyze soluble complexes containing full-length versions of all three fission yeast CAF-1 subunits, an important accomplishment given that many previous structural and biophysical studies have focused on truncated complexes. New data here supports previous experiments indicating that the KER domain is a long alpha helix that binds DNA. Via NMR, the authors discover structural changes at the histone binding site, defined here with high resolution. Most strikingly, the experiments here show that for the S. pombe CAF-1 complex, the WHD domain at the C-terminus of the large subunit lacks DNA binding activity observed in the human and budding yeast homologs, indicating a surprising divergence in the evolution of this complex. Together, these are important contributions to the understanding of how the CAF-1 complex works.
  
  Weaknesses:
  
  There are some aspects of the experimentation that are incompletely described: <br /> In the SEC data (Fig. S1C) it appears that Pcf1 in the absence of other proteins forms three major peaks. Two are labeled as "1a" (eluting at ~8 mL) and "1b" (~10-11 mL). It appears that Pcf1 alone or in complex with either or both of the other two subunits forms two different high molecular weight complexes (e.g. 4a/4b, 5a/5b, 6a/6b). There is also a third peak in the analysis of Pcf1 alone, which isn't named here, eluting at ~14 mL, overlapping the peaks labeled 2a, 4c, and 5c. The text describing these different macromolecular complexes seems incomplete (p. 3, lines 32-33): "When isolated, both Pcf2 and Pcf3 are monomeric while Pcf1 forms large soluble oligomers". Which of the three Pcf1-alone peaks are oligomers, and how do we know? What is the third peak? The gel analysis across these chromatograms should be shown.
  
  We thank the reviewer for his/her careful reading of the manuscript. Indeed, we plotted two curves in Figure S1C in a color that does not match the legend, leading to confusion. Curve 1, Pcf1 alone, depicted in red, should appear in pink as indicated in the legend and in the SDS-PAGE analysis below. Curve 1 exhibits two peaks, labeled as 1a and 1b. With an elution volume of 8.5mL close to the dead volume of the column, peak 1a corresponds to soluble oligomers, while peak 1b (10.4mL) likely corresponds to monomeric Pcf1. Curve 5 (Pcf1 + Pcf2 mixture) was in pink instead of purple as indicated in the legend. This curve consists of three distinct peaks (5a, 5b, and 5c). The SDS-PAGE analysis revealed the presence of oligomers of Pcf1-Pcf2 (5a, 8.3mL), the Pcf1-Pcf2 complex (5b, 9.8mL), and Pcf2 alone (5c, 13.6 mL).
  
  The color has now been corrected in the revised manuscript.
  
  More importantly, was a particular SEC peak of the three-subunit CAF-1 complex (i.e. 4a or 4b) characterized in the further experimentation, or were the data obtained from the input material prior to the separation of the different peaks? If the latter, how might this have affected the results? Do the forms inter-convert spontaneously?
  
  We conducted all structural analyses and DNA/PCNA interactions Figures (1-4, S1-S4) with freshly SECpurified samples corresponding to the 4b peak (9.7mL). Aliquots were flash-frozen with 50% glycerol for in vitro histone assembly assays (Figure 5).
  
  Given the strong structural predication about the roles of residues L359 and F380 (Fig. 2f), these should be mutated to determine effects on histone binding.
  
  We are pleased that our structural predictions are considered as strong. We agree that investigating the role of the L359 and F380 residues will be critical to further refine the binding interface between histone H3-H4 and CAF-1. An in vitro and in vivo analysis of such mutated forms, alongside the current Pcf1-ED mutant characterized in this article and additional potential mutated forms, has the potential to provide a better understanding of the dynamic of histone deposition by CAF-1. However, these additional approaches would require to reach another step in breaking this enigmatic dynamic.
  
  Could it be that the apparent lack of histone deposition by the delta-WHD mutant complex occurs because this mutant complex is unstable when added to the Xenopus extract?
  
  We cannot formally exclude this possibility, and this could potentially applies to all mutated forms tested. However, in the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. Nevertheless, we feel reassured by the fact that the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, that reflects a defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe and was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002). This further supports the evolutionary conservation based on genetic assay as a read out for defective histone deposition by CAF-1.
  
  Reviewer #1 (Recommendations For The Authors):
  
  p. 4: "An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS), consistent with a 1:1:1 stoichiometry (Figure S1e). These data are in agreement with a globular complex with a significant flexibility (Figure S1f)." There needs to be more description of the precision of the molecular weight measurement, and what aspects of these data indicate the flexibility.
  
  The molecular weight was estimated using the correlation volume (Vc) defined by (Rambo & Tainer, Nature 2013, 496, 477-481). The estimated error with this method is around 10%. We added this information together with supporting arguments for the existence of flexibility: “An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS). Assuming an accuracy of around 10% with this method (Rambo and Tainer 2013), this value is consistent with a 1:1:1 stoichiometry for the CAF-1 complex (calculated MW 167kDa) (Figure S1e). In addition, the position of the maximum for the dimensionless Kratky plot was slightly shifted to higher values in the y and x axis compared to the position of the expected maximum of the curve for a fully globular protein (Figure S1f).
  
  This shows that the complex was globular with a significant flexibility.”
  
  p. 6, lines 21-22: "In contrast, a large part of signals (338-396) did not vanish anymore upon addition of a histone complex preformed with two other histone chaperones known to compete with CAF-1 for histone binding..." Given the contrast made later with the 338-351 region which is insensitive to Asf1/Mcm2, it would be clearer for the reader to describe the Asf1/Mcm2-competed regions as residues 325-338 plus 352-396. Note that the numerical scale of residues doesn't line up perfectly with the data points in Figure 2d, and this should be fixed as well.
  
  We thank this reviewer for spotting this typographical error; we intended to write "In contrast, a large part of signals (348-396) did not vanish anymore… “. We modified paragraph as suggested by the reviewer because we agree it is clearer for the reader : “In contrast, only a shorter fragment (338-347) vanished upon addition of Asf1-H3-H4-Mcm2(69-138), a histone complex preformed with two other histone chaperones, Asf1 and Mcm2, known to compete with CAF-1 for histone binding (Sauer et al. 2017) and whose histone binding modes are well established (Figure 2e) (Huang et al. 2015, Richet et al. 2015). This finding underscores a direct competition between residues (325-338) and (349-396) within the ED domain and Asf1/Mcm2 for histone binding.”
  
  The slight shift in the numerical scale Figure 2d was also corrected.
  
  p. 8. Lines 22-24: "EMSAs with a double-stranded 40bp DNA fragment confirmed the homogeneity of the bound complex. When increasing the SpCAF-1 concentration, additional mobility shifts suggest, a cooperative DNA binding (Figure 3a)." I agree that the migration of the population is further retarded upon the addition of more protein. However, doesn't this negate the first sentence? That is, if multiple CAF-1 complexes can bind each dsDNA molecule, can these complexes be described as homogeneous?
  
  We fully agree with the reviewer's comment and have removed the notion of homogeneity from the first sentence. “EMSAs with a double-stranded 40bp DNA fragment showed the formation of a bound complex.”
  
  Figure S2b Legend: "1H-15N HSQC spectra of Pcf1_ED (425-496)." The residue numbers should read 325-396.
  
  The typo has been corrected.
  
  Is the title for Figure 5 correct?: "Figure 5: Rescue using Y340 and W348 in the ED domain, the intact KER DNA binding domain and the C-terminal WHD of Pcf1 in SpCAF-1 mediated nucleosome assembly." I don't see that any point mutation rescue experiments are done here.
  
  The title of figure 5 has been modified for “Efficient nucleosome assembly by SpCAF-1 in vitro requires interactions with H3-H4, DNA and PCNA, and the C-terminal WHD domain”.
  
  Figure S6C. I assume the top strain lacks the Pcf2-GFP but this should be stated explicitly.
  
  The following sentence “The top strain corresponds to a strain expressing wild-type and untagged Pcf2 as a negative control of GFP fluorescence” is now added to the figure legend. The figure S6C has been modified accordingly to mention “Pcf2 (untagged)” and state more explicitly.
  
  Regarding point #3 in the public review, a simple initial test of this idea would be to determine if similar amounts of wt and mutant complexes can be immunoprecipitated at the endpoint of the assembly reactions.
  
  In the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. However, the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, reflecting defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe, as it was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002), further supporting the evolution conservation in the genetic assay as a read out for defective histone deposition by CAF-1.
  
  Foundational findings that should be cited: The role of PCNA in CAF-1 activity was first recognized by pioneering studies in the Stillman laboratory (PMID: 10052459, 11089978). The earliest recombinant studies of CAF-1 showed that the large subunit is the binding platform for the other two, showed that the KER and ED domains were required for histone deposition activity, and roughly mapped the p60-binding site on the large subunit (PMID: 7600578). Another early study roughly mapped the binding site for the third subunit and showed that biological effects of impairing the PCNA binding synergized with defects in the HIR pathway (PMID: 11756556), a genetic synergy first demonstrated in budding yeast (PMID: 9671489).
  
  We thank the reviewer for providing these important references that are now cited in the manuscript. PMID: 10052459 and 11089978 are cited page 2 line 18 and 19, PMID: 7600578 page 19 line 5 and PMID: 11756556 and 9671489 page 18 line 2.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors describe the structure-functional relationship of domains in S. pombe CAF-1, which promotes DNA replication-coupled deposition of histone H3-H4 dimer. The authors nicely showed that the ED domain with an intrinsically disordered structure binds to histone H3-H4, that the KER domain binds to DNA, and that, in addition to a PIP box, the KER domain also contributes to the PCNA binding. The ED and KER domains as well as the WHD domain are essential for nucleosome assembly in vitro. The ED, KER domains, and the PIP box are important for the maintenance of heterochromatin.
  
  Strengths:
  
  The combination of structural analysis using NMR and Alphafold2 modeling with biophysical and biochemical analysis provided strong evidence on the role of the different domain structures of the large subunit of SpCAF-1, spPCF-1 in the binding to histone H3-H4, DNA as well as PCNA. The conclusion was further supported by genetic analysis of the various pcf1 mutants. The large amounts of data provided in the paper support the authors' conclusion very well.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The paper by Ochesenbein describes the structural and functional analysis of S. pombe CAF-1 complex critical for DNA replication-coupled histone H3/H4 deposition. By using structural, biophysical, and biochemical analyses combined with genetic methods, the authors nicely showed that a large subunit of SpCAF1, SpPCF-1, consists of 5 structured domains with four connecting IDR domains. The ED domain with IDR nature binds to histone H3-H4 dimer with the conformational change of the other domain(s). SpCAF-1 binds to dsDNA by using the KER domain, but not the WHD domain. The experiments have been done with great care and a large amount of the data are highly reliable. Moreover, the results are clearly presented and convincingly written. The conclusion in the paper is very solid and will be useful for researchers who work in the field of chromosome biology.
  
  Major points:
  
  DNA binding of the KER mutant shown in Figures S3h and S3i, which was measured by the EMSA, looks similar to that of wild-type control in Figure S3f, which is different from the data in Figures 3b and 3e measured by the MST. The authors need a more precise description of the EMSA result of the KER mutant shown in Figures 3 and S3. The quantification of the EMSA result would resolve the point (should be provided).
  
  A proposed by this reviewer, we performed quantification of all EMSA presented in Figure 3 and Figure S3. We quantified the signal of the free DNA band to calculate a percentage of bound DNA in each condition. All EMSA experiments were conducted in duplicate, allowing us to calculate an average value and standard deviation for each interaction. Representative curves and fitted values are reported below in the figure provided for the reviewer (panel a data for Pcf1_KER domain with two fitting models, panel b for the entire CAF-1 complexes and mutants, panel c for the isolated Pcf1_KER domains), all fitted values in panel d. Importantly, as illustrated in panel a, the complete model for a single interaction (complete KD model, dashed line curve) does not adequately fit the data. In contrast, a function incorporating cooperativity (Hill model) better accounts for the measured data (solid line curve). Consistently, we also used the Hill model to fit the binding curves measured with the MST technique. As also specified now in the text, the Hill model allows to determine an EC50 value (concentration of protein resulting in the disappearance of half of the free DNA band intensity) and a Hill coefficient value (representing cooperativity during the interaction) for each curve.
  
  We measure a value of 3.4 ± 0.4 μM for the EC50 of SpCAF-1 WT, which is higher than the value measured by MST (0.7 ± 0.1 μM). Higher values were also calculated for all mutants and isolated Pcf1_KER domains compared to MST. These discrepancies could raise from the fact that the DNA concentration used in the two techniques were very different (20nM for MST experiments and 1μM for EMSA). Unlike the complete KD model, which includes in the calculation the DNA concentration (considered here as the "receptor"), the Hill model is fitted independently of this value. This model assumes that the “receptor” concentration is low compared to the KD. Here we calculate EC50 values on the same order of magnitude as the DNA concentration (low micromolar), The quantification obtained by EMSA is thus challenging to interpret. In contrast, values fitted by the MST measurements are more reliable since this limitation of low “receptor” concentration is correct.
  
  Therefore, although measurements of EC50 and Hill coefficient from EMSA are reproducible, they may be confusing for quantifying apparent affinity values through EC50. Nevertheless, this quantitative analysis of EMSA, requested by the reviewer, has highlighted an interesting characteristic of the KER mutant that is consistent across both methods: even though the EMSA pointed by the reviewer (Figures S3h and S3i compared to the wild-type control in Figure 3d and Figure S3f) show similar EC50 values, the binding cooperativity is different. Binding curves for the KER mutants is no longer cooperative (Hill coefficient ~1), and this is observed for all KER curves (isolated Pcf1_KER domain and the entire SpCAF-1 complex) with both methods, EMSA and MST. We thus decided to emphasize this characteristic of the KER mutant in the text (page 9 line 30-32). “Importantly, this mutant also shows a lower binding cooperativity for DNA binding, as estimated by the Hill coefficient value close to 1, compared to values around 3 for the WT and other mutants.”
  
  Since EMSA quantifications did not show a loss of “affinity” (as measured by the EC50 value) for the KER* mutants, compared to the WT contrary to MST measurements and because the DNA concentration was close to the measured EC50, we consider that EC50 values calculated by EMSA do not represent a KD value. If we add this quantification, we should discuss this point in detail. Thus, for sake of clarity, we prefer to put in the manuscript EMSA measurements as illustrations and qualitative validations of the interaction but not to include the quantification.
  
  Author response image 1.
  
  Quantitative analysis of interaction with DNA by EMSA. a: quantification of the amount of bound DNA for the Pcf1_KER domain (blue points with error bars). The fit with a KD model is shown as a dashed line, and the fit with a Hill model with a solid line. b: Examples of quantifications and fits (Hill model) for reconstituted SpCAF-1 WT and mutants. c: Examples of quantifications and fits (Hill model) for Pcf1_KER domains WT and mutant. d: EC50 values and Hill coefficients obtained for all EMSA experiments presented in Figure 3 and S3.
  
  As with the cooperative DNA binding of CAF-1, it is very important to show the stoichiometry of CAF-1 to the DNA or the site size. Given a long alpha-helix of the KER domain with biased charges, it is also interesting to show a model of how the dsDNA binds to the long helix with a cooperative binding property (this is not essential but would be helpful if the authors discuss it).
  
  We agree that having a molecular model for the binding of the KER helix to DNA would be especially interesting, but at this point, considering the accuracy of the tools currently at our disposal for predicting DNA-protein interactions, such a model would remain highly speculative.
  
  Figure 5 shows nucleosome assembly by SpCAF-1. SpCAF-1-PIP* mutant produced a product with faster mobility than the control at 2 h incubation. How much amounts of SpCAF-1 was added in the reaction seems to be critical. At least a few different concentrations of proteins should be tested.
  
  The slightly faster migration of the SpCAF-1-PIPis not systematically reproduced and we observed in several experiments that the band corresponding to supercoiled DNA migrated slightly above or below the one for the complementation by the SpCAF-1-WT (see Author response image 2 below). Thus this indicates that after 2 hours incubation the supercoiling assay with the SpCAF-1-PIP mutant compared to those achieved with the SpCAF-1-WT. To further document whether the WT or the PIP mutant are similar or not, we monitored difference of their nucleosome assembly efficiency by testing their ability to produce supercoiled DNA over shorter time, after 45 minute incubation. Under these conditions, we reproducibly detected supercoiled forms at earlier times with SpCAF-1-WT when compared to the SpCAF-1-PIP* (see figure 5 and Author response image 2). These observations indicate that mutation in the PIP motif of Pcf1 affects the rate of supercoiling in a distinct manner when compared to the other mutations that dramatically impair SpCAF-1 capacity to promote supercoiling.
  
  Author response image 2.
  
  Minor points:
  
  Page 8, line 26 or Table 1 legend: Please explain what "EC50" is.
  
  The definition of EC50, together with a reference paper for the Hill model have been added in the text page 8 lines 23-26, “The curves were fitted with a Hill model (Tso et al. 2018) with a EC50 value of 0.7± 0.1µM (effective concentration at which a 50% signal is observed) and a cooperativity (Hill coefficient, h) of 2.7 ± 0.2, in line with a cooperative DNA binging of SpCAF-1.”, in the Table 1 figure legend and in the method section (page 26).
  
  Page 13, lines 9, 11: "Xenopus" should be italicized.
  
  This is corrected
  
  Page 14, second half: In S. pombe, the pcf1 deletion mutant is not lethal. It is helpful to mention the phenotype of the deletion mutant a bit more when the authors described the genetic analysis of various pcf1 mutants.
  
  This point has been added on page 15, line 1.
  
  Figure 1d and Figure S2a: Captions and labels on the X and Y axes are overlapped or misplaced.
  
  This is corrected
  
  Figure 5: Please add a schematic figure of the assay to explain how one can check the nucleosome assembly by looking at the form I, supercoiled DNAs.
  
  A new panel has been added to Figure 5. This scheme depicts the supercoiling assay where supercoiled DNA (form I) is used as an indication of efficient nucleosome assembly. The figure legend has also been modified accordingly.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The study conducted by Ouasti et al. is an elegant investigation of fission yeast CAF-1, employing a diverse array of technologies to dissect its functions and their interdependence. These functions play a critical role in specifying interactions vital for DNA replication, heterochromatin maintenance, and DNA damage repair, and their dynamics involve multiple interactions. The authors have extensively utilized various in vitro and in vivo tools to validate their model and emphasize the dynamic nature of this complex.
  
  Strengths:
  
  Their work is supported by robust experimental data from multiple techniques, including NMR and SAXS, which validate their molecular model. They conducted in vitro interactions using EMSA and isothermal microcalorimetry, in vitro histone deposition using Xenopus high-speed egg extract, and systematically generated and tested various genetic mutants for functionality in in vivo assays. They successfully delineated domain-specific functions using in vitro assays and could validate their roles to large extent using genetic mutants. One significant revelation from this study is the unfolded nature of the acidic domain, observed to fold when binding to histones. Additionally, the authors also elucidated the role of the long KER helix in mediating DNA binding and enhancing the association of CAF-1 with PCNA. The paper effectively addresses its primary objective and is strong.
  
  Weaknesses:
  
  A few relatively minor unresolved aspects persist, which, if clarified or experimentally addressed by the authors, could further bolster the study.
  
  The precise function of the WHD domain remains elusive. Its deletion does not result in DNA damage accumulation or defects in heterochromatin maintenance. This raises questions about the biological significance of this domain and whether it is dispensable. While in vitro assays revealed defects in chromatin assembly using this mutant (Figure 5), confirming these phenotypes through in vivo assays would provide additional assurance that the lack of function is not simply due to the in vitro system lacking PTMs or other regulatory factors.
  
  Our work demonstrates that the WHD domain is important CAF-1 function during DNA replication. Indeed, the deletion of this domain lead to a synthetic lethality when combined with mutation of the HIRA complex, as observed for a null pcf1 mutant, indicating a severe loss of function in the absence of the WHD domain. We propose that these genetic interactions, previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002) are indicative of a defective histone deposition by CAF-1. Moreover, our work establishes that this domain is dispensable to prevent DNA damage accumulation and to maintain silencing at centromeric heterochromatin, indicating that the WHD domain specifies CAF-1 functions. Moreover, our work further demonstrates that, in contrast to the S. cerevisiae and human WHD domain, the S. pombe counterpart exhibits no DNA binding activity. We thus agree that the WHD domain may contribute to nucleosome assembly in vivo via PTMs or interactions with regulatory factors that may potentially lack in in vitro systems. However, addressing these aspects deserves further investigations beyond the scope of this article.
  
  The observation of increased Pcf2-gfp foci in pcf1-ED cells, particularly in mono-nucleated (G2phase) and bi-nucleated cells with septum marks (S-phase), might suggest the presence of replication stress. This could imply incomplete replication in specific regions, leading to the persistence of Caf1-ED-PCNA factories throughout the cell cycle. To further confirm this, detecting accumulated single-stranded DNA (ssDNA) regions outside of S-phase using RPA as an ssDNA marker could be informative.
  
  We cannot formally exclude that cells expressing the Pcf1-ED mutated form exhibit incomplete replication in specific regions, an aspect that would require careful investigations. However, the microscopy analysis (Fig. 6c and S6c) of this mutant showed no alteration in the cell morphology, including the absence of elongated cells compared to wild type, a hallmark of checkpoint activation caused by ssDNA (Enoch et al. Gene & Dev 1992). Therefore, investigating the consequences of the interplay between the binding of CAF-1 to PCNA and histones on the dynamic of DNA replication, is of particular interest but out of the scope of the current manuscript.
  
  Moreover, considering the authors' strong assertion of histone binding defects in ED through in vitro assays (Figure 2d and S2a), these claims could be further substantiated, especially considering that some degree of histone deposition might still persist in vivo in the ED mutant (Figure 7d, viable though growth defective double ED*+hip1D mutants). For example, the approach, akin to the one employed in Fig. 6a (FLAG-IPs of various Pcf1-FLAG-tagged mutants), could also enable a comparison of the association of different mutants with histones and PCNA, providing a more thorough validation of their findings.
  
  We have provided in the current manuscript data establishing how Pcf1 mutated forms interacted with PCNA (Fig. 6a, 6b). Regarding the interactions with histone H3-H4, the approach based on immunoprecipitation using various Pcf1-FLAG tagged mutants has been unsuccessful in our hands. Indeed, we were unable to obtain robust and reproducible interactions between Pcf1 or its various mutated form with H3-H4. This is likely because Co-IP approaches do not probe for direct interactions. Indirect interactions between Pcf1 and H3-H4 are potentially bridged by additional factors, including the two other subunits of CAF-1, Pcf2 and Pcf3, or Asf1. Therefore, we are not in a position to address in vivo the direct interactions between Pcf1 and histone H3-H4.
  
  It would be valuable for the authors to speculate on the necessity of having disordered regions in CAF1. Specifically, exploring the overall distribution of these domains within disordered/unfolded structures could provide insightful perspectives. Additionally, it's intriguing to note that the significant disparities observed among mutants (ED, PIP, and KER*) in in vitro assays seem to become more generic in vivo, except for the indispensability of the WHD-domain. Could these disordered regions potentially play a crucial role in the phase separation of replication factories? Considering these questions could offer valuable insights into the underlying mechanisms at play.
  
  We agree that the potential mechanistic role of partial disorder in CAF-1 is particularly interesting. Disordered regions of human CAF-1 have been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al EMBO J. 2021). As suggested, this raises the question of how disordered domains of Pcf1 could promote phase separation for replication factories, if such phenomenon happens in vivo. Moreover, numerous factors of the replisome also harbor disordered regions (Bedina, A. et al, 2013. Intrinsically Disordered Proteins in Replication Process. InTech. doi: 10.5772/51673), adding complexity in disentangling experimentally such questions. We have added these elements at the end of the discussion in the revised manuscript (page 20, lines 23-29). “Such plasticity and cross-talks provided by structurally disordered domains might be key for the multivalent CAF-1 functions. Human CAF-1 has been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al. 2021). This raises the question of a potential role of the disordered domains of Pcf1, together with other replisome factor harbouring such disordered regions (Bedina 2013), in promoting phase separation of replication factories, if such phenomenon happens in vivo. Further studies will be needed to tackle these questions.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.02.543505v4
www.biorxiv.org www.biorxiv.org

New submission 01/02/2024, 09:13:57

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Yamanaka et al.'s research investigates into the impact of volatile organic compounds (VOCs), particularly diacetyl, on gene expression changes. By inhibiting histone acetylase (HDACs) enzymes, the authors were able to observe changes in the transcriptome of various models, including cell lines, flies, and mice. The study reveals that HDAC inhibitors not only reduce cancer cell proliferation but also provide relief from neurodegeneration in fly Huntington's disease models. Although the findings are intriguing, the research falls short in providing a thorough analysis of the underlying mechanisms.
  
  HDAC inhibitors have been previously shown to induce gene expression changes as well as control cell division and demonstrated to work on disease models. The authors demonstrate diacetyl as a prominent HDAC inhibitor. Though the demonstration of diacetyl is novel, several similar molecules have been used before.
  
  In this manuscript we are not trying to understand the mechanisms by which HDAC inhibitors affect Huntington’s disease or cancer, since these have either been studied in detail before and are outside the scope of this manuscript. Our focus is to demonstrate that volatile odorants commonly found in the environment can inhibit HDACs, alter gene expression, and have downstream physiological effects. To the best of our knowledge this unusual effect of odorants has not been systematically described before.
  
  Reviewer #2 (Public Review):
  
  Sachiko et al. study presents strong evidence that implicates environmental volatile odorants, particularly diacetyl, in an alternate role as an inhibitors HDAC proteins and gene expression. HDACs are histone deacetylases that generally have repressive role in gene expression. In this paper the authors test the hypothesis that diacetyl, which is a compound emitted by rotting food sources, can diffuse through blood-brain-barrier and cell membranes to directly modulate HDAC activity to alter gene expression in a neural activity independent manner. This work is significant because the authors also link modulation of HDAC activity by diacetyl exposure to transcriptional and cellular responses to present it as a potential therapeutic agent for neurological diseases, such as inhibition of neuroblastoma and neurodegeneration.
  
  The authors first demonstrate that exposure to diacetyl, and some other odorants, inhibits deacetylation activity of specific HDAC proteins using in vitro assays, and increases acetylation of specific histones in cultured cells. Consistent with a role for diacetyl in HDAC inhibition, the authors find dose dependent alterations in gene expression in different fly and mice tissues in response to diacetyl exposure. In flies they first identify a decrease in the expression of chemosensory receptors in olfactory neurons after exposure to diacetyl. Subsequently, they also observe large gene expression changes in the lungs, brain, and airways in mice. In flies, some of the gene expression changes in response to diacetyl are partially reversable and show an overlap with genes that alter expression in response to treatment with other HDAC inhibitors. Given the use of HDAC inhibitors as chemotherapy agents and treatment methods for cancers and neurodegenerative diseases, the authors hypothesize that diacetyl as an HDAC inhibitor can also serve similar functions. Indeed, they find that exposure of mice to diacetyl leads to a decrease in the brain expression of many genes normally upregulated in neuroblastomas, and selectively inhibited proliferation of cell lines which are driven from neuroblastomas. To test the potential for diacetyl in treatment of neurodegenerative diseases, the authors use the fly Huntington's disease model, utilizing the overexpression of Huntingtin protein with expanded poly-Q repeats in the photoreceptor rhabdomeres which leads to their degeneration. Exposing these flies to diacetyl significantly decreases the loss of rhabdomeres, suggesting a potential for diacetyl as a therapeutic agent for neurodegeneration.
  
  The findings are very intriguing and highlight environmental chemicals as potent agents which can alter gene expression independent of their action through chemosensory receptors.
  
  We thank the reviewer for the encouraging comments.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  1) The results section for figure 1 seems poorly written with errors in figure citations. Please rewrite this section.
  
  We thank the reviewer for pointing it out and have now rewritten the results section as well as made concomitant changes in the introduction to address this comment.
  
  2) Discussion could be more focused and could speculate mechanistic details of HDAC inhibitors in rescue of neurodegeneration.
  
  We have added in information about the mechanistic role of the HDAC inhibition in rescue of neurodegeneration. “Exposure to diacetyl volatiles in the fly model of Huntington’s disease reduces cell degeneration, as has been previously observed with orally administered HDAC inhibitors like sodium butyrate and SAHA in this genetic model (27). Previous studies indicate that the inhibition of HDACs counter the acetyltransferase inhibitory activity of the polyglutaminedomain of the human Htt protein which binds to p300, P/CAF and CBP (27).”
  
  A few minor comments are:
  
  1) Figure 1 is not properly cited in the test (Eg: line 137- Its not relevant to Fig 1B and its to IC)
  
  We thank the referee for pointing out our error and have now corrected it.
  
  2) Some Abbreviations were not expanded at the first sight, which made difficult in understanding the statement (Eg: Line 51- VOC, 111- Or
  
  We have now defined abbreviations the first time they appear in the manuscript.
  
  3) Line 98- What was the unit when you mention 0.01%?
  
  We have added (v/v) in the text to represent the standard volume / total volume. We have also described it in the method section.
  
  4) Line 138- there is no comparative study done with b-HB, but the authors have claimed its was comparable. If it’s from previous study, a relative comparative statement could be given.
  
  We apologize for the confusion. We have added the IC50 values previously reported for b-hydroxy butyrate “IC50 for HDAC1: 5.3 mM and HDAC3 2.4 mM” which was shown in the reference #21.
  
  5) In lines 146-150, more details of what are the compounds and how similar they are to diacetyl could be added
  
  We have added representative structures and names for the chemicals tested in Figure 1C.
  
  6) In line 160, Why specifically they increase H3K14 acetylation?
  
  This observed increased H3K9 (not H3K14) acetylation levels is identical to what has previously reported for b-hydroxybutyrate. We have added a sentence pointing out this similarity “preferable acetylation of H3K9 was also observed in HEK193 cells with b-hydroxybutyrate (reference #21)”.
  
  7) In line 317, How HDAC inhibitors reverse the PolyQ disorder? What is its mechanism? Can at least discuss in the discussion section.
  
  Our assay is based on a previous publication using the Drosophila model (Ref #27) and evaluated the mechanisms in detail. We have now added a section in the Discussion describing the past findings. “Exposure to diacetyl volatiles in the fly model of Huntington’s disease reduces cell degeneration, as has been previously observed with orally administered HDAC inhibitors like sodium butyrate and SAHA in this genetic model (27). Previous studies indicate that the inhibition of HDACs counter the acetyltransferase inhibitory activity of the polyglutamine-domain of the Htt protein which binds to p300, P/CAF and CBP (27).”
  
  8) In figures, 1C and 1D, proper labeling of drug molecules is missing. Check 1D- Could have included Diacetyl for comparison, Where is the uninhibited control (negative)?
  
  We have added the name of the chemical compounds to Figure 1C and 1D. Each compound tested has a separate blank control, which forms the basis for calculation of the percentage inhibition. The negative control is therefore part of each column.
  
  Reviewer #2 (Recommendations For The Authors):
  
  As specific feedback for the authors, I have a few questions/recommendations about the main point of the paper:
  
  a. Throughout the manuscript, the authors demonstrate gene expression differences in different tissues in flies and mice in response to exposure to diacetyl using both transgenic reporter expression and RNAseq. The authors mention they were able to show that these gene expression changes are independent of neural activity, yet I am not sure which experiment specifically demonstrates this. How do the authors know that these changes in gene expression are due to diacetyl reaching the brain after passing blood brain barrier but not due to changes in gene expression with olfactory circuit activity? I acknowledge that disproving that the gene expression differences are independent of neural activity, but one question is whether inhibiting neural activity result in changes in the expression of overlapping genes in the same direction. Or for example, if one inhibits neural activity in Gr21a neurons, do they reversibly shut down expression of the receptor after a few days? Is this true for other ORs or specific to Gr21a and Gr63a?
  
  While it is difficult to completely rule out contributions of the olfactory effects in the brain, we also report differential gene expression in the lungs of mice where we do not expect olfactory circuit activity (Fig 3D-G). The overlap in DEGs is highly statistically significant between the organs suggesting at least some commonality in mechanism (Fig 5D). We recently evaluated a Drosophila tissue that does not express odorant receptors or connections, the ovaries, and also found substantial evidence of diacetyl-exposed modulation of genes. While the data are intended for a different publication, we found up to 123 up and 61 downregulated DEGs (FDR cutoff <0.05 and log2 fold change cutoff of 1 and -1). These data should also be viewed together with the in vitro HDAC inhibition data and the increased histone acetylation seen in cell lines.
  
  b. Is diacetyl detected by any chemosensory receptors in flies or mice? RNA profiles from these receptor mutants can be used to distinguish whether the gene expression changes are occurring due to neural activity or direct ability of diacetyl to alter HDAC activity. One relatively simple experiment would be to test whether differentially expressed genes in the orco mutant antennae overlap at all with antennal RNA profiles from diacetyl exposed flies.
  
  Diacetyl can be detected by multiple chemosensory receptors in flies and mice. In flies the Gr21a+Gr63a complex expressing neurons are inhibited by diacetyl as indicated, and Or9a, Or43b, Or59b, Or67a, and Or85b are activated receptors (Hallem, Cell, 2006). It would be extremely resource and time-consuming process to create and evaluate single mutants or combinations of mutants as suggested. In response to the previous point, we noted examples of tissues without olfactory receptors or olfactory circuits showing DEGs upon diacetyl exposure.
  
  As suggested by the referee, we compared DEGs from RNASeq data of Orco mutant antenna (N=2 replicates) generated for another project. There is very little overlap between antennal DEGs from Orco and the diacetyl (labelled chart as d4on_up and d4on_down) exposed flies. These data suggest that large-scale silencing of antennal neurons in Orco mutants do not alter expression of the same genes as altered by exposure to diacetyl.
  
  Author response image 1.
  
  c. The comparison of DEGs from individuals exposed to diacetyl versus the other two HDAC inhibitors shows some overlap. The overlap is greater for DEGs shared between the two HDAC inhibitors. Yet, there is still a substantial number of genes that are unique to diacetyl exposure. For example, if you compare SB to VA exposure, each condition has about 150-200 genes uniquely misexpressed for each condition with about 55 genes shared. However, the number of uniquely misexpressed genes is over 600 for diacetyl exposed individuals, with only 30 and 100 genes shared with either SB and VA respectively. I would have expected a higher overlap in DEGs if these compounds all inhibit similar HDACs. Do they inhibit different HDACs? Can this explain the significant number of uniquely misexpressed genes in each condition?
  
  It is difficult to judge significance of overlap in DEG sets the genome has around 13,000 genes from evaluating numbers without statistical analysis which we noted in the text. “A pairwise analysis using the Fisher’s exact test of each gene set revealed a statistically significant overlap of diacetyl-induced genes with SB-induced genes (p=6x10-11) and with VA-induced genes (p=2x10-65) (Figure 4F).”
  
  We have also further clarified in the text “This highly significant overlap among upregulated genes lends further support to our model that diacetyl vapors act as an HDAC inhibitor in vivo. As expected, each of the 3 treatments also modulated a substantial number of unique genes (Figure 4G,H), suggesting that differences in delivery format (oral vs vapor delivery), molecular structure and inhibition profile across the repertoire of HDACs may contribute to differences in gene regulation.”
  
  d. The authors show changes in RNA profiles in response to diacetyl exposure in different tissues and suggest that these are due to changes in histone acetylation without direct comparison of genes that show up or down regulation with acetylation patterns. They do show in the beginning that diacetyl inhibits HDAC function in vitro and in cell culture. Yet it is critical that they also show a general increase in acetylation levels within tissues profiled for RNA. Additional experiments profiling chromatin and histone acetylation patterns in the tissues where RNA is profiled from would strengthen the argument of the paper.
  
  We agree with the referee’s suggestion and appreciate it. However, given the heterogeneity of the cell types and therefore histone marks in chromatin within the tissues that we analyzed, we estimate that it will require substantial effort to purify or enrich specific cell populations before performing Chip-Seq. Such studies will examine correlations between up- and down-regulated genes and histone acetylation pattens in cells in the future studies. This effort will require significant resources and time which we feel are outside the scope of this manuscript.
  
  e. The rhabdomere experiments might benefit from a negative control. Can the authors expose the flies to another volatile and show neurodegeneration is not affected?
  
  We exposed the negative control group to headspace odorants of paraffin oil which is a mixture of hydrocarbons.
  
  f. The same is true for the initial HDAC activity profiles from Figure 1. Can the authors show an HDAC activity that is not affected by diacetyl exposure?
  
  We exposed the negative control group to headspace odorants of paraffin oil which is a mixture of hydrocarbons. Diacetyl shows very little inhibition (Average inhibition = 7.69%; N=2) in purified human HDAC4 when tested at the 15mM concentration.
  
  g. One point that might require some explanation in the discussion is why diacetyl exposure only increases acetylation of certain histones but not others in Figure 2, especially given that many HDACs are inhibited by diacetyl in Figure 1.
  
  Please see response to comment #6, Reviewer 1.
  
  h. Figure S1C is missing descriptions of what different histogram colors signify.
  
  We apologize for the oversight and have now indicated it in the Figure legend.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.21.529339v2
www.biorxiv.org www.biorxiv.org

Multimodal mismatch responses in mouse auditory cortex

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are:
  
  (1) We have added additional analysis of running onsets in closed and open loop conditions for audiomotor (Figure 2H) and visuomotor (Figure 3H) coupling.
  
  (2) We have also added analysis of running speed and pupil dilation upon mismatch presentation (Figures S2A and S2B, S4A and S4B, and S5A and S5B).
  
  (3) We have expanded on the discussion of the nature of differences between audiomotor and visuomotor mismatches.
  
  Reviewer #1:
  
  The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on the visual cortex. By correlating the mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in the visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion.
  
  While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons.
  
  Strengths:
  
  (1) Well-designed study addressing a timely question in the field.
  
  (2) Successful transition from previous work focused on the visual cortex to the auditory cortex, demonstrating generic principles in mismatch responses.
  
  (3) The correlation between mouse locomotion speed and acoustic feedback levels provides evidence for a prediction signal in the auditory cortex.
  
  (4) Coupling of visual and auditory feedback shows putative multimodal integration in the auditory cortex.
  
  Weaknesses:
  
  (1) Lack of quantification of animal behavior upon mismatches, potentially leading to alternative interpretations of recorded signals.
  
  (2) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions.
  
  (3) Discrepancies in reported values in a few figure panels raise questions about data consistency and interpretation.
  
  (4) Ambiguity regarding the identity of the [AM+VM] MM neurons.
  
  The manuscript is a short report following up on a series of papers focusing on mismatch responses between sensory inputs and predicted signals. While previous studies focused on the visual modality, here the authors moved to the auditory modality. By pairing mouse locomotion speed to the sound level of the acoustic feedback, they show that a subpopulation of neurons displays excitatory responses to halts in the (expected) acoustic feedback. These responses were lower in the open-loop state, when the feedback was uncorrelated to the animal locomotion.
  
  Overall it is a well-designed study, with a timely and well-posed question. I have several concerns regarding the nature of the MM responses and their interpretations.
  
  - One lacks quantification of the animal behavior upon mismatches. Behavioral responses may trigger responses in the mouse auditory cortex, and this would be an alternative explanation to the recorded signals.
  
  What is the animal speed following closed-loop halts (we only have these data for the playback condition)?
  
  We have quantified the running speed of the mouse following audiomotor and visuomotor mismatches. We found no evidence of a change in running speed. We have added this to Figures S2A and S4A, respectively.
  
  Is there any pupillometry to quantify possible changes in internal states upon halts (both closed-loop and playback)?
  
  The term 'internal state' may be somewhat ambiguous in this context. We assume the reviewer is asking whether we have any evidence for possible neuromodulatory changes. We know that there are noradrenergic responses in visual cortex to visuomotor mismatches (Jordan and Keller, 2023), but no cholinergic responses (Yogesh and Keller, 2023). Pupillometry, however, is likely not always sensitive enough to pick up these responses. With very strong neuromodulatory responses (e.g. to air puffs, or other startling stimuli), pupil dilation is of course detected, but this effect is likely at best threshold linear. Looking at changes in pupil size following audiomotor and visuomotor mismatch responses, we found no evidence of a change. We have added this to Figures S2B and S4B, respectively. Note, we suspect this is also strongly experience-dependent. The first audio- or visuomotor mismatch the mouse encounters is likely a more salient stimulus (to the rest of the brain, not necessarily to auditory or visual cortex), than the following ones.
  
  These quantifications must be provided for the auditory mismatches but also for the VM or [AM+VM] mismatches.
  
  During the presentation of multimodal mismatches [AM + VM], mice did not exhibit significant changes in running speed or pupil diameter. These data have been now added to Figures S5A and S5B.
  
  - AM MM neurons supposedly receive a (excitatory) locomotion-driven prediction signal. Therefore the magnitude of the excitation should depend on the actual animal velocity. Does the halt-evoked response in a closed loop correlate with the animal speed during the halt? Is the correlation less in the playback condition?
  
  This is indeed what one would expect. We fear, however, that we don’t have sufficient data to address this question properly. Moreover, there is an important experimental caveat that makes the interpretation of the results difficult. In addition to the sound we experimentally couple to the locomotion speed of the mouse, the mouse self-generates sound by running (the treadmill rotating, changes to the airflow of the air-supported treadmill, footsteps, etc.). These sources of sound all also correlate in intensity with running speed. Thus, it is not entirely clear how our increase in sound amplitude with increasing running speed relates to the increase in self-generated sounds on the treadmill. This is one of the key reasons we usually do this type of experiment in the visual system where experimental control of visual flow feedback (in a given retinotopic location) is straightforward.
  
  Having said that, if we look at the how mismatch responses change as a function of locomotion speed across the entire population of neurons, there appears to be no systematic change with running speed (and the effects are highly dependent on speed bins we choose). However, just looking at the most audiomotor mismatch responsive neurons, we find a trend for increased responses with increasing running speed (Author response image 1). We analyzed the top 5% of cells that showed the strongest response to mismatch (MM) and divided the MM trials into three groups based on running speed: slow (10-20 cm/s), middle (20-30 cm/s), and fast (>30 cm/s). Given the fact that we have on average 14 mismatch events in total per neuron, we don’t have sufficient data to analyze this.
  
  Author response image 1.
  
  The average response of strongest AM MM responders to AM mismatches as a function of running speed (data are from 51 cells, 11 fields of view, 6 mice).
  
  Values in Figure 2H are way higher than what can be observed in Figures 2C, and D. Could you explain the mismatch in values? Same for 3H and 4F.
  
  In Figure 2H (now Figure S2F), we display responses from 4 755 individual neurons. Since most recorded neurons did not exhibit significant responses to mismatch presentations, their responses cluster around zero, significantly contributing to the final average shown in panel D. To clarify how individual neurons contribute to the overall population activity, we have added a histogram showing the distribution of neurons responding to audiomotor mismatch and sound playback halts. We hope this addition clarifies how individual neuron responses affect the final population activity.
  
  Furthermore, neurons exhibiting suppression upon closed-loop halts (Figure 2C) show changes in deltaF/F of the same order of magnitude as the AM MM neurons (with excitatory responses). I cannot picture where these neurons are found in the scatter plot of Figure 2H.
  
  This is caused by a ceiling effect. While we could adjust the scale of the heat map to capture neurons with very high responses (e.g. [-50 50], Author response image 2), doing so would obscure the response dynamics of most neurons. Note that the number of neurons on the y-axis far exceeds the resolution of this figure and thus there are also aliasing issues that mask the strong responses.
  
  Author response image 2.
  
  Responses of all L2/3 ACx neurons to audiomotor mismatches. Same as Figure 2C with different color scale [-50 50] which does not capture most of the neural activity.
  
  - Are [AM+VM] MM neurons AM neurons?
  
  Many of [AM + VM] and [AM] neurons overlap but it is not exactly the same population. This is partially visible in Figure 4F. There is a subset of neurons (13.7%; red dots, Figure 4F) that selectively responded to the concurrent [AM+VM] mismatch, while a different subset of neurons (11.2%; yellow dots, Figure 4F) selectively responded to the mismatch responses in isolation. The [VM] response contributes only little to the sum of the two responses [AM] + [VM].
  
  Please do not use orange in Figure 4F, it is perceptually too similar to red.
  
  We have now changed it to yellow.
  
  Reviewer #2 (Public Review):
  
  In this study, Solyga and Keller use multimodal closed-loop paradigms in conjunction with multiphoton imaging of cortical responses to assess whether and how sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality. Their work addresses an important open question pertaining to the relevance of non-hierarchical (lateral cortico-cortical) interactions in predictive processing within the neocortex.
  
  Specifically, they monitor GCaMP6f responses of layer 2/3 neurons in the auditory cortex of head-fixed mice engaged in VR paradigms where running is coupled to auditory, visual, or audio-visual sensory feedback. The authors find strong auditory and motor responses in the auditory cortex, as well as weak responses to visual stimuli. Further, in agreement with previous work, they find that the auditory cortex responds to audiomotor mismatches in a manner similar to that observed in visual cortex for visuomotor mismatches. Most importantly, while visuomotor mismatches by themselves do not trigger significant responses in the auditory cortex, simultaneous coupling of audio-visual inputs to movement non-linearly enhances mismatch responses in the auditory cortex.
  
  Their results thus suggest that prediction errors within a given sensory modality are non-trivially influenced by prediction errors from another modality. These findings are novel, interesting, and important, especially in the context of understanding the role of lateral cortico-cortical interactions and in outlining predictive processing as a general theory of cortical function.
  
  In its current form, the manuscript lacks sufficient description of methodological details pertaining to the closed-loop training and the overall experimental design. In several scenarios, while the results per se are convincing and interesting, their exact interpretation is challenging given the uncertainty about the actual experimental protocols (more on this below). Second, the authors are laser-focused on sensorimotor errors (mismatch responses) and focus almost exclusively on what happens when stimuli deviate from the animal's expectations.
  
  While the authors consistently report strong running-onset responses (during open-loop) in the auditory cortex in both auditory and visual versions of the task, they do not discuss their interpretation in the different task settings (see below), nor do they analyze how these responses change during closed-loop i.e. when predictions align with sensory evidence.
  
  However, I believe all my concerns can be easily addressed by additional analyses and incorporation of methodological details in the text.
  
  Major concerns:
  
  (1) Insufficient analysis of audiomotor mismatches in the auditory cortex:
  
  Lack of analysis of the dependence of audiomotor mismatches on the running speed: it would be helpful if the authors could clarify whether the observed audiomotor mismatch responses are just binary or scale with the degree of mismatch (i.e. running speed). Along the same lines, how should one interpret the lack of dependence of the playback halt responses on the running speed? Shouldn't we expect that during playback, the responses of mismatch neurons scale with the running speed?
  
  Regarding the scaling of AM mismatch responses with running speed, please see our response to reviewer 1 above to the same question.
  
  Regarding the playback halt response and dependence on running speed, we would not expect there to be a dependence. The playback halt response (by design) measures the strength of the sensory response to a cessation of a stimulus (think OFF response). These typically are less strong in cortex than the corresponding ON responses but need to be controlled for (else a mismatch response might just be an OFF response – the prediction error is quantified as the difference between AM mismatch response and playback halt response). Given that sound onset responses only have a small dependence on running state, we would similarly expect sound offset (playback halt) responses to exhibit only minimal dependence on running state.
  
  Slow temporal dynamics of audiomotor mismatches: despite the transient nature of the mismatches (1s), auditory mismatch responses last for several seconds. They appear significantly slower than previous reports for analogous visuomotor mismatches in V1 (by the same group, using the same methods) and even in comparison to the multimodal mismatches within this study (Figure 4C). What might explain this sustained activity? Is it due to a sustained change in the animal's running in response to the auditory mismatch?
  
  This is correct, neither AM or AM+VM mismatch return to baseline in the 3 seconds following onset. VM mismatch response in visual cortex also do not return to baseline in that time window (see e.g.
  
  Figure 1E in (Attinger et al., 2017), or Figure 1F in (Zmarz and Keller, 2016). What the origin or computation significance of this sustained calcium response is we do not know. In intracellular signals, we do not see this sustained response (Jordan and Keller, 2020). Also peculiar is indeed the fact that in the case of AM mismatch the sustained response is similar in strength to the initial response. But also here, why this would be the case, we do not know. It is conceivable that the initial and the sustained calcium response have different origins, if the sustained response amplitude is all or nothing, the fact that the AM mismatch response is the smallest of the three could explain why sustained and initial responses are closer than for [AM+VM] or VM (in visual cortex) mismatch responses. All sustained responses appear to be roughly 1% dF/F. There are no apparent changes in running speed or pupil dilation that would correlate with the sustained activity (new panel A in Figure S2).
  
  (2) Insufficient analysis and discussion of running onset responses during audiomotor sessions: The authors report strong running-onset responses during open-loop in identified mismatch neurons. They also highlight that these responses are in agreement with their model of subtractive prediction error, which relies on subtracting the bottom-up sensory evidence from top-down motor-related predictions. I agree, and, thus, assume that running-onset responses during the open loop in identified 'mismatch' neurons reflect the motor-related predictions of sensory input that the animal has learned to expect. If this is true, one would expect that such running-onset responses should dampen during closed-loop, when sensory evidence matches expectations and therefore cancels out this prediction. It would be nice if the authors test this explicitly by analyzing the running-related activity of the same neurons during closed-loop sessions.
  
  Thank you for the suggestion. We now show running onset responses in both closed and open loop conditions for audiomotor and visuomotor coupling (new Figures 2H and 3H). In closed loop, we observe only a transient running onset response. In the open loop condition, running onset responses are sustained. For the visuomotor coupling, running onset responses are sustained in both closed and open loop conditions. This would be consistent with a slightly delayed cancellation of sound and motor related inputs in the audiomotor closed loop condition but not otherwise.
  
  (3) Ambiguity in the interpretation of responses in visuomotor sessions.
  
  Unlike for auditory stimuli, the authors show that there are no obvious responses to visuomotor mismatches or playback halts in the auditory cortex. However, the interpretation of these results is somewhat complicated by the uncertainty related to the training history of these mice. Were these mice exclusively trained on the visuomotor version of the task or also on the auditory version? I could not find this info in the Methods. From the legend for Figure 4D, it appears that the same mice were trained on all versions of the task. Is this the case? If yes, what was the training sequence? Were the mice first trained on the auditory and then the visual version?
  
  The training history of the animals is important to outline the nature of the predictions and mismatch responses that one should expect to observe in the auditory cortex during visuomotor sessions.
  
  Depending on whether the mice in Figure 3 were trained on visual only or both visual and auditory tasks, the open-loop running onset responses may have different interpretations.
  
  a) If the mice were trained only on the visual task, how should one interpret the strong running onset responses in the auditory cortex? Are these sensorimotor predictions (presumably of visual stimuli) that are conveyed to the auditory cortex? If so, what may be their role?
  
  b) If the mice were also trained on the auditory version, then a potential explanation of the running-onset responses is that they are audiomotor predictions lingering from the previously learned sensorimotor coupling. In this case, one should expect that in the visual version of the task, these audiomotor predictions (within the auditory cortex) would not get canceled out even during the closedloop periods. In other words, mismatch neurons should constantly be in an error state (more active) in the closed-loop visuomotor task. Is this the case?
  
  If so, how should one then interpret the lack of a 'visuomotor mismatch' aligned to the visual halts, over and above this background of continuous errors?
  
  As such, the manuscript would benefit from clearly stating in the main text the experimental conditions such as training history, and from discussing the relevant possible interpretations of the responses.
  
  Mice were not trained on either audiomotor or visuomotor coupling and were reared normally. Prior to the recording day, the mice were habituated to running on the air-supported treadmill without any coupling for up to 5 days. On the first recording day, the mice experienced all three types of sessions (audiomotor, visuomotor, or combined coupling) in a random order for the first time. We have clarified this in the methods.
  
  Regarding the question of how one should interpret the strong running onset responses in the auditory cortex, this is complicated by the fact that – unless mice are raised visually or auditorily deprived – they always have life-long experience with visuomotor or audiomotor coupling. The visuomotor coupling they experience in VR is geometrically matched to what they would experience by moving in the real world, for the audiomotor coupling the exact relationship is less clear, but there are a diverse set of sound sources that scale in loudness with increasing running speed. Hence running onset responses reflect either such learned associations (as the reviewer also speculates), or spurious input. Rearing mice without coupling between movement and visual feedback does not abolish movement related responses in visual cortex (Attinger et al., 2017), to the contrary, it enhances them considerably. We suspect this reflects visual cortex being recruited for other functions in the absence of visual input. But given the data we have we cannot distinguish the different possible sources of running related responses. It is very likely that any “training” related effect we could achieve in a few hours pales in comparison to the life-long experience the mouse has in the world.
  
  Regarding the lack of a 'visuomotor mismatch' aligned to the visual halts, we are not sure we understand. Our interpretation is that there are no (or only a very small - we speculate that any nonzero VM mismatch response is just inherited from visual cortex) VM mismatch responses in auditory cortex above chance. Our data are consistent with the interpretation that there is no opposition of bottom up visual and top down motor related input in auditory cortex, hence no VM mismatch responses (independent of how strong the top-down motor related input is). This is of course not surprising – this is more of a sanity check and becomes relevant in the context of interpreting AM+VM responses.
  
  (4) Ambiguity in the interpretation of responses in multimodal versus unimodal sessions.
  
  The authors show that multimodal (auditory + visual) mismatches trigger stronger responses than unimodal mismatches presented in isolation (auditory only or visual only). Further, they find that even though visual mismatches by themselves do not evoke a significant response, co-presentation of visual and auditory stimuli non-linearly augments the mismatch responses suggesting the presence of nonhierarchical interactions between various predictive processing streams.
  
  In my opinion, this is an important result, but its interpretation is nuanced given insufficient details about the experimental design. It appears that responses to unimodal mismatches are obtained from sessions in which only one stimulus is presented (unimodal closed-loop sessions). Is this actually the case? An alternative and perhaps cleaner experimental design would be to create unimodal mismatches within a multimodal closed-loop session while keeping the other stimulus still coupled to the movement.
  
  This is correct, unimodal mismatches were acquired in unimodal coupling. Testing unimodal mismatch responses in multimodally coupled VR is an interesting idea we had initially even pursued. However, halting visual flow in a condition of coupling of both visual flow and sound amplitude to running speed has an additional complication. Introducing an audiomotor mismatch in this coupling inherently also creates an audiovisual (AV) mismatch, and the same applies to visuomotor mismatches, which cause a concurrent visuoaudio (VA) mismatch (Figure R3). This assumes that there are cross modal predictions from visual cortex to auditory cortex as there are from auditory cortex to visual cortex (Garner and Keller, 2022). There are interesting differences between the different types of mismatches, but with the all the necessary passive controls this quickly exceeded the amount of data we could reasonably acquire for this paper. This remains an interesting question for future research.
  
  Author response image 3.
  
  Rationale of unimodal mismatches introduced within multimodal paradigm.
  
  Given the current experiment design (if my assumption is correct), it is unclear if the multimodal potentiation of mismatch responses is a consequence of nonlinear interactions between prediction/error signals exchanged across visual and auditory modalities. Alternatively, could this result from providing visual stimuli (coupled or uncoupled to movement) on top of the auditory stimuli? If it is the latter, would the observed results still be evidence of non-hierarchical interactions between various predictive processing streams?
  
  Mice are not in complete darkness during the AM mismatch experiments (the VR is off, but there is low ambient light in the experimental rooms primarily from computer screens), so we can rule out the possibility that the difference comes from having “no” visual input during AM mismatch responses. Addressing the question of whether it is this particular stimulus that cause the increase would require an experiment in which we couple sound amplitude but keep visual flow open loop. We did not do this, but also think this is highly unlikely. However, as described above, we did do an experiment in which we coupled both sound amplitude and visual flow to running, and then either halted visual flow, or sound amplitude, or both. Comparing the [AM+VM] and [AM+AV] mismatch responses, we find that [AM+VM] responses are larger than [AM+AV] responses as one would expect from an interaction between [AM] and [VM] responses (Author response image 4). Finally, either way the conclusion that there are nonhierarchical interactions of prediction error computations holds either way – if any visual stimulus (either visuomotor mismatch, or visual flow responses) influences audiomotor mismatch responses, this is evidence of non-hierarchical interactions.
  
  Author response image 4.
  
  Average population response of all L2/3 neurons to concurrent [AM + VM] or [AM+AV] mismatch. Gray shading indicates the duration of the stimulus.
  
  Along the same lines, it would be interesting to analyze how the coupling of visual as well as auditory stimuli to movement influences responses in the auditory cortex in close-loop in comparison to auditoryonly sessions. Also, do running onset responses change in open-loop in multimodal vs. unimodal playback sessions?
  
  We agree, and why we started out doing the experiments described above. We stopped with this however, because it quickly became a combinatorial nightmare. We will leave addressing the question of how different types of coupling influences responses in auditory cortex to brave future neuroscientists.
  
  Regarding the question of running onset responses, in both the multimodal and auditory only paradigms, running onset responses are transient; bottom-up sensory evidence is quickly subtracted from top-down motor-related prediction (Author response image 5). While there appears to be a small difference in the dynamics of running onset responses between these two paradigms, it was not significant. Note, we also have much less data than we would like here for this type of analysis.
  
  Author response image 5.
  
  Running onset responses recorded in unimodal and multimodal closed loop sessions (1903 neurons, 16 fields of view, 8 mice)
  
  We also compared running onsets in open loop sessions and did not find any significant differences between unimodal and multimodal sessions (Author response image 6). We found only six sessions in which animals performed at least two running onsets in each session type, therefore, we do not have enough data to include it in the manuscript.
  
  Author response image 6.
  
  Running onset responses recorded within unimodal and multimodal open loop sessions (659 cells, 6 field of view, 5 mice).
  
  Minor concerns and comments:
  
  (1) Rapid learning of audiomotor mismatches: It is interesting that auditory mismatches are present even on day 1 and do not appear to get stronger with learning (same on day 2). The authors comment that this could be because the coupling is learned rapidly (line 110). How does this compare to the rate at which visuomotor coupling is learned? Is this rapid learning also observable in the animal's behavior i.e. is there a change in running speed in response to the mismatch?
  
  In the visual system this is a bit more complicated. If you look at visuomotor mismatch responses in a normally reared mouse, responses are present from the first mismatch (as far as we can tell given the inherently small dataset with just one response pre mouse). However, this is of course confounded by the fact that a normally reared mouse has visuomotor coupling throughout life from eye-opening. Raising mice in complete darkness, we have shown that approximately 20 min of coupling are sufficient to establish visuomotor mismatch responses (Attinger et al., 2017).
  
  Regarding the behavioral changes that correlate with learning, we are not sure what the reviewer would expect. We cannot detect a change in mismatch responses and hence would also not expect to see a change in behavior.
  
  (2) The authors should clarify whether the sound and running onset responses of the auditory mismatch neurons in Figure 2E were acquired during open-loop. This is most likely the case, but explicitly stating it would be helpful.
  
  Both responses were measured in isolation (i.e. VR off, just sound and just running onset), not in an open-loop session. We have clarified in the figure legend that these are the same data as in Figure 1H and N.
  
  (3) In lines 87-88, the authors state 'Visual responses also appeared overall similar but with a small increase in strength during running ...'. This statement would benefit from clarification. From Figure S1 it appears that when the animal is sitting there are no visual responses in the auditory cortex. But when the animal is moving, small positive responses are present. Are these actually 'visual' responses - perhaps a visual prediction sent from the visual cortex to the auditory cortex that is gated by movement? If so, are they modulated by features of visual stimuli eg. contrast, intensity? Or, do these responses simply reflect motor-related activity (running)? Would they be present to the same extent in the same neurons even in the dark?
  
  This was wrong indeed - we have rephrased the statement as suggested. Regarding the source of visual responses, we use the term “visual response” operationally here agnostic to what pathway might be driving it (i.e. it could be a prediction triggered by visual input).
  
  We did not test if recorded visual responses are modulated by contrast or intensity. However, testing whether they are would not help us distinguish whether the responses are ‘visual’ or ‘visual predictions’. Finally, regarding the question about whether they are motor-related responses, this might be a misunderstanding. These are responses to visual stimuli while the mouse is already running (i.e. there is no running onset), hence we cannot test whether these responses are present in the dark (this would be the equivalent of looking at random triggers in the dark while the mouse is running).
  
  (4) The authors comment in the text (lines 106-107) about cessation of sound amplitude during audiomotor mismatches as being analogous to halting of visual flow in visuomotor mismatches. However, sound amplitude versus visual flow are quite different in nature. In the visuomotor paradigm, the amount of visual stimulation (photons per unit time) does not necessarily change systematically with running speed. Whereas, in the audiomotor paradigm, the SNR of the stimulus itself changes with running speed which may impact the accuracy of predictions. On a broader note, under natural settings, while the visual flow is coupled to movement, sound amplitude may vary more idiosyncratically with movement.
  
  This is a question of coding space. The coding space of visual cortex of the mouse is probably visual flow (or change in image) not number of photons. This already starts in the retina. The demonstration of this is quite impressive. A completely static image on the retina will fade to zero response (even though the number of photons remains constant). This is also why most visual physiologists use dynamic stimuli – e.g. drifting gratings, not static gratings – to map visual responses in visual cortex. If responses were linear in number of photons, this would make less of a difference. The correspondence we make is between visual flow (which we assume is the main coding space of mouse V1 – this is not established fact, but probably implicitly the general consensus of the field) and sound amplitude. Responses in auditory cortex are probably more linear in sound amplitude than visual cortex responses are linear in number of photons, but whether that is the correct coding space is still unclear, and as far as we can tell there is no clear consensus in the field. We did consider coupling running speed to frequency, which may work as well, but given the possible equivalence (as argued above) and the fact that we could see similar responses with sound amplitude coupling we did not explore frequency coupling.
  
  If visual speed is the coding space of V1, SNR should behave equivalently in both cases.
  
  Perhaps such differences might explain why unlike in the case of visual cortex experiments, running speed does not affect the strength of playback responses in the auditory cortex.
  
  Possible, but the more straightforward framing of this point is that sensory responses are enhanced by running in visual cortex while they are not in auditory cortex. A playback halt response (by design) is just a sensory response. Why running does not generally increase sensory responses in auditory cortex (L2/3 neurons), but does so in visual cortex, would be the more general version of the same question.
  
  We fear we have no intelligent answer to this question.
  
  Reviewer #3 (Public Review):
  
  This study explores sensory prediction errors in the sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same-level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli, and movement onset were characterized. Then, the authors made the running speed of the mouse predictive of sound intensity and/or visual flow. Mismatches were created through the interruption of sound and/or visual flow for 1 second while the animal moved, disrupting the expected sensory signal given the speed of movement. As a control, the same sensory stimuli triggered by the animal's movement were presented to the animal decoupled from its movement. The authors suggest that auditory responses to the unpredicted silence reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates the cross-modal influence of prediction error signals.
  
  This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation.
  
  This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation. The pattern they observe is different from the visuomotor mismatch responses the authors found in V1 (Keller et al., 2012), where the interruption of visual flow did not activate neuronal activity in the decoupled condition.
  
  Just to add brief context to this. The reviewer is correct here, the (Keller et al., 2012) paper reports finding no responses to playback halt. However, this was likely a consequence of indicator sensitivity (these experiments were done with what now seems like a pre-historic version of GCaMP). Experiments performed with more modern indicators do find playback halt responses in visual cortex (see e.g. (Zmarz and Keller, 2016)).
  
  The auditory system is sensitive to transitions, also those to silence. See the work of the Linden or the Barkat labs on-off responses, and also that of the Mesgarani lab (Khalighinejad et al., 2019) on responses to transitions 'to clean' (Figure 1c) in the human auditory cortex. Since the responses described in the current work are modulated by movement and the relationship between movement and sound is more consistent during the coupled sessions, this could explain the difference in response size between coupled and uncoupled sessions. There is also the question of learning. Prediction signals develop over a period of several days and are frequency-specific (Schneider et al., 2018). From a different angle, in Keller et al. 2012, mismatch responses decrease over time as one might expect from repetition.
  
  Also for brief context, this might be a misconception. We don’t find a decrease of mismatch responses in the (Keller et al., 2012) paper – we assume what the reviewer is referring to is the fact that mismatch responses decrease in open-loop conditions (they normally do not in closed-loop conditions). This is the behavior one would expect if the mouse learns that movement no longer predicts visual feedback.
  
  It would help to see the responses to varying sound intensity as a function of previous intensity, and to plot the interruption response as a function of both transition and movement in both conditions.
  
  Given the large populations of neurons recorded and the diversity of the responses, from clearly negative to clearly positive, it would be interesting to understand better whether the diversity reflects the diversity of sounds used or a diversity of cell types, or both.
  
  Comments and questions:
  
  Does movement generate a sound and does this change with the speed of movement? It would be useful to have this in the methods.
  
  There are three ways to interpret the question – below the answers to all three:
  
  (1) Running speed is experimentally coupled to sound amplitude of a tone played through a loudspeaker. Tone amplitude is scaled with running speed of the mouse in a closed loop fashion. We assume this is not what the reviewer meant, as this is described in the methods (and the results section).
  
  (2) Movements of the mouse naturally generate sounds (footsteps, legs moving against fur, etc.). Most of these sounds trivially scale with the frequency of leg movements – we assume this also not what the reviewer meant.
  
  (3) Finally, there are experimental sounds related to the rotation speed of the air supported treadmill that increase with running speed of the mouse. We have added this to the methods as suggested.
  
  Figures 1a and 2a. The mouse is very hard to see. Focus on mouse, objective, and sensory stimuli? The figures are generally very clear though.
  
  We have enlarged the mouse as suggested.
  
  1A-K was the animal running while these responses were measured?
  
  We did not restrict this analysis to running or sitting and pooled responses over both conditions. We have made this more explicit in the results section.
  
  Data in Figure 1: Since the modulation of sensory responses by movement is relevant for the mismatch responses, I would move this analysis from S1 to Figure 1 and analyze the responses more finely in terms of running speed relative to sound and gratings. I would include here a more thorough analysis of the responses to 8kHz at varying intensities, for example in the decoupled sessions. Does the response adapt? Does it follow the intensity?
  
  We agree that these are interesting questions, but they do not directly pertain to our conclusions here. The key point Figure S1 addresses is whether auditory responses are generally enhanced by running (as they are e.g. in visual cortex) – the answer, on average, is no. We have tried emphasizing this more, but it changes the flow of the paper away from our main message, hence we have left the panels in the supplements.
  
  Regarding the 8kHz modulation, there is a general increase of the suppression of activity with increasing sound amplitude (Author response image 7 and Author response image 8). But due to the continuously varying amplitude of the stimulus, we do not have sufficient data (or do not know how to with the data we have) to address questions of adaptation. We assume there is some form of adaptation. However, either way, we don’t see how this would change our conclusions.
  
  Author response image 7.
  
  Neural activity as a function of sound level in an AM open loop session.
  
  Author response image 8.
  
  The average sound evoked population response of all ACx layer 2/3 neurons to 60 dB or 75 dB 8 kHz pure tones. Stimulus duration was 1 s (gray shading).
  
  2C-D why not talk of motor modulation? Paralleling what happens in response to auditory and visual stimuli?
  
  This is correct, a mismatch response (we use mismatch here to operationally describe the stimulus – not the interpretation) can be described either as a prediction error (this is the interpretation) or a stimulus specific motor modulation. Note, the key here is “stimulus specific”. It is stimulus specific as there is an approximately 3x change between mismatch and playback halt (the same sensory stimulus with and without locomotion), but basically no change for sound onsets (Figure S1). Having said that, one explanation (prediction error) has predictive power (and hence is testable – see e.g. (Vasilevskaya et al., 2023) for an extensive discussion on exactly this argument for mismatch responses in visual cortex), while the other does not (a “stimulus specific” motor modulation has no predictive value or computational theory behind it and is simply a description). Thus, we choose to interpret it as a prediction error. Note, this finding does not stand in isolation and many of the testable predictions of the predictive processing interpretation have turned out to be correct (see e.g. (Keller and Mrsic-Flogel, 2018) for a review).
  
  Note, we try to only use the interpretation of “prediction error” when motivating why we do the experiments, and in the discussion, but not directly in the description of the results (e.g. in Figure 2).
  
  How does the mismatch affect the behavior of the mouse? Does it stop running? This could also influence the size of the response.
  
  We quantified animal behavior during audiomotor mismatches and did not find any significant acceleration or slowing down upon mismatch events. Thus, neural responses recorded during AM mismatches are unlikely to be explained by changes in animal behavior. These data have been added in Figure S2A and Figure S4A.
  
  Figure 3. What about neurons that were positively modulated by both grating and movement? How do these neurons respond to the mismatch?
  
  Neurons positively modulated by both grating and movement were slightly more responsive to MM than the rest of the population, though this difference was not significant (Author response image 9). This is also visible in Figure 3G – the high VM mismatch responsive neurons are randomly distributed in regard to correlation with running speed and visual flow speed.
  
  Author response image 9.
  
  Responses to visuomotor mismatches of neurons positively modulated by grating and movement and remaining of the population.
  
  Line 176. The authors say 'Thus, in the case of a [AM + VM] mismatch both the halted visual flow and the halted sound amplitude are predicted by running speed' but the mismatch (halted flow and amplitude) is not predicted by the speed, correct? Please rephrase.
  
  Thank you for pointing this out – this was indeed phrased incorrectly. We have corrected this.
  
  How was the sound and/or visual flow interruption triggered? Did the animal have to run at a minimum speed in order for it to happen?
  
  Sound and visual flow interruptions were triggered randomly, independent of the animal's running speed. However, for the analysis, only MM presentations during which animals were running at a speed of at least 0.3 cm/s were included. The 0.3 cm/s was simply the (arbitrary) threshold we used to determine if the mouse was running. In a completely stationary mouse a mismatch event will not have any effect (sound amplitude/visual flow speed are already at 0). This is described in the methods section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.29.564593v3
www.biorxiv.org www.biorxiv.org

Chromosome Structure I: Loop extrusion or boundary:boundary pairing?

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors addressed how long-range interactions between boundary elements are established and influence their function in enhancer specificity. Briefly, the authors placed two different reporters separated by a boundary element. They inserted this construct ectopically ~140 kb away from an endogenous locus that contains the same boundary element. The authors used expression patterns driven by nearby enhancers as an output to determine which enhancers the reporters interact with. They complemented this analysis with 3D DNA contact mapping. The authors found that the orientation of the boundary element determined which enhancers each reporter interacted with. They proposed that the 3D interaction topology, whether being circular or stem configuration, distinguished whether the interaction was cohesin mediated or through an independent mechanism termed pairing.
  
  Strengths:
  
  The transgene expression assays are built upon prior knowledge of the enhancer activities. The 3D DNA contacts confirm that transgene expression correlates with the contacts. Using 4 different orientations covers all combinations of the reporter genes and the boundary placement.
  
  Weaknesses:
  
  The interpretation of the data as a refusal of loop extrusion playing a role in TAD formation is not warranted, as the authors did not deplete the loop extruders to show that what they measure is independent.
  
  (1.1) To begin with, our findings do not exclude the possibility that cohesin loop extrusion has some sort of role in the formation or maintenance of TADs in flies or other aspects of chromosome structure. On the other hand, it clearly is not determinative in defining the end-points of TADs or in generating the resulting topology (stem-loop or circle-loop). Our main point, which we feel we have established unequivocally, is that it can’t explain many essential features of TADs or chromosome loops (see below) in Drosophila. This reviewer agrees with this point in their next paragraph (below). We also think that the loop extrusion model’s general acceptance as THE driving force behind TAD formation in mammals is unwarranted and not fully consistent with the available data, as explained below.
  
  As to the reviewer’s specific point regarding depletion of loop extruders, we first note that completely eliminating factors encoding cohesin subunits in fly embryos isn’t readily feasible. As cohesin is essential starting at the beginning of embryonic development, and is maternally deposited, knockdowns/depletions would likely be incomplete and there would always be some remaining activity. As long as there is some residual activity—and no disruption in TAD formation is observed—this experimental test would be a failure. In addition, any defects that are observed might arise not from a failure in TAD formation via loop extrusion but rather because the rapid mitotic cycles would be disrupted. A far better approach would be to deplete/knockdown cohesin subunits in tissue culture cells, as there is no requirement for the cells to undergo embryonic development. Moreover, since cell division is relatively slow, the depletion would likely eliminate much if not all of the activity before a checkpoint is reached.
  
  While a drastic depletion of cohesin is not feasible in our model organism, we would draw the reviewer’s attention to an experiment of this type which has already been done in mammalian tissue culture cells by Goel et al. (Goel et al. 2023). Unlike most Hi-C studies in mammals, the authors used region capture MicroC (RCMC). In contrast to published genome-wide mammalian MicroC experiments (c.f., (Hsieh et al. 2020; Krietenstein et al. 2020)) which require large bin sizes to visualize mammalian “TADs,” the resolution of the experiments in Goel et al. (Goel et al. 2023) is similar to the resolution in our MicroC experiments (200-400 bp). A MicroC contact map from Goel et al. shows the Pdm1g locus on chromosome 5 before and after Rad21 depletion. The contact map visualizes a 250 kb DNA segment, which is only slightly larger than the ~230 kb DNA segment in Fig. 2C in our paper.
  
  In this experiment, there was a 97% reduction in the amount of Rad21. However, as can be seen by comparing the contact profiles above and below the diagonal, there is little or no difference in TAD organization after cohesin depletion when individual TADs are visualized with a bin size of 250 bp. These results would indicate that mammalian TADs do not require cohesin.
  
  Note also that the weak 45o stripes connecting different TADs (c.f. blue/green arrowheads) are still present after Rad21 depletion. In the most popular version of the loop extrusion model, cohesin loads at a site(s) somewhere in the TAD-to-be, and then extrudes both strands until it bumps into CTCF roadblocks. As illustrated in Figure Sup 2, this mechanism generates a vertical stripe originating at the cohesin loading site and extending until cohesin bumps into the left or right roadblock, at which point the stripe transitions into 45o stripe that ends when cohesin bumps into the other roadblock. While 45o stripes are visible, there is no hint of a vertical stripe. This suggests that the mechanism for generating stripes, if it is an active mechanism (rather than passive diffusion) may be quite different. The 45o stripes must be generated by a factor(s) that is anchored to one (blue arrowhead) or both (green arrowhead) boundaries. In addition, this factor, whatever it is, is not cohesin. The reason for this is that the 45o stripes are present both before and after Rad21 depletion. Moreover, if one were to imagine that the stripes represent a process involved in TAD formation, this process does not require cohesin (see Goel et al 2023).
  
  It is worth noting another observation that is inconsistent with the cohesin loop extrusion/CTCF roadblock model for TAD formation/maintenance. CTCF is not found at all of the TAD boundaries in this 250 kb DNA region. This would suggest that there are other DNA binding proteins that have chromosomal architectural functions besides CTCF. In flies, many of the chromosomal architectural proteins are, like CTCF, polydactyl zinc finger (PZF) proteins (Bonchuk et al. 2021; Bonchuk et al. 2022; Fedotova et al. 2017). These include Su(Hw), CTCF, Pita, Zipic and CLAMP. The PZF family in flies is quite large. There are ~250 different PZF genes, and since only a handful of these have been characterized, it seems likely that additional members of this family will have architectural functions. Thus far, only one boundary protein, CTCF, has received attention in studies on mammalian chromosome architecture. As the mammalian genome is much larger and more complicated than the fly genome, it is difficult to believe that CTCF is the sole chromosomal architectural protein in mammals. In this respect, it is worth noting that there are ~800 members of the PZF family in mammalian genomes (Fedotova et al. 2017).
  
  Goel et al. (Goel et al. 2023) did observe alterations in the contact profiles after Rad21 depletion when they visualized the Ppm1g region at much lower resolution (bin sizes of 5 kb and 1 kb). The 5 kb bin size visualizes a region of ~1.2 Mb, while the 1 kb bin size visualizes a region that spans ~800 kb. These large triangular units do not correspond to the individual TADs seen when Goel et al. visualized the Ppm1g locus at 250 bp resolution.
  
  Nor do they correspond to TADs in Fig. 2 of our paper. Instead they represent TAD neighborhoods which, likely consist of 20-30 or more individual TADs. Consequently the alterations in contact patterns seen after Rad21 depletion are occurring at the level of TAD neighborhoods. This can be seen by comparing pixel density inside the blue lines before (above the diagonal) and after Rad21 depletion (below the diagonal) (Goel et al 2023). The more distant contacts between individual TADs within this neighborhood are preferentially reduced by Rad21 depletion (the region below and to the left of the double arrowhead). By contrast, the TADs themselves are unaffected, as are contacts between individual TADs and their immediate neighbors (see purple and light green asterisk). The other interesting feature is the loss of contacts between what appears to be partially overlapping neighborhoods. This loss of neighborhood-toneighborhood contacts can be seen in the region located between the green and blue lines. The neighborhood that appears to partially overlap the Ppm1g neighborhood is outlined in purple.
  
  It worth noting that, with the exception of the high resolution experiments in Goel et al., all of the other studies on cohesin (and CTCF) have examined the effects on contact maps within (and between) large neighborhoods (bin sizes >1 kb). In most cases, these large neighborhoods are likely to be composed of many individual TADs like those seen in Goel et al. and in Fig. 2 of our paper. We also observe larger neighborhoods in the fly genome, though they do not appear to be as large as those in mammals. Our experiments do not address what role cohesin might have in facilitating contacts between more distant TADs located within the same neighborhoods, or between TADs in different neighborhoods, or whether loop extrusion is involved.
  
  We would also note that the Drosophila DNA segment in Fig. 2C contains 35 different genes, while the mammalian DNA segment shown in Fig. 1 has only 9. Thus, in this part of the fly genome, Pol II genes are more densely packed than in the mammalian DNA segment. Much of the fly genome is also densely packed, and the size of individual TADs will likely be smaller, on average, than in mammals. Nevertheless, the MicroC profiles are not all that different. As is also common in flies, each TAD in the Ppm1g region only encompasses one or two genes. Note also that there are no volcano triangles with plumes as would be predicted for TADs that have a stem-loop topology.
  
  In fact, as shown in Author response image 1, the high-resolution contact profile for the Ppm1g region shows a strong resemblance to that observed for the fly Abd-B regulatory domains. These regulatory domains are part of larger neighborhood that encompasses the abd-A and Abd-B genes and their regulatory domains.
  
  Author response image 1.
  
  Abd-B regulatory domains
  
  As the authors show, the single long DNA loop mediated by cohesin loop extrusion connecting the ectopic and endogenous boundary is clearly inconsistent with the results, therefore the main conclusion of the paper that the 3D topology of the boundary elements a consequence of pairing is strong. However, the loop extrusion and pairing are not mutually exclusive models for the formation of TADs. Loop-extruding cohesin complexes need not make a 140 kb loop, multiple smaller loops could bring together the two boundary elements, which are then held together by pairing proteins that can make circular topologies.
  
  (1.2) In the pairing model, distant boundaries bump into each other (by random walks or partially constrained walks), and if they are “compatible” they pair with each other, typically in an orientation-dependent manner. As an alternative, the reviewer argues that cohesin need not make one large 140 kb loop. Instead it could generate a series of smaller loops (presumably corresponding to the intervening TADs). These smaller loops would bring homie in the transgene in close proximity to the eve locus so that it could interact with the endogenous homie and nhomie elements in the appropriate orientation, and in this way only one of the reporters would be ultimately activated.
  
  There are two problems with the idea that cohesin-dependent loop extrusion brings transgene homie into contact with homie/nhomie in the eve locus by generating a series of small loops (TADs). The first is the very large distances over which specific boundary:boundary pairing interactions can occur. The second is that boundary:boundary pairing interactions can take place not only in cis, but also in trans.
  
  We illustrate these points with several examples.
  
  Fujioka et al. 2016, Fig 7 shows an experiment in which attP sites located ~2 Mb apart were used to insert two different transgenes, one containing a lacZ reporter and the other containing the eve anal plate enhancer (AP) (Fujioka et al. 2016). If the lacZ reporter and the AP transgenes also contain homie, the AP enhancer can activate lacZ expression (panel A,). On the other hand, if one of the transgenes has lambda DNA instead of homie, no regulatory interactions are observed (panel A,). In addition, as is the case in our experiments using the -142 kb platform, orientation matters. In the combination on the top left, the homie boundary is pointing away from both the lacZ reporter and the AP enhancer. Since homie pairs with itself head-tohead, pairing brings the AP enhancer into contact with the lacZ reporter. A different result is obtained for the transgene pair in panel A on the top right. In this combination, homie is pointing away from the lacZ reporter, while it is pointing towards the AP enhancer. As a consequence, the reporter and enhancer are located on opposite sides of the paired homie boundaries, and in this configuration they are unable to interact with each other.
  
  On the top left of panel B, the homie element in the AP enhancer transgene was replaced by a nhomie boundary oriented so that it is pointing towards the enhancer. Pairing of homie and nhomie head-to-tail brings the AP enhancer in the nhomie transgene into contact with the lacZ reporter in the homie transgene, and it activates reporter expression. Finally, like homie, nhomie pairs with itself head-to-head, and when the nhomie boundaries are pointing towards both the AP reporter and the lacZ reporter, reporter expression is turned on.
  
  Long distance boundary-dependent pairing interactions by the bithorax complex Mcp boundary have also been reported in several papers. Fig. 6 from Muller et al. (Muller et al. 1999) shows the pattern of regulatory interactions (in this case PRE-dependent “pairing-sensitive silencing”) between transgenes that have a mini-white reporter, the Mcp and scs’ boundaries and a PRE that is located close to Mcp. In this experiment flies carrying transgenes inserted at the indicated sites on the left and right arms of the 3rd chromosome were mated in pairwise combinations, and their trans-heterozygous progeny examined for pairing-sensitive silencing of the mini-white reporter.
  
  Two examples of long-distance pairing-sensitive silencing mediated by Mcp/scs’ are shown in Fig. 5b from Muller et al. 1999. The transgene inserts in panel A are w#12.43 and ff#10.5. w#12.43 is inserted close to the telomere of 3R at 99B. ff10.5 is inserted closer to the middle of 3R at 91A. The estimate distance between them is 11.3 Mb. The transgene inserts in panel B are ff#10.5 and ff#11.102. ff#11.102 is inserted at 84D, and the distance between them is 11 Mb. Normally, the eye color phenotype of the mini-white reporter is additive: homozygyous inserts have twice as dark eye color as hemizygous inserts, while in trans-_heterozygous flies the eye color would be the sum of the two different transgenes. However, when a PRE is present and the transgene can pair, silencing is observed. In panel A, the t_rans-_heterozygous combination has a lighter eye color than either of the parents. In panel B, the _trans-_heterozygous combination is darker than one of the parents (_ff#10.5) but much lighter than the other (ff#11.102).
  
  All ten of the transgenes tested were able to engage in long distance (>Mbs) trans_regulatory interactions; however, likely because of how the chromosome folds on the Mb scale (e.g., the location of meta-loops: see #2.1 and Author response image 3) not all of the possible pairwise silencing interactions are observed. The silencing interactions shown in Muller et.al. are between transgenes inserted on different homologs. _Mcp/scs'-dependent silencing interactions can also occur in cis. Moreover, just like the homie and nhomie experiments described above, Muller et.al. (Muller et al. 1999) found that Mcp could mediate long-distance activation of mini-white and yellow by their respective enhancers.
  
  The pairing-sensitive activity of the PRE associated with the Mcp boundary is further enhanced when the mini-white transgene has the scs boundary in addition to Mcp and scs’. In the experiment shown in Fig. 8 from Muller et al. 1999, the pairing-sensitive silencing interactions of the Mcp/scs’/scs transgene are between transgenes inserted on different chromosomes. Panel A shows pairing-sensitive silencing between w#15.60, which is on the X chromosome, and w#15.102, which is on the 2nd chromosome. Panel B shows pairing-sensitive silencing between the 2nd chromosome insert w#15.60 and a transgene, w#15.48, which is inserted on the 3rd chromosome.
  
  The long-distance trans and cis interactions described here are not unique to homie, nhomie, Mcp, scs’, or scs. Precisely analogous results have been reported by Sigrist and Pirrotta (Sigrist and Pirrotta 1997) for the gypsy boundary when the bxd PRE was included in the mini-white transgene. Also like the Mcp-containing transgenes in Muller et al. (Muller et al. 1999), Sigrist and Pirrotta observed pairing-sensitive silencing between gypsy bxd_PRE _mini-white transgenes inserted on different chromosomes. Similar long-distance (Mb) interactions have been reported for Fab-7 (Bantignies et al. 2003; Li et al. 2011). In addition, there are examples of “naturally occurring” long-distance regulatory and/or physical interactions. One would be the regulatory/physical interactions between the p53 enhancer upstream of reaper and Xrp1 which was described by Link et al. (Link et al. 2013). Another would be the nearly 60 meta-loops identified by Mohana et al. (Mohana et al. 2023).
  
  Like homie at -142 kb, the regulatory interactions (pairing-sensitive silencing and enhancer activation of reporters) reported in Muller et al. (Muller et al. 1999) involve direct physical interactions between the transgenes. Vazquez et al. (Vazquez et al. 2006) used the lacI/lacO system to visualize contacts between distant scs/Mcp/scs’-containing transgenes in imaginal discs. As indicated in Vasquez et al. 2006, Table 3 lines #4-7, when both transgenes have Mcp and were inserted on the same chromosome, they colocalized in trans-_heterozygotes (single dot) in 94% to 97% of the disc nuclei in the four pairwise combinations they tested. When the transgenes both lacked _Mcp (Vasquez et al. 2006, Table 3 #1), co-localization was observed in 4% of the nuclei. When scs/Mcp/scs’-containing transgenes on the 2nd and 3rd chromosome were combined (Vasquez et al. 2006, Table 3 #8), colocalization was observed in 96% of the nuclei. They also showed that four different scs/Mcp/scs’ transgenes (two at the same insertion site but on different homologs, and two at different sites on different homologs) co-localized in 94% of the eye imaginal disc nuclei (Vasquez et al. 2006, Table 3 #9). These pairing interactions were also found to be stable over several hours. Similar co-localization experiments together with 3C were reported by Li et al. (Li et al. 2011).
  
  The de novo establishment of trans interactions between compatible boundary elements has been studied by Lim et al. (Lim et al. 2018). These authors visualized transvection (enhancer activation of a MS2 loop reporter in trans) mediated by the gypsy insulator, homie and Fab-8 in NC14 embryos. When both transgenes shared the same boundary element, transvection/physical pairing was observed in a small subset of embryos. The interactions took place after a delay and increased in frequency as the embryo progressed into NC14. As expected, transvection was specific: it was not observed when the transgenes had different boundaries. For homie it was also orientation-dependent. It was observed when homie was orientated in the same direction in both transgenes, but not when homie was orientated in opposite directions in the two transgenes.
  
  While one could imagine that loop extrusion-dependent compaction of the chromatin located between eve and the transgene at -142 kb into a series of small loops (the intervening TADs) might be able to bring homie in the transgene close to homie/nhomie in the eve locus, there is no cohesinbased loop extrusion scenario that would bring transgenes inserted at sites 6 Mb, 11 Mb, on different sides of the centromere, or at opposite ends of the 3rd chromosome together so that the distant boundaries recognize their partners and physically pair with each other. Nor is there a plausible cohesin-based loop extrusion mechanism that could account for the fact that most of the documented long-distance interactions involve transgenes inserted on different homologs. This is not to mention the fact that long-distance interactions are also observed between boundarycontaining transgenes inserted on different chromosomes.
  
  In fact, given these results, one would logically come to precisely the opposite conclusion. If boundary elements inserted Mbs apart, on different homologs and on different chromosomes can find each other and physically pair, it would be reasonable to think that the same mechanism (likely random collisions) is entirely sufficient when they are only 142 kb apart.
  
  Yet another reason to doubt the involvement or need for cohesin-dependent loop extrusion in bringing the transgene homie in contact with the eve locus comes from the studies of Goel et al. (Goel et al. 2023). They show that cohesin has no role in the formation of TADs in mammalian tissue culture cells. So if TADs in mammals aren’t dependent on cohesin, there would not be a good reason to think at this point that the loops (TADs) that are located between eve and the transgene are generated by, or even strongly dependent on, cohesin-dependent loop extrusion.
  
  It is also important to note that even if loop-extrusion were to contribute to chromatin compaction in this context and make the looping interactions that lead to orientation-specific pairing more efficient, the role of loop extrusion in this model is not determinative of the outcome, it is merely a general compaction mechanism. This is a far cry from the popular concept of loop extrusion as being THE driving force determining chromosome topology at the TAD level.
  
  Reviewer #2 (Public Review):
  
  In Bing et al, the authors analyze micro-C data from NC14 fly embryos, focusing on the eve locus, to assess different models of chromatin looping. They conclude that fly TADs are less consistent with conventional cohesin-based loop extrusion models and instead rely more heavily on boundaryboundary pairings in an orientation-dependent manner.
  
  Overall, I found the manuscript to be interesting and thought-provoking. However, this paper reads much more like a perspective than a research article. Considering eLIFE is aimed at the general audience, I strongly suggest the authors spend some time editing their introduction to the most salient points as well as organizing their results section in a more conventional way with conclusion-based titles. It was very difficult to follow the authors' logic throughout the manuscript as written. It was also not clear as written which experiments were performed as part of this study and which were reanalyzed but published elsewhere. This should be made clearer throughout.
  
  It has been shown several times that Drosophila Hi-C maps do not contain all of the features (frequent corner peaks, stripes, etc.) observed when compared to mammalian cells. Considering these features are thought to be products of extrusion events, it is not an entirely new concept that Drosophila domains form via mechanisms other than extrusion.
  
  (2.1) While there are differences between the Hi-C contact profiles in flies and mammals, these differences likely reflect in large part the bin sizes used to visualize contact profiles. With the exception of Goel et al. (Goel et al. 2023), most of the mammalian Hi-C studies have been low resolution restriction enzyme-based experiments, and required bin sizes of >1 kb or greater to visualize what are labeled as “TADs.” In fact, as shown by experiments in Goel et al., these are not actually TADs, but rather a conglomeration of multiple TADs into a series of TAD neighborhoods. The same is true for the MicroC experiments of Krietenstein et al. and Hsieh et al. on human and mouse tissue culture cells (Hsieh et al. 2020; Krietenstein et al. 2020). This is shown in Author response image 2. In this image, we have compared the MicroC profiles generated from human and mouse tissue culture cells with fly MicroC profiles at different levels of resolution.
  
  For panels A-D, the genomic DNA segments shown are approximately 2.8 Mb, 760 kb, 340 kb, and 190 kb. For panels E-H, the genomic DNA segments shown are approximately 4.7 Mb, 870 kb, 340 kb and 225 kb. For panels I-L, the genomic DNA segments shown are approximately 3 Mb, 550 kb, 290 kb and 175 kb.
  
  As reported for restriction enzyme-based Hi-C experiments, a series of stripes and dots are evident in mammalian MicroC profiles. In the data from Krietenstein et al., two large TAD “neighborhoods” are evident with a bin size of 5 kb, and these are bracketed by 45o stripes (A: black arrows). At 1 kb (panel B), the 45o stripe bordering the neighborhood on the left no longer defines the edge of the neighborhood (blue arrow: panel B), and both stripes become discontinuous (fuzzy dots). At 500 (panel C) and 200 bp (panel D) bin sizes, the stripes largely disappear (black arrows) even though they were the most prominent feature in the TAD landscape with large bin sizes. At 200 bp, the actual TADs (as opposed to the forest) are visible, but weakly populated. There are no stripes, and only one of the TADs has an obvious “dot” (green asterisk: panel C).
  
  Author response image 2.
  
  Mammalian MicroC profiles different bin sizes.
  
  Large TAD neighborhoods bordered by stripes are also evident in the Hsieh et al. data set in Author response image 2 panels E and F (black arrows in E and F and green arrow in F). At 400 bp resolution (panel G), the narrow stripe in panel F (black arrows) becomes much broader, indicating that it is likely generated by interactions across one or two small TADs that can be discerned at 200 bp resolution. The same is true for the broad stripe indicated by the green arrows in panels F, G and H. This stripe arises from contacts between the TADs indicated by the red bar in panels G and H and the TADs to the other side of the volcano triangle with a plume (blue arrow in panel H). As in flies, we would expect that this volcano triangle topped by a plume corresponds to a stem-loop. However, the resolution is poor at 200 bp, and the profiles of the neighboring TADs are not very distinct.
  
  For the fly data set, stripes can be discerned when analyzed at 800 bp resolution (see arrows in Author response image 3); however, these stripes are flanked by regions of lower contact, and represent TAD-TAD interactions. At 400 bp, smaller neighborhoods can be discerned, and these neighborhoods exhibit a complex pattern of interaction with adjacent neighborhoods. With bin sizes of 200 bp, individual TADs are observed, as are TAD-TAD interactions like those seen near eve. Some of the TADs have dots at their apex, while others do not—much like what is seen in the mammalian MicroC studies.
  
  Author response image 3.
  
  Mammalian MicroC profiles different bin sizes.
  
  Stripes: As illustrated in Author response image 2 A-D and E-H, the continuous stripes seen in low resolution mammalian studies (>1 kb bins) would appear to arise from binning artefacts. At high resolution where single TADs are visible, the stripes seem to be generated by TAD-TAD interactions, and not by some type of “extrusion” mechanism. This is most clearly seen for the volcano with plume TAD in Author response inage 2 G and H. While stripes in Author response image 2 disappear at high resolution, this is not always true. There are stripes that appear to be “real” in Geol et al. 2023 for the TADs in the Ppm1g region, and in Author response image 1 for the Abd-B regulatory domain TADs. Since the stripes in the Ppm1g region are unaffected by Rad21 depletion, some other mechanism must be involved (c.f. (Shidlovskii et al. 2021)).
  
  Dots: The high resolution images of mammalian MicroC experiments in Author response image 2D and H show that, like Drosophila (Author response image 3L), mammalian TADs don’t always have a “dot” at the apex of the triangle. This is not surprising. In the MicroC procedure, fixed chromatin is digested to mononucleosomes with MNase. Since most TAD boundaries in flies, and presumably also in mammals, are relatively large (150-400 bp) nuclease hypersensitive regions, extensive MNase digestion will typically reduce the boundary element sequences to oligonucleotides.
  
  In flies, the only known sequences (at least to date) that end up giving dots (like those seen in Author response image 1) are bound by a large (>1,000 kd) GAF-containing multiprotein complex called LBC. In the Abd-B region of BX-C, LBC binds to two ~180 bp sequences in Fab-7 (dHS1 and HS3: (Kyrchanova et al. 2018; Wolle et al. 2015), and to the centromere proximal (CP) side of Fab-8. The LBC elements in Fab-7 (dHS1) and Fab-8 (CP) have both blocking and boundary bypass activity (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018). Elsewhere, LBC binds to the bx and bxd PREs in the Ubx regulatory domains, to two PREs upstream of engrailed, to the hsp70 promoter, the histone H3-H4 promoters, and the eve promoter (unpublished data). Based on ChIP signatures, it likely binds to most PREs/tethering elements in the fly genome (Batut et al. 2022; Li et al. 2023). Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that LBC protects an ~150-180 bp DNA segment from MNase digestion, which would explain why LBC-bound sequences are able to generate dots in MicroC experiments. Also unlike typical boundary elements, the pairing interactions of the LBC elements we’ve tested appear to be orientation-independent (unpublished data).
  
  The difference in MNase sensitivity between typical TAD boundaries and LBC-bound elements is illustrated in the MicroC of the Leukocyte-antigen-related-like (Lar) meta-loop in Author response image 4 panels A and B. Direct physical pairing of two TAD boundaries (blue and purple) brings two TADs encompassing the 125 kb lar gene into contact with two TADs in a gene poor region 620 kb away. This interaction generates two regions of greatly enhanced contact: the two boxes on either side of the paired boundaries (panel A). Note that like transgene homie pairing with the eve boundaries, the boundary pairing interaction that forms the lar meta-loop is orientation-dependent. In this case the TAD boundary in the Lar locus pairs with the TAD boundary in the gene poor region head-to-head (arrow tip to arrow tip), generating a circle-loop. This circle-loop configuration brings the TAD upstream of the blue boundary into contact with the TAD upstream of the purple boundary. Likewise, the TAD downstream of the blue boundary is brought into contact with the TAD downstream of the purple boundary.
  
  In the MicroC procedure, the sequences that correspond to the paired boundaries are not recovered (red arrow in Author response image 4 panel B). This is why there are vertical and horizontal blank stripes (red arrowheads) emanating from the missing point of contact. Using a different HiC procedure (dHS-C) that allows us to recover sequences from typical boundary elements (Author response image 4 panels C and D), there is a strong “dot” at the point of contact which corresponds to the pairing of the blue and purple boundaries.
  
  There is a second dot (green arrow) within the box that represents physical contacts between sequences in the TADs downstream of the blue and purple boundaries. This dot is resistant to MNase digestion and is visible both in the MicroC and dHS-C profiles. Based on the ChIP signature of the corresponding elements in the two TADs downstream of the blue and purple boundaries, this dot represents paired LBC elements.
  
  Author response image 4.
  
  Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C
  
  That being said, the authors' analyses do not distinguish between the formation and the maintenance of domains. It is not clear to this reviewer why a single mechanism should explain the formation of the complex structures observed in static Hi-C heatmaps from a population of cells at a single developmental time point. For example, how can the authors rule out that extrusion initially provides the necessary proximity and possibly the cis preference of contacts required for boundaryboundary pairing whereas the latter may more reflect the structures observed at maintenance?
  
  (2.2) The MicroC profiles shown in Fig. 2 of our paper were generated from nuclear cycle (NC) 14 embryos. NC14 is the last nuclear cycle before cellularization (Foe 1989). After the nuclei exit mitosis, S-phase begins, and because satellite sequences are late replicating in this nuclear cycle, S phase lasts 50 min instead of only 4-6 min during earlier cycles (Shermoen et al. 2010). So unlike MicroC studies in mammals, our analysis of chromatin architecture in NC14 embryos likely offers the best opportunity to detect any intermediates that are generated during TAD formation. In particular, we should be able to observe evidence of cohesin linking the sequences from the two extruding strands together (the stripes) as it generates TADs de novo. However, there are no vertical stripes in the eve TAD as would be expected if cohesin entered at a few specific sites somewhere within the TAD and extruded loops in opposite directions synchronously, nor are their stripes at 45o as would be expected if it started at nhomie or homie (see Figure Supplemental 1). We also do not detect cohesin-generated stripes in any of the TADs in between eve and the attP site at -142 kb. Note that in some models, cohesin is thought to be continuously extruding loops. After hitting the CTCF roadblocks, cohesin either falls off after a short period and starts again or it breaks through one or more TAD boundaries generating the LDC domains. In this dynamic model, stripes of crosslinked DNA generated by the passing cohesin complex should be observed throughout the cell cycle. They are not.
  
  As for formation versus maintenance, and the possible involvement of cohesin loop extrusion in the former, but not the latter: This question was indirectly addressed in point #1.2 above. In this point we described multiple examples of specific boundary:boundary pairing interactions that take place over Mbs, in cis and in trans and even between different chromosomes. These long-distance interactions don’t preexist; instead they must be established de novo and then maintained. This process was actually visualized in the studies of Lim et al. (Lim et al. 2018) on the establishment of trans boundary pairing interactions in NC14 embryos. There is no conceivable mechanism by which cohesin-based loop extrusion could establish the long or short distance trans interactions that have been documented in many studies on fly boundary elements. Also as noted above, its seems unlikely that it is necessary for long-range interactions in cis.
  
  A more plausible scenario is that cohesin entrapment helps to stabilize these long-distance interactions after they are formed. If this were true, then one could argue that cohesin might also function to maintain TADs after boundaries have physically paired with their neighbors in cis. However, the Rad21 depletion experiments of Goel et al. (Goel et al. 2023) would rule out an essential role for cohesin in maintaining TADs after boundary:boundary pairing. In short, while we cannot formally rule out that loop extrusion might help bring sequences closer together to increase their chance of pairing, neither the specificity of that pairing, nor its orientation can be explained by loop extrusion. Furthermore, since pairing in trans cannot be facilitated by loop extrusion, invoking it as potentially important for boundary-boundary pairing in cis can only be described as a potential mechanism in search of a function, without clear evidence in its favor.
  
  On the other hand, the apparent loss of contacts between TADs within large multi-TAD neighborhoods (Geol et al. 2023) would suggest that there is some sort of decompaction of neighborhoods after Rad21 depletion. It is possible that this might stress interactions that span multiple TADs as is the case for homie at -142, or for the other examples described in #1.2 above. This kind of involvement of cohesin might or might not be associated with a loop extrusion mechanism.
  
  Future work aimed at analyzing micro-C data in cohesin-depleted cells might shed additional light on this.
  
  (2.3) This experiment has been done by Goel et al. (Goel et al. 2023) in mammalian tissue culture cells. They found that TADs, as well as local TAD neighborhoods, are not disrupted/altered by Rad21 depletion (see Geol at al. 2023 and our response to point #1.1 of reviewer #1).
  
  Additional mechanisms at play include compartment-level interactions driven by chromatin states. Indeed, in mammalian cells, these interactions often manifest as a "plume" on Hi-C maps similar to what the authors attribute to boundary interactions in this manuscript. How do the chromatin states in the neighboring domains of the eve locus impact the model if at all?
  
  (2.4) Chromatin states have been implicated in driving compartment level interactions.
  
  Compartments as initially described were large, often Mb sized, chromosomal segments that “share” similar chromatin marks/states, and are thought to merge via co-polymer segregation. They were visualized using large multi-kb bin sizes. In the studies reported here, we use bin sizes of 200 bp to examine a DNA segment of less than 200 kb which is subdivided into a dozen or so small TADs. Several of the TADs contain more than one transcription unit, and they are expressed in quite different patterns, and thus might be expected to have different “chromatin states” at different points in development and in different cells in the organism. However, as can be seen by comparing the MicroC patterns in our paper that are shown in Fig. 2 with Fig. 7, Figure Supplemental 5 and Figure Supplemental 6, the TAD organization in NC14 and 12-16 hr embryos is for the most part quite similar. There is no indication that these small TADs are participating in liquid phase compartmentalization that depends upon shared chromatin/transcriptional states in NC14 and then again in 12-16 hr embryos.
  
  In NC14 embryos, eve is expressed in 7 stripes, while it is potentially active throughout much of the embryo. In fact, the initial pattern in early cycles is quite broad and is then refined during NC14. In 12-16 hr embryos, the eve gene is silenced by the PcG system in all but a few cells in the embryo. However, here again the basic structure of the TAD, including the volcano plume, looks quite similar at these different developmental stages.
  
  As for the suggestion that the plume topping the eve volcano triangle is generated because the TADs flanking the eve TAD share chromatin states and coalesce via some sort of phase separation:
  
  This model has been tested directly in Ke et al. (Ke et al. 2024). In Ke et al., we deleted the nhomie boundary and replaced it with either nhomie in the reverse orientation or homie in the forward orientation. According to the compartment model, changing the orientation of the boundaries so that the topology of the eve TAD changes from a stem-loop to a circle-loop should have absolutely no effect on the plume topping the eve volcano triangle. The TADs flanking the eve TAD would still be expected to share the same chromatin states and would still be able to coalesce via phase transition. However, this is not what is observed. The plume disappears and is replaced by “clouds” on both sides of the eve TAD. The clouds arise because the eve TAD bumps into the neighboring TADs when the topology is a circle-loop.
  
  We would also note that “compartment-level” interactions would not explain the findings presented in Muller at al. 1999, in Table 1 or in Author response image 4. It is clear that the long distant (Mb) interactions observed for Mcp, gypsy, Fab-7, homie, nhomie and the blue and purple boundaries in Author response image 4 arise by the physical pairing of TAD boundary elements. This fact is demonstrated directly by the MicroC experiments in Fig. 7 and Fig Supplemental 4 and 5, and by the MicroC and dHS-C experiments in Author response image 4. There is no evidence for any type of “compartment/phase separation” driving these specific boundary pairing interactions.
  
  In fact, given the involvement of TAD boundaries in meta-loop formation, one might begin to wonder whether some of the “compartment level interactions” are generated by the specific pairing of TAD boundary elements rather than by “shared chromatin” states. For example, the head-tohead pairing of the blue and purple boundaries generates a Lar meta-loop that has a circle-loop topology. As a consequence, sequences upstream of the blue and purple boundary come into contact, generating the small dark rectangular box on the upper left side of the contact map. Sequences downstream of the blue and purple boundary also come into contact, and this generates the larger rectangular box in the lower right side of the contact map. A new figure, Fig. 9, shows that the interaction pattern flips (lower left and top right) when the meta-loop has a stem-loop topology. If these meta-loops are visualized using larger bin sizes, the classic “compartment” patchwork pattern of interactions emerges. Would the precise patchwork pattern of “compartmental” interactions involving the four distant TADs that are linked in the two meta-loops shown in Fig. 9 persist as is if we deleted one of the TAD boundaries that forms the meta-loop? Would the precise patchwork pattern persist if we inverted one of the meta-loop boundaries so that we converted the topology of the loop from a circle-loop to a stem-loop or vice versa? We haven’t used MicroC to compare the compartment organization after deleting or inverting a meta-loop TAD boundary; however, a comparison of the MicroC pattern in WT in Fig. 1C with that for the homie transgenes in Fig. 7 and Figs. Supplemental 5, 6 and 7 indicates a) that novel patterns of TAD:TAD interactions are generated by this homie dependent mini-meta-loop and b) that the patterns of TAD:TAD interactions depend upon loop topology. Were these novel TAD:TAD interactions generated instead by compartment level interactions/shared chromatin states, they should be evident in WT as well (Fig. 1). They are not.
  
  How does intrachromosomal homolog pairing impact the models proposed in this manuscript (Abed et al. 2019; Erceg et al., 2019). Several papers recently have shown that somatic homolog pairing is not uniform and shows significant variation across the genome with evidence for both tight pairing regions and loose pairing regions. Might loose pairing interactions have the capacity to alter the cis configuration of the eve locus?
  
  (2.5) At this point it is not entirely clear how homolog pairing impacts the cis configuration/MicroC contact maps. We expect that homolog pairing is incomplete in the NC14 embryos we analyzed; however, since replication of eve and the local neighborhood is likely complete, sister chromosomes should be paired. So we are likely visualizing the 3D organization of paired TADs.
  
  In summary, the transgenic experiments are extensive and elegant and fully support the authors' models. However, in my opinion, they do not completely rule out additional models at play, including extrusion-based mechanisms. Indeed, my major issue is the limited conceptual advance in this manuscript. The authors essentially repeat many of their previous work and analyses.
  
  (2.6) In our view, the current paper makes a number of significant contributions that go well beyond those described in our 2016 publication. These are summarized below.
  
  A) While our 2016 paper used transgenes inserted in the -142 kb attP site to study pairing interactions of homie and nhomie, we didn’t either consider or discuss how our findings might bear on the loop extrusion model. However, since the loop extrusion model is currently accepted as established fact by many labs working on chromosome structure, it is critically important to devise experimental approaches which test the predictions of this particular model. One approach would be to deplete cohesin components; however, as discussed in #1.1, our experimental system is not ideal for this type of approach. On the other hand, there are other ways to test the extrusion model. Given the mechanism proposed for TAD formation—extruding a loop until cohesin bumps into CTCF/boundary road blocks—it follows that only two types of loop topologies are possible: stemloop and unanchored loop. The loop extrusion model, as currently conceived, can’t account for the two cases in this study in which the reporter on the wrong side of the homie boundary from the eve locus is activated by the eve enhancers. In contrast, our findings are completely consistent with orientation-specific boundary:boundary pairing.
  
  B) In the loop extrusion model, cohesin embraces both of the extruded chromatin fibers, transiently bringing them into close proximity. As far as we know, there have been no (high resolution) experiments that have actually detected these extruding cohesin complexes during TAD formation. In order to have a chance of observing the expected signatures of extruding cohesin complexes, one would need a system in which TADs are being formed. As described in the text, this is why we used MicroC to analyze TADs in NC14 embryos. We do not detect the signature stripes that would be predicted (see Figure Supp 2) by the current version of the loop extrusion model.
  
  C) Reporter expression in the different -142 kb transgenes provides only an indirect test of the loop extrusion and boundary:boundary pairing models for TAD formation. The reporter expression results need to be confirmed by directly analyzing the pattern of physical interactions in each instance. While we were able to detect contacts between the transgenes and eve in our 2016 paper, the 3C experiments provided no information beyond that. By contrast, the MicroC experiments in the current paper give high resolution maps of the physical contacts between the transgene and the eve TAD. The physical contacts track completely with reporter activity. Moreover, just as is the case for reporter activity, the observed physical interactions are inconsistent with the loop extrusion model.
  
  D) Genetic studies in Muller et al. (Muller et al. 1999) and imaging in Vazquez et al. (Vazquez et al. 2006) suggested that more than two boundaries can participate in pairing interactions. Consistent with these earlier observations, viewpoint analysis indicates the transgene homie interacts with both eve boundaries. While this could be explained by transgene homie alternating between nhomie and homie in the eve locus, this would require the remodeling of the eve TAD each time the pairing interaction switched between the three boundary elements. Moreover, two out of the three possible pairing combinations would disrupt the eve TAD, generating an unanchored loop (c.f., the lambda DNA TAD in Ke et al., (Ke et al. 2024)). However, the MicroC profile of the eve TAD is unaffected by transgenes carrying the homie boundary. This would suggest that like Mcp, the pairing interactions of homie and nhomie might not be exclusively pairwise. In this context is interesting to compare the contact profiles of the lar meta-loop shown in Author response image 4 with the different 142 kb homie inserts. Unlike the homie element at -142 kb, there is clearly only a single point of contact between the blue and purple boundaries.
  
  E) Chen et al. (Chen et al. 2018) used live imaging to link physical interactions between a homie containing transgene inserted at -142 kb and the eve locus to reporter activation by the eve enhancers. They found that the reporter was activated by the eve enhancers only when it was in “close proximity” to the eve gene. “Close proximity” in this case was 331 nM. This distance is equivalent to ~1.1 kb of linear duplex B form DNA, or ~30 nucleosome core particles lined up in a row. It would not be possible to ligate two DNAs wrapped around nucleosome core particles that are located 330 nM apart in a fixed matrix. Since our MicroC experiments were done on embryos in which the gene is silent in the vast majority of cells, it is possible that the homie transgene only comes into close enough proximity for transgene nucleosome: eve nucleosome ligation events when the eve gene is off. Alternatively, and clearly more likely, distance measurements using imaging procedures that require dozens of fluorescent probes may artificially inflate the distance between sequences that are actually close enough for enzymatic ligation.
  
  F) The findings reported in Goel et al. (Goel et al. 2023) indicate that mammalian TADs don’t require cohesin activity; however, the authors do not provide an alternative mechanism for TAD formation/stability. Here we have suggested a plausible mechanism.
  
  The authors make no attempt to dissect the mechanism of this process by modifying extrusion components directly.
  
  (2.7) See point #1.1
  
  Some discussion of Rollins et al. on the discovery of Nipped-B and its role in enhancer-promoter communication should also be made to reconcile their conclusions in the proposed absence of extrusion events.
  
  (2.8) The reason why reducing nipped-B activity enhances the phenotypic effects of gypsy-induced mutations is not known at this point; however, the findings reported in Rollins et al. (Rollins et al. 1999) would appear to argue against an extrusion mechanism for TAD formation.
  
  Given what we know about enhancer blocking and TADs, there are two plausible mechanisms for how the Su(Hw) element in the gypsy transposon blocks enhancer-promoter interactions in the gypsy-induced mutants studied by Rollins et al. First, the Su(Hw) element could generate two new TADs through pairing interactions with boundaries in the immediate neighborhood. This would place the enhancers in one TAD and the target gene in another TAD. Alternatively, the studies of Sigrist and Pirrotta (Sigrist and Pirrotta 1997) as well as several publications from Victor Corces’ lab raise the possibility that the Su(Hw) element in gypsy-induced mutations is pairing with gypsy transposons inserted elsewhere in the genome. This would also isolate enhancers from their target genes. In either case, the loss of nipped-B activity increases the mutagenic effects of Su(Hw) element presumably by strengthening its boundary function. If this is due to a failure to load cohesin on to chromatin, this would suggest that cohesin normally functions to weaken the boundary activity of the Su(Hw) element, i.e., disrupting the ability of Su(Hw) elements to interact with either other boundaries in the neighborhood or with themselves. Were this a general activity of cohesin (to weaken boundary activity), one would imagine that cohesin normally functions to disrupt TADs rather than generate/stabilize TADs.
  
  An alternative model is that Nipped-B (and thus cohesion) functions to stabilize enhancerpromoter interactions within TADs. In this case, loss of Nipped-B would result in a destabilization of the weak enhancer:promoter interactions that can still be formed when gypsy is located between the enhancer and promoter. In this model the loss of these weak interactions in nipped-b mutants would appear to increase the “blocking” activity of the gypsy element. However, this alternative model would also provide no support for the notion that Nipped-B and cohesin function to promote TAD formation.
  
  Reviewer #3 (Public Review):
  
  Bing et al. attempt to address fundamental mechanisms of TAD formation in Drosophila by analyzing gene expression and 3D conformation within the vicinity of the eve TAD after insertion of a transgene harboring a Homie insulator sequence 142 kb away in different orientations. These transgenes along with spatial gene expression analysis were previously published in Fujioka et al. 2016, and the underlying interpretations regarding resulting DNA configuration in this genomic region were also previously published. This manuscript repeats the expression analysis using smFISH probes in order to achieve more quantitative analysis, but the main results are the same as previously published. The only new data are the Micro-C and an additional modeling/analysis of what they refer to as the 'Z3' orientation of the transgenes. The rest of the manuscript merely synthesizes further interpretation with the goal of addressing whether loop extrusion may be occurring or if boundary:boundary pairing without loop extrusion is responsible for TAD formation. The authors conclude that their results are more consistent with boundary:boundary pairing and not loop extrusion; however, most of this imaging data seems to support both loop extrusion and the boundary:boundary models. This manuscript lacks support, especially new data, for its conclusions.
  
  (3.1) The new results/contributions of our paper are described in #2.6 above.
  
  Although there are (two) homie transgene configurations that give expression patterns that would be consistent with the loop extrusion model, that is not quite the same as strong evidence supporting loop extrusion. On the contrary, key aspects of the expression data are entirely inconsistent with loop extrusion, and they thus rule out the possibility that loop extrusion is sufficient to explain the results. Moreover, the conclusions drawn from the expression patterns of the four transgenes are back up by the MicroC contact profiles—profiles that are also not consistent with the loop extrusion model. Further, as documented above, loop extrusion is not only unable to explain the findings reported in this manuscript, but also the results from a large collection of published studies on fly boundaries. Since all of these boundaries function in TAD formation, there is little reason to think that loop extrusion makes a significant contribution at the TAD level in flies.   Given the results reported by Goel et al. (Goel et al. 2023), one might also have doubts about the role of loop extrusion in the formation/maintenance of mammalian TADs.
  
  To further document these points, we’ve included a new figure (Fig. 9) that shows two meta-loops. Like the loops seen for homie-containing transgenes inserted at -142 kb, meta-loops are formed by the pairing of distant fly boundaries. As only two boundaries are involved, the resulting loop topologies are simpler than those generated when transgene homie pairs with nhomie and homie in the eve locus. The meta-loop in panel B is a stem-loop. While a loop with this topology could be formed by loop extrusion, cohesion would have to break through dozens of intervening TAD boundaries and then somehow know to come to a halt at the blue boundary on the left and the purple boundary on the right. However, none of the mechanistic studies on either cohesin or the mammalian CTCF roadblocks have uncovered activities of either the cohesin complex or the CTCF roadblocks that could explain how cohesin would be able to extrude hundreds of kb and ignore dozens of intervening roadblocks, and then stop only when it encounters the two boundaries that form the beat-IV meta-loop. The meta-loop in panel A is even more problematic in that it is a circle-loop--a topology that can’t be generated by cohesin extruding a loop until comes into contact with CTCF roadblocks on the extruded strands.
  
  Furthermore, there are many parts of the manuscript that are difficult to follow. There are some minor errors in the labelling of the figures that if fixed would help elevate understanding. Lastly, there are several major points that if elaborated on, would potentially be helpful for the clarity of the manuscript.
  
  Major Points:
  
  (1) The authors suggest and attempt to visualize in the supplemental figures, that loop extrusion mechanisms would appear during crosslinking and show as vertical stripes in the micro-C data. In order to see stripes, a majority of the nuclei would need to undergo loop extrusion at the same rate, starting from exactly the same spots, and the loops would also have to be released and restarted at the same rate. If these patterns truly result from loop extrusion, the authors should provide experimental evidence from another organism undergoing loop extrusion.
  
  (3.2) We don’t know of any reports that actually document cohesion extrusion events that are forming TADs (TADs as defined in our paper, in the RCMC experiments of Goel et al. (Goel et al. 2023), in response #1.1, or in the high-resolution images from the MicroC data of Krietenstein et al (Krietenstein et al. 2020) and Hseih et al. (Hsieh et al. 2020). However, an extruding cohesin complex would be expected to generate stripes because it transiently brings together the two chromatin strands as illustrated by the broken zipper in Figure Supplemental 2 of our paper. While stripes generated by cohesin forming a TAD have not to our knowledge ever been observed, Fig. 4 in Goel et al. (Goel et al. 2023)) shows 45o stripes outlining TADs and connecting neighboring TADs. These stripes are visible with or without Rad21.
  
  In some versions of the loop extrusion model, cohesin extrudes a loop until it comes to a halt at both boundaries, where it then remains holding the loop together. In this model, the extrusion event would occur only once per cell cycle. This is reason we selected NC14 embryos as this point in development should provide by far the best opportunity to visualize cohesin-dependent TAD formation. However, the expected stripes generated by cohesin embrace of both strands of the extruding loop were not evident. Other newer versions of the loop extrusion model are much more dynamic—cohesin extrudes the loop, coming to a halt at the two boundaries, but either doesn’t remain stably bound or breaks through one or both boundaries. In the former case, the TAD needs to be reestablished by another extrusion event, while in the latter case LDC domains are generated. In this dynamic model, we should also be able to observe vertical and 45o stripes (or stripes leaning to one side or another of the loading site if the extrusion rates aren’t equal on both fibers) in NC14 embryos corresponding to the formation of TADs and LDC domains. However, we don’t.
  
  (2) On lines 311-314, the authors discuss that stem-loops generated by cohesin extrusion would possibly be expected to have more next-next-door neighbor contacts than next-door neighbor contacts and site their models in Figure 1. Based on the boundary:boundary pairing models in the same figure would the stem-loops created by head-to-tail pairing also have the same phenotype? Making possible enrichment of next-next-door neighbor contacts possible in both situations? The concepts in the text are not clear, and the diagrams are not well-labeled relative to the two models.
  
  (3.3) Yes, we expect that stem-loops formed by cohesin extrusion or head-to-tail pairing would behave in a similar manner. They could be stem-loops separated by unanchored loops as shown in Fig. 1B and E. Alternatively, adjacent loops could be anchored to each other (by cohesin/CTCF road blocks or by pairing interactions) as indicated in Fig. 1C and F. In stem-loops generated either by cohesin extrusion or by head-to-tail pairing, next-next door neighbors should interact with each other, generating a plume above the volcano triangle. In the case of circle-loops, the volcano triangle should be flanked by clouds that are generated when the TAD bumps into both next-door neighbors. In the accompanying paper, we test this idea by deleting the nhomie boundary and then a) inserting nhomie back in the reverse orientation, or b) by inserting homie in the forward orientation. The MicroC patterns fit with the predictions that were made in this paper.
  
  (3) The authors appear to cite Chen et al., 2018 as a reference for the location of these transgenes being 700nM away in a majority of the nuclei. However, the exact transgenes in this manuscript do not appear to have been measured for distance. The authors could do this experiment and include expression measurements.
  
  (3.4) The transgenes used in Chen et al. are modified versions of a transgene used in Fujioka et al. (2016) inserted into the same attP site. When we visualize reporter transcription in NC14 embryos driven by the eve enhancers using smFISH, HCR-FISH or DIG, only a subset of the nuclei at this stage are active. The number of active nuclei we detect is similar to that observed in the live imaging experiments of Chen et al. The reason we cited Chen et al. (Chen et al. 2018) was that they found that proximity was a critical factor in determining whether the reporter was activated or not in a given nucleus. The actual distance they measured wasn’t important. Moreover, as we discussed in response #2.6 above, there are good reasons to think that the “precise” distances measured in live imaging experiments like those used in Chen et al. are incorrect. However, their statements are certainly correct if one considers that a distance of ~700 nM or so is “more distant” relative to a distance of ~300 nM or so, which is “closer.”
  
  (4) The authors discuss the possible importance of CTCF orientation in forming the roadblock to cohesin extrusion and discuss that Homie orientation in the transgene may impact Homie function as an effective roadblock. However, the Homie region inserted in the transgene does not contain the CTCF motif. Can the authors elaborate on why they feel the orientation of Homie is important in its ability to function as a roadblock if the CTCF motif is not present? Trans-acting factors responsible for Homie function have not been identified and this point is not discussed in the manuscript.
  
  We discussed the “importance” of CTCF orientation in forming roadblocks because one popular version of the cohesin loop extrusion/CTCF roadblock model postulates that CTCF must be oriented so that the N-terminus of the protein is facing towards the oncoming cohesin complex, otherwise it won’t be able to halt extrusion on that strand. When homie in the transgene is pointing towards the eve locus, the reporter on the other side (farther from eve) is activated by the eve enhancers. One possible way to explain this finding (if one believes the loop extrusion model) is that when homie is inverted, it can’t stop the oncoming cohesin complex, and it runs past the homie boundary until it comes to a stop at a properly oriented boundary farther away. In this case, the newly formed loop would extend from the boundary that stopped cohesin to the homie boundary in the eve locus, and would include not only the distal reporter, but also the proximal reporter. If both reporters are in the same loop with the eve enhancers (which they would have to be given the mechanism of TAD formation by loop extrusion), both reporters should be activated. They are not.
  
  For the boundary pairing model, the reporter that will be activated will depend upon the orientation of the pairing interaction—which can be either head-to-head or head-to-tail (or both: see discussion of LBC elements in #2.1). For an easy visualization of how the orientation of pairing interactions is connected to the patterns of interactions between sequences neighboring the boundary, please look at Fig. 9. This figure shows two different meta-loops. In panel A, head-tohead pairing of the blue and purple boundaries brings together, on the one hand, sequences upstream of the blue and purple boundary, and on the other hand, sequences downstream of the blue and purple boundaries. In the circle loop configuration, the resulting rectangular boxes of enhanced contact are located in the upper left and lower right of the contact map. In panel B, the head-to-tail pairing of the blue and purple boundary changes how sequences upstream and downstream of the blue and purple boundaries interact with each other. Sequences upstream of the blue boundary interact with sequences downstream of the purple boundary, and this gives the rectangular box of enhanced interactions on the top right. Sequences downstream of the blue boundary interact with sequences upstream of the purple boundary, and this gives the rectangular box of enhanced contact on the lower left.
  
  CTCF: Our analysis of the homie boundary suggests that CTCF contributes little to its activity. It has an Su(Hw) recognition sequence and a CP190 “associated” sequence. Mutations in both compromise boundary activity (blocking and -142 kb pairing). Gel shift experiments and ChIP data indicate there are half a dozen or more additional proteins that associate with the 300 bp homie fragment used in our experiments.
  
  Orientation of CTCF or other protein binding sites: The available evidence suggests that orientation of the individual binding sites is not important (Kyrchanova et al. 2016; Lim et al. 2018)). Instead, it is likely that the order of binding sites affects function.
  
  (5) The imaging results seem to be consistent with both boundary:boundary interaction and loop extrusion stem looping.
  
  It is not clear whether the reviewer is referring to the different patterns of reporter expression— which clearly don’t fit with the loop extrusion model in the key cases that distinguish the two models—or the live imaging experiments in Chen et al. (Chen et al. 2018).
  
  (6) The authors suggest that the eveMa TAD could only be formed by extrusion after the breakthrough of Nhomie and several other roadblocks. Additionally, the overall long-range interactions with Nhomie appear to be less than the interactions with endogenous Homie (Figures 7, 8, and supplemental 5). Is it possible that in some cases boundary:boundary pairing is occurring between only the transgenic Homie and endogenous Homie and not including Nhomie?
  
  Yes, it is possible. On the other hand, the data that are currently available supports the idea that transgene homie usually interacts with endogenous homie and nhomie at the same time. This is discussed in #2.6D above. The viewpoints indicate that crosslinking occurs more frequently to homie than to nhomie. This could indicate that when there are only pairwise interactions, these tend to be between homie and homie. Alternatively, this could also be explained by a difference in relative crosslinking efficiency.
  
  (7) In Figure 4E, the GFP hebe expression shown in the LhomieG Z5 transgenic embryo does not appear in the same locations as the LlambdaG Z5 control. Is this actually hebe expression or just a background signal?
  
  The late-stage embryos shown in E are oriented differently. For GlambdaL, the embryo is oriented so that hebe-like reporter expression on the ventral midline is readily evident. However, this orientation is not suitable for visualizing eve enhancer-dependent expression of the reporters in muscle progenitor cells. For this reason, the 12-16 hr GeimohL embryo in E is turned so that the ventral midline isn’t readily visible in most of the embryo. As is the case in NC14 embyros, the eve enhancers drive lacZ but not gfp expression in the muscle progenitor cells.
  
  (8) Figure 6- The LhomieG Z3 (LeimohG) late-stage embryo appears to be showing the ventral orientation of the embryo rather than the lateral side of the embryo as was shown in the previous figure. Is this for a reason? Additionally, there are no statistics shown for the Z3 transgenic images.
  
  Were these images analyzed in the same way as the Z5 line images?
  
  The LeimohG embryo was turned so that the hebe enhancer-dependent expression of lacZ is visible. While the eve enhancer-dependent expression of lacZ in the muscle progenitor cells isn’t visible with this orientation, eve enhancer-dependent expression in the anal plate is.
  
  (9) Do the Micro-C data align with the developmental time points used in the smFISH probe assays?
  
  The MicroC data aligns with the smFISH images of older embryos: 12-14 hour embryos or stages 14-16.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  This was a difficult paper to review. It took me several hours to understand the terminology and back and forth between different figures to put it together. It might be useful to put the loop models next to the MicroC results and have a cartoon way of incorporating which enhancers are turning on which reporters.
  
  I also found the supercoiled TAD models in Figure 1 not useful. These plectoneme-type of structures likely do not exist, based on the single-cell chromosome tracing studies, and the HiC structures not showing perpendicular to diagonal interactions between the arms of the plectonemes.
  
  We wanted to represent the TAD as a coiled 30nM fiber, as they are not likely to resemble the large loops like those shown in Fig. 1 A, D, and G.
  
  There are no stripes emerging from homies, which is consistent with the pairing model, but there seem to be stripes from the eve promoter. I think these structures may be a result of both the underlying loop extruders + pairing elements.
  
  There are internal structures in the eve TAD that link the upstream region of the eve promoter to the eve PRE and sequences in nhomie. All three of these sequences are bound by LBC. Each of the regulatory domains in BX-C also have LBC elements and, as shown in Author response image 1, you can see stripes connecting some of these LBC elements to each other. Since the stripes that Goel et al. (Goel et al. 2023) observed in their RCMC analysis of Ppm1g didn’t require cohesin, how these stripes are generated (active: e.g, a chromatin remodeler or passive: e.g., the LBC complex has non-specific DNA binding activity that can be readily crosslinked as the chromatin fiber slides past) isn’t clear.
  
  The authors say there are no TADs that have "volcano plumes" but the leftmost TAD TA appears to have one. What are the criteria for calling the plumes? I am also not clear why there is a stripe off the eve volcano. It looks like homie is making a "stripe" loop extrusion type of interaction with the next TAD up. Is this maybe cohesin sliding off the left boundary?
  
  The reviewer is correct, the left-most TAD TA appears to have a plume. We mentioned TA seems to have a plume in the original text, but it was inadvertently edited out.
  
  Two different types of TADßàTAD interactions are observed. In the case of eve, the TADs to either side of eve interact more frequently with each other than they do with eve. This generates a “plume” above the eve volcano triangle. The TADs that comprise the Abd-B regulatory domains (see Author response image 1) are surrounded by clouds of diminishing intensity. Clouds at the first level represent interactions with both next-door neighbors; clouds at the second level represent interactions with both next-next-door neighbors; clouds at the third level represent interactions with next-next-next door neighbors. The Abd-B TADs are close to the same size, so that interactions with neighbors are relatively simple. However, this is not always the case. When there are smaller TADs near larger TADs the pattern of interaction can be quite complicated. An example is indicated by the red bar in Author response image 2
  
  The authors state "In the loop-extrusion model, a cohesin complex initiating loop extrusion in the eve TAD must break through the nhomie roadblock at the upstream end of the eve TAD. It must then make its way past the boundaries that separate eve from the attP site in the hebe gene, and come to a halt at the homie boundary associated with the lacZ reporter." Having multiple loops formed by cohesin would also bring in the 142kb apart reporter and homie. Does cohesin make 140 kb long loops in flies?
  
  A mechanism in which cohesin brings the reporter close to the eve TAD by generating many smaller loops (which would be the intervening TADs) was discussed in #1.2.
  
  Figure 5 title mistakes the transgene used?
  
  Fixed.
  
  In figure 6, the orientation of the embryos does not look the same for the late-stage panels. So it was difficult to tell if the eve enhancer was turning the reporter on.
  
  Here we were focusing mainly on the AP enhancer activation of the reporter, as this is most easily visualized. It should be clear from the images that the appropriate reporter is activated by the AP enhancer for each of the transgene inserts.
  
  It is not clear to me why the GFP makes upstream interactions (from the 4C viewpoint) in GhomileLZ5 but not in LhomieGZ5? Corresponding interactions for Fig Supp 5 & 6 are not the same. That is, LacZ in the same place and with the same homie orientation does not show a similar upstream enrichment as the GFP reporter does.
  
  We are uncertain as to whether we understand this question/comment. In GhomieLZ5 (now GhomieL, the lacZ reporter is on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary. Since homie is pointing away from gfp, pairing interactions with homie and nhomie in the eve locus bring the eve enhancers in close proximity with the gfp reporter. This is what is seen in Fig. 7 panel D—lower trace. In LhomieGZ5 (now GeimohL) the lacZ reporter is again on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary. However, in this case homie is inverted so that it is points away from lacZ (towards gfp). In this orientation, pairing brings the lacZ reporter into contact with the eve enhancers. This is what is seen in the upper trace in Fig. 7 panel D.
  
  The orientation of the transgene is switch in Fig. Supp 5 and 6. For these “Z3) transgenes (now called LeimohG and LhomieG the gfp reporter is on the eve side of homie while the lacZ reporter is on the hebe enhancer side of homie. The interactions between the reporters and eve are determined by the orientation of homie in the transgene. When homie is pointing away from gfp (as in LeimohG), gfp is activated and that is reflected in the trace in Supp Fig. 5. When homie is pointing away from lacZ, lacZ is activated and this is reflected (though not as cleanly as in other cases) in the trace in Supp Fig. 6.
  
  I did not see a data availability statement. Is the data publicly available? The authors also should consider providing the sequences of the insertions, or provide the edited genomes, in case other researchers would like to analyze the data.
  
  Data have been deposited.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Minor Points:
  
  (1) There is an inconsistency in the way that some of the citations are formatted. Some citations have 'et al' italicized while others do not. It seems to be the same ones throughout the manuscript. Some examples: Chetverina et al 2017, Chetverina et al 2014, Cavalheiro et al 2021, Kyrchanova et al 2008a, Muravyova et al 2001.
  
  Fixed
  
  (2) Pita is listed twice in line 48.
  
  Fixed
  
  (3) Line 49, mod(mdg4)67.2 is written just as mod(mdg4). The isoform should be indicated.
  
  This refers to all Mod isoforms.
  
  (4) Homie and Nhomie are italicized throughout the manuscript and do not need to be.
  
  This is the convention used previously.
  
  (5) The supplemental figure captions 1 and 2 in the main document are ordered differently than in the supplemental figures file. This caused it to look like the figures are being incorrectly cited in lines 212-214 and 231-232.
  
  Fixed
  
  (6) Is the correct figure being cited in line 388-389? The line cites Figure 6E when mentioning LlambdaG Z5; however, LlambdaG Z5 is not shown in Figure 6.
  
  Fixed
  
  (7) Section heading 'LhomieG Z5 and GhomieL Z5' could be renamed for clarity. GhomieL Z5 results are not mentioned until the next section, named 'GhomieL Z5'.
  
  Fixed
  
  (8) Can the authors provide better labeling for control hebe expression? This would help to determine what is hebe expression and what is background noise in some of the embryos in Figures 4-6.
  
  Author response image 5 shows expression of the lacZ reporter in GeimohL and GlambdaL. For the GlambdaL transgene, the hebe enhancers drive lacZ expression in 1216 hr embryos. Note that lacZ expression is restricted to a small set of quite distinctive cells along the ventral midline. lacZ is also expressed on the ventral side of the GeimohL embryo (top panel). However, their locations are quite different from those of the lacZ positive cells in the GlambdaL transgene embryo. These cells are displaced from the midline, and are arranged as pairs of cells in each hemisegment, locations that correspond to eve-expressing cells in the ventral nerve cord. The eve enhancers also drive lacZ expression elsewhere in the GeimohL embryo, including the anal plate and dorsal muscle progenitor cells (seen most clearly in the lower left panel).
  
  Author response image 5.
  
  lacZ expression in Giemohl and Glambdal embryos
  
  (9) The Figure 5 title is labeled with the wrong transgene.
  
  Fixed
  
  (10) Heat map scales are missing for Figures 7, supplemental 5, and supplemental 6.
  
  Fixed
  
  (11) Did the authors check if there was a significant difference in the expression of GFP and lacZ from lambda control lines to the Homie transgenic lines?
  
  Yes. Statistical analysis added in Table Supplemental #1
  
  (12) The Figure 7 title references that these are Z3 orientations, however, it is Z5 orientations being shown.
  
  Fixed
  
  (13) The virtual 4C data should include an axis along the bottom of the graphs for better clarity. An axis is missing in all 4C figures.
  
  References:
  
  Bantignies F, Grimaud C, Lavrov S, Gabut M, Cavalli G. 2003. Inheritance of polycomb-dependent chromosomal interactions in drosophila. Genes Dev. 17(19):2406-2420.
  
  Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.
  
  Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zinc-fingerassociated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.
  
  Bonchuk AN, Boyko KM, Nikolaeva AY, Burtseva AD, Popov VO, Georgiev PG. 2022. Structural insights into highly similar spatial organization of zinc-finger associated domains with a very low sequence similarity. Structure. 30(7):1004-1015.e1004.
  
  Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.
  
  Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):4758.
  
  Foe VE. 1989. Mitotic domains reveal early commitment of cells in drosophila embryos. Development. 107(1):1-22.
  
  Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.
  
  Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis-regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.
  
  Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.
  
  Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539553.e538.
  
  Ke W, Fujioka M, Schedl P, Jaynes JB. 2024. Chromosome structure ii: Stem-loops and circle-loops. eLife.
  
  Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.
  
  Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.
  
  Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442.
  
  Kyrchanova O, Mogila V, Wolle D, Deshpande G, Parshikov A, Cleard F, Karch F, Schedl P, Georgiev P. 2016. Functional dissection of the blocking and bypass activities of the fab-8 boundary in the drosophila bithorax complex. PLoS Genet. 12(7):e1006188.
  
  Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P.
  
  2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.
  
  Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.
  
  Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.
  
  Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.
  
  Lim B, Heist T, Levine M, Fukaya T. 2018. Visualization of transvection in living drosophila embryos. Mol Cell. 70(2):287-296. e286.
  
  Link N, Kurtz P, O'Neal M, Garcia-Hughes G, Abrams JM. 2013. A p53 enhancer region regulates target genes through chromatin conformations in cis and in trans. Genes Dev. 27(22):24332438.
  
  Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.
  
  Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.
  
  Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.
  
  Rollins RA, Morcillo P, Dorsett D. 1999. Nipped-b, a drosophila homologue of chromosomal adherins, participates in activation by remote enhancers in the cut and ultrabithorax genes. Genetics. 152(2):577-593.
  
  Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.
  
  Shermoen AW, McCleland ML, O'Farrell PH. 2010. Developmental control of late replication and s phase length. Curr Biol. 20(23):2067-2077.
  
  Shidlovskii YV, Bylino OV, Shaposhnikov AV, Kachaev ZM, Lebedeva LA, Kolesnik VV, Amendola D, De Simone G, Formicola N, Schedl P et al. 2021. Subunits of the pbap chromatin remodeler are capable of mediating enhancer-driven transcription in drosophila. Int J Mol Sci. 22(6).
  
  Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.
  
  Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.
  
  Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.
  
  Wolle D, Cleard F, Aoki T, Deshpande G, Schedl P, Karch F. 2015. Functional requirements for fab-7 boundary activity in the bithorax complex. Mol Cell Biol. 35(21):3739-3752.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.17.567501v2
www.biorxiv.org www.biorxiv.org

Sensory-memory interactions via modular structure explain errors in visual working memory

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  In this important paper, the authors propose a computational model for understanding how the dynamics of neural representations may lead to specific patterns of errors as observed in working memory tasks. The paper provides solid evidence showing how a two-area model of sensory-memory interactions can account for the error patterns reported in orientation estimation tasks with delays. By integrating ideas from efficient coding and attractor networks, the resulting theoretical framework is appealing, and nicely captures some basic patterns of behavior data and the distributed nature of memory representation as reported in prior neurophysiological studies. The paper can be strengthened if (i) further analyses are conducted to deepen our understanding of the circuit mechanisms underlying the behavior effects; (ii) the necessity of the two-area network model is better justified; (iii) the nuanced aspects of the behavior that are not captured by the current model are discussed in more detail.
  
  We thank the Editors and Reviewers for their constructive comments. In response to the suggestions provided, we have implemented the following revisions:
  
  - Clarified the origin of the specific pattern of diffusion: We showed that variance patterns remain consistent across different noise types or levels in new Figure 5 – Figure supplement 2 and Figure 9 – Figure supplement 1 (uniform Gaussian noise with varying strengths). This is connected to the representation geometry induced by heterogeneous connections (Eq. 21).
  
  - Provided an intuitive explanation of the two-module network’s advantages: Additional simulations demonstrated that heterogeneity degree of sensory connections and intermodal connection strengths affect drift and diffusion terms differently (new Figure 6). This endows an extra degree of freedom in controlling heterogeneity in drift and diffusion terms in the two-module network (new Figure 9).
  
  - Addressed a limitation and future directions in the Discussion: Our study is limited to the dynamic evolution of memory representation for a single orientation stimulus and its associated error patterns. We acknowledge the need for further investigation to capture nuanced error patterns in broader experimental settings, such as changes in error patterns for varying stimulus presentation durations in perception tasks. We have discussed potential extensions, such as incorporating more biologically plausible baseline activities, external noise, or variations of loss functions.
  
  Additionally, we showed consistent error patterns when decoded from activities of the sensory module (Figure 4 – Figure supplement 1), and incorrect error patterns with autapses in the sensory module (Figure 7 – Figure supplement 2). Below, we have reorganized each Reviewer’s comments and separately addressed them. All changes were shown in red in the manuscript submitted as Related Manuscript File.
  
  Reviewer #1:
  
  Summary:
  
  Working memory is imperfect - memories accrue errors over time and are biased towards certain identities. For example, previous work has shown memory for orientation is more accurate near the cardinal directions (i.e., variance in responses is smaller for horizontal and vertical stimuli) while being biased towards diagonal orientations (i.e., there is a repulsive bias away from horizontal and vertical stimuli). The magnitude of errors and biases increase the longer an item is held in working memory and when more items are held in working memory (i.e., working memory load is higher). Previous work has argued that biases and errors could be explained by increased perceptual acuity at cardinal directions. However, these models are constrained to sensory perception and do not explain how biases and errors increase over time in memory. The current manuscript builds on this work to show how a two-layer neural network could integrate errors and biases over a memory delay. In brief, the model includes a 'sensory' layer with heterogenous connections that lead to the repulsive bias and decreased error in the cardinal directions. This layer is then reciprocally connected with a classic ring attractor layer. Through their reciprocal interactions, the biases in the sensory layer are constantly integrated into the representation in memory. In this way, the model captures the distribution of biases and errors for different orientations that have been seen in behavior and their increasing magnitude with time. The authors compare the two-layer network to a simpler one-network model, showing that the one-model network is harder to tune and shows an attractive bias for memories that have lower error (which is incompatible with empirical results).
  
  Strengths:
  
  The manuscript provides a nice review of the dynamics of items in working memory, showing how errors and biases differ across stimulus space. The two-layer neural network model is able to capture the behavioral effects as well as relate to neurophysiological observations that memory representations are distributed across the sensory cortex and prefrontal cortex.
  
  The authors use multiple approaches to understand how the network produces the observed results. For example, analyzing the dynamics of memories in the low-dimensional representational space of the networks provides the reader with an intuition for the observed effects.
  
  As a point of comparison with the two-layer network, the authors construct a heterogenous one-layer network (analogous to a single memory network with embedded biases). They argue that such a network is incapable of capturing the observed behavioral effects but could potentially explain biases and noise levels in other sensory domains where attractive biases have lower errors (e.g., color).
  
  The authors show how changes in the strength of Hebbian learning of excitatory and inhibitory synapses can change network behavior. This argues for relatively stronger learning in inhibitory synapses, an interesting prediction.
  
  The manuscript is well-written. In particular, the figures are well done and nicely schematize the model and the results.
  
  Overall:
  
  Overall, the manuscript was successful in building a model that captured the biases and noise observed in working memory. This work complements previous studies that have viewed these effects through the lens of optimal coding, extending these models to explain the effects of time in memory. In addition, the two-layer network architecture extends previous work with similar architectures, adding further support to the distributed nature of working memory representations.
  
  We appreciate the reviewer’s comments that the work successfully explains error patterns of working memory, extends previous models of optimal coding to include temporal effects, and supports the distributed nature of working memory representations. Below, we address the specific concerns of the reviewer.
  
  Weaknesses:
  
  Despite its strengths, the manuscript does have some weaknesses.
  
  Major Point 1: First, as far as we can tell, behavioral data is only presented in schematic form. This means some of the nuances of the effects are lost. It also means that the model is not directly capturing behavioral effects. Therefore, while providing insight into the general phenomenon, the current manuscript may be missing some important aspects of the data.
  
  Relatedly, the models are not directly fit to behavioral data. This makes it hard for the authors to exclude the possibility that there is a single network model that could capture the behavioral effects. In other words, it is hard to support the authors' conclusion that "....these evolving errors...require network interaction between two distinct modules." (from the abstract, but similar comments are made throughout the manuscript). Such a strong claim needs stronger evidence than what is presented. Fitting to behavioral data could allow the authors to explore the full parameter space for both the one-layer and two-layer network architectures.
  
  In addition, directly comparing the ability of different model architectures to fit behavioral data would allow for quantitative comparison between models. Such quantitative comparisons are currently missing from the manuscript.
  
  We agree with the reviewer that incorporating quantitative comparisons to the data will strengthen our results. However, we note the limitations in fitting network models to behavior data. Previous studies employed drift-diffusion models to fit error patterns observed in visual working memory tasks (Panichello, DePasquale et al. 2019, Gu, Lee et al. 2023). In contrast to these phenomenological models, network models have more parameters that can cause overfitting. Consequently, we focused on comparing the qualitative differences between onemodule and two-module networks, examining whether each network can generate the correct shape of bias and variance patterns. In response to the reviewers’ suggestions, we have revised the manuscript to reinforce our claim by providing an intuitive explanation of the qualitative differences between these two models (see response to your Major Point 3) and conducting additional simulations to support our claim that error patterns are consistent under different noise types or levels (see responses to Major Points 2 of Reviewer 2, and Minor point 1 of Reviewer 3).
  
  Major Point 2: To help broaden the impact of the paper, it would be helpful if the authors provided insight into how the observed behavioral biases and/or network structures influence cognition. For example, previous work has argued that biases may counteract noise, leading to decreased variance at certain locations. Is there a similar normative explanation for why the brain would have repulsive biases away from commonly occurring stimuli? Are they simply a consequence of improved memory accuracy? Why isn't this seen for all stimulus domains?
  
  Previous work has found both diffusive noise and biases increase with the number of items in working memory. It isn't clear how the current model would capture these effects. The authors do note this limitation in the Discussion, but it remains unclear how the current model can be generalized to a multi-item case.
  
  As pointed by the reviewer, attractors counteract noise and lead to reduced variance around the attracting locations. However, most attractor models reporting such effects did not consider the interaction of attractor dynamics with the sensory network. For the repulsive biases considered here, previous studies on the sensory stage have theoretically demonstrated that they could lower the discrimination threshold around cardinal orientations (e.g., see Wei and Stocker, 2017). In Wei and Stocker (2017), the authors showed that this relationship between bias and discrimination threshold was observed across many stimulus modalities. In the present study, we demonstrated that the bias and variability patterns naturally emerged from the underlying neural dynamics. Nonetheless, we also noted that color working memory shows attractive biases, which necessitates further study of the underlying neural mechanisms of color perception. A plausible explanation is that the categorical effect dominates color perception and memory processes, as suggested by existing modelling work (Tajima et al., 2016).
  
  However, we do note the limitation of our current work that does not capture nuanced error patterns in broader experimental settings, such as variation of perception tasks or memory of multiple items. For instance, while shorter stimulus presentations with no explicit delay lead to larger biases experimentally, our current model, which starts activities from a flat baseline, shows an increase in bias throughout the stimulus presentation. Additionally, the error variance during stimulus presentation is almost negligible compared to that during the delay period, as the external input overwhelms the internal noise. These mismatches during stimulus presentation have minimal impact on activities during the delay period when the internal dynamics dominate. Nonetheless, the model needs further refinement to accurately reproduce activities during stimulus presentation, possibly by incorporating more biologically plausible baseline activities. Also, a recent Bayesian perception model suggested different types of noise like external noise or variations in loss functions that adjust tolerance to small errors may help explain various error patterns observed across different modalities (Hahn and Wei, 2024). Even for memories involving multiple items, noise can be critical in determining error patterns, as encoding more items might be equivalent to higher noise for each individual item (Chunharas, Rademaker et al. 2022).
  
  To make this limitation clear, we included the above response in a new paragraph on limitations and future directions in the Discussion (2nd paragraph in p. 11). Also, we modified the text that previously described that our model can “explain error patterns in both perception and working memory tasks” in p. 3 and p. 5 as
  
  “explain error patterns in working memory tasks that are similar to those observed in perception tasks.”
  
  And we added the bias and variance pattern right after the stimulus offset in Figure 4C,D with the following note in p. 6:
  
  “Note that the variance of errors is nearly zero during stimulus presentation because the external input overwhelms internal noise, which does not fully account for the variability observed during perception tasks (see Discussion).”
  
  Major Point 3: The role of the ring attractor memory network isn't completely clear. There is noise added in this stage, but how is this different from the noise added at the sensory stage? Shouldn't these be additive? Is the noise necessary?
  
  Similarly, it isn't clear whether the memory network is necessary - can it be replaced by autapses (self-connections) in the sensory network to stabilize its representation? In short, it would be helpful for the authors to provide an intuition for why the addition of the memory network facilitates the repulsive bias.
  
  Internal noise in the circuits is necessary to replicate the variability of the readout in estimating the stimulus because our model did not incorporate external noise (i.e., noise associated with the stimulus). We note the distinct noise implementation in both extension of the previous Bayesian model (Fig. 2) and the network models (Fig. 3 and beyond). In Fig. 2, we followed previous studies by employing static tuning curves for the sensory module and Poisson noise to account for variability in the perception stage. In the memory stage, sensory output undergoes the addition of constant Gaussian noise, replicating the diffusion process along the memory manifolds as shown in traditional memory network models. In the network models, we do consider the same noise in both sensory and memory modules, subjecting all units to Poisson noise to simulate neuronal spiking variability. In the network models, the two modules dynamically interact, which warp the energy landscape and generate uneven noise coefficients along the memory manifold, reminiscent of the conditions shown in Fig. 1.
  
  From the bias and variance patterns, we can infer two requirements the network to fulfill – one is efficient coding suggested by sensory perception stage and the other is memory maintenance. The former is achieved by realizing the previous Bayesian models in the sensory networks with specific heterogeneous connections. In our work, the latter is achieved by strong recurrent connections to sustain persistent activity during the delay period. On the other hand, as the reviewer noted, memory can be maintained through autapses in the sensory network, which is equivalent to elongating intrinsic time constants of individual units (Seung, Lee et al. 2000). We simulated such sensory network and showed the results in Figure 7 – Figure Supplement 2. As shown in the figure, a larger time constant also slows down the increase in bias significantly, which can be deduced from Eq. 20.
  
  When memory is maintained through strong recurrent connections, there are two possible scenarios, one-module network combining both efficient coding and memory maintenance (Fig. 8), or two-module network satisfying each condition in different modules (Fig. 7). In both networks, heterogeneous connections achieving efficient coding shape drift and diffusion dynamics similarly as illustrated in Figure 9 (previous Figure 7 – Supplement 1). Discrete attractors are formed near oblique orientations, inducing an increase of repulsive bias during the delay period. Also, noise coefficient is lowest at cardinal orientations. However, there is a difference in the asymmetry degrees of the drift and diffusion at cardinal and oblique orientations the one-module network shows larger asymmetry in potential energy, while the two-module network shows larger asymmetry in the noise coefficient. These varying degrees of heterogeneity in drift and diffusion lead to qualitative differences in bias and variance patterns in estimation. Shallower potential differences with more asymmetrical noise coefficients result in correct bias and variance patterns in the two-module network, while the opposite leads to flipped variance patterns in the one-module network.
  
  An intuitive explanation of how connectivity heterogeneity differentially affects the asymmetry degrees of drift and diffusion in one-module and two-module networks is detailed in our response to Major Point 3 of Reviewer 2. In summary, separating the memory module from the sensory module imposes an additional degree of freedom, allowing for more flexible control over drift and diffusion, thereby bias and variance patterns. To clarify this, we have added simulations in Figure 6 and Figure 9 and provided an intuitive explanation in the accompanying texts in pp. 6-7 and p. 9.
  
  Minor Point 1: The code is stated to be available on GitHub, but I could not access it.
  
  Thank you for pointing it out. The repository is now publicly available.
  
  Minor Point 2: The legend for late/mid/early is in an odd place in Figure 1, as it is in panel E where you can't see the difference between the lines. We would suggest moving this to another panel where the different time points are clear. In general, we would suggest adding more text (legends and titles) to the figure to help the reader understand the figures without having to refer to the details in the text and/or figure legends.
  
  We have now moved the legend to panel B where late/mid/early is first introduced. Also, we added more text to the figure legend (Figure 3,4,5,8).
  
  Minor Point 3: The last line of the first paragraph of the Introduction ends awkwardly. I assume it's referring to indirect evidence for dynamics in memory?
  
  Thank you. We have modified the sentence as follows:
  
  “For instance, biases of errors, the systematic deviation from the original stimuli, observed in estimation tasks have been used as indirect evidence to infer changes in internal representations of stimuli.”
  
  Minor Point 4: Similarly, the first line of the second paragraph of the Introduction was also awkward. Specifically, the clause "..., such as nonuniform stimulus distribution in nature." Seems to be missing a 'the' before 'nonuniform'.
  
  We have modified the sentence as follows:
  
  “One important source of biases is adaptation to environmental statistics, such as the nonuniform stimulus distribution found in nature or the limited range in specific settings.”
  
  Reviewer #2:
  
  In this manuscript, Yang et al. present a modeling framework to understand the pattern of response biases and variance observed in delayed-response orientation estimation tasks. They combine a series of modeling approaches to show that coupled sensory-memory networks are in a better position than single-area models to support experimentally observed delay-dependent response bias and variance in cardinal compared to oblique orientations. These errors can emerge from a population-code approach that implements efficient coding and Bayesian inference principles and is coupled to a memory module that introduces random maintenance errors. A biological implementation of such operation is found when coupling two neural network modules, a sensory module with connectivity inhomogeneities that reflect environment priors, and a memory module with strong homogeneous connectivity that sustains continuous ring attractor function. Comparison with single-network solutions that combine both connectivity inhomogeneities and memory attractors shows that two-area models can more easily reproduce the patterns of errors observed experimentally. This, the authors take as evidence that a sensory-memory network is necessary, but I am not convinced about the evidence in support of this "necessity" condition. A more in-depth understanding of the mechanisms operating in these models would be necessary to make this point clear.
  
  Strengths:
  
  The model provides an integration of two modeling approaches to the computational bases of behavioral biases: one based on Bayesian and efficient coding principles, and one based on attractor dynamics. These two perspectives are not usually integrated consistently in existing studies, which this manuscript beautifully achieves. This is a conceptual advancement, especially because it brings together the perceptual and memory components of common laboratory tasks.
  
  The proposed two-area model provides a biologically plausible implementation of efficient coding and Bayesian inference principles, which interact seamlessly with a memory buffer to produce a complex pattern of delay-dependent response errors. No previous model had achieved this.
  
  We appreciate the reviewer’s comments that the work is a conceptual advancement, combining Bayesian perception models and attractor memory models, and produces error patterns which wasn’t achieved by previous models. Below, we address the specific concerns of the reviewer.
  
  Major Point 1: The correspondence between the various computational models is not fully disclosed. It is not easy to see this correspondence because the network function is illustrated with different representations for different models and the correspondence between components of the various models is not specified. For instance, Figure 1 shows that a specific pattern of noise is required in the low-dimensional attractor model, but in the next model in Figure 2, the memory noise is uniform for all stimuli. How do these two models integrate? What element in the population-code model of Figure 2 plays the role of the inhomogeneous noise of Figure 1? Also, the Bayesian model of Figure 2 is illustrated with population responses for different stimuli and delays, while the attractor models of Figures 3 and 4 are illustrated with neuronal tuning curves but not population activity. In addition, error variance in the Bayesian model appears to be already higher for oblique orientations in the first iteration whereas it is only first shown one second into the delay for the attractor model in Figure 4. It is thus unclear whether variance inhomogeneities appear already at the perceptual stage in the attractor model, as it does in the population-code model. Of course, correspondences do not need to be perfect, but the reader does not know right now how far the correspondence between these models goes.
  
  Thank you for pointing out the lack of clarity in the correspondence between different models. We note the distinct noise implementation in extension of the previous Bayesian model (Fig. 2) and the network models (Fig. 3 and beyond). In Fig. 2, we followed previous studies by employing static tuning curves for the sensory module and Poisson noise to account for variability in the perception stage. In the memory stage, sensory output undergoes the addition of constant Gaussian noise, replicating the diffusion process along the memory manifolds as shown in traditional memory network models. In the network models in Fig. 3 and beyond, we do consider the same noise in both sensory and memory modules, subjecting all units to Poisson noise to simulate neuronal spiking variability. In the network models, the two modules dynamically interact, which warp the energy landscape and generate uneven noise coefficients along the memory manifold, reminiscent of the conditions shown in Fig. 1.
  
  However, we do note the limitation of the current study which cannot fully replicate behavior patterns observed in variation of perception tasks. For instance, while shorter stimulus presentations with no explicit delay lead to larger biases experimentally, our current model, which starts activities from a flat baseline, shows an increase in bias throughout the stimulus presentation. Additionally, the error variance during stimulus presentation is almost negligible compared to that during the delay period, as the external input overwhelms the internal noise. These mismatches during stimulus presentation have minimal impact on activities during the delay period when the internal dynamics dominate. Nonetheless, the model needs further refinement to accurately reproduce activities during stimulus presentation, possibly by incorporating more biologically plausible baseline activities. To make this limitation clear, we included the above response in a new paragraph on limitations and future directions in the Discussion (2nd paragraph in p. 11). Also, we modified the text that previously described that our model can “explain error patterns in both perception and working memory tasks” in p. 3 and p. 5 as “explain error patterns in working memory tasks that are similar to those observed in perception tasks.”
  
  And we added the bias and variance pattern right after the stimulus offset in Figure 4C,D with the following note in p. 6:
  
  “Note that the variance of errors is nearly zero during stimulus presentation because the external input overwhelms internal noise, which does not fully account for the variability observed during perception tasks (see Discussion).”
  
  Major Point 2: The manuscript does not identify the mechanistic origin in the model of Figure 4 of the specific noise pattern that is required for appropriate network function (with higher noise variance at oblique orientations). This mechanism appears critical, so it would be important to know what it is and how it can be regulated. In particular, it would be interesting to know if the specific choice of Poisson noise in Equation (3) is important. Tuning curves in Figure 4 indicate that population activity for oblique stimuli will have higher rates than for cardinal stimuli and thus induce a larger variance of injected noise in oblique orientations, based on this Poissonnoise assumption. If this explanation holds, one wonders if network inhomogeneities could be included (for instance in neural excitability) to induce higher firing rates in the cardinal/oblique orientations so as to change noise inhomogeneities independently of the bias and thus control more closely the specific pattern of errors observed, possibly within a single memory network.
  
  The specific pattern of noise coefficient, lower variability at cardinal orientations in the network models, inherited that of the previous Bayesian perception models (Wei and Stocker, 2017). Either in one-module or two-module networks, the specific pattern of heterogeneous connections induces more neurons tuned to cardinal orientations with narrower tuning widths. Such sparser representation near cardinal stimuli generates lower noise variability even with constant Gaussian noise. This is verified in Eq. 21 in Methods, showing the derivation of noise coefficients – with constant Gaussian noise, Eq. 21 is modified as
  
  because . Thus, 𝒟(𝜃) is inversely proportional to , which reflects the length travelled on the stable trajectory 𝒔𝒔‾(𝜃𝜃) when θ increases by one unit. For sparser representation, becomes larger and 𝒟(𝜃) is reduced. Intuitively, with more neurons tuned to cardinal stimuli, noise is averaged and reduced. In sum, the heterogeneous connection induces the specific noise coefficient, and the choice of Poisson-like noise is not essential, although it facilitates the correct variance pattern. To clarify this point, we have added the results of using uniform Gaussian noise in new Figure 5 – Figure Supplement 2 and Figure 9 – Figure Supplement 1.
  
  Major point 3: The main conclusion of the manuscript, that the observed patterns of errors "require network interaction between two distinct modules" is not convincingly shown. The analyses show that there is a quantitative but not a qualitative difference between the dynamics of the single memory area compared to the sensory-memory two-area network, for specific implementations of these models (Figure 7 - Figure Supplement 1). There is no principled reasoning that demonstrates that the required patterns of response errors cannot be obtained from a different memory model on its own. Also, since the necessity of the two-area configuration is highlighted as the main conclusion of the manuscript, it is inconvenient that the figure that carefully compares these conditions is in the Supplementary Material.
  
  Following the suggestion by the reviewer, we moved Figure 7 – Figure supplement 1 as new Figure 9. As noted by the reviewer, drift dynamics and diffusion projected onto the lowdimensional memory manifold have similar shapes in both one-module and two-module networks, with the lowest potential and highest noise coefficient observed at the oblique orientations. However, there is a difference in the asymmetry degrees of the drift and diffusion at cardinal and oblique orientations: the one-module network shows larger asymmetry in potential energy, while the two-module network shows larger asymmetry in the noise coefficient. These varying degrees of heterogeneity in drift and diffusion lead to qualitative differences in bias and variance patterns in estimation. Shallower potential differences with more asymmetrical noise coefficients result in correct bias and variance patterns in the two-module network, while the opposite leads to flipped variance patterns in the one-module network.
  
  To intuitively understand how connectivity heterogeneity differentially affects the asymmetry degrees of drift and diffusion in one-module and two-module networks, consider a simple case where only the excitatory connection is heterogeneous, denoted as α. The asymmetry of diffusion reflects the degree of heterogeneity in either the sensory or memory modules. The noise coefficient derived from the low-dimensional projection is mainly determined by the heterogeneity of . While the one-module network, with a much lower α, shows almost flat , the two-module network shows more prominent asymmetry in with a larger α in the sensory module.
  
  On the other hand, the asymmetry in the potential energy is influenced differently by the connectivity heterogeneity of the sensory module and that of the memory module. For memory maintenance, overall recurrent connections need to be strong enough to overcome intrinsic decay, simplifying to w = 1. In the one-module network, α in the memory module creates potential differences at cardinal and oblique orientations as 1± α. On the other hand, in the two-module network, with w = 1 fulfilled by the memory module, α in the sensory module acts as a perturbation. The effect of α is modulated by the connectivity strengths between sensory and memory module, denoted by γ. Potential differences at cardinal and oblique orientations can be represented as 1± γα. While both α and γ determine the energy level, the noise coefficient less depends on γ (see response to your Major Point 4). Thus, even for relatively larger α in the sensory module leading to more asymmetrical noise coefficients, the potential difference could be shallower in the two-module network with small γ<1.
  
  In sum, in the two-module network, there is an additional degree of freedom, connectivity strengths between sensory and memory modules, which provides the flexibility to control drift and diffusion separately, unlike in the one-module network. To clarify this, we have added simulations in Figure 6 and Figure 9 and provided an intuitive explanation in the accompanying texts in pp. 6-7 and p. 9.
  
  Major Point 4: The proposed model has stronger feedback than feedforward connections between the sensory and memory modules. This is not a common assumption when thinking about hierarchical processing in the brain, and it is not discussed in the manuscript.
  
  As noted in the previous response, the connectivity strengths between the sensory and memory modules, denoted as γ, are important parameters determining the qualitative features of bias and variance patterns. γ corresponds to the product of Jf and Jb, feedforward and feedback strengths, and our additional simulation shows that the bias and variance patterns remain similar for a fixed γ. Note that further simulation revealed that the heterogeneity degree, α, and the intermodal connectivity strengths, γ, influence the drift and diffusion terms differently. As this result highlights the advantage of the two-module network, we moved the dependence of error patterns on intermodal connectivity strengths to the main figure (previous Figure 5 – Figure supplement 2), which now includes more simulations showing bias and variance patterns for different Jf and Jb and for different α and Jb (new Figure 6).
  
  Minor Point 1: page 11: "circular standard deviation of sigma_theta = 1.3º at cardinal orientations" but in Figure 2 we see sigma_theta = 2º at cardinal orientations.
  
  The circular standard deviation of 𝜎𝜎𝜃𝜃 = 1.3º refers to the standard deviation of the sensory module output in iteration 1, that is, before feeding into the memory module to complete this iteration. In figure 2, the standard deviation plotted is that of the output of the memory module, which has a Gaussian memory noise with standard deviation 1.3º added on top of the sensory output. Hence we see a standard deviation of √(1.32 + 1.32) = 1.84º which seems close to 2º in the figure. We added a sentence in this paragraph of Methods (p. 13) to avoid confusion.
  
  Minor Point 2: equation (19): What does the prime of ||s'(theta)|| mean?
  
  The prime represents taking the derivative with respect to θ:
  
  reflects the length travelled on the stable trajectory when θ increases by one unit. As we plotted in Figure 9 and Figure 5 – Figure supplement 2, we clarified it in the legend.
  
  Minor Point 3: page 15: "The Fisher information (F) is estimated by assuming that the likelihood function p(r|theta) is Gaussian", but the whole point of Wei and Stocker (2015) and your Figure 2 is that likelihoods are skewed in these networks. This could be clarified.
  
  Thank you for pointing out the lack of clarity. In Wei and Stocker (2015) and our Figure 2, the likelihood is skewed with respect to 𝜃 (note the horizontal axes). However, in the Methods section, we assumed the distribution function 𝑝(𝑟|𝜃) is Gaussian with respect to 𝑟𝑟 when 𝜃 is considered fixed:
  
  where . The distribution function is skewed with respect to 𝜃 because the tuning curves are skewed with respect to 𝜃 (see Figure 4B). We have clarified our assumption in p. 16 to avoid confusion.
  
  Reviewer #3:
  
  Summary:
  
  The present study proposes a neural circuit model consisting of coupled sensory and memory networks to explain the circuit mechanism of the cardinal effect in orientation perception which is characterized by the bias towards the oblique orientation and the largest variance at the oblique orientation.
  
  Strengths:
  
  The authors have done numerical simulations and preliminary analysis of the neural circuit model to show the model successfully reproduces the cardinal effect. And the paper is wellwritten overall. As far as I know, most of the studies on the cardinal effect are at the level of statistical models, and the current study provides one possibility of how neural circuit models reproduce such an effect.
  
  We appreciate the reviewer’s comments that the work successfully reproduces error patterns through circuit models, advancing beyond previous statistical models. Below, we address the specific concerns of the reviewer.
  
  Weaknesses:
  
  There are no major weaknesses and flaws in the present study, although I suggest the author conduct further analysis to deepen our understanding of the circuit mechanism of the cardinal effects. Please find my recommendations for concrete comments.
  
  Minor Point 1: Likely, the interplay of the potential function (Figure 5D) and the noise amplitude (Figure 5C) in the memory network is the key to reproducing the cardinal effect. For me, it is obvious to understand the spatial profile of the potential function as what it currently looks like (Figure 5D), while I haven't had an intuitive understanding of how the spatial profile of noise structure emerges from the circuit model. Therefore I suggest the authors provide a more comprehensive analysis, including theory and simulation, to demonstrate how the noise structure depends on the network parameters. I am concerned about whether the memory network can still reproduce the minimal variance at the cardinal orientation if we reduce the Fano factor of single neuron variabilities. In this case, the shape of the potential function will be dominant in determining the variance over orientation (Figure 5F) and the result might be reverted.
  
  Thank you for the suggestion. Either in one-module or two-module networks, the specific pattern of heterogeneous connections induces more neurons tuned to cardinal orientations with narrower tuning widths. Such sparser representation near cardinal stimuli generates lower noise variability even with constant Gaussian noise, which is now added in Figure 5 – Figure Supplement 2. We also showed that the distinctive error patterns in one-module and two-module networks are maintained under Gaussian noise with varying amplitude in Figure 9 – Figure supplement 1.
  
  Minor Point 2: In addition, it is interesting to show how the representation of the sensory module looks like, e.g., plotting the figures similar to Figures B-F but from the sensory module. I feel the sensory module doesn't have a result similar to Figure 5F. Is it?
  
  Yes, decoded error patterns obtained from the sensory module are similar to the results obtained from the memory module. We have added Figure 4 – Figure supplement 1 to show that our conclusions remain valid when decoding from the sensory module.
  
  Minor point 3: Last but not least, I have a conceptual question about the presentation mechanism in the proposed circuit model. The present study refers to Wei, et al., 2015 and 2017 about the statistical model mechanism of the cardinal effect. If I remember correctly, Wei's papers considered joint encoding and decoding processes to render the cardinal effect. Can the authors regard the processes in the proposed circuit model with the stages in the statistical model? Or at least the authors should discuss this link in the Discussions.
  
  We now included a mention of using a population vector decoder that mimics Bayesian optimal readout in the Result section (p. 6), in addition to the Discussion and Methods. However, we acknowledge that this decoder is only optimal under a specific loss function. A recent Bayesian perception model suggested different types of noise like external noise or variations in loss functions that adjust tolerance to small errors may help explain various error patterns observed across different modalities (Hahn and Wei, 2024). We have now added this limitation in the Discussion, along with the inconsistency of the current model with experimental observations during perception tasks and future directions (p. 11).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.09.566396v3
www.biorxiv.org www.biorxiv.org

LRMP inhibits cAMP potentiation of HCN4 channels by disrupting intramolecular signal transduction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.
  
  Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.
  
  Public Reviews
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.
  
  Strengths:
  
  The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.
  
  Thank you.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.
  
  Weaknesses:
  
  (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.
  
  We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.
  
  (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.
  
  FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.
  
  Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.
  
  We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.
  
  (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?
  
  All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.
  
  Strengths:
  
  Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.
  
  Weaknesses:
  
  Although the work is interesting, there are some discrepancies between data that need to be addressed.
  
  (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.
  
  Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.
  
  (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).
  
  With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.
  
  This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?
  
  The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.
  
  There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.
  
  Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.
  
  The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.
  
  We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.
  
  On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.
  
  Exemplary current traces have been added to all figures in the revised manuscript.
  
  (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."
  
  Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.
  
  Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.
  
  We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.
  
  Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.
  
  Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.
  
  Reviewing Editor (recommendations for the Authors):
  
  (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.
  
  Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.
  
  (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).
  
  There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.
  
  (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.
  
  We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).
  
  (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.
  
  This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.
  
  (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.
  
  As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.
  
  We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.
  
  (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.
  
  As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.
  
  (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.
  
  We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.
  
  Reviewer #1 (Recommendations For The Authors):
  
  A) Comments:
  
  (1) Figure 1: Please show some representative current traces.
  
  Exemplar currents are now shown for each channel in the manuscript.
  
  (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.
  
  We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.
  
  (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.
  
  See response to the reviewing editor’s comment 7.
  
  B) Minor Comments:
  
  (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.
  
  We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.
  
  (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?
  
  See response to the reviewing editor’s comment 2.
  
  (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.
  
  Fig. 5 has been revised as suggested.
  
  (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?
  
  This is correct. This has been explicitly stated in the revised manuscript (page 14).
  
  (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.
  
  This has been stated in the revised figure legend.
  
  (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.
  
  This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.
  
  Author response image 1.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.
  
  We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).
  
  (2) Resolution of Fig. 8b is low.
  
  We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.
  
  (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.
  
  A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.
  
  (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.
  
  This has been added to figure legends for the relevant figures in the revised manuscript.
  
  (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".
  
  Fixed. Thank you.
  
  (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?
  
  This information has been added to the methods (page 5). It is the same as Peters 2020.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".
  
  This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).
  
  We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.
  
  (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."
  
  On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.
  
  As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.
  
  Please see the response to the reviewing editor’s comment 3.
  
  (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.
  
  Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.
  
  Please see the response to the reviewing editor’s comment 2 and 5.
  
  Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).
  
  Please see the response to the reviewing editor’s comment 3.
  
  (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".
  
  I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.
  
  Please see the response to the reviewing editor’s comment 6.
  
  (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".
  
  Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?
  
  You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.
  
  Please see the response to the reviewing editor’s comment 7.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.29.555242v2
www.biorxiv.org www.biorxiv.org

Chronic hyperactivation of midbrain dopamine neurons causes preferential dopamine neuron degeneration

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.
  
  Strengths:
  
  This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.
  
  We thank the reviewer for these positive comments.
  
  Weaknesses:
  
  Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.
  
  We thank the reviewer for this review. We do believe that the manuscript has a substantial mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have executed several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We believe that these additions significantly bolster the conclusions of the paper.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.
  
  Strengths:
  
  This is an exciting and important paper.
  
  The paper compares mouse transcriptomics with human patient data.
  
  It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.
  
  We thank the reviewer for these comments.
  
  Weaknesses:
  
  Major concerns:
  
  (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.
  
  We thank the reviewer for this important recommendation. Although the initial version showed that CNO does not produce degeneration of DA neuron terminals, it did not exclude a contribution to the behavioral changes. To address this, we now include a cohort of DREADD free non-injected mice treated with either vehicle or CNO (Figure S1C). We found that on its own, CNO did not significantly impact either light cycle or dark cycle running. Together these results along with the lack of degeneration observed with CNO treatment in non-DREADD mice (Figure 2D) support that our behavioral and histological results are the result of dopamine neuron activation.
  
  (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.
  
  We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and have completed these experiments in the revision (Figure 1, Figure S2). We now show that in vivo treatment with CNO causes some of the same physiological changes in VTA dopamine neurons as we found in SNc dopamine neurons, including an increased spontaneous firing rate, and a similar decrease in responsiveness to CNO in the slice recordings. Together these observations support the conclusion that SNc axons are intrinsically more vulnerable to increased activity than VTA dopamine axons.
  
  (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.
  
  We have clarified which mice had access to a running wheel in the methods of our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps prevented mice from having access to a running wheel in their home cage. Mice used for non-responder and non-hM3Dq (CNO alone) experiments also had access to a running wheel during their treatment. Mice used for the isradipine experiment did not have access to a running wheel, as the number of mice was too large and while unilateral hM3Dq expression allows for within-animal controls, it does not lend to clear interpretation of running wheel data.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.
  
  We thank the reviewer for the careful and thoughtful review of our manuscript.
  
  While extensive depolarization and associated intracellular calcium elevations promote degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not detect an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than VTA DA neurons, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel. In addition, we are not aware of prior studies that have chronically activated DREADDs over several weeks to produce neurodegeneration.
  
  In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.
  
  Thank you for this comment. As discussed in greater detail in the “comments on results section” below, our data suggests this isn’t a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and have expanded on the discussion of this possibility in the revised manuscript.
  
  The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.
  
  We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking, and the little data that exists is difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). In addition to the human and rodent data already discussed in the manuscript, additional support for increased activity in PD models include:
  
  • Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)
  
  • Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)
  
  • Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)
  
  We have included citation of these important examples in our revision. In our model, we have found that chronic hyperactivity causes a substantial loss of nigral DA terminals while mesolimbic terminals are relatively spared (Figure 2), and that striatal DA levels are markedly decreased (Figure S6), phenomena that are hallmarks of Parkinson’s disease.
  
  There are additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models a form of increased intrinsic activity, and interpretation of our results will be facilitated as we learn more about how the activity of DA neurons changes in humans in PD. Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity.
  
  Comments on the introduction:
  
  The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.
  
  We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. Importantly, while significant changes to burst firing were not seen until almost complete loss of dopamine neurons, these recordings were made in anesthetized rats which may not be representative of neural activity in awake animals. We adjusted the text so that this is no longer referred to as ‘partial’ loss. At the same time, we point out that the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al., Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al., Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al., J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al., Annu Rev Pathol 2011, PMID: 21034221).
  
  Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020, PMID: 33173027).
  
  It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.
  
  We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. Accordingly, we have expanded on our citation of this literature in both the introduction and discussion sections. However, we believe that the novelty of our study lies in: 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.
  
  Comments on the results section:
  
  The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.
  
  The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.
  
  We do report the input resistance in Figure S1C (now Figure S2A, S2B), which was unchanged in CNO-treated animals compared to controls. We did not previously report the resting membrane potential because many of the DA neurons were spontaneously firing. In the revision, we now report the initial membrane potential on first breaking into the cell for the whole cell recordings, which did not vary between groups (Figure S2). This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing the neuron with the internal solution, which might alter the intracellular concentrations of ions. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S2). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (Figure S4B). This finding is also consistent with increased activity of the DA neurons. We have added discussion of these important considerations in the revision.
  
  It is great that the authors quantified not only TH levels but also the levels of mCherry, coexpressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.
  
  We thank the reviewer for this comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Our control animals give us an indicator of injection variability, which is likely substantial and prevents us from detecting more subtle changes. Nonetheless, we believe that it conveys useful complementary data. We discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.
  
  Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine
  
  neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.
  
  We agree that the stereology experiments were performed on relatively small numbers of animals, such that only robust effects would be detected. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. Given this small effect size, we would indeed need much larger groups to better discern these changes. Stereology is an intensive technique, and we have therefore elected to focus on terminal loss. We have also replaced panel 2G with a more representative CNO image.
  
  In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.
  
  We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We have included a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We have also included frequency and amplitude data for these recordings (Figure S4A), along with discussion of the significance of these findings.
  
  Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?
  
  While levels of DARPP32 mRNA were unchanged, our additional HPLC data show strong decreases in striatal dopamine in hyperactivated mice. We do not see strong changes in classic activity-related genes (data not shown), however these genes may behave differently in the context of chronic hyperactivity and ongoing degeneration. Instead, we employed NEUROeSTIMator (Bahl et al., Nature Comm. 2024, PMID: 38278804), a deep learning method to predict neural activation based on transcriptomic data. We found that predicted activity scores were significantly higher in GqCNO dopaminergic regions compared to controls (Figure X). Indeed, some of the genes used within the model to predict activity are immediate early genes eg. c-fos.
  
  The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared? Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing our mouse model to early PD samples when there is more limited SNc DA neuron loss (see the proportion of DA neurons within the areas of human tissues we selected for sampling in Author response image 1). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration to those in patients where degeneration is ongoing.
  
  Author response image 1.
  
  Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.
  
  Comments on the discussion:
  
  In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.
  
  Our model utilizes hM3Dq-DREADDs that function by activating Gq pathways that are classically expected to increase intracellular calcium to increase neuronal excitability. Indeed in slices from mice that were not treated with CNO, acute CNO application caused depolarizations (Figure 1E) that can be due to an increase in intracellular calcium and also cause increases in intracellular calcium. Additionally, our results show increased calcium by fiber photometry and changes to calcium-related genes, suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point. Indeed, a small preliminary experiment with chronic isradipine failed to show protection, although it lacked power to detect a partial effect. We have acknowledged this in the text, and also briefly consider other mechanisms such as increased dopamine levels that could also mediate the toxicity.
  
  In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.
  
  As discussed, we sampled SN DA neurons in early PD (see Author response image 1), and in our view there is great value for such comparisons.
  
  A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.
  
  As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we have included additional electrophysiology experiments and have added discussion of this important consideration.
  
  Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.
  
  As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150), while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020, PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We have amended our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.
  
  There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.
  
  While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we have revised the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The temporal design of the experiments is quite confusing. For instance, Figures 1 and 3 illustrate the daily changes of the mice and suggest some critical time points within 2 weeks of CNO administration, whereas Figure 2 presents data at 2 and 4 weeks, which are much later than the proposed critical time points. Furthermore, Figure 4 includes only 1 week data, and lacks subsequent data from 2 and 4 weeks, at which significant changes such as calcium levels and neuronal/axonal degeneration are observed.
  
  While interesting behavior and calcium phenotypes were detected within 2 and 4 weeks of CNO administration (Figures 1 and 3), we only collected tissues for histology at the 2 and 4 week time points (Figure 2). Observing degeneration of DA neuron axons but not cell bodies at 2 weeks served as a rationale to extend to the 4 week time point to determine whether degeneration was progressive. At the same time, our primary focus is on identifying early changes that may drive or contribute to the degeneration. As such, we recorded calcium changes over a 2-week treatment period, capturing the period during which almost all of the dopamine axons are lost. Similarly, we had the capacity to perform spatial transcriptomics at only one time point, and the 1 week time point was selected to capture transcriptomic changes that precede and potentially contribute to the mild and severe degeneration that occurs at 2 and 4 weeks, respectively. We have added text clarifying the rationale for the time points chosen.
  
  (2) The authors showed the changes in neuronal firing in dopamine neurons by the administration of CNO. However, one of the most important features of dopaminergic neuronal activity is dopamine release at its axon terminals in the striatum. Thus, the claims raised in this paper would be better supported if the authors further show any alterations in dopamine release (by FSCV or fluorescent dopamine sensors) at some critical time points during or after CNO application.
  
  While we are confident that DA release is altered due to the significant changes in behavior when hM3Dq DREADDs are activated specifically in DA neurons, the current manuscript does not quantify this, or distinguish between axonal and somatodendritic DA release. Interestingly, we did find significantly decreased striatal dopamine by HPLC after chronic activation (Figure S6). We believe that resolving these questions is beyond the scope of this manuscript, but have added text indicating the importance of these experiments.
  
  (3) The authors used 2% sucrose as a vehicle via drinking water. Please explain the rationale behind this choice.
  
  We used 2% sucrose as the vehicle because it is also added to the CNO water to counteract the bitterness of CNO (Kumar et al., J Neurotrauma 2024, PMID: 37905504). We have clarified this in the manuscript.
  
  (4) As we know, mRNA levels of some genes do not always predict their protein levels; there is sometimes a huge discrepancy between mRNA and protein abundance. In this paper, the mechanistic interpretation of the results by the authors heavily relies on the spatial transcriptomics of the midbrain and striatum. Thus, the authors need to provide additional data proving that the gene expression of some genes in the CNO group is also changed at the level of protein.
  
  We agree that validating hits at the protein level is valuable, however we were limited in our ability to assess these changes for the revision. However, we have done additional transcriptomics with the high resolution Xenium platform to increase confidence in a subset of hits of interest for follow up in future work, and we included data on genes related to DA metabolism and markers of DA neurons.
  
  (5) The authors provided spatial transcriptomics data only for mice with one week of chronic activation. However, other data also indicate significant differences when the activation period extends beyond 10 to 12 days (Figure 1C, Figure 3D-F). While a 7-day chronic activation time point might be crucial, additional transcriptomics data from later time points would be beneficial to confirm the persistence of these changes in gene expression. Furthermore, differential gene expression (DEG) analysis at these later time points could identify novel pathways or genes influenced by the chronic activation of dopamine neurons.
  
  This is an interesting point and would provide valuable data as to how chronic activity influences gene expression, however additional transcriptomics at later timepoints is beyond the scope of this paper. In future studies we will assess changes observed in this manuscript at other time points.
  
  (6) Figure 1D, Figure S1C:
  
  The authors should present the sample recording traces to demonstrate that the electrophysiological recordings were appropriately made.
  
  These data have been provided in Figure S2.
  
  (7) Figure S1C:
  
  AP thresholds in SNc dopamine neurons from both groups look quite high. In addition, considering the data from the previous reports, AP peak amplitudes in SNc dopamine neurons from both groups seem to be very low. Are these values correct?
  
  The thresholds and peaks are correct, including the AP (threshold to peak), which is typical in our (Dr. Margolis’s) experience. AP thresholds are measured from an average of at least 10 APs, as the voltage at which the derivative of the trace first exceeds 10 V/s. As mentioned in the methods section, junction potentials were not corrected, which can result in values that are a bit depolarized from ground truth. This junction potential would be consistent across all recordings, thus not impede detection of a difference in AP thresholds between groups of animals.
  
  (8) Figure 1E:
  
  It would be better if the statistical significance is depicted in the graph.
  
  We don’t perform repeated measures statistics across data like these, as the data are continuous, collected at 10 kHz. For ease of displaying the data, the data for each neuron is binned and then these traces are averaged together. We display SEM to give a sense of the variance across neurons. We have provided sample traces of individual neurons to better demonstrate the variability and significance of this data (Figure S2).
  
  (9) Figure 2C:
  
  The representative staining images appear to be taken from coronal slices at anatomically different positions along the rostral-to-caudal axis. Although the total numbers of TH+ cells are comparable between vehicle and CNO groups in the graph, the sample images do not reflect this result. The authors should replace the current images with the better ones.
  
  We have replaced this image in the manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor concerns:
  
  (1) The authors claim that their transcriptomics experiments are conducted 'before any degeneration has occurred'. And they do not see significant differences in the TH expression in the striatum. However, the n for these mice at 1 week is lower than the n use at 2 weeks (n=5 vs n=8-9) and the images used to show 'no degeneration' really look like there is some degeneration going on. Also, throughout the paper, there is a stronger effect when degeneration is measured with mCherry compared to when it is measured with TH. The 'no change' claim is made only with the TH comparison. It seems possible (and almost likely) that there would be significant axonal degeneration at one week with either a higher sample size or using the mCherry comparison. The authors should simply claim that their transcriptomics data is collected before any 'somatic' degeneration occurs.
  
  Thank you, we have included data that shows partial terminal loss after one week of activation (Figure S3B, Figure S5A) and have corrected this language in the manuscript to reflect transcriptomics occurring before somatic degeneration.
  
  (2) While selective degeneration is one of the most interesting findings in the paper, that finding is not emphasized and why it would be interesting to compare the VTA vs SNc is not discussed in the introduction.
  
  Emphasis for comparing the VTA vs the SNc has been added to the introduction, along with additional electrophysiology data in VTA dopamine neurons in Figure 1 and Figure S2.
  
  (3) In a similar direction, the vulnerability of dopaminergic neurons has been shown to be differential even within the SNc, with the ventral tier neurons degenerating more severely and the dorsal tier neurons remaining resilient. Is there any evidence for a ventral-dorsal degeneration gradient in the SNc in these experiments?
  
  This is a really interesting point and changes to dopamine neuron subtypes along the ventraldorsal axis may be occurring in this model, particularly as there is more selective loss of SNc neurons. However, the cell type involved would be difficult to determine at this stage, since single cell transcriptomic resolution is necessary across the entire SNc to identify cell subtypes. Transcriptomic identification is further complicated given that transcriptome change has recently been shown with genetic manipulation (Gaertner et al., bioRxiv 2024, PMID: 38895448), and we would think could similarly change with increased activity. Assessing these issues are beyond the scope of this paper.
  
  (4) The running data is very interesting and the circadian rhythm alterations are compelling.
  
  However, it is unclear whether the CNO mice run more total compared with the vehicle mice.
  
  The authors should show the combined total running data to evaluate this. We now show total running data in Figure 1C.
  
  (5) The finding that acute CNO has no effect on the membrane potential of SNc neurons after chronic CNO exposure is very peculiar! Especially because the fiber photometry data suggests that CNO continues to have an effect in vivo. Is there any explanation for this?
  
  While there is no acute electrophysiological response to CNO detected in this group, there may be intracellular pathways activated by the DREADD that do not acutely impact membrane potential in current clamp (I = 0 pA) mode.
  
  (6) The terminology of chronic CNO is sometimes confusing as it refers to both 2-week and 4week administration. Using additional terminology such as 'early' and 'late' might help with clarity.
  
  We have decreased usage of ‘chronic,’ and increased usage of more specific treatment times in order to increase clarity throughout the manuscript.
  
  (7) In Figure 2C, the SNc image looks binarized.
  
  This image has been updated.
  
  (8) Also in Figure 2, why are TH and mCherry measured for the 4-week time point, but only TH measured for the 2-week time point?
  
  mCherry quantification was performed to further support the finding of DA neuron death, and was therefore not assessed at 2 weeks given that there was no change in the TH stereology.
  
  (9) Additional scale bars and labeling is needed in Figure 3. In addition, there is such a strong reduction in noise after chronic CNO in the fiber photometry recordings, and the noise does not return upon CNO washout. What is the explanation for this?
  
  Additional scale bars were added to Figure 3. Traces are not getting less noisy with chronic CNO treatment, rather, there is less bursting activity in the dopamine cells. Our interpretation is that the baseline activity is rescued during washout but this bursting activity is not.
  
  (10) While not necessary to support the claims in this paper, it would be very interesting to see if chronic inhibition of dopaminergic neurons had a similar or different effect, as too little dopaminergic activity may also cause degeneration in some cases.
  
  We agree that assessing chronic inhibition is valuable, and this is an important area for future research.
  
  Reviewer #3 (Recommendations For The Authors):
  
  All the mice used in the study are not listed in the methods section. For example, the GCaMP6f floxed mice discussed in the results section are not listed in the methods. Also, the breeding scheme used for the different mouse lines needs to be described. For example, did the DAT-Cre mice carry one or two alleles?
  
  Both the DAT<sup>IRES</sup>Cre and GCaMP6f floxed (Ai148) Jax mouse line numbers and RRIDs are included in the methods. DAT<sup>IRES</sup>Cre mice carried two alleles.
  
  In the methods section, the amount of virus injected needs to be mentioned.
  
  This information has been added to the methods section.
  
  In all result graphs, please include the individual data points so that the readers can see the distribution of the data and quickly see the sample size.
  
  Graphs have been updated to include all individual data points. For line graphs, the distribution is communicated by the error bars, while the n is in the legends.
  
  The authors provide running wheel data in supplementary figure 1A to validate that chemogenetic activation of dopamine neurons leads to increased locomotor activity. The results shown in the figure appear to be qualitative as no average data is presented. The authors should provide average data from all mice tested.
  
  Average IP response data for all mice assessed for running wheel activity has been included in Figure S1.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.05.588321v3
www.biorxiv.org www.biorxiv.org

Brain areas for reversible symbolic reference, a potential singularity of the human brain

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  fMRI was used to address an important aspect of human cognition - the capacity for structured representations and symbolic processing - in a cross-species comparison with non-human primates (macaques); the experimental design probed implicit symbolic processing through reversal of learned stimulus pairs. The authors present solid evidence in humans that helps elucidate the role of brain networks in symbolic processing, however the evidence from macaques was incomplete (e.g., sample size constraints, potential and hard-to-quantify differences in attention allocation, motivation, and lived experience between species).
  
  Thank you very much for your assessment. We would like to address the potential issues that you raise point-by-point below.
  
  We agree that for macaque monkey physiology, sample size is always a constraint, due to both financial and ethical reasons. We addressed this concern by combining the results from two different labs, which allowed us to test 4 animals in total, which is twice as much as what is common practice in the field of primate physiology. (We discuss this now on lines 473-478.)
  
  Interspecies differences in motivation, attention allocation, task strategies etc. could also be limiting factors. Note that we did address the potential lack of attention allocation directly in Experiment 2 using implicit reward association, which was successful as evidenced by the activation of attentional control areas in the prefrontal cortex. We cannot guarantee that the strategies that the two species deploy are identical, but we tentatively suggest that this might be a less important factor in the present study than in other interspecies comparisons that use explicit behavioral reports. In the current study, we directly measured surprise responses in the brain in the absence of any explicit instructions in either species, which allowed us to measure the spontaneous reversal of learned associations, which is a very basic element of symbolic representation. Our reasoning is that such spontaneous responses should be less dependent on attention allocation and task strategies. (We discuss this now in more detail on lines 478-485.)
  
  Finally, lived experience could be a major factor. Indeed, obvious differences include a lifetime of open-field experiences and education in our human adult subjects, which was not available to the monkey subjects, and includes a strong bias towards explicit learning of symbolic systems (e.g. words, letters, digits, etc). However, we have previously shown that 5-month-old human infants spontaneously generalize learning to the reversed pairs after a short learning in the lab using EEG (Kabdebon et al, PNAS, 2019). This indicates that also with very limited experience, humans spontaneously reverse learned associations. (We discuss this now in more detail on lines 478-485.) It could be very interesting to investigate whether spontaneous reversal could be present in infant macaque monkeys, as there might be a critical period for this effect. Although neurophysiology in awake infant monkeys is highly challenging, it would be very relevant for future work. (We discuss this in more detail on lines 493-498.)
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Kerkoerle and colleagues present a very interesting comparative fMRI study in humans and monkeys, assessing neural responses to surprise reactions at the reversal of a previously learned association. The implicit nature of this task, assessing how this information is represented without requiring explicit decision-making, is an elegant design. The paper reports that both humans and monkeys show neural responses across a range of areas when presented with incongruous stimulus pairs. Monkeys also show a surprise response when the stimuli are presented in a reversed direction. However, humans show no such surprise response based on this reversal, suggesting that they encode the relationship reversibly and bidirectionally, unlike the monkeys. This has been suggested as a hallmark of symbolic representation, that might be absent in nonhuman animals.
  
  I find this experiment and the results quite compelling, and the data do support the hypothesis that humans are somewhat unique in their tendency to form reversible, symbolic associations. I think that an important strength of the results is that the critical finding is the presence of an interaction between congruity and canonicity in macaques, which does not appear in humans. These results go a long way to allay concerns I have about the comparison of many human participants to a very small number of macaques.
  
  We thank the reviewer for the positive assessment. We also very much appreciate the point about the interaction effect in macaque monkeys – indeed, we do not report just a negative finding.
  
  I understand the impossibility of testing 30+ macaques in an fMRI experiment. However, I think it is important to note that differences necessarily arise in the analysis of such datasets. The authors report that they use '...identical training, stimuli, and whole-brain fMRI measures'. However, the monkeys (in experiment 1) actually required 10 times more training.
  
  We agree that this description was imprecise. We have changed it to “identical training stimuli” (line 151), indeed the movies used for training were strictly identical. Furthermore, please note that we do report the fMRI results after the same training duration. In experiment 1, after 3 days of training, the monkeys did not show any significant results, even in the canonical direction. However, in experiment 2, with increased attention and motivation, a significant effect was observed on the first day of scanning after training, as was found in human subjects (see Figure 4 and Table 3).
  
  More importantly, while the fMRI measures are the same, group analysis over 30+ individuals is inherently different from comparing only 2 macaques (including smoothing and averaging away individual differences that might be more present in the monkeys, due to the much smaller sample size).
  
  Thank you for understanding that a limited sampling size is intrinsic to macaque monkey physiology. We also agree that data analysis in humans and monkeys is necessarily different. As suggested by the reviewer, we added an analysis to address this, see the corresponding reply to the ‘Recommendations for the authors’ section below.
  
  Despite this, the results do appear to show that macaques show the predicted interaction effect (even despite the sample size), while humans do not. I think this is quite convincing, although had the results turned out differently (for example an effect in humans that was absent in macaques), I think this difference in sample size would be considerably more concerning.
  
  Thank you for noting this. Indeed, the interaction effect is crucial, and the task design was explicitly made to test this precise prediction, described in our manuscript as the “reversibility hypothesis”. The congruity effect in the learned direction served as a control for learning, while the corresponding congruity effect in the reversed direction tested for spontaneous reversal. The reversibility hypothesis stipulates that in humans there should not be a difference between the learned and the reversed direction, while there should be for monkeys. We already wrote about that in the result section of the original manuscript and now also describe this more explicitly in the introduction and beginning of the result section.
  
  I would also note that while I agree with the authors' conclusions, it is notable to me that the congruity effect observed in humans (red vs blue lines in Fig. 2B) appears to be far more pronounced than any effect observed in the macaques (Fig. 3C-3). Again, this does not challenge the core finding of this paper but does suggest methodological or possibly motivational/attentional differences between the humans and the monkeys (or, for example, that the monkeys had learned the associations less strongly and clearly than the humans).
  
  As also explained in response to the eLife assessment above, we expanded the “limitations” section of the discussion, with a deeper description of the possible methodological differences between the two species (see lines 478-485).
  
  With the same worry in mind, we did increase the attention and motivation of monkeys in experiment 2, and indeed obtained a greater activation to the canonical pairs and their violation, -notably in the prefrontal cortex – but crucially still without reversibility.
  
  In the end, we believe that the striking interspecies difference in size and extent of the violation effect, even for purely canonical stimuli, is an important part of our findings and points to a more efficient species-specific learning system, that our experiment tentatively relates to a symbolic competence.
  
  This is a strong paper with elegant methods and makes a worthwhile contribution to our understanding of the neural systems supporting symbolic representations in humans, as opposed to other animals.
  
  We again thank the reviewer for the positive review.
  
  Reviewer #2 (Public Review):
  
  In their article titled "Brain mechanisms of reversible symbolic reference: a potential singularity of the human brain", van Kerkoerle et al address the timely question of whether non-human primates (rhesus macaques) possess the ability for reverse symbolic inference as observed in humans. Through an fMRI experiment in both humans and monkeys, they analyzed the bold signal in both species while observing audio-visual and visual-visual stimuli pairs that had been previously learned in a particular direction. Remarkably, the findings pertaining to humans revealed that a broad brain network exhibited increased activity in response to surprises occurring in both the learned and reverse directions. Conversely, in monkeys, the study uncovered that the brain activity within sensory areas only responded to the learned direction but failed to exhibit any discernible response to the reverse direction. These compelling results indicate that the capacity for reversible symbolic inference may be unique to humans.
  
  In general, the manuscript is skillfully crafted and highly accessible to readers. The experimental design exhibits originality, and the analyses are tailored to effectively address the central question at hand.
  
  Although the first experiment raised a number of methodological inquiries, the subsequent second experiment thoroughly addresses these concerns and effectively replicates the initial findings, thereby significantly strengthening the overall study. Overall, this article is already of high quality and brings new insight into human cognition.
  
  We sincerely thank the reviewer for the positive comments.
  
  I identified three weaknesses in the manuscript:
  
  - One major issue in the study is the absence of significant results in monkeys. Indeed, authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison).
  
  First, we disagree with the statement about “absence of significant results in monkeys”. We do report a significant interaction which, as noted by the referee, is a crucial positive finding.
  
  Second, we performed the suggested analysis for experiment 2, using the bilateral ROIs of the putative monkey MDN from previous literature (Mitchell, et al. 2016), which are based on the human study by Fedorenko et al. (PNAS, 2013).
  
  Author response table 1.
  
  Congruity effect for monkeys in Experiment 2 within the ROIs of the MDN (n=3). Significance was assessed with one-sided one-sample t-tests.
  
  As can be seen, none of the regions within the monkey MDN showed an FDR-corrected significant difference or interaction. Although the absence of a canonical congruity effect makes it difficult to draw strong conclusions, it did approach significance at an uncorrected level in the lateral frontal posterior region, similar to the large prefrontal effect we report in Figures 4 and 5. Furthermore, for the reversed congruity effect there was never even a trend at the uncorrected level, and the crucial interaction of canonicity and congruity again approached significance in the lateral prefrontal cortex.
  
  We also performed an ANOVA in the human participants of the VV experiment on the average betas across the 7 different fronto-parietal ROIs as used by Mitchell et al to define their equivalent to the monkey brain (Fig 1a, right in Mitchell et al. 2016) with congruity, canonicity and hemisphere (except for the anterior cingulate which is a bilateral ROI) as within-subject factors. We confirmed the results presented in the manuscript (Figure 4C) with notably no significant interaction between congruity and canonicity in any of these ROIs (all F-values (except insula) <1). A significant main effect of congruity was observed in the posterior middle frontal gyrus (MFG) and inferior precentral sulcus at the FDR corrected level. Analyses restricted to the canonical trials found a congruity effect in these two regions plus the anterior insula and anterior cingulate/presupplementary motor area, whereas no ROIs were significant at a FDR corrected level for reverse trials. There was a trend in the middle MFG and inferior precentral region for reversed trials. Crucially, there was not even a trend for the interaction between congruity and canonicity at the uncorrected level. The difference in the effect size between the canonical and reversed direction can therefore be explained by the larger statistical power due to the larger number of congruent trials (70%, versus 10% for the other trial conditions), not by a significant effect by the canonical and the reversed direction.
  
  Author response table 2.
  
  Congruity effect for humans in Experiment 2 within the ROIs of the MDN (n=23).
  
  These results support our contention that the type of learning of the stimulus pairs was very different in the two species. We thank the reviewer for suggesting these relevant additional analyses.
  
  - While the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants.
  
  We agree that this is an interesting question, although it is also very open-ended. For instance, we could report each subjects’ individual whole-brain results, but this would take too much space (and the interested reader will be able to do so from the data that we make available as part of this publication). As a step in this direction, we provide below a figure showing the individual congruity effects, separately for each experiment and for each ROI of table 5, and for each of the 52 participants for whom an fMRI localizer was available:
  
  Author response image 1.
  
  Difference in mean betas between congruent and incongruent conditions in a-priori linguistic and mathematical ROIs (see definition and analyses in Table 5) in both experiments (experiment 1 = AV, left panel; experiment 2= VV, right panel). Dots correspond to participants (red: canonical trials, green reversed trials).The boxplot notch is located at the median and the lower and upper box hinges at the 25th and 75th centiles. Whiskers extend to 1.5 inter-quartile ranges on either side of the hinges. ROIs are ranked by the median of the Incongruent-Congruent difference across canonical and reversed order, within a given experiment. For purposes of comparison between the two experiments, we have underlined with colors the top-five common ROIs between the two experiments. N.s.: non-significant congruity effect (p>0.05)
  
  Several regions show a rather consistent difference across subjects (see, for instance, the posterior STS in experiment 1, left panel). Overall, only 3 of the 52 participants did not show any beta superior to 2 in canonical or reversed in any ROIs. The consistency is quite striking, given the limited number of test trials (in total only 16 incongruent trials per direction per participant), and the fact that these ROIs were selected for their responses to spoken or written sentences, as part of a subsidiary task quite different from the main task.
  
  - Some details are missing in the methods.
  
  Thank you for these comments, we reply to them point-by-point below.
  
  Reviewer #3 (Public Review):
  
  This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their finding to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human.
  
  This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing.
  
  We thank the reviewer for the careful summary of the manuscript, and the positive comments.
  
  Methods - Design issues:
  
  The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless).
  
  Thank you for this comment.
  
  Indeed, for experiment 1, the amount of training and testing was not equal for the humans and monkeys, as also mentioned by reviewer 2. We now describe in more detail how many training and imaging days we used for each experiment and each species, as well as the number of blocks per day and the number of trials per block (see lines 572-577). We also added the information on the amount of training receives to all of the legends of the Tables.
  
  We are sorry for giving the impression that we trained until the monkeys learned this. This was not the case. Based on previous literature, we actually anticipated that the short training would not be sufficient, and therefore planned additional training in advance. Specifically, Meyer & Olson (2011) had observed pair learning in the inferior temporal cortex of macaque monkeys after 816 exposures per pair. This is similar to the additional training we gave, about 80 blocks with 12 trials per pair per block. This is now explained in more detail (lines 577-580).
  
  Furthermore, we strongly disagree with the pejorative term p-hacking. The aim of the experiment was not to show a congruency effect in the canonical direction in monkeys, but to track and compare their behavior in the same paradigm as that of humans for the reverse direction. It would have been unwise to stop after human-identical training and only show that humans learn better, which is a given. Instead, we looked at brain activations at both times, at the end of human-identical training and when the monkeys had learned the pairs in the canonical direction.
  
  Finally, in experiment 2, monkeys were tested after the same 3 days of training as humans. We wrote: “Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations” (lines 252-253).
  
  (2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front.
  
  Thank you for raising this important point. We already had a section on “limitations” in the manuscript, which we now extended (line 478-485). Indeed, this study is following a previous study in 5-month-old infants using EEG, in which we already showed that after learning associations between labels and categories, infants spontaneously generalize learning to the reversed pairs after a short learning period in the lab (Kabdebon et al, PNAS, 2019). We also cited preliminary results of the same paradigm as used in the current study but using EEG in 4-month-old infants (Ekramnia and Dehaene-Lambertz, 2019), where we replicated the results obtained by Kabdebon et al. 2019 showing that preverbal infants spontaneously generalize learning to the reversed pairs.
  
  Functional MRI in awake infants remains a challenge at this age (but see our own work, DehaeneLambertz et al, Science, 2002), especially because the experimental design means only a few trials in the conditions of interest (10%) and thus a long experimental duration that exceed infants’ quietness and attentional capacities in the noisy MRI environment. (We discuss this on lines 493-496.)
  
  (3) Humans have big advantages in processing and discriminating spoken stimuli and associating them with visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still, it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences.
  
  As the reviewer wrote, we deliberately performed Experiment 2 with visual shapes to control for various factors that might have explained the monkeys' failure in Experiment 1.
  
  (4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do?
  
  The referee is correct: our use of the word “reciprocally” was improper (although see Amalric et Dehaene, 2016 for significant differences in both directions when non-mathematical sentences concern specific knowledge). We changed the formulation to clarify this as follows: “In these ROIs, we recovered the subject-specific coordinates of each participant’s 10% best voxels in the following comparisons: sentences vs rest for the 6 language Rois ; reading vs listening for the VWFA ; and numerical vs non-numerical sentences for the 8 mathematical ROIs.” (lines 678-680).
  
  Methods - Analysis issues:
  
  (5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative.
  
  It is not clear to us which result the reviewer is referring to. In Tables 1-4, we report the values that we found significant in the whole brain analysis, we do not report additional statistical tests for this data. For Table 5, the subject-specific voxels were identified through a separate localizer experiment, which was designed to pinpoint the precise activation areas for each subject in the domains of oral and written language-processing and math. Subsequently, we compared the activation at these voxel locations across different conditions of the main experiment. Thus, the two datasets were distinct, and there was no double dipping. In both interpretations of the comment, we therefore disagree with the reviewer.
  
  Framing:
  
  (6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions.
  
  First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights into how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy.
  
  We agree with the referee that the term “mechanism” is ambiguous and, for systems neuroscientists, may suggest more than we are able to do here with functional MRI. We changed the title to “Brain areas for reversible symbolic reference, a potential singularity of the human brain”. This title better describes our specific contribution: mapping out the areas involved in reversibility in humans, and showing that they do not seem to respond similarly in macaque monkeys.
  
  Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly, if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task.
  
  As explained in the introduction, the reversibility test addressed a very minimal core property of symbolic reference. There cannot be a symbol if its attachment doesn’t operate in both directions. Thus, this property is necessary – but we agree that it is not sufficient. Indeed, more tests are needed to establish whether and how the learned symbols are used in further downstream compositional tasks (as discussed in our recent TICS papers, Dehaene et al. 2022). We added a sentence in the introduction to acknowledge this fact:
  
  “Such reversibility is a core and necessary property of symbols, although we readily acknowledge that it is not sufficient, since genuine symbols present additional referential and compositional properties that will not be tested in the present work.” (lines 89-92).
  
  Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above).
  
  We have published an extensive review including a description of our use of the term “singularity” (Dehaene et al., TICS 2022). Here is a short except: “Humans are different even in domains such as drawing and geometry that do not involve communicative language. We refer to this observation using the term “human cognitive singularity”, the word singularity being used here in its standard meaning (the condition of being singular) as well as its mathematical sense (a point of sudden change). Hominization was certainly a singularity in biological evolution, so much so that it opened up a new geological age (the Anthropocene). Even if evolution works by small continuous change (and sometimes it doesn’t [4]), it led to a drastic cognitive change in humans.”
  
  We find the referee’s use of the pejorative term ”insinuate” quite inappropriate. From the title on, we are quite nuanced and refer only to a “potential singularity”. Furthermore, as noted above, we explicitly mention in the discussion the limitations of our study, and in particular the fact that only a single non-human species was tested (see lines 486-493). We are working hard to get chimpanzee data, but this is remarkably difficult for us, and we hope that our paper will incite other groups to collect more evidence on this point.
  
  (7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and DehaeneLambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").
  
  We fail to understand the putative circularity that the referee sees in our introduction. We urge him/her to re-read it, and hope that, with the changes that we introduced, it does boil down to his/her summary, i.e. “What is uniquely human? One possibility is spontaneous reversal of temporal associations."
  
  Reviewer #1 (Recommendations For The Authors):
  
  In general, the manuscript was very clear, easy to read, and compelling. I would recommend the authors carefully check the text for consistency and minor typos. For example:
  
  The sample size for the monkeys kept changing throughout the paper. E.g., Experiment 1: n = 2 (line 149); n = 3 (line 205).
  
  Thank you for catching this error, we corrected it. The number of animals was indeed 2 for experiment 1, and 3 for experiment 2. (Animals JD and YS participated in experiment 1 and JD, JC and DN in experiment 2. So only JD participated in both experiments.)
  
  Similarly, the number of stimulus pairs is reported inconsistently (4 on line 149, 5 pairs later in the paper).
  
  We’re sorry that this was unclear. We used 5 sets of 4 audio-visual pairs each. We now clarify this, on line 157 and on lines 514-516.
  
  At least one case of p>0.0001, rather than p < 0.0001 (I assume).
  
  Thank you once again, we now corrected this.
  
  Reviewer #2 (Recommendations For The Authors):
  
  One major issue in the study is the absence of significant results in monkeys. Indeed, the authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). In other words: what are the statistics for the MDN regarding congruity, canonicity, and interaction in both species? Since the authors have already performed this type of analysis for language and Math ROIs (table 5), it should be relatively easy for them to extend it to the MDN. Demonstrating that results in monkeys are far from significant could further convince the reader.
  
  Furthermore, while the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. Specifically, it would be valuable to describe the proportion of human participants in which the effects of congruency, canonicity, and their interaction are significant. Additionally, stating the variability of the F-values for each effect would provide reassurance to the reader regarding the distinctiveness of humans in comparison to monkeys. Low variability in the results would serve to mitigate concerns that the observed disparity is merely a consequence of testing a unique subset of monkeys, which may differ from the general population. Indeed, this would be a greater support to the notion that the dissimilarity stems from a genuine distinction between the two species.
  
  We responded to both of these points above.
  
  In terms of methods, details are missing:
  
  - How many trials of each condition are there exactly? (10% of 44 trials is 4.4) :
  
  We wrote: “In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials).”(See lines 596-600.)
  
  Thus, each block comprised 4 initial trials, 28 canonical congruent trials, 4 canonical incongruent, 4 reverse congruent and 4 reverse incongruent trials, i.e. 4+28+3x4=40 trials.
  
  - How long is one trial?
  
  As written in the method section: “In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds” i.e. 700+100+700=1500, plus 3 to 4.75 seconds of blank between the trials (see lines 531-533).
  
  - How are the stimulus presentations jittered?
  
  See : “The pairs were separated by a variable inter-trial-interval randomly chosen among eight different durations between 3 and 4.75 seconds (step=250 ms). The series of 8 intervals was randomized again each time it was completed.”(lines 533-535).
  
  - What is the statistical power achieved for humans? And for monkeys?
  
  We know of no standard way to define power for fMRI experiments. Power will depend on so many parameters, including the fMRI signal-to-noise ratio, the attention of the subject, the areas being considered, the type of analysis (whole-brain versus ROIs), etc.
  
  - Videos are mentioned in the methods, is it the image and sound? It is not clear.
  
  We’re sorry that it was unclear. Video’s were only used for the training of the human subjects. We now corrected this in the method section (lines 552-554).
  
  Reviewer #3 (Recommendations For The Authors):
  
  The main recommendations are to adjust the framing (making it less bold and more connected to the empirical evidence) and to ensure independence in the statistical analyses of the fMRI data.
  
  See our replies to the reviewer’s comments on “Framing” above. In particular, we changed the title of the paper from “Brain mechanisms of reversible symbolic reference” to “Brain areas for reversible symbolic reference”.
  
  References cited in this response
  
  Dehaene, S., Al Roumi, F., Lakretz, Y., Planton, S., & Sablé-Meyer, M. (2022). Symbols and mental programs : A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9), 751‑766. https://doi.org/10.1016/j.tics.2022.06.010.
  
  Dehaene-Lambertz, Ghislaine, Stanislas Dehaene, et Lucie Hertz-Pannier. Functional Neuroimaging of Speech Perception in Infants. Science 298, no 5600 (2002): 2013-15. https://doi.org/10.1126/science.1077066.
  
  Ekramnia M, Dehaene-Lambertz G. 2019. Investigating bidirectionality of associations in young infants as an approach to the symbolic system. Presented at the CogSci. p. 3449.
  
  Fedorenko E, Duncan J, Kanwisher N (2013) Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A 110:16616-16621.
  
  Kabdebon, Claire, et Ghislaine Dehaene-Lambertz. « Symbolic Labeling in 5-Month-Old Human Infants ». Proceedings of the National Academy of Sciences 116, no 12 (2019): 5805-10. https://doi.org/10.1073/pnas.1809144116.
  
  Mitchell, D. J., Bell, A. H., Buckley, M. J., Mitchell, A. S., Sallet, J., & Duncan, J. (2016). A Putative Multiple-Demand System in the Macaque Brain. Journal of Neuroscience, 36(33), 8574‑8585. https://doi.org/10.1523/JNEUROSCI.0810-16.2016
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.04.531109v2
www.biorxiv.org www.biorxiv.org

I-Spin live: An open-source software based on blind-source separation for real-time decoding of motor unit activity in humans

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This manuscript compiles existing algorithms into an open-source software package that enables realtime motor unit decomposition from muscle activity collected via grids of surface electrodes and indwelling electrode arrays. The software package is valuable given that many motor neuroscience labs are using such algorithms and that there exist a host of potential real-time applications for such data. Validation of the software package is generally solid but incomplete in some important areas: the primary data is narrow in scope and only from male participants, and there is a lack of ground truth tests on synthetic data. The impact of the software package could be strengthened by making it less tied to specific electrode hardware and by expanding it to easily permit offline analysis.
  
  We thank the reviewers and editors for their comments and suggestions after reading the initial version of our manuscript. In this second iteration, we have performed a validation of the algorithm using synthetic EMG signals. We have also added experimental data collected in female participants. Finally, the new version of I-Spin is compatible with the Open Ephys GUI that can interface with devices such as the Open Ephys and Intan acquisition boards. Another version has been developed for interfacing with the devices provided by the TMSi company (https://info.tmsi.com/blog/ispin-saga-real-timemotor-unit-decomposition-tool). We believe that such changes will make I-Spin more accessible for a broad range of experimental setups and research teams. Please find below the specific answers to the reviewers’ comments.
  
  Reviewer #1 (Public Review):
  
  Many labs worldwide now use the blind source deconvolution technique to identify the firing patterns of multiple motor units simultaneously in human subjects. This technique has had a truly transformative effect on our understanding of the structure of motor output in both normal subjects and, increasingly, in persons with neurological disorders. The key advance presented here is that the software provides real-time identification of these firing patterns. The main strengths are the clarity of the presentation and the great potential that real-time decoding will provide. Figures are especially effective and statistical analyses are excellent.
  
  We thank the reviewer for this positive appreciation of our work.
  
  The main limitation of the work is that only male subjects were included in the validation of the software. The reason given - that yield of number of motor units identified is generally larger in males than females - is reasonable in the sense that this is the first systematic test of this real-time approach. At a minimum, however, the authors should clearly commit to future work with female subjects and emphasize the importance of considering sex differences.
  
  As emphasised by the reviewer, the number of identified motor units is typically higher in males than females when using surface EMG (Taylor et al., 2022), which is the current main limitation of the implementation of offline EMG decomposition technique in a broad and representative sample of research participants. These differences between biological sex are less present when using intramuscular EMG, as the signals are less affected by the filtering effect of the volume conductor separating the motor units from the recording electrodes. Besides the different yields expected between males and females, we do not expect differences in terms of the accuracy of the motor unit identification algorithm, which is the main outcome of this paper.
  
  Nevertheless, we acknowledge the importance to understand the reasons for this difference, and the imperative to refine algorithms and/or surface electrode design to mitigate this major limitation with surface EMG.
  
  To support this point, the discussion has been updated (P20; L480):
  
  ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’
  
  Finally, we have completed new experiments including males and females in this new iteration (P.12; L.295):
  
  ‘Application of motor unit filters in experimental data
  
  We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).
  
  When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’
  
  A second weakness is that the Introduction does a poor job of establishing the potential importance of the real-time approach.
  
  The introduction has been modified to highlight the importance of identifying the spiking activity of motor units in real time. Specifically, the first paragraph has been rewritten to read (P3; L67):
  
  ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications’
  
  Reviewer #2 (Public Review):
  
  Rossato et al present I-spin live, a software package to perform real-time blind-source separation-based sorting of motor unit activity. The core contribution of this manuscript is the development and validation of a software package to perform motor unit sorting, apply the resulting motor unit filters in real-time during muscle contractions, and provide real-time visual feedback of the motor unit activity. I have a few concerns with the work as presented:
  
  I found it challenging to specifically understand the technical contributions of this manuscript. The authors do not appear to be claiming anything novel algorithmically (with respect to spike sorting) or methodologically (with respect to manual editing of spikes before the use of the algorithms in real-time). My takeaway is that the key contributions are C1) development of an open-source implementation of the Negro algorithm, C2) validating it for real-time application (evaluating its sorting efficacy, and closed-loop performance, etc), and developing a software package to run in closed-loop with visual feedback. I will comment on each of these items separately below. It would be great if the authors could more explicitly lay out the key contributions of this manuscript in the text.
  
  The main objective of this work was to provide an open-source implementation of the real-time identification of motor units together with a user interface that allow researchers to easily process the data and display the firing activity of motor unit in the form of several visual feedback. We have explicitly laid out these key contributions in the introduction: “Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’
  
  Related to the above, much of the validation of the algorithms in this manuscript has a "trust me" feel. The authors note that the Negro et al. algorithm has already been validated, so very few details or presentations of primary data showing the algorithm's performance are shown. Similarly, the efficacy of the decomposition approach is evaluated using manual editing of the sorting output as a reference, which is a subjective process, and users would greatly benefit from explicit guidance. There are very few details of manual editing shown in this manuscript (I believe the authors reference the Hug et al. 2021 paper for these details), and little discussion of the core challenges and variability of that process, even though it seems to be a critical step in the proposed workflow. So this is very hard to evaluate and would be challenging for readers to replicate.
  
  To address the reviewer’s comment, we added a validation step using synthetic EMG data (P.10; L.235).
  
  ‘Validation of the algorithm
  
  We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units.
  
  Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.
  
  Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and
  
  80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’
  
  In addition, we added a new paragraph in the Method section to describe the manual editing process (P.26; L.658).
  
  ‘There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55). Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%. All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).’
  
  I found the User Guide in the Github package to be easy to follow. Importantly, it seems heavily tied to the specific hardware (Quattrocento). I understand it may be difficult to make the full software package work with different hardware, but it seems important to at least make an offline analysis of recorded data possible for this package to be useful more broadly.
  
  The software was updated to perform real-time decomposition with signals recorded from the Quattrocento and the Open Ephys GUI, which is compatible with Intan and Open Ephys acquisition boards. I-Spin has also been adapted by TMSi to perform real-time decomposition with their devices (https://info.tmsi.com/blog/ispin-saga-real-time-motor-unit-decomposition-tool).
  
  Moreover, the manual editing panel of the software can now import any files from these devices and allow users to reformat data in mat files to perform offline analyses.
  
  While this may be a powerful platform, it is also very possible that without more details and careful guidance for users on potential pitfalls, many non-experts in sorting could use this as a platform for somewhat sloppy science.
  
  We fully agree with the reviewer that real-time EMG decomposition - with a different approach here than spike sorting - may yield unreliable results if not applied properly. As outlined in the introduction of our initial manuscript, assessing the accuracy and limitations of real-time decomposition was a primary motivation for this study. Specifically, we compared accuracy between contraction intensities, muscles, and electrode types (see Results section).
  
  We also demonstrated that manual editing of the decomposition outputs should be done after the training phase to improve the motor unit filters, thereby improving the accuracy of real-time decomposition. We also outlined the importance to never blindly accept the result of the decomposition without visual inspection and manual editing. (P8; L214)
  
  ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’
  
  We have also included more detailed information about the manual editing process (see above).
  
  The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data.
  
  This link to the data on figshare was added in the GitHub.
  
  Given the centrality of the real-time visual feedback to their system, the authors should show some examples of the actual display etc. so readers can understand what the system in action actually looks like (I believe there is no presentation of the actual system in the manuscript, just in the User Guide). Similarly, it would be helpful to have a schematic figure outlining the full workflow that a user goes through when using this system.
  
  A figure of the workflow is present in the user manual. Additionally, we now display traces of visual feedback in figure 5 and we added videos of the software during each of the visual feedback in supplemental materials.
  
  The authors note all data was collected with male subjects because more motor units can be decomposed from male subjects relative to females. But what is the long-term outlook for the field if studies avoid female subjects because their motor units may be harder to decompose? This should at least be discussed - it is an important challenge for the field to solve, and it is unacceptable if new methods just avoid this problem and are only tested on male subjects.
  
  This point was rightly raised by each of the three reviewers. To solve this, we added data collected on four females, and discussed future developments to make the decomposition of surface EMG equally performant for everyone (P.20; L.480).
  
  ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’
  
  Specific comments on the core contributions of this paper:
  
  C1. Development of an open-source implementation of the Negro algorithm
  
  This seems an important contribution and useful for the community. There are very few figures showing any primary data, the efficacy of sorting, raw traces showing the waveforms that are identified, cluster shapes, etc. I realize the high-level algorithm has been outlined elsewhere, but the implementation in this package, and its efficacy, is a core component of the system and the claims being made in this paper. Much more presentation of data is needed to evaluate this.
  
  It is worth noting that the approach used here is based on blind source separation, which is different than spike-sorting algorithms as it relies on the statistical properties of the spike trains (their sparseness) rather than the profiles of the action potentials. In short, we optimise separation vectors that are applied onto the whitened signal to generate a sparse motor unit pulse train. The discharge times are then directly estimated from the high peaks of this pulse train (Section 1 of the results; overview of the approach).
  
  We are thus displaying motor unit pulse trains in three figures with the automatically detected discharge times, with cases of successful separation in figure 1 and merged motor units in the same pulse train in figures 3 and 4.
  
  We also validated the algorithm with synthetic EMG to provide objective data on the accuracy of the algorithm. These results are shown in the section ‘Validation of the algorithm’ and displayed in figure 3.
  
  Similarly, more information on the offline manual editing process (e.g. showing before/after examples with primary data) would be important to gain confidence in the method. The current paper shows application to both surface EMG and intramuscular EMG, but I could not find IM EMG examples in the Hug paper (apologies if I missed them). Surface and IM data are very, very different, so one would imagine the considerations when working with them should also be different.
  
  In response to another comment from the reviewer, we have included more detailed information about the manual editing process (see above). As stated above, the decomposition approach used in our software differs from a spike sorting approach. Therefore, even though intramuscular and surface EMG signals are different, the decomposition and manual editing process is the same.
  
  All descriptions of math/algorithms are presented in text, without any actual math, variable definitions, etc. This presentation makes it difficult to understand what is done. I would strongly recommend writing out equations and defining variables where possible.
  
  More details on how the level of sparseness is controlled during optimization would be helpful.
  
  And how this sparseness penalty is weighed against other optimization costs.
  
  A mathematical description of the model has been added in the methods (P25; L620)
  
  ‘Mathematical modelling of the recorded spike trains.
  
  The spike train of a motor neuron recorded over time 𝑡 ∈ [0, 𝑇] can be described as the result of a convolution between a delta function (d) representing the firing times (j), and finite impulse responses (h) representing action potentials of duration L: . In practice, the nature of h and the duration L depend on the type of recordings. For electrophysiological measurements, h characterises the local electrical field generated by the spike and conducted through the surrounding tissues.
  
  As the recorded volume of tissue comprises many active neurons, each recording can be considered as a convolutive mixture of multiple sources, and the previous equation can be expressed in the form of a matrix to also consider all the electrodes of an array: given , where is a matrix of m electrophysiological signals, is a matrix of n motor neurons’ spike trains, and 𝐻(𝑙) is a m by n matrix containing the lth sample of action potentials from n neurons and m signals. In this situation, we can reformulate the model as an instantaneous mixture of an extended set of sources, that is, the motor neurons’ spike trains and their delayed versions. This allows us to simply write the previous equation as a multiplication of matrices, in which each source is delayed L times, L being the duration of the impulse response h. This model can be inverted for neural decoding with source-separation approaches.’
  
  The rest of the decomposition approach was rewritten to make it clearer for the reader:
  
  ‘The monopolar EMG signals collected during the baseline contractions were extended with an extension factor of   1000/m (21), where m is the number of channels free of any noise or artifact. The signals were then demeaned and whitened. A contrast function was iteratively applied to estimate a separation vector that maximised the level of sparseness of the motor unit pulse train (Figure 1B). This loop stopped when the variation of the separation vector between two successive iterations reaches a predefined lower bound. After the application of a peak detection algorithm, the motor unit pulse train contained high peaks (i.e., the spikes from the identified motor unit) and low peaks from other motor units and noise. High peaks were separated from low peaks and noise using K-mean classification with two classes (Figure 1B). The peaks from the class with the highest centroid were considered as spikes of the identified motor unit. A second algorithm refined the estimation of the discharge times by iteratively recalculating the separation vector and repeating the steps with peak detection and K-mean classification until the coefficient of variation of the inter-spike intervals was minimised. The accuracy of each estimated spike train was assessed by computing the silhouette (SIL) value between the two classes of peaks identified with K-mean classification (24). When the SIL exceeded a predetermined threshold, the motor unit filter was saved for the real-time decomposition, together with the centroids of the ‘spikes’ and ‘noise’ classes (Figure 2A).’
  
  Overall the paper is not very rigorous about the accuracy of motor unit identification. For example, the authors note that SIL of 0.9 is generally used for offline evaluation (why is this acceptable?), but it was lowered to 0.8 for particular muscles in this study. But overall, it is unclear how sorting accuracy/inaccuracy affects performance in the target applications of this work.
  
  In the section mentioned by the reviewer, we aimed to show how this metric can help to automatically select motor units that are likely to have a higher accuracy of spike detections as the peaks of their pulse train are easily separable from the noise.
  
  We reformulated the conclusion of this section to make it clearer (P8; L214):
  
  ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’
  
  C2. For real-time experiments, variability/jitter is important to characterize. Fig. 4 seems to be presenting mean computational times, etc, but no presentation of variability is shown. It would be helpful to depict data distributions somehow, rather than just mean values.
  
  The variability in computational time was added to this section (P.28; L.730):
  
  ‘The standard deviation of computational times across windows reached 5.4 ± 4.0 ms (raster plot), 4.0 ± 3.2 ms (smoothed firing rate), and 2.8 ± 2.5 ms (quadrant)’
  
  The computational time minimally varied between the successive windows, except when the labels of the x-axis were updated in real-time with scrolling feedback. It was overall always well below the duration of the window.
  
  Author response image 1.
  
  Computational time for each iteration of the algorithm in one participant. The top panels display the continuous computation time through the recording, while the bottom panels display the distribution of computational times. The dash line represents the duration of a window of EMG signals.
  
  There is some description about the difference between units identified during baseline contractions, and how they might be misidentified during online contractions ("Accuracy of the real-time identification..."). This should be described in more detail.
  
  We added an additional section in the results to clarify the concept of motor unit filters, and the reapplication of motor unit filters on signals in real-time. We highlighted how each motor unit must have a unique spatio-temporal signature to be accurately identified by our algorithms, in opposition to merged motor units sharing the same spatio-temporal features. This section shows how motor units accurately identified during baseline contractions can be misidentified during online contractions (P12; L295).
  
  ‘Application of motor unit filters in experimental data
  
  We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).
  
  When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units.
  
  When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’
  
  Fig. 6: Given that a key challenge in sorting should be that collisions occur during large contractions, much more primary data should be presented/visualized to show how the accuracy of sorting changes during larger contractions in online experiments.
  
  As indicated above, the decomposition approach implemented in our software is not based on spikesorting, so it does not require to separate overlapping profiles of action potentials (see Methods).
  
  Fig.7: In presenting the accuracy of biofeedback, it is very hard to gain any intuition for performance by just looking at RMSE values. Showing the online decoded and edited trajectories would help readers understand the magnitude of errors.
  
  We updated the figure to display examples of visual feedback before and after manual editing.
  
  Reviewer #3 (Public Review):
  
  In this manuscript, Rossato and colleagues present a method for real-time decoding of EMG into putative single motor units. Their manuscript details a variety of decision points in their code and data collection pipeline that led to a final result of recording on the order of ~10 putative motor units per muscle in human males. Overall, the manuscript is highly restricted in its potential utility but may be of interest to aficionados. For those outside the field of human or nonhuman primate EMG, these methods will be of limited interest.
  
  We thank the reviewer for his/her throughout evaluation of our manuscript. We recognise that this tool/resource will immediately benefit groups working with humans or nonhuman primate models. However, the recent development of intramuscular thin films with various designs adapted to rodents and smaller animals could expand the range of future users (Chung et al., 2023, Elife). Nonetheless, decoding motor units in humans could be useful for many fields, e.g. in the domains of movement restoration and augmentation. The following paragraph has been added in the introduction section to highlight the importance of real-time decoding of motor unit activity (P3; L67):
  
  ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’
  
  Notes
  
  (1) Artificial data should be used with this method to provide ground truth performance evaluations. Without it, the study assumptions are unchallenged and could be seriously flawed.
  
  A new section on the validation of the algorithm has been added. We verified the accuracy of the algorithm by comparing the series of identified discharge times with the ground truth, i.e., the simulated discharge times. (P10; L235)
  
  ‘Validation of the algorithm
  
  We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units.
  
  Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.
  
  Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and 80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’
  
  (2) From the point of view of a motor control neuroscientist studying movement in animals other than humans or non-human primates, the title was misleadingly hopeful. The use case presented in this study requires human participants to perform isometric contractions, facilitating spatially redundant recordings across the muscle for the algorithm to work. It is unclear whether these methods will be of utility to use cases under more physiological conditions (ie. dynamic movement).
  
  We modified the title to read: “I-Spin live: An open-source software based on blind-source separation for real-time decoding of motor unit activity in humans”.
  
  (3) The text states that "EMG signals recorded with an array of electrodes can be considered and instantaneous mixture of the original motor unit spike trains and their delayed versions." While this may be a true statement, it is not a complete statement, since motor units at distal sites may be shared, not shared, or novel. It was not clear to me whether the diversity of these scenarios would affect the performance of the software or introduce artifacts. In other words, if at site 1 you can pick up the bulk signal of units 1,2,3,4; at site two you pick up the signals of units 2,3,4,5 and site three you pick up the signal of units 3,4,5,6, what does the algorithm assume is happening and what does it report and why?
  
  This section has been rewritten to clarify this point. The EMG signal represents indeed the sum of the active motor units within the recorded muscle volume. Put in other words, it is possible that deep motor units or motor units with innervated fibres far away from the grid were not in this recorded muscle volume, and thus non-identifiable. Another necessary condition to ensure the identifiability of the motor unit is its unique spatio-temporal signature within the signal. It means that two motor units close to each other within the muscle volume will be merged by the model. This point was clarified in the results during the validation and the application of filters on experimental data.
  
  (P5; L115)
  
  ‘An EMG signal represents the sum of trains of action potentials from all the active motor units within the recorded muscle volume (Figure 1A). During stationary conditions, e.g., isometric contractions, the train of motor unit action potentials can be modelled as the convolution of series of discrete delta functions, representing the discharge times, and motor unit action potentials that have a consistent shape across time. When EMG signals are recorded with an array of electrodes, the shape of the recorded potential of each motor unit differs across electrodes. This is due to 1) the varying conduction velocity of action potentials among the muscle fibres, and 2) the location/depth of the muscle fibres that belong to each motor unit relatively to the electrodes, which impact the low pass filtering effect of the tissue on the recorded potential. Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29). The uniqueness of motor unit action potential profiles is necessary for the blind source separation to accurately estimate the motor unit discharge times. Conversely, the spike trains of two motor units with similar action potential profiles will be merged by the model.
  
  Our software uses a fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a separation vector (i.e., the motor unit filter) for each motor unit [Figure 1B; (24-26)]. (24-26)]. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and a smaller number of samples significantly greater than zero (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units). During the decomposition in real-time, short segments of EMG signals are projected on the saved separation vectors, and the peaks are classified as discharge times if they are closer to the centroid of the class ‘spikes’ than to the centroid of the class ‘noise’ (Figure 1C). The algorithm used to identify motor units discharge activity is based on that proposed by Negro et al. (24) and Barsakcioglu et al. (26).’
  
  (4) I could not fully appreciate the performance gap solved by the current methods. What was not achievable before that is now achievable? The 125 ms speed of deconvolution? What was achievable before? Intro text around ln 85 states that 'most of the current implementations of this approach rely on offline processing, which restricts its ability to be used..." but no reference is provided here about what the non 'most' of can achieve.
  
  (8) The authors might try to add text to be more circumspect about the contributions of this method. I would recommend emphasizing the conceptual advances over the specifics of the performance of the algorithm since processor speed and implementation of the ideas in a faster environment (Matlab can be slow) will change those outcomes in a trivial way. Yet, much of the results section is very focused on these metrics.
  
  The main contribution of this work submitted to the section ‘Tools and Resource’ of Elife is to provide a user interface that enables researchers to decompose EMG signals recorded with multichannel systems into motor unit activities, to perform this process in real-time, and to translate it into visual feedback. The user interface is fully open source and does not require coding experience. If necessary, the users can inspect the commented code and even modify it for their own experimental setup. The toolbox is now compatible with various acquisition boards, which can expand its use to novel surface and intramuscular arrays of electrodes.
  
  (5) Relatedly, it would have been nice to see a proof of concept using real-time feedback for some kind of biofeedback signal. If that is the objective here, why not show us this? I found the actual readout metrics of performance rather esoteric. They may be of interest to very close experts so I will defer to them for input.
  
  We agree with the reviewer. Videos were added to the supplemental materials to show the different forms of feedback, together with a case scenario where the participant try to separate the activity of two motor units from the same muscle.
  
  (6) I was disappointed to see that only male participants are used because of some vague statement that 'it is widely known in the field' that more motor units can be resolved in males, without thorough referencing. It seems that the objective of the algorithm is the speed of analysis, not the number of units, which makes the elimination of female participants not justified.
  
  The reviewer is right and that was corrected in the new version of the manuscript. We first performed additional experiments in both males and females focused on the accuracy of the approach, and further discussed the differences in yield between men and women in the discussion together with research perspectives to solve this issue.
  
  Results (P12; L296):
  
  ‘We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).
  
  When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’
  
  Discussion (P20; L480):
  
  “An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.”
  
  (7) Human curation is often used in spike sorting, but the description of criteria used in this step or how the human curation choices are documented is missing.
  
  To address the reviewer’s comment, we added a new paragraph in the Method section to describe the manual editing process: (P26; L657)
  
  “There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55). Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%. All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).”
  
  Minor
  
  Ln 115, "inversing" is not a word. "inverse" is not a verb
  
  Changed as suggested
  
  Ln 186, typo, bioadhesive
  
  Changed as suggested
  
  MVC should be defined on first use. It is currently defined on 3rd use or so.
  
  The term rate is used in a variety of places without units. Eg line 465 but not limited to that
  
  Changed as suggested
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Two minor comments: Para 125: it is not clear what is meant by "spatial distribution" of recording electrodes.
  
  ‘Density’ was used instead of ‘spatial distribution’ to now read:
  
  ‘Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29).’
  
  Para 545: perhaps a bit more explanation about why low spatial overlap is better would be appropriate.
  
  We added a section in the results showing how motor units with similar spatial signatures are merged by our model, leading to a lower precision. We therefore changed this sentence to now read:
  
  ‘Therefore, the likelihood of having spatially overlapping motor unit action potentials - and thus merged motor units - is lower, which explains why the rate of agreement of motor units identified from intramuscular arrays of electrodes is much higher than grids of surface electrodes (12, 13).’
  
  Reviewer #2 (Recommendations For The Authors):
  
  The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data. (Apologies if I missed this - it would be helpful to make it more prominent)
  
  The link to the data on figshare was added in the GitHub, as well as data samples to run the algorithm offline and test manual editing.
  
  Minor comments:
  
  Not sure what is meant by "boundary capabilities of online decomposition"
  
  This was removed to only discuss the accuracy of online decomposition.
  
  CoV for ISIs is not formally defined or justified.
  
  This was added to the caption of figure 2:
  
  ‘The CoV of ISI estimates the regularity of spiking for each motor unit, an expected behaviour during isometric contractions at consistent levels of force.’
  
  Fig. 4: slope units should be ms/motor unit, perhaps?
  
  Changed as suggested.
  
  In some places, the manuscript uses "edition" to describe the editing process. I am not familiar with this usage, "editing" may be more common.
  
  Editing is now used through the entire manuscript.
  
  Reviewer #3 (Recommendations For The Authors):
  
  I would recommend that the authors revise their manuscript to conform to eLife formatting guidelines, including moving the methods to the end of the manuscript. This change may entail substantial editing since many ideas are presented in order from the beginning of the methods. While this suggestion may seem superficial, the success of the new publishing model might benefit from general uniformity in manuscript style.
  
  We changed and edited the draft to follow the classic format of Elife papers.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.14.536933v2
www.biorxiv.org www.biorxiv.org

Emergence of Dip2-mediated Specific DAG-based PKC Signalling Axis in Eukaryotes

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.
  
  Strengths:
  
  The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.
  
  We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of DAGs regulated by Dip2 activate PKC signalling.
  
  Weaknesses:
  
  One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.
  
  We thank the reviewer for the suggestion to trace the localization of Dip2 in the absence of various DAG-acting enzymes. To address this, we generated Dip2-GFP knock-in (KI) in Δpah1, Δlro1 and Δdga1 strains, confirming successful integration by western blotting using an anti-GFP antibody. We then performed microscopy to examine the localization of Dip2. Since Dip2 is a mitochondria-vacuole contact site protein that predominantly localizes to mitochondria (approximately 60% puncta of Dip2 localize to mitochondria) (Mondal et al. 2022), we co-stained the cells with MitoTracker red to visualize mitochondria.
  
  Consistent with our previous findings, Dip2 colocalizes with the MitoTracker red in WT (Figure 3-figure supplement 2 A). As suggested by the reviewer, we deleted PAH1, which converts phosphatidic acid to DAGs and is also known to work at the nucleus-vacuole junction. On examining whether absence of PAH1 influences the localization of Dip2, we found that there is no change in Dip2’s spatial organization. This could also be due to no observable change in the DAG species on deleting PAH1, as noted in our lipidomic studies (Figure 4. figure supplement 2A). These observations suggest that in a homeostatic condition, Pah1 does not affect the DAG pool acted upon by Dip2 and therefore has no influence on Dip2’s subcellular localization. This data has been incorporated in the revised manuscript (line no. 286-289) and Figure 4-figure supplement 2D-E.
  
  Similarly, we probed for the localization of Dip2 in LRO1 and DGA1 knock out strains. These enzymes are responsible for converting bulk DAGs to TAGs. We have previously shown that Dip2 is selective for only C36:0 and C36:1 and does not act on the bulk DAGs (Mondal et al. 2022). Both Lro1 and Dga1 are endoplasmic reticulum (ER) resident proteins and the bulk DAG accumulation in their knockouts is shown to be in the ER (Li et al. 2020), not influencing the mitochondrial DAG pool. On tracing Dip2’s localization in these knockouts, we found that Dip2 remains in the mitochondria (Figure 3-figure supplement 2, Figure 4. figure supplement 2D,E). These results suggest that Dip2 localization is not influenced by bulk DAG accumulation, reinforcing its specificity toward selective DAGs, which are likely to be present at mitochondria and mitochondria-vacuole contact sites. We have added this data in the revised manuscript (line no. 240-246) with Figure 3. figure supplement 2.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.
  
  Strengths:
  
  Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.
  
  We would like to thank the reviewer for the positive comments on our work and finding the study novel and interesting.
  
  Weaknesses:
  
  More evidence is needed to support the central hypothesis. The authors may consider the following:
  
  (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.
  
  We thank the reviewer for the insightful comments. We were unable to include C36:1 DAG in our in vitro DAG binding assays because it is not commercially available. We have now explicitly mentioned it in the revised manuscript (Line no. 186).
  
  We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs, is supported by a combination of robust in vitro and in vivo data:
  
  (1) In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain binds only to the selective DAG and does not interact with bulk DAGs.
  
  (2) In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected.
  
  These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs.
  
  Moreover, the structural basis of this selectivity would require either a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. In addition, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups. Also, we hypothesize that the DAG selectivity by Pkc1 is more of a membrane phenomenon wherein these DAGs might create a specific microdomain or form a particular curvature that is sensed by Pkc1. Investigating this would require extensive structural and biophysical studies, that are beyond the scope of the current work but are planned for future research.
  
  (2) Does Dip2 colocalize with Plc1 or Pkc1?
  
  As shown in our previous study (Mondal et al. 2022) and in the above section (Figure 3. figure supplement 2(A-B)), Dip2 predominantly localizes to the mitochondria. Pkc1, on the other hand, is known to be found in the cytosol, plasma membrane and bud site (Andrews and Stark 2000). We also checked the localization of Pkc1, co-stained with mitotracker-red and observed no significant overlap between the two, confirming that Pkc1 does not colocalize with Dip2 (Author response image 1).
  
  Author response image 1.
  
  Live cell microscopy for tracing Pkc1 localization. (A) Microscopy image panel showing DIC image (left), fluorescence for (A) Pkc1 tagged with GFP, mitotracker-red for staining mitochondria and the merged image for both the fluorophores (right). Scale bar represents 5 µm. (B) Line scan plotted for the fluorescence intensity of Pkc1-GFP along with mitotracker-red across the line shown in the merged panel.
  
  Moreover, as suggested by the reviewer, we also checked the localization of Plc1 and found that Plc1 is present in cytosol and shows a partial colocalization with the mitochondria (Figure 4-figure supplement 3A-B). As some puncta of Dip2 also colocalize with the vacuoles, we checked whether Plc1 also follows such localization pattern. We costained Plc1-GFP with FM4-64, a vacuolar membrane dye and observed that Plc1 partially localizes to vacuoles as well (Figure 4-figure supplement 3C-D). This is also observed in a previous study where Plc1 was found in a subcellular fractionation of isolated yeast vacuoles and total cell lysate (Jun, Fratti, and Wickner 2004). We also checked similar to Dip2, whether Plc1 also localizes to the Mitochondria-vacuole contact site by using tri-colour imaging with FM4-64 for vacuole, DAPI for mitochondria and GFP tagged Plc1. We were not able to trace Dip2 and Plc1 simultaneously as we could not generate a strain endogenously tagged with two different colours even after several attempts. However, from our observations, we can conclude that Plc1 partially localizes to mitochondria and vacuole and might be locally producing the selective DAGs to be acted upon by Dip2. We have incorporated this data in the revised manuscript (line no. 301-304) with Figure 4-figure supplement 3.
  
  For probing the localization of Dip2 upon Plc1 activation, we used cell wall stress- a condition inducing Plc1 activation for selective DAG production (this study). Under this condition, we probed the localization of Dip2 by fluorescent microscopy and found that Dip2 does not move to the plasma membrane but remains localized to mitochondria (Figure. 1. figure supplement 3). This result has been added in the revised manuscript (line no. 153-160) with Figure. 1-figure supplement 3.
  
  This raises intriguing questions regarding the spatial regulation of Pkc1 by Dip2. Since Dip2’s localization remains unaffected, whether the selective DAGs, presumably at the mitochondria, move to the plasma membrane for Pkc1 activation or the Pkc1 translocates to the mitochondria needs further exploration. Addressing these possibilities will require a combination of genetic approaches, organellar lipidomics, and advanced microscopy, which we aim to explore in future studies.
  
  References:
  
  Andrews, P. D., and M. J. Stark. 2000. “Dynamic, Rho1p-Dependent Localization of Pkc1p to Sites of Polarized Growth.” Journal of Cell Science 113 ( Pt 15): 2685–93. doi:10.1242/jcs.113.15.2685.
  
  Jun, Youngsoo, Rutilio A. Fratti, and William Wickner. 2004. “Diacylglycerol and Its Formation by Phospholipase C Regulate Rab- and SNARE-Dependent Yeast Vacuole Fusion*.” Journal of Biological Chemistry 279(51): 53186–95. doi:10.1074/jbc.M411363200.
  
  Li, Dan, Shu-Gao Yang, Cheng-Wen He, Zheng-Tan Zhang, Yongheng Liang, Hui Li, Jing Zhu, et al. 2020. “Excess Diacylglycerol at the Endoplasmic Reticulum Disrupts Endomembrane Homeostasis and Autophagy.” BMC Biology 18(1): 107. doi:10.1186/s12915-020-00837-w.
  
  Mondal, Sudipta, Priyadarshan Kinatukara, Shubham Singh, Sakshi Shambhavi, Gajanan S Patil, Noopur Dubey, Salam Herojeet Singh, et al. 2022. “DIP2 Is a Unique Regulator of Diacylglycerol Lipid Homeostasis in Eukaryotes.” eLife 11: e77665. doi:10.7554/eLife.77665.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.15.618531v2
www.biorxiv.org www.biorxiv.org

New submission 06/12/2023, 09:50:33

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The biogenesis of outer membrane proteins (OMPs) into the outer membranes of Gram-negative bacteria is still not fully understood, particularly substrate recognition and insertion by beta-assembly machinery (BAM). In the studies, the authors present their studies that in addition to recognition by the last strand of an OMP, sometimes referred to as the beta-signal, an additional signal upstream of the last strand is also important for OMP biogenesis.
  
  Strengths:
  
  Overall the manuscript is well organized and written, and addresses an important question in the field. The idea that BAM recognizes multiple signals on OMPs has been presented previously, however, it was not fully tested.
  
  The authors here re-address this idea and propose that it is a more general mechanism used by BAM for OMP biogenesis.
  
  The notion that additional signals assist in biogenesis is an important concept that indeed needs fully tested in OMP biogenesis.
  
  A significant study was performed with extensive experiments reported in an attempt to address this important question in the field.
  
  The identification of important crosslinks and regions of substrates and Bam proteins that interact during biogenesis is an important contribution that gives clues to the path substrates take en route to the membrane.
  
  Weaknesses:
  
  Major critiques (in no particular order):
  
  The title indicates 'simultaneous recognition', however no experiments were presented that test the order of interactions during OMP biogenesis.
  
  We have replaced the word “Simultaneous” with “Dual” so as not to reflect on the timing of the recognition events for the distinct C-terminal signal and -5 signal.
  
  Aspects of the study focus on the peptides that appear to inhibit OmpC assembly, but should also include an analysis of the peptides that do not to determine this the motif(s) present still or not.
  
  We thank the reviewer for this comment. Our study focuses on the peptides which exhibited an inhibitory effect in order to elucidate further interactions between the BAM complex and substrate proteins, especially in early stage of the assembly process. In the case of peptide 9, which contains all of our proposed elements but did not have an inhibitory effect, there is the presence of an arginine residue at the polar residue next to hydrophobic residue in position 0 (0 Φ). As seen in Fig S5, S6, and S7, there are no positively charged amino acids in the polar residue positions in the -5 or last strands. This might be the reason why peptide 9, as well as peptide 24, the β-signal derived from the mitochondrial OMP Tom40 and contains a lysine at the polar position, did not display an inhibitory effect. Incorporating the reviewer's suggestions might elucidate conditions that should not be added to the elements, but this is not the focus of this paper and was not discussed to avoid complicating the paper.
  
  The β-signal is known to form a β-strand, therefore it is unclear why the authors did not choose to chop OmpC up according to its strands, rather than by a fixed peptide size. What was the rationale for how the peptide lengths were chosen since many of them partially overlap known strands, and only partially (2 residues) overlap each other? It may not be too surprising that most of the inhibitory peptides consist of full strands (#4, 10, 21, 23).
  
  A simple scan of known β-strands would have been an alternative approach, however this comes with the bias of limiting the experiments to predicted substrate (strand) sequences, and it presupposes that the secondary structure element would be formed by this tightly truncated peptide.
  
  Instead, we allowed for the possibility that OMPs meet the BAM complex in an unfolded or partially folded state, and that the secondary structure (β-strand) might only form via β-argumentation after the substrate is placed in the context of the lateral gate. We therefore used peptides that mapped right across the entirety of OmpC, with a two amino acid overlap.
  
  To clarify this important point regarding the unbiased nature of our screen, we have revised the text:
  
  (Lines 147-151) "We used peptides that mapped the entirety of OmpC, with a two amino acid overlap. This we considered preferable to peptides that were restricted by structural features, such as β-strands, in consideration that β-strand formation may or may not have occurred in early-stage interactions at the BAM complex."
  
  It would be good to have an idea of the propensity of the chosen peptides to form β-stands and participate in β-augmentation. We know from previous studies with darobactin and other peptides that they can inhibit OMP assembly by competing with substrates.
  
  We appreciate the reviewer's suggestion. However, we have not conducted biophysical characterizations of the peptides to calculate the propensity of each peptide to form β-stands and participate in β-augmentation. The sort of detailed biophysical analysis done for Darobactin (by the Maier and Hiller groups, The antibiotic darobactin mimics a β-strand to inhibit outer membrane insertase Nature 593:125-129) was a Nature publication based on this single peptide. A further biophysical analysis of all of the peptides presented here goes well beyond the scope of our study.
  
  The recognition motifs that the authors present span up to 9 residues which would suggest a relatively large binding surface, however, the structures of these regions are not large enough to accommodate these large peptides.
  
  The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an 8-residue consensus, some of the inhibitory peptides include additional residues before and after the defined motif of 8 residues, and the lateral gate of BamA has been shown interact with a 7-residue span (eg. Doyle et al, 2022). Cross-linking presented in our study showed BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D).
  
  We appreciate this point of clarification and have modified the text to acknowledge that in the final registering of the peptide with its binding protein, some parts of the peptide might sit beyond the bounds of the BamD receptor’s binding pocket and the BamA lateral gate:
  
  (Lines 458-471) "The β-signal motif (ζxGxx[Ω/Φ]x[Ω/Φ]) is an eight-residue consensus, and internal signal motif is composed of a nine-residue consensus. Recent structures have shown the lateral gate of BamA interacts with a 7-residue span of substrate OMPs. Interestingly, inhibitory compounds, such as darobactin, mimic only three resides of the C-terminal side of β-signal motif. Cross-linking presented here in our study showed that BamD residues R49 and G65 cross-linked to the positions 0 and 6 of the internal signal in OmpC (Fig. 6D). Both signals are larger than the assembly machineries signal binding pocket, implying that the signal might sit beyond the bounds of the signal binding pocket in BamD and the lateral gate in BamA. These finding are consistent with similar observations in other signal sequence recognition events, such as the mitochondrial targeting presequence signal that is longer than the receptor groove formed by the Tom20, the subunit of the translocator of outer membrane (TOM) complex (Yamamoto et al., 2011). The presequence has been shown to bind to Tom20 in several different conformations within the receptor groove (Nyirenda et al., 2013)."
  
  Moreover, the distance between amino acids of BamD which cross-linked to the internal signal, R49 and Y62, is approximately 25 Å (pdbID used 7TT3). The distance of the maximum amino acid length of the internal signal of OmpC, from F280 to Y288, is approximately 22 Å (pdbID used 2J1N). This would allow for the signal to fit within the confines of the TRP motif of BamD.
  
  Author response image 1.
  
  The authors highlight that the sequence motifs are common among the inhibiting peptides, but do not test if this is a necessary motif to mediate the interactions. It would have been good to see if a library of non-OMP related peptides that match this motif could also inhibit or not.
  
  With respect, this additional work would not address any biological question relevant to the function of BamD. To randomize sequences and then classify those that do or don’t fit the motif would help in refining the parameters of the β-signal motif, but that was not our intent.
  
  We have identified the peptides from within the total sequence of an OMP, shown which peptides inhibit in an assembly assay, and then observed that the inhibitory peptides conform to a previously published (β-signal) motif.
  
  In the studies that disrupt the motifs by mutagenesis, an effect was observed and attributed to disruption of the interaction of the 'internal signal'. However, the literature is filled with point mutations in OMPs that disrupt biogenesis, particular those within the membrane region. F280, Y286, V359, and Y365 are all residues that are in the membrane region that point into the membrane. Therefore, more work is needed to confirm that these mutations are in parts of a recognition motif rather than on the residues that are disrupting stability/assembly into the membrane.
  
  As the reviewer pointed out, the side chains of the amino acids constituting the signal elements we determined were all facing the lipid side, of which Y286 and Y365 were important for folding as well as to be recognized. However, F280A and V359A had no effect on folding, but only on assembly through the BAM complex. The fact that position 0 functions as a signal has been demonstrated by peptidomimetics (Fig. 1) and point mutant analysis (Fig. 2). We appreciate this clarification and have modified the text to acknowledge that the all of the signal element faces the lipid side, which contributes to their stability in the membrane finally, and before that the BAM complex actively recognizes them and determines their orientation:
  
  (Lines 519-526) After OMP assembly, all elements of the internal signal are positioned such that they face into the lipid-phase of the membrane. This observation may be a coincidence, or may be utilized by the BAM complex to register and orientate the lipid facing amino acids in the assembling OMP away from the formative lumen of the OMP. Amino acids at position 6, such as Y286 in OmpC, are not only component of the internal signal for binding by the BAM complex, but also act in structural capacity to register the aromatic girdle for optimal stability of the OMP in the membrane.
  
  The title of Figure 3 indicates that disrupting the internal signal motif disrupts OMP assembly, however, the point mutations did not seem to have any effect. Only when both 280 and 286 were mutated was an effect observed. And even then, the trimer appeared to form just fine, albeit at reduced levels, indicating assembly is just fine, rather the rate of biogenesis is being affected.
  
  We appreciate this point and have revised the title of Figure 3 to be:
  
  (Lines 1070-1071) "Modifications in the putative internal signal slow the rate of OMP assembly in vivo."
  
  In Figure 4, the authors attempt to quantify their blots. However, this seems to be a difficult task given the lack of quality of the blots and the spread of the intended signals, particularly of the 'int' bands. However, the more disturbing trend is the obvious reduction in signal from the post-urea treatment, even for the WT samples. The authors are using urea washes to indicate removal of only stalled substrates. However a reduction of signal is also observed for the WT. The authors should quantify this blot as well, but it is clear visually that both WT and the mutant have obvious reductions in the observable signals. Further, this data seems to conflict with Fig 3D where no noticeable difference in OmpC assembly was observed between WT and Y286A, why is this the case?
  
  We have addressed this point by adding a statistical analysis on Fig. 4A. As the reviewer points out, BN-PAGE band quantification is a difficult task given the broad spread of the bands on these gels. Statistical analysis showed that the increase in intermediates (int) was statistically significant for Y286A at all times until 80 min, when the intermediate form signals decrease.
  
  (Lines 1093-1096) "Statistical significance was indicated by the following: N.S. (not significant), p<0.05; , p<0.005; *. Exact p values of intermediate formed by Wt vs Y286A at each timepoint were as follows; 20 minutes: p = 0.03077, 40 minutes: p = 0.02402, 60 minutes: p = 0.00181, 80 minutes: p = 0.0545."
  
  Further regarding the Int. band, we correct the statement as follows.
  
  (Lines 253-254) "Consistent with this, the assembly intermediate which was prominently observed at the OmpC(Y286A) can be extracted from the membranes with urea;"
  
  OMP assembly in vivo has additional periplasmic chaperones and factors present in order to support the assembly process. Therefore, it is likely that some proteins were assembled properly in vivo compared to their in vitro counterparts. Such a decrease has been observed not only in E. coli but also in mitochondrial OMP import (Yamano et al., 2010).
  
  The pull-down assays with BamA and BamD should include a no protein control at the least to confirm there is no non-specific binding to the resin. Also, no detergent was mentioned as part of the pull downs that contained BamA or OmpC, nor was it detailed if OmpC was urea solubilized.
  
  We have performed pull down experiments with a no-protein (Ni-NTA only) control as noted (Author response image 1). The results showed that the amount of OmpC carrying through on beads only was significantly lower than the amount of OmpC bound in the presence of BamD or BamA. The added OmpC was not treated with urea, but was synthesized by in vitro translation; the in vitro translated OmpC is the standard substrate in the EMM assembly assay (Supp Fig. S1) where it is recognized by the BAM complex. Thus, we used it for pull-down as well and, to make this clearer, we have revised as follows:
  
  Author response image 2.
  
  Pull down assay of radio-labelled OmpC with indicated protein or Ni-NTA alone (Ni-NTA) . T; total, FT; Flow throw, W; wash, E; Elute.
  
  (Lines 252-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."
  
  11.
  
  • The neutron reflectometry experiments are not convincing primarily due to the lack controls to confirm a consistent uniform bilayer is being formed and even if so, uniform orientations of the BamA molecules across the surface.
  
  • Further, no controls were performed with BamD alone, or with OmpC alone, and it is hard to understand how the method can discriminate between an actual BamA/BamD complex versus BamA and BamD individually being located at the membrane surface without forming an actual complex.
  
  • Previous studies have reported difficulty in preparing a complex with BamA and BamD from purified components.
  
  • Additionally, little signal differences were observed for the addition of OmpC. However, an elongated unfolded polypeptide that is nearly 400 residues long would be expected to produce a large distinct signal given that only the C-terminal portion is supposedly anchored to BAM, while the rest would be extended out above the surface.
  
  • The depiction in Figure 5D is quite misleading when viewing the full structures on the same scales with one another.
  
  We have addressed these five points individually as follows.
  
  i. The uniform orientation of BamA on the surface is guaranteed by the fixation through a His-tag engineered into extracellular loop 6 of BamA and has been validated in previous studies as cited in the text. Moreover, to explain this, we reconstructed another theoretical model for BamA not oriented well in the system as below. However, we found that the solid lines (after fitting) didn’t align well with the experimental data. We therefore assumed that BamA has oriented well in the membrane bilayer.
  
  Author response image 3.
  
  Experimental (symbols) and fitted (curves) NR profiles of BamA not oriented well in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.
  
  ii. There would be no means by which to do a control with OmpC alone or BamD alone as neither protein binds to the lipid layer chip. OmpC is diluted from urea and then the unbound OmpC is washed from the chip before NR measurements. BamD does not have an acyl group to anchor it to the lipid layer, without BamA to anchor to, it too is washed from the chip before NR measurements. We have reconstructed another theoretical model for both of BamA + BamD embedding in the membrane bilayer, and the fits were shown below. Apparently, the fits didn’t align well with the experimental data, which discriminate the BamA/BamD individually being located at the membrane surface without forming an actual complex.
  
  Author response image 4.
  
  Experimental (symbols) and fitted (curves) NR profiles of BamA+D embedding together in the POPC bilayer in D2O (black), GMW (blue) and H2O (red) buffer.
  
  iii. The previous studies that reported difficulty in preparing a complex with BamA and BamD from purified components were assays done in aqueous solution including detergent solubilized BamA, or with BamA POTRA domains only. Our assay is superior in that it reports the binding of BamD to a purified BamA that has been reconstituted in a lipid bilayer.
  
  iv. The relatively small signal differences observed for the addition of OmpC are expected, since OmpC is an elongated, unfolded polypeptide of nearly 400 residues long which, in the context of this assay, can occupy a huge variation in the positions at which it will sit with only the C-terminal portion anchored to BAM, and the rest moving randomly about and extended from the surface.
  
  v. We appreciate the point raised and have now added a note in the Figure legend that these are depictions of the results and not a scale drawing of the structures.
  
  In the crosslinking studies, the authors show 17 crosslinking sites (43% of all tested) on BamD crosslinked with OmpC. Given that the authors are presenting specific interactions between the two proteins, this is worrisome as the crosslinks were found across the entire surface of BamD. How do the authors explain this? Are all these specific or non-specific?
  
  The crosslinking experiment using purified BamD was an effective assay for comprehensive analysis of the interaction sites between BamD and the substrate. However, as the reviewer pointed out, cross-linking was observed even at the sites that, in the context of the BAM complex, interact with BamC as a protein-protein interaction and would not be available for substrate protein-protein interactions. To complement this, analysis and to address this issue, we also performed the experiment in Fig. 6C.
  
  In Fig. 6C, the interaction of BamD with the substrate is examined in vivo, and the results demonstrate that if BPA is introduced into the site, we designated as the substrate recognition site, it is cross-linked to the substrate. On the other hand, position 114 was found to crosslink with the substrate in vitro crosslinking, but not in vivo. It should be noted that position 114 has also been confirmed to form cross-link products with BamC, we believe that BamD-substrate interactions in the native state have been investigated. To explain the above, we have added the following description to the Results section.
  
  (Lines 319-321) "Structurally, these amino acids locate both the lumen side of funnel-like structure (e.g. 49 or 62) and outside of funnel-like structure such as BamC binding site (e.g. 114) (fig. S12C). (Lines 350-357) Positions 49, 53, 65, and 196 of BamD face the interior of the funnel-like structure of the periplasmic domain of the BAM complex, while position 114 is located outside of the funnel-like structure (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016). We note that while position 114 was cross-linked with OmpC in vitro using purified BamD, that this was not seen with in vivo cross-linking. Instead, in the context of the BAM complex, position 114 of BamD binds to the BamC subunit and would not be available for substrate binding in vivo (Bakelar et al., 2016; Gu et al., 2016; Iadanza et al., 2016)."
  
  The study in Figure 6 focuses on defined regions within the OmpC sequence, but a more broad range is necessary to demonstrate specificity to these regions vs binding to other regions of the sequence as well. If the authors wish to demonstrate a specific interaction to this motif, they need to show no binding to other regions.
  
  The region of affinity for the BAM complex was determined by peptidomimetic analysis, and the signal region was further identified by mutational analysis of OmpC. Subsequently, the subunit that recognizes the signal region was identified as BamD. In other words, in the process leading up to Fig. 6, we were able to analyze in detail that other regions were not the target of the study. We have revised the text to make clear that we focus on the signal region including the internal signal, and have not also analyzed other parts of the signal region:
  
  (Lines 329-332) "As our peptidomimetic screen identified conserved features in the internal signal, and cross-linking highlighted the N-terminal and C-terminal TPR motifs of BamD as regions of interaction with OmpC, we focused on amino acids specifically within the β-signals of OmpC and regions of BamD which interact with β-signal."
  
  The levels of the crosslinks are barely detectable via western blot analysis. If the interactions between the two surfaces are required, why are the levels for most of the blots so low?
  
  These are western blots of cross-linked products – the efficiency of cross-linking is far less than 100% of the interacting protein species present in a binding assay and this explains why the levels for the blots are ‘so low’. We have added a sentence to the revised manuscript to make this clear for readers who are not molecular biologists:
  
  (Lines 345-348) "These western blots reveal cross-linked products representing the interacting protein species. Photo cross-linking of unnatural amino acid is not a 100% efficient process, so the level of cross-linked products is only a small proportion of the molecules interacting in the assays."
  
  15.
  
  • Figure 7 indicates that two regions of BamD promote OMP orientation and assembly, however, none of the experiments appears to measure OMP orientation?
  
  • Also, one common observation from panel F was that not only was the trimer reduced, but also the monomer. But even then, still a percentage of the trimer is formed, not a complete loss.
  
  (i) We appreciate this point and have revised the title of Figure 7 to be:
  
  (Lines 1137-1138) "Key residues in two structurally distinct regions of BamD promote β-strand formation and OMP assembly."
  
  (ii) In our description of Fig. 7F (Lines 356-360) we do not distinguish between the amount of monomer and trimer forms, since both are reflective of the overall assembly rate i.e. assembly efficiency. Rather, we state that:
  
  "The EMM assembly assay showed that the internal signal binding site was as important as the β-signal binding site to the overall assembly rates observed for OmpC (Fig. 7F), OmpF (fig. S15D), and LamB (fig. S15E). These results suggest that recognition of both the C-terminal β-signal and the internal signal by BamD is important for efficient protein assembly."
  
  16.
  
  • The experiment in Fig 7B would be more conclusive if it was repeated with both the Y62A and R197A mutants and a double mutant. These controls would also help resolve any effect from crowding that may also promote the crosslinks.
  
  • Further, the mutation of R197 is an odd choice given that this residue has been studied previously and was found to mediate a salt bridge with BamA. How was this resolved by the authors in choosing this site since it was not one of the original crosslinking sites?
  
  As stated in the text, the purpose of the experiment in Figure 7B is to measure the impact of pre-forming a β-strand in the substrate (OmpC) before providing it to the receptor (BamD). We thank the reviewer for the comment on the R197 position of BamD. The C-terminal domain of BamD has been suggested to mediate the BamA-BamD interface, specifically BamD R197 amino acid creates a salt-bridge with BamA E373 (Ricci et al., 2012). It had been postulated that the formation of this salt-bridge is not strictly structural, with R197 highlighted as a key amino acid in BamD activity and this salt-bridge acts as a “check-point” in BAM complex activity (Ricci et al., 2012, Storek et al., 2023). Our results agree with this, showing that the C-terminus of BamD acts in substrate recognition and alignment of the β-signal (Fig. 6, Fig S12). We show that amino acids in the vicinity of R197 (N196, G200, D204) cross-linked well to substrate and mutations to the β-signal prevent this interaction (Fig S12B, D). For mutational analysis of BamD, we looked then at the conservation of the C-terminus of BamD and determined R197 was the most highly conserved amino acid (Fig 6C). In order to account for this, we have adjusted the manuscript:
  
  (Lines 376-377) "R197 has previously been isolated as a suppressor mutation of a BamA temperature sensitive strain (Ricci et al., 2012)."
  
  (Lines 495-496) "This adds an additional role of the C-terminus of BamD beyond a complex stability role (Ricci et al., 2012; Storek et al., 2023)."
  
  As demonstrated by the authors in Fig 8, the mutations in BamD lead to reduction in OMP levels for more than just OmpC and issues with the membrane are clearly observable with Y62A, although not with R197A in the presence of VCN. The authors should also test with rifampicin which is smaller and would monitor even more subtle issues with the membrane. Oddly, no growth was observed for the Vec control in the lower concentration of VCN, but was near WT levels for 3 times VCN, how is this explained?
  
  While it would be interesting to correlate the extent of differences to the molecular size of different antibiotics such as rifampicin, such correlations are not the intended aim of our study. Vancomycin (VCN) is a standard measure of outer membrane integrity in our field, hence its use in our tests for membrane integrity.
  
  We apologize to the reviewer as Figure 8 D-G may have been misleading. Figure 8D,E are using bamD shut-down cells expressing plasmid-borne BamD mutants. Whereas Figure 8F, G are the same strain as used in Figure 3. We have adjusted the figure as well as the figure legend: (Lines 1165-1169) D, E E coli bamD depletion cells expressing mutations at residues, Y62A and R197A, in the β-signal recognition regions of BamD were grown with of VCN. F, G, E coli cells expressing mutations to OmpC internal signal, as shown in Fig 3, grown in the presence of VCN. Mutations to two key residues of the internal signal were sensitive to the presence of VCN.
  
  While Fig 8I indeed shows diminished levels for FY as stated, little difference was observed for the trimer for the other mutants compared to WT, although differences were observed for the dimer. Interestingly, the VY mutant has nearly WT levels of dimer. What do the authors postulate is going on here with the dimer to trimer transition? How do the levels of monomer compare, which is not shown?
  
  The BN-PAGE gel system cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. Recently, Hussain et al., has shown that in vitro proteo-liposome system OmpC assembly progresses from a “short-lived dimeric” form before the final process of trimerization (Hussain et al., 2021). However, their findings suggest that LPS plays the final role in stimulation of dimer-to-trimer, a step well past the recognition step of the β-signals. Mutations to the internal signal of OmpC results in the formation of an intermediate, the substrate stalled on the BAM complex. This stalling, presumably, causes a hinderance to the BAM complex resulting in reduced timer and loss of dimer OmpF signal in the EMM of cells expressing OmpC double mutant strain, FY. cannot resolve protein species that migrate below ~50kDa and the monomer species of the OMPs is below this size. We can’t comment on effects on the monomer because it is not visualized. The non-cropped gel image is shown here. We have noted this in the revised text:
  
  Author response image 5.
  
  Non-cropped gel of Fig. 8I. the asterisk indicates a band observed in the sample loading wells at the top of the gel.
  
  (Lines 417-418) "The dimeric form of endogenous OmpF was prominently observed in both the OmpC(WT) as well as the OmpC(VY) double mutant cells."
  
  In the discussion, the authors indicate they have '...defined an internal signal for OMP assembly', however, their study is limited and only investigates a specific region of OmpC. More is needed to definitively say this for even OmpC, and even more so to indicate this is a general feature for all OMPs.
  
  We acknowledge the reviewer's comment on this point and have expanded the statement to make sure that the conclusion is justified with the specific evidence that is shown in the paper and the supplementary data. We now state:
  
  (Lines 444-447) "This internal signal corresponds to the -5 strand in OmpC and is recognized by BamD. Sequence analysis shows that similar sequence signatures are present in other OMPs (Figs. S5, S6 and S7). These sequences were investigated in two further OMPs: OmpF and LamB (Fig. 2C and D)."
  
  Note, we did not state that this is a general feature for all OMPs. That would not be a reasonable proposition.
  
  20.
  
  • In the proposed model in Fig 9, it is hard to conceive how 5 strands will form along BamD given the limited surface area and tight space beneath BAM.
  
  • More concerning is that the two proposal interaction sites on BamD, Y62 and R197, are on opposite sides of the BamD structure, not along the same interface, which makes this model even more unlikely.
  
  • As evidence against this model, in Figure 9E, the two indicates sites of BamD are not even in close proximity of the modeled substrate strands.
  
  We can address the reviewer’s three concerns here:
  
  i. The first point is that the region (formed by BamD engaged with POTRA domains 1-2 and 5 of BamA) is not sufficient to accommodate five β-strands. Structural analysis reveals that the interaction between the N-terminal side of BamD and POTRA1-2 is substantially changed the conformation by substrate binding, and that this surface is greatly extended. This surface does have enough space to accommodate five beta-strands, as now documented in Fig. 9D, 9E using the latest structures (7TT5 and 7TT2) as illustrations of this. The text now reads:
  
  (Lines 506-515) "Spatially, this indicates the BamD can serve to organize two distinct parts of the nascent OMP substrate at the periplasmic face of the BAM complex, either prior to or in concert with, engagement to the lateral gate of BamA. Assessing this structurally showed the N-terminal region of BamD (interacting with the POTRA1-2 region of BamA) and the C-terminal region of BamD (interacting with POTRA5 proximal to the lateral gate of BamA) (Bakelar et al., 2016; Gu et al., 2016; Tomasek et al., 2020) has the N-terminal region of BamD changing conformation depending on the folding states of the last four β-strands of the substrate OMP, EspP (Doyle et al., 2022). The overall effect of this being a change in the dimensions of this cavity change, a change which is dependent on the folded state of the substrate engaged in it (Fig 9 B-E)."
  
  ii. The second point raised regards the orientation of the substrate recognition residues of BamD. Both Y62A and R197 were located on the lumen side of the funnel in the EspP-BAM transport intermediate structure (PDBID;7TTC); Y62A is relatively located on the edge of BamD, but given that POTRA1-2 undergoes a conformational change and opens this region, as described above, both are located in locations where they could bind to substrates. This was explained in the following text in the results section of revised manuscript.
  
  (Lines 377-379) "Each residue was located on the lumen side of the funnel-like structure in the EspP-BAM assembly intermediate structure (PDBID; 7TTC) (Doyle et al., 2022)."
  
  **Reviewer #2 (Public Review):"
  
  Previously, using bioinformatics study, authors have identified potential sequence motifs that are common to a large subset of beta-barrel outer membrane proteins in gram negative bacteria. Interestingly, in that study, some of those motifs are located in the internal strands of barrels (not near the termini), in addition to the well-known "beta-signal" motif in the C-terminal region.
  
  Here, the authors carried out rigorous biochemical, biophysical, and genetic studies to prove that the newly identified internal motifs are critical to the assembly of outer membrane proteins and the interaction with the BAM complex. The author's approaches are rigorous and comprehensive, whose results reasonably well support the conclusions. While overall enthusiastic, I have some scientific concerns with the rationale of the neutron refractory study, and the distinction between "the intrinsic impairment of the barrel" vs "the impairment of interaction with BAM" that the internal signal may play a role in. I hope that the authors will be able to address this.
  
  Strengths:
  
  It is impressive that the authors took multi-faceted approaches using the assays on reconstituted, cell-based, and population-level (growth) systems.
  
  Assessing the role of the internal motifs in the assembly of model OMPs in the absence and presence of BAM machinery was a nice approach for a precise definition of the role.
  
  Weaknesses:
  
  The result section employing the neutron refractory (NR) needs to be clarified and strengthened in the main text (from line 226). In the current form, the NR result seems not so convincing.
  
  What is the rationale of the approach using NR?
  
  We have now modified the text to make clear that:
  
  (Lines 276-280) "The rationale to these experiments is that NR provides: (i) information on the distance of specified subunits of a protein complex away from the atomically flat gold surface to which the complex is attached, and (ii) allows the addition of samples between measurements, so that multi-step changes can be made to, for example, detect changes in domain conformation in response to the addition of a substrate."
  
  What is the molecular event (readout) that the method detects?
  
  We have now modified the text to make clear that:
  
  (Lines 270-274) "While the biochemical assay demonstrated that the OmpC(Y286A) mutant forms a stalled intermediate with the BAM complex, in a state in which membrane insertion was not completed, biochemical assays such as this cannot elucidate where on BamA-BamD this OmpC(Y286A) substrate is stalled."
  
  What are "R"-y axis and "Q"-x axis and their physical meanings (Fig. 5b)?
  
  The neutron reflectivity, R, refers to the ratio of the incoming and exiting neutron beams and it is measured as a function of Momentum transfer Q, which is defined as Q=4π sinθ/λ, where θ is the angle of incident and λ is the neutron wavelength. R(Q)is approximately given byR(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), the scattering length density (SLD) distribution normal to the surface. SLD is the sum of the coherent neutron scattering lengths of all atoms in the sample layer divided by the volume of the layer. Therefore, the intensity of the reflected beams is highly dependent on the thickness, densities and interface roughness of the samples. This was explained in the following text in the method section of revised manuscript.
  
  (Lines 669-678) "Neutron reflectivity, denoted as R, is the ratio of the incoming to the exiting neutron beams. It’s calculated based on the Momentum transfer Q, which is defined by the formula Q=4π sinθ/λ, where θ represents the angle of incidence and λ stands for the neutron wavelength. The approximate value of R(Q) can be expressed as R(Q)=16π2/ Q2 |ρ(Q)|2, where R(Q) is the one-dimensional Fourier transform of ρ(z), which is the scattering length density (SLD) distribution perpendicular to the surface. SLD is calculated by dividing the sum of the coherent neutron scattering lengths of all atoms in a sample layer by the volume of that layer. Consequently, factors such as thickness, volume fraction, and interface roughness of the samples significantly influence the intensity of the reflected beams."
  
  How are the "layers" defined from the plot (Fig. 5b)?
  
  The “layers” in the plot (Fig. 5b) represent different regions of the sample being studied. In this study, we used a seven-layer model to fit the experimental data (chromium - gold - NTA - HIS8 - β-barrel - P3-5 - P1-2. This was explained in the following text in the figure legend of revised manuscript. (Lines 1115-1116) The experimental data was fitted using a seven-layer model: chromium - gold - NTA - His8 - β-barrel - P3-5 - P1-2.
  
  What are the meanings of "thickness" and "roughness" (Fig. 5c)?
  
  We used neutron reflectometry to determine the relative positions of BAM subunits in a membrane environment. The binding of certain subunits induced conformational changes in other parts of the complex. When a substrate membrane protein is added, the periplasmic POTRA domain of BamA extends further away from the membrane surface. This could result in an increase in thickness as observed in neutron reflectometry measurements.
  
  As for roughness, it is related to the interface properties of the sample. In neutron reflectometry, the intensity of the reflected beams is highly dependent on the thickness, densities, and interface roughness of the samples. An increase in roughness could suggest changes in these properties, possibly due to protein-membrane interactions or structural changes within the membrane.
  
  (Lines 1116-1120) "Table summarizes of the thickness, roughness and volume fraction data of each layer from the NR analysis. The thickness refers to the depth of layered structures being studied as measured in Å. The roughness refers to the irregularities in the surface of the layered structures being studied as measured in Å."
  
  What does "SLD" stand for?
  
  We apologize for not explaining abbreviation when the SLD first came out. We explained it in revised manuscript. (Line 298)
  
  In the result section, "The internal signal is necessary for insertion step of assembly into OM" This section presents an important result that the internal beta-signal is critical to the intrinsic propensity of barrel formation, distinct from the recognition by BAM complex. However, this point is not elaborated in this section. For example, what is the role of these critical residues in the barrel structure formation? That is, are they involved in any special tertiary contacts in the structure or in membrane anchoring of the nascent polypeptide chains?
  
  We appreciate the reviewer's comment on this point. Both position 0 and position 6 appear to be important amino acids for recognition by the BAM complex, since mutations introduced at these positions in peptide 18 prevent competitive inhibition activity.
  
  In terms of the tertiary structure of OmpC, position 6 is an amino acid that contributes to the aromatic girdle, and since Y286A and Y365A affected OMP folding as measured in folding experiments, it is perhaps their position in the aromatic girdle that contributes to the efficiency of β-barrel folding in addition to its function as a recognition signal. We have added a sentence in the revised manuscript:
  
  (Lines 233-236) "Position 6 is an amino acid that contributes to the aromatic girdle. Since Y286A and Y365A affected OMP folding as measured in folding experiments, their positioning into the aromatic girdle may contributes to the efficiency of β-barrel folding, in addition to contributing to the internal signal."
  
  The mutations made at position 0 had no effect on folding, so this residue may function solely in the signal. Given the register of each β-strand in the final barrel, the position 0 residues have side-chains that face out into the lipid environment. From examination of the OmpC crystal structure, the residue at position 0 makes no special tertiary contacts with other, neighbouring residues.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor critiques (in no particular order):
  
  Peptide 18 was identified based on its strong inhibition for EspP assembly but another peptide, peptide 23, also shows inhibition and has no particular consensus.
  
  We would correct this point. Peptide 23 has a strong consensus to the canonical β-signal. We had explained the sequence consensus of β-signal in the Results section of the text. In the third paragraph, we have added a sentence indicating the relationship between peptide 18 and peptide 23.
  
  (Lines 152-168) "Six peptides (4, 10, 17, 18, 21, and 23) were found to inhibit EspP assembly (Fig. 1A). Of these, peptide 23 corresponds to the canonical β-signal of OMPs: it is the final β-strand of OmpC and it contains the consensus motif of the β-signal (ζxGxx[Ω/Φ]x[Ω/Φ]). The inhibition seen with peptide 23 indicated that our peptidomimetics screening system using EspP can detect signals recognized by the BAM complex. In addition to inhibiting EspP assembly, five of the most potent peptides (4, 17, 18, 21, and 23) inhibited additional model OMPs; the porins OmpC and OmpF, the peptidoglycan-binding OmpA, and the maltoporin LamB (fig. S3). Comparing the sequences of these inhibitory peptides suggested the presence of a sub-motif from within the β-signal, namely [Ω/Φ]x[Ω/Φ] (Fig. 1B). The sequence codes refer to conserved residues such that: ζ, is any polar residue; G is a glycine residue; Ω is any aromatic residue; Φ is any hydrophobic residue and x is any residue (Hagan et al., 2015; Kutik et al., 2008). The non-inhibitory peptide 9 contained some elements of the β-signal but did not show inhibition of EspP assembly (Fig. 1A).
  
  Peptide 18 also showed a strong sequence similarity to the consensus motif of the β-signal (Fig. 1B) and, like peptide 23, had a strong inhibitory action on EspP assembly (Fig. 1A). Variant peptides based on the peptide 18 sequence were constructed and tested in the EMM assembly assay (Fig. 1C)."
  
  It is unclear why the authors immediately focused on BamD rather than BamB, given that both were mentioned to mediate interaction with substrate. Was BamB also tested?
  
  We thank the reviewer for this comment. Following the reviewer's suggestion, we have now performed a pull-down experiment on BamB and added it to Fig. S9. We also modified the text of the results as follows.
  
  (Lines 262-265) "Three subunits of the BAM complex have been previously shown to interact with the substrates: BamA, BamB, and BamD (Hagan et al., 2013; Harrison, 1996; Ieva et al., 2011). In vitro pull-down assay showed that while BamA and BamD can independently bind to the in vitro translated OmpC polypeptide (Fig .S9A), BamB did not (Fig. S9B)."
  
  For the in vitro folding assays of the OmpC substrates, labeled and unlabeled, no mention of adding SurA or any other chaperone which is known to be important for mediating OMP biogenesis in vitro.
  
  We appreciate the reviewer’s concerns on this point, however chaperones such as SurA are non-essential factors in the OMP assembly reaction mediated by the BAM complex: the surA gene is not essential and the assembly of OMPs can be measured in the absence of exogenously added SurA. It remains possible that addition of SurA to some of these assays could be useful in detailing aspects of chaperone function in the context of the BAM complex, but that was not the intent of this study.
  
  For the supplementary document, it would be much easier for the reader to have the legends groups with the figures.
  
  Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.
  
  Some of the figures and their captions are not grouped properly and are separated which makes it hard to interpret the figures efficiently.
  
  We thank the reviewer for this comment, we have revised the manuscript and figures to properly group the figures and captions together on a single page.
  
  The authors begin their 'Discussion' with a question (line 454), however, they don't appear to answer or even attempt to address it; suggest removing rhetorical questions.
  
  As per the reviewers’ suggestion, we removed this question.
  
  Line 464, 'unbiased' should be removed. This would imply that if not stated, experiments are 'negatively' biased.
  
  We removed this word and revised the sentence as follows:
  
  (Lines 431-433) "In our experimental approach to assess for inhibitory peptides, specific segments of the major porin substrate OmpC were shown to interact with the BAM complex as peptidomimetic inhibitors."
  
  Lines 466-467; '...go well beyond expected outcomes.' What does this statement mean?
  
  Our peptidomimetics led to unexpected results in elucidating the additional essential signal elements. The manuscript was revised as follows:
  
  (Lines 433-435) "Results for this experimental approach went beyond expected outcomes by identifying the essential elements of the signal Φxxxxxx[Ω/Φ]x[Ω/Φ] in β-strands other than the C-terminal strand."
  
  Line 478; '...rich information that must be oversimplified...'?
  
  We appreciate the reviewer’s pointed out. For more clarity, the manuscript was revised as follows:
  
  (Lines 450-453) "The abundance of information which arises from modeling approaches and from the multitude of candidate OMPs, is generally oversimplified when written as a primary structure description typical of the β-signal for bacterial OMPs (i.e. ζxGxx[Ω/Φ]x[Ω/Φ]) (Kutik et al., 2008)."
  
  There are typos in the supplementary figures.
  
  We have revised and corrected the Supplemental Figure legends.
  
  Reviewer #2 (Recommendations For The Authors):
  
  In Supplementary Information, I recommend adding the figure legends directly to the corresponding figures. Currently, it is very inconvenient to go back and forth between legends and figures.
  
  Following the reviewer's suggestion, we have placed the legends of Supplemental Figures together with each Figure.
  
  Line 94 (p.3): "later"
  
  Lateral?
  
  Yes. We have corrected this.
  
  Line 113 (p.3): The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly" Rationale of the peptide inhibition assay is not clear. How can the peptide sequence that effectively inhibit the assembly interpreted as the b-assembly signal? By competitive binding to BAM or by something else? What is the authors' hypothesis in doing this assay?
  
  In revision, we have added following sentence to explain the aim and design of the peptidomimetics:
  
  (Lines 140-145) "The addition of peptides with BAM complex affinity, such as the OMP β-signal, are capable of exerting an inhibitory effect by competing for binding of substrate OMPs to the BAM complex (Hagan et al., 2015). Thus, the addition of peptides derived from the entirety of OMPs to the EMM assembly assay, which can evaluate assembly efficiency with high accuracy, expects to identify novel regions that have affinity for the BAM complex."
  
  Line 113- (p.3) and Fig. S1: The result section, "Peptidomimetics derived from E. coli OmpC inhibit OMP assembly"
  
  Some explanation seems to be needed why b-barrel domain of EspP appears even without ProK?
  
  We appreciate the reviewer’s pointed out. We added following sentence to explain:
  
  (Lines 128-137) "EspP, a model OMP substrate, belongs to autotransporter family of proteins. Autotransporters have two domains; (1) a β-barrel domain, assembled into the outer membrane via the BAM complex, and (2) a passenger domain, which traverses the outer membrane via the lumen of the β-barrel domain itself and is subsequently cleaved by the correctly assembled β-barrel domain (Celik et al., 2012). When EspP is correctly assembled into outer membrane, a visible decrease in the molecular mass of the protein is observed due to the self-proteolysis. Once the barrel domain is assembled into the membrane it becomes protease-resistant, with residual unassembled and passenger domains degraded (Leyton et al., 2014; Roman-Hernandez et al., 2014)."
  
  Line 186 (p.6): "Y285"
  
  Y285A?
  
  We have corrected the error, it was Y285A.
  
  Lines 245- (p. 7)/ Lines 330- (p. 10)
  
  It needs to be clarified that the results described in these paragraphs were obtained from the assays with EMM.
  
  We appreciate the reviewer’s concerns on these points. For the first half, the following text was added at the beginning of the applicable paragraph to indicate that all of Fig. 4 is the result of the EMM assembly assay.
  
  (Line 241) "We further analyzed the role of internal β-signal by the EMM assembly assay. At the second half, we used purified BamD but not EMM. We described clearly with following sentence."
  
  (Lines 316-318) "We purified 40 different BPA variants of BamD, and then irradiated UV after incubating with 35S-labelled OmpC."
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.10.29.466387v4
www.biorxiv.org www.biorxiv.org

Spatial and temporal pattern of structure-function coupling of human brain connectome with development

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Lines 40-42: The sentence "The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies as well as individual differences in cognitive function, and is regulated by genes" is a misstatement. Regional variations of structure-function coupling do not really reflect differences in cognitive function among individuals, but inter-subject variations do.
  
  Thank you for your comment. We have made revisions to the sentence to correct its misstatement. Please see lines 40-43: “The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies[1, 6-9] and is regulated by genes[6, 8], as well as its individual differences relates to cognitive function[8, 9].”
  
  (2) In Figure 1, the graph showing the relation between intensity and cortical depth needs explanation.
  
  Thank you for your comment. We have added necessary explanation, please see lines 133-134: “The MPC was used to map similarity networks of intracortical microstructure (voxel intensity sampled in different cortical depth) for each cortical node.”
  
  (3) Line 167: Change "increased" to "increase".
  
  We have corrected it, please see lines 173-174: “…networks significantly increased with age and exhibited greater increase.”
  
  (4) Line 195: Remove "were".
  
  We have corrected it, please see line 204: “…default mode networks significantly contributed to the prediction…”
  
  (5) Lines 233-240, Reproducibility analyses: Comparisons of parcellation templates were not made with respect to gene weights. Is there any particular reason?
  
  Thank you for your comment. We have quantified the gene weights based on HCPMMP using the same procedures. We identified a correlation (r \= 0.25, p<0.001) between the gene weights in HCPMMP and BNA. Given that this is a relatively weak correlation, we need to clarify the following points.
  
  Based on HCPMMP, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions[1]. The excluding 4 cortical regions that had an insufficient number of assigned samples may lead to different templates having a relatively weak correlation of gene associations. Moreover, the effect of different template resolutions on the results of human connectome-transcriptome association is still unclear.
  
  In brain connectome analysis, the choice of parcellation templates can indeed influence the subsequent findings to some extent. A methodological study[2] provided referenced correlations about 0.4~0.6 for white matter connectivity and 0.2~0.4 for white matter nodal property between two templates (refer to Figure 4 and 5 in [2]). Therefore, the age-related coupling changes as a downstream analysis was calculated using multimodal connectome and correlated with gene expression profiles, which may be influenced by the choice of templates.
  
  We have further supplemented gene weights results obtained from HCPMMP to explicitly clarify the dependency of parcellation templates.
  
  Please see lines 251-252: “The gene weights of HCPMMP was consistent with that of BNA (r = 0.25, p < 0.001).”
  
  Author response image 1.
  
  The consistency of gene weights between HCPMMP and BNA.
  
  Please see lines 601-604: “Finally, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions based on HCPMMP and obtained the gene weights by PLS analysis. We performed Pearson's correlation analyses to assess the consistency of gene weights between HCPMMP and BNA.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  Your paper is interesting to read and I found your efforts to evaluate the robustness of the results of different parcellation strategies and tractography methods very valuable. The work is globally easy to navigate and well written with informative good-quality figures, although I think some additional clarifications will be useful to improve readability. My suggestions and questions are detailed below (I aimed to group them by topic which did not always succeed so apologies if the comments are difficult to navigate, but I hope they will be useful for reflection and to incorporate in your work).
  
  * L34: 'developmental disorder'
  
  ** As far as I understand, the subjects in HCP-D are mostly healthy (L87). Thus, while your study provides interesting insights into typical brain development, I wonder if references to 'disorder' might be premature. In the future, it would be interesting to extend your approach to the atypical populations. In any case, it would be extremely helpful and appreciated if you included a figure visualising the distribution of behavioural scores within your population and in relationship to age at scan for your subjects (and to include a more detailed description of the assessment in the methods section) given that large part of your paper focuses on their prediction using coupling inputs (especially given a large drop of predictive performance after age correction). Such figures would allow the reader to better understand the cognitive variability within your data, but also potential age relationships, and generally give a better overview of your cohort.
  
  We agree with your comment that references to 'disorder' is premature. We have made revisions in abstract and conclusion.
  
  Please see lines 33-34: “This study offers insight into the maturational principles of SC-FC coupling in typical development.”
  
  Please see lines 395-396: “Further investigations are needed to fully explore the clinical implications of SC-FC coupling for a range of developmental disorders.”
  
  In addition, we have included a more detailed description of the cognitive scores in the methods section and provided a figure to visualize the distributions of cognitive scores and in relationship to age for subjects. Please see lines 407-413: “Cognitive scores. We included 11 cognitive scores which were assessed with the National Institutes of Health (NIH) Toolbox Cognition Battery (https://www.healthmeasures.net/exploremeasurement-systems/nih-toolbox), including episodic memory, executive function/cognitive flexibility, executive function/inhibition, language/reading decoding, processing speed, language/vocabulary comprehension, working memory, fluid intelligence composite score, crystal intelligence composite score, early child intelligence composite score and total intelligence composite score. Distributions of these cognitive scores and their relationship with age are illustrated in Figure S12.”
  
  Author response image 2.
  
  Cognitive scores and age distributions of scans.
  
  * SC-FC coupling
  
  ** L162: 'Regarding functional subnetworks, SC-FC coupling increased disproportionately with age (Figure 3C)'.
  
  *** As far as I understand, in Figure 3C, the points are the correlation with age for a given ROI within the subnetwork. Is this correct? If yes, I am not sure how this shows a disproportionate increase in coupling. It seems that there is great variability of SC-FC correlation with age across regions within subnetworks, more so than the differences between networks. This would suggest that the coupling with age is regionally dependent rather than network-dependent? Maybe you could clarify?
  
  The points are the correlation with age for a given ROI within the subnetwork in Figure 3C. We have revised the description, please see lines 168-174: “Age correlation coefficients distributed within functional subnetworks were shown in Figure 3C. Regarding mean SC-FC coupling within functional subnetworks, the somatomotor (𝛽𝑎𝑔𝑒\=2.39E-03, F=4.73, p\=3.10E-06, r\=0.25, p\=1.67E07, Figure 3E), dorsal attention (𝛽𝑎𝑔𝑒\=1.40E-03, F=4.63, p\=4.86E-06, r\=0.24, p\=2.91E-07, Figure 3F), frontoparietal (𝛽𝑎𝑔𝑒 =2.11E-03, F=6.46, p\=2.80E-10, r\=0.33, p\=1.64E-12, Figure 3I) and default mode (𝛽𝑎𝑔𝑒 =9.71E-04, F=2.90, p\=3.94E-03, r\=0.15, p\=1.19E-03, Figure 3J) networks significantly increased with age and exhibited greater increase.” In addition, we agree with your comment that the coupling with age is more likely region-dependent than network-dependent. We have added the description, please see lines 329-332: “We also found the SC-FC coupling with age across regions within subnetworks has more variability than the differences between networks, suggesting that the coupling with age is more likely region-dependent than network-dependent.” This is why our subsequent analysis focused on regional coupling.
  
  *** Additionally, we see from Figure 3C that regions within networks have very different changes with age. Given this variability (especially in the subnetworks where you show both positive and negative correlations with age for specific ROIs (i.e. all of them)), does it make sense then to show mean coupling over regions within the subnetworks which erases the differences in coupling with age relationships across regions (Figures 3D-J)?
  
  Considering the interest and interpretation for SC-FC coupling, showing the mean coupling at subnetwork scales with age correlation is needed, although this eliminates variability at regional scale. These results at different scales confirmed that coupling changes with age at this age group are mainly increased.
  
  *** Also, I think it would be interesting to show correlation coefficients across all regions, not only the significant ones (3B). Is there a spatially related tendency of increases/decreases (rather than a 'network' relationship)? Would it be interesting to show a similar figure to Figure S7 instead of only the significant regions?
  
  As your comment, we have supplemented the graph which shows correlation coefficients across all regions into Figure 3B. Similarly, we supplemented to the other figures (Figure S3-S6).
  
  Author response image 3.
  
  Aged-related changes in SC-FC coupling. (A) Increases in whole-brain coupling with age. (B) Correlation of age with SC-FC coupling across all regions and significant regions (p<0.05, FDR corrected). (C) Comparisons of age-related changes in SC-FC coupling among functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict 1.5× IQR from the first or third quartile. (D-J) Correlation of age with SC-FC coupling across the VIS, SM, DA, VA, LIM, FP and DM. VIS, visual network; SM, somatomotor network; DA, dorsal attention network; VA, ventral attention network; LIM, limbic network; FP, frontoparietal network; DM, default mode network.
  
  *** For the quantification of MPC.
  
  **** L421: you reconstructed 14 cortical surfaces from the wm to pial surface. If we take the max thickness of the cortex to be 4.5mm (Fischl & Dale, 2000), the sampling is above the resolution of your anatomical images (0.8mm). Could you expand on what the interest is in sampling such a higher number of surfaces given that the resolution is not enough to provide additional information?
  
  The surface reconstruction was based on state-of-the-art equivolumetric surface construction techniques[3] which provides a simplified recapitulation of cellular changes across the putative laminar structure of the cortex. By referencing a 100-μm resolution Merkerstained 3D histological reconstruction of an entire post mortem human brain (BigBrain: https://bigbrain.loris.ca/main.php), a methodological study[4] systematically evaluated MPC stability with four to 30 intracortical surfaces when the resolution of anatomical image was 0.7 mm, and selected 14 surfaces as the most stable solution. Importantly, it has been proved the in vivo approach can serve as a lower resolution yet biologically meaningful extension of the histological work[4].
  
  **** L424: did you aggregate intensities over regions using mean/median or other statistics?
  
  It might be useful to specify.
  
  Thank you for your careful comment. We have revised the description in lines 446-447: “We averaged the intensity profiles of vertices over 210 cortical regions according to the BNA”.
  
  **** L426: personal curiosity, why did you decide to remove the negative correlation of the intensity profiles from the MPC? Although this is a common practice in functional analyses (where the interpretation of negatives is debated), within the context of cortical correlations, the negative values might be interesting and informative on the level of microstructural relationships across regions (if you want to remove negative signs it might be worth taking their absolute values instead).
  
  We agree with your comment that the interpretation of negative correlation is debated in MPC. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach. As your comment, the negative correlation might be informative. We will also continue to explore the intrinsic information on the negative correlation reflecting microstructural relationships.
  
  **** L465: could you please expand on the notion of self-connections, it is not completely evident what this refers to.
  
  We have revised the description in lines 493-494: “𝑁𝑐 is the number of connection (𝑁𝑐 = 245 for BNA)”.
  
  **** Paragraph starting on L467: did you evaluate the multicollinearities between communication models? It is possibly rather high (especially for the same models with similar parameters (listed on L440-444)). Such dependence between variables might affect the estimates of feature importance (given the predictive models only care to minimize error, highly correlated features can be selected as a strong predictor while the impact of other features with similarly strong relationships with the target is minimized thus impacting the identification of reliable 'predictors').
  
  We agree with your comment. The covariance structure (multicollinearities) among the communication models have a high probability to lead to unreliable predictor weights. In our study, we applied Haufe's inversion transform[5] which resolves this issue by computing the covariance between the predicted FC and each communication models in the training set. More details for Haufe's inversion transform please see [5]. We further clarified in the manuscript, please see in lines 497-499: “And covariance structure among the predictors may lead to unreliable predictor weights. Thus, we applied Haufe's inversion transform[38] to address these issues and identify reliable communication mechanisms.”
  
  **** L474: I am not completely familiar with spin tests but to my understanding, this is a spatial permutation test. I am not sure how this applies to the evaluation of the robustness of feature weight estimates per region (if this was performed per region), it would be useful to provide a bit more detail to make it clearer.
  
  As your comment, we have supplemented the detail, please see lines 503-507: “Next, we generated 1,000 FC permutations through a spin test[86] for each nodal prediction in each subject and obtained random distributions of model weights. These weights were averaged over the group and were investigated the enrichment of the highest weights per region to assess whether the number of highest weights across communication models was significantly larger than that in a random discovery.”
  
  **** L477: 'significant communication models were used to represent WMC...', but in L103 you mention you select 3 models: communicability, mean first passage, and flow graphs. Do you want to say that only 3 models were 'significant' and these were exactly the same across all regions (and data splits/ parcellation strategies/ tractography methods)? In the methods, you describe a lot of analysis and testing but it is not completely clear how you come to the selection of the final 3, it would be beneficial to clarify. Also, the final 3 were selected on the whole dataset first and then the pipeline of SC-FC coupling/age assessment/behaviour predictions was run for every (WD, S1, S2) for both parcellations schemes and tractography methods or did you end up with different sets each time? It would be good to make the pipeline and design choices, including the validation bit clearer (a figure detailing all the steps which extend Figure 1 would be very useful to understand the design/choices and how they relate to different runs of the validation).
  
  Thank you for your comment. In all reproducibility analyses, we used the same 3 models which was selected on the main pipeline (probabilistic tractography and BNA parcellation). According to your comment, we produced a figure that included the pipeline of model selection as the extend of Figure 1. And the description please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).”
  
  Author response image 4.
  
  Pipeline of model selection and reproducibility analyses.
  
  **** Might the imbalance of features between structural connectivity and MPC affect the revealed SC-FC relationships (3 vs 1)? Why did you decide on this ratio rather than for example best WM structural descriptor + MPC?
  
  We understand your concern. The WMC communication models represent diverse geometric, topological, or dynamic factors. In order to describe the properties of WMC as best as possible, we selected three communication models after controlling covariance structure that can significantly predict FC from the 27 models. Compared to MPC, this does present a potential feature imbalance problem. However, this still supports the conclusion that coupling models that incorporate microarchitectural properties yield more accurate predictions of FC from SC[6, 7]. The relevant experiments are shown in Figure S2 below. If only the best WM structural descriptor is used, this may lose some communication properties of WMC.
  
  **** L515: were intracranial volume and in-scanner head motion related to behavioural measures? These variables likely impact the inputs, do you expect them to influence the outcome assessments? Or is there a mistake on L518 and you actually corrected the input features rather than the behaviour measures?
  
  The in-scanner head motion and intracranial volume are related to some age-adjusted behavioural measures, as shown in the following table. The process of regression of covariates from cognitive measures was based on these two cognitive prediction studies [8, 9]. Please see lines 549-554: “Prior to applying the nested fivefold cross-validation framework to each behaviour measure, we regressed out covariates including sex, intracranial volume, and in-scanner head motion from the behaviour measure[59, 69]. Specifically, we estimated the regression coefficients of the covariates using the training set and applied them to the testing set. This regression procedure was repeated for each fold.”
  
  Author response table 1.
  
  ** Additionally, in the paper, you propose that the incorporation of cortical microstructural (myelin-related) descriptors with white-matter connectivity to explain FC provides for 'a more comprehensive perspective for characterizing the development of SC-FC coupling' (L60). This combination of cortical and white-matter structure is indeed interesting, however the benefits of incorporating different descriptors could be studied further. For example, comparing results of using only the white matter connectivity (assessed through selected communication models) ~ FC vs (white matter + MPC) ~ FC vs MPC ~ FC. Which descriptors better explain FC? Are the 'coupling trends' similar (or the same)? If yes, what is the additional benefit of using the more complex combination? This would also add strength to your statement at L317: 'These discrepancies likely arise from differences in coupling methods, highlighting the complementarity of our methods with existing findings'. Yes, discrepancies might be explained by the use of different SC inputs. However, it is difficult to see how discrepancies highlight complementarity - does MCP (and combination with wm) provide additional information to using wm structural alone?~
  
  According to your comment, we have added the analyses based on different models using only the myelin-related predictor or WM connectivity to predict FC, and further compared the results among different models. please see lines 519-521: “In addition, we have constructed the models using only MPC or SCs to predict FC, respectively. Spearman’s correlation was used to assess the consistency between spatial patterns based on different models.”
  
  Please see lines 128-130: “In addition, the coupling pattern based on other models (using only MPC or only SCs to predict FC) and the comparison between the models were shown in Figure S2A-C.” Please see lines 178-179: “The age-related patterns of SC-FC coupling based other coupling models were shown in Figure S2D-F.”
  
  Although we found that there were spatial consistencies in the coupling patterns between different models, the incorporation of MPC with SC connectivity can improve the prediction of FC than the models based on only MPC or SC. For age-related changes in coupling, the differences between the models was further amplified. We agree with you that the complementarity cannot be explicitly quantified and we have revised the description, please see line 329: “These discrepancies likely arise from differences in coupling methods.”
  
  Author response image 5.
  
  Comparison results between different models. Spatial pattern of mean SC-FC coupling based on MPC ~ FC (A), SCs ~ FC (B), and MPC + SCs ~ FC (C). Correlation of age with SC-FC coupling across cortex based on MPC ~ FC (D), SCs ~ FC (E), and MPC + SCs ~ FC (F).
  
  ** For the interpretation of results: L31 'SC-FC coupling is positively associated with genes in oligodendrocyte-related pathways and negatively associated with astrocyte-related gene'; L124: positive myelin content with SC-FC coupling...and similarly on L81, L219, L299, L342, and L490:
  
  ***You use a T1/T2 ratio which is (in large part) a measure of myelin to estimate the coupling between SC and FC. Evaluation with SC-FC coupling with myeline described in Figure 2E is possibly biased by the choice of this feature. Similarly, it is possible that reported positive associations with oligodendrocyte-related pathways and SC-FC coupling in your work could in part result from a bias introduced by the 'myelin descriptor' (conversely, picking up the oligodendrocyte-related genes is a nice corroboration for the T1/T2 ration being a myelin descriptor, so that's nice). However, it is possible that if you used a different descriptor of the cortical microstructure, you might find different expression patterns associated with the SCFC coupling (for example using neurite density index might pick up neuronal-related genes?). As mentioned in my previous suggestions, I think it would be of interest to first use only the white matter structural connectivity feature to assess coupling to FC and assess the gene expression in the cortical regions to see if the same genes are related, and subsequently incorporate MPC to dissociate potential bias of using a myelin measure from genetic findings.
  
  Thank you for your insightful comments. In this paper, however, the core method of measuring coupling is to predict functional connections using multimodal structural connections, which may yield more information than a single modal. We agree with your comment that separating SCs and MPC to look at the genes involved in both separately could lead to interesting discoveries. We will continue to explore this in the future.
  
  ** Generally, I find it difficult to understand the interpretation of SC-FC coupling measures and would be interested to hear your thinking about this. As you mention on L290-294, how well SC predicts FC depends on which input features are used for the coupling assessment (more complex communication models, incorporating additional microstructural information etc 'yield more accurate predictions of FC' L291) - thus, calculated coupling can be interpreted as a measure of how well a particular set of input features explain FC (different sets will explain FC more or less well) ~ coupling is related to a measure of 'missing' information on the SC-FC relationship which is not contained within the particular set of structural descriptors - with this approach, the goal might be to determine the set that best, i.e. completely, explains FC to understand the link between structure and function. When you use the coupling measures for comparisons with age, cognition prediction etc, the 'status' of the SC-FC changes, it is no longer the amount of FC explained by the given SC descriptor set, but it's considered a descriptor in itself (rather than an effect of feature selection / SC-FC information overlap) - how do you interpret/argue for this shift of use?
  
  Thank you for your comment. In this paper, we obtain reasonable SC-FC coupling by determining the optimal set of structural features to explain the function. The coupling essentially measures the direct correspondence between structure and function. To study the relationship between coupling and age and cognition is actually to study the age correlation and cognitive correlation of this direct correspondence between structure and function.
  
  ** In a similar vein to the above comment, I am interested to hear what you think: on L305 you mention that 'perfect SC-FC coupling may be unlikely'. Would this reasoning suggest that functional activity takes place through other means than (and is therefore somehow independent of) biological (structural) substrates? For now, I think one can only say that we have imperfect descriptors of the structure so there is always information missing to explain function, this however does not mean the SC and FC are not perfectly coupled (only that we look at insufficient structural descriptors - limitations of what imaging can assess, what we measure etc). This is in line with L305 where you mention that 'Moreover, our results suggested that regional preferential contributions across different SCs lead to variations in the underlying communication process'. This suggests that locally different areas might use different communication models which are not reflected in the measures of SC-FC coupling that was employed, not that the 'coupling' is lower or higher (or coupling is not perfect). This is also a change in approach to L293: 'This configuration effectively releases the association cortex from strong structural constraints' - the 'release' might only be in light of the particular structural descriptors you use - is it conceivable that a different communication model would be more appropriate (and show high coupling) in these areas.
  
  Thank you for your insightful comments. We have changed the description, please see lines 315317: “SC-FC coupling is dynamic and changes throughout the lifespan[7], particularly during adolescence[6,9], suggesting that perfect SC-FC coupling may require sufficient structural descriptors.”
  
  *Cognitive predictions:
  
  ** From a practical stand-point, do you think SC-FC coupling is a better (more accurate) indicator of cognitive outcomes (for example for future prediction studies) than each modality alone (which is practically easier to obtain and process)? It would be useful to check the behavioural outcome predictions for each modality separately (as suggested above for coupling estimates). In case SC-FC coupling does not outperform each modality separately, what is the benefit of using their coupling? Similarly, it would be useful to compare to using only cortical myelin for the prediction (which you showed to increase in importance for the coupling). In the case of myelin->coupling-> intelligence, if you are able to predict outcomes with the same performance from myelin without the need for coupling measures, what is the benefit of coupling?
  
  From a predictive performance point of view, we do not believe that SC-FC coupling is a better indicator than a single mode (voxel, network or other indicator). Our starting point is to assess whether SC-FC coupling is related to the individual differences of cognitive performances rather than to prove its predictive power over other measures. As you suggest, it's a very interesting perspective on the predictive power of cognition by separating the various modalities and comparing them. We will continue to explore this issue in the future study.
  
  ** The statement on L187 'suggesting that increased SC-FC coupling during development is associated with higher intelligence' might not be completely appropriate before age corrections (especially given the large drop in performance that suggests confounding effects of age).
  
  According to your comment, we have removed the statement.
  
  ** L188: it might be useful to report the range of R across the outer cross-validation folds as from Figure 4A it is not completely clear that the predictive performance is above the random (0) threshold. (For the sake of clarity, on L180 it might be useful for the reader if you directly report that other outcomes were not above the random threshold).
  
  According to your comment, we have added the range of R and revised the description, please see lines 195-198: “Furthermore, even after controlling for age, SC-FC coupling remained a significant predictor of general intelligence better than at chance (Pearson’s r\=0.11±0.04, p\=0.01, FDR corrected, Figure 4A). For fluid intelligence and crystal intelligence, the predictive performances of SC-FC coupling were not better than at chance (Figure 4A).”
  
  In a similar vein, in the text, you report Pearson's R for the predictive results but Figure 4A shows predictive accuracy - accuracy is a different (categorical) metric. It would be good to homogenise to clarify predictive results.
  
  We have made the corresponding changes in Figure 4.
  
  Author response image 6.
  
  Encoding individual differences in intelligence using regional SC-FC coupling. (A) Predictive accuracy of fluid, crystallized, and general intelligence composite scores. (B) Regional distribution of predictive weight. (C) Predictive contribution of functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict the 1.5× IQR from the first or third quartile.
  
  *Methods and QC:
  
  -Parcellations
  
  ** It would be useful to mention briefly how the BNA was applied to the data and if any quality checks were performed for the resulting parcellations, especially for the youngest subjects which might be most dissimilar to the population used to derive the atlas (healthy adults HCP subjects) ~ question of parcellation quality.
  
  We have added the description, please see lines 434-436: “The BNA[31] was projected on native space according to the official scripts (http://www.brainnetome.org/resource/) and the native BNA was checked by visual inspection.”
  
  ** Additionally, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate. It might be useful to mention the above as limitations (which apply to most studies with similar focus).
  
  We have added your comment to the methodological issues, please see lines 378-379: “Third, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate.”
  
  - Tractography
  
  ** L432: it might be useful to name the method you used (probtrackx).
  
  We have added this name to the description, please see lines 455-456: “probabilistic tractography (probtrackx)[78, 79] was implemented in the FDT toolbox …”
  
  ** L434: 'dividing the total fibres number in source region' - dividing by what?
  
  We have revised the description, please see line 458: “dividing by the total fibres number in source region.”
  
  ** L436: 'connections in subcortical areas were removed' - why did you trace connections to subcortical areas in the first place if you then removed them (to match with cortical MPC areas I suspect)? Or do you mean there were spurious streamlines through subcortical regions that you filtered?
  
  On the one hand we need to match the MPC, and on the other hand, as we stated in methodological issues, the challenge of accurately resolving the connections of small structures within subcortical regions using whole-brain diffusion imaging and tractography techniques[10, 11].
  
  ** Following on the above, did you use any exclusion masks during the tracing? In general, more information about quality checks for the tractography would be useful. For example, L437: did you do any quality evaluations based on the removed spurious streamlines? For example, were there any trends between spurious streamlines and the age of the subject? Distance between regions/size of the regions?
  
  We did not use any exclusion masks. We performed visual inspection for the tractography quality and did not assess the relationship between spurious streamlines and age or distance between regions/size of the regions.
  
  ** L439: 'weighted probabilistic network' - this was weighted by the filtered connectivity densities or something else?
  
  The probabilistic network is weighted by the filtered connectivity densities.
  
  ** I appreciate the short description of the communication models in Text S1, it is very useful.
  
  Thank you for your comment.
  
  ** In addition to limitations mentioned in L368 - during reconstruction, have you noticed problems resolving short inter-hemispheric connections?
  
  We have not considered this issue, we have added it to the limitation, please see lines 383-384: “In addition, the reconstruction of short connections between hemispheres is a notable challenge.”
  
  - Functional analysis:
  
  ** There is a difference in acquisition times between participants below and above 8 years (21 vs 26 min), does the different length of acquisition affect the quality of the processed data?
  
  We have made relatively strict quality control to ensure the quality of the processed data.
  
  ** L446 'regressed out nuisance variables' - it would be informative to describe in more detail what you used to perform this.
  
  We have provided more detail about the regression of nuisance variables, please see lines 476-477: “The nuisance variables were removed from time series based on general linear model.”
  
  ** L450-452: it would be useful to add the number of excluded participants to get an intuition for the overall quality of the functional data. Have you checked if the quality is associated with the age of the participant (which might be related to motion etc). Adding a distribution of remaining frames across participants (vs age) would be useful to see in the supplementary methods to better understand the data you are using.
  
  We have supplemented the exclusion information of the subjects during the data processing, and the distribution and aged correlation of motion and remaining frames. Please see lines 481-485: “Quality control. The exclusion of participants in the whole multimodal data processing pipeline was depicted in Figure S13. In the context of fMRI data, we computed Pearson’s correlation between motion and age, as well as between the number of remaining frames and age, for the included participants aged 5 to 22 years and 8 to 22 years, respectively. These correlations were presented in Figure S14.”
  
  Author response image 7.
  
  Exclusion of participants in the whole multimodal data processing pipeline.
  
  Author response image 8.
  
  Figure S14. Correlations between motion and age and number of remaining frames and age.
  
  ** L454: 'Pearson's correlation's... ' In contrast to MPC you did not remove negative correlations in the functional matrices. Why this choice?
  
  Whether the negative correlation connection of functional signal is removed or not has always been a controversial issue. Referring to previous studies of SC-FC coupling[12-14], we find that the practice of retaining negative correlation connections has been widely used. In order to retain more information, we chose this strategy. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach.
  
  - Gene expression:
  
  ** L635, you focus on the left cortex, is this common? Do you expect the gene expression to be fully symmetric (given reported functional hemispheric asymmetries)? It might be good to expand on the reasoning.
  
  An important consideration regarding sample assignment arises from the fact that only two out of six brains were sampled from both hemispheres and four brains have samples collected only in the left. This sparse sampling should be carefully considered when combining data across donors[1]. We have supplemented the description, please see lines 569-571: “Restricting analyses to the left hemisphere will minimize variability across regions (and hemispheres) in terms of the number of samples available[40].”
  
  ** Paragraph of L537: you use evolution of coupling with age (correlation) and compare to gene expression with adults (cohort of Allen Human Brain Atlas - no temporal evolution to the gene expressions) and on L369 you mention that 'relative spatial patterns of gene expressions remain stable after birth'. Of course this is not a place to question previous studies, but would you really expect the gene expression associated with the temporary processes to remain stable throughout the development? For example, myelination would follow different spatiotemporal gradient across brain regions, is it reasonable to expect that the expression patterns remain the same? How do you then interpret a changing measure of coupling (correlation with age) with a gene expression assessed statically?
  
  We agree with your comment that the spatial expression patterns is expected to vary at different periods. We have revised the previous description, please see lines 383-386: “Fifth, it is important to acknowledge that changes in gene expression levels during development may introduce bias in the results.”
  
  - Reproducibility analyses:
  
  ** Paragraph L576: are we to understand that you performed the entire pipeline 3 times (WD, S1, S2) for both parcellations schemes and tractography methods (~12 times) including the selection of communication models and you always got the same best three communication models and gene expression etc? Or did you make some design choices (i.e. selection of communication models) only on a specific set-up and transfer to other settings?
  
  The choice of communication model is established at the beginning, which we have clarified in the article, please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” For reproducibility analyses (parcellation, tractography, and split-half validation), we fixed other settings and only assessed the impact of a single factor.
  
  ** Paragraph of L241: I really appreciate you evaluated the robustness of your results to different tractography strategies. It is reassuring to see the similarity in results for the two approaches. Did you notice any age-related effects on tractography quality for the two methods given the wide age range (did you check?)
  
  In our study, the tractography quality was checked by visual inspection. Using quantifiable tools to tractography quality in future studies could answer this question objectively.
  
  ** Additionally, I wonder how much of that overlap is driven by the changes in MPC which is the same between the two methods... especially given its high weight in the SC-FC coupling you reported earlier in the paper. It might be informative to directly compare the connectivity matrices derived from the two tracto methods directly. Generally, as mentioned in the previous comments, I think it would be interesting to assess coupling using different input settings (with WM structural and MPC separate and then combined).
  
  As your previous comment, we have examined the coupling patterns, coupling differences, coupling age correlation, and spatial correlations between the patterns based on different models, as shown in Figure S2. Please see our response to the previous comment for details.
  
  ** L251 - I also wonder if the random splitting is best adapted to validation in your case given you study relationships with age. Would it make more sense to make stratified splits to ensure a 'similar age coverage' across splits?
  
  In our study, we adopt the random splitting process which repeated 1,000 times to minimize bias due to data partitioning. The stratification you mentioned is a reasonable method, and keeping the age distribution even will lead to higher verification similarity than our validation method. However, from the validation results of our method, the similarity is sufficient to explain the generalization of our findings.
  
  Minor comments
  
  L42: 'is regulated by genes'
  
  ** Coupling (if having a functional role and being regulated at all) is possibly resulting from a complex interplay of different factors in addition to genes, for example, learning/environment, it might be more cautious to use 'regulated in part by genes' or similar.
  
  We have corrected it, please see line 42.
  
  L43 (and also L377): 'development of SC-FC coupling'
  
  ** I know this is very nitpicky and depends on your opinion about the nature of SC-FC coupling, but 'development of SC-FC coupling' gives an impression of something maturing that has a role 'in itself' (for example development of eye from neuroepithelium to mature organ etc.). For now, I am not sure it is fully certain that SC-FC coupling is more than a byproduct of the comparison between SC and FC, using 'changes in SC-FC coupling with development' might be more apt.
  
  We have corrected it, please see lines 43-44.
  
  L261 'SC-FC coupling was stronger ... [] ... and followed fundamental properties of cortical organization.' vs L168 'No significant correlations were found between developmental changes in SC-FC coupling and the fundamental properties of cortical organization'.
  
  **Which one is it? I think in the first you refer to mean coupling over all infants and in the second about correlation with age. How do you interpret the difference?
  
  Between the ages of 5 and 22 years, we found that the mean SC-FC coupling pattern has become similar to that of adults, consistent with the fundamental properties of cortical organization. However, the developmental changes in SC-FC coupling are heterogeneous and sequential and do not follow the mean coupling pattern to change in the same magnitude.
  
  L277: 'temporal and spatial complexity'
  
  ** Additionally, communication models have different assumptions about the flow within the structural network and will have different biological plausibility (they will be more or less
  
  'realistic').
  
  Here temporal and spatial complexity is from a computational point of view.
  
  L283: 'We excluded a centralized model (shortest paths), which was not biologically plausible' ** But in Text S1 and Table S1 you specify the shortest paths models. Does this mean you computed them but did not incorporate them in the final coupling computations even if they were predictive?
  
  ** Generally, I find the selection of the final 3 communication models confusing. It would be very useful if you could clarify this further, for example in the methods section.
  
  We used all twenty-seven communication models (including shortest paths) to predict FC at the node level for each participant. Then we identified three communication models that can significantly predict FC. For the shortest path, he was excluded because he did not meet the significance criteria. We have further added methodological details to this section, please see lines 503-507.
  
  L332 'As we observed increasing coupling in these [frontoparietal network and default mode network] networks, this may have contributed to the improvements in general intelligence, highlighting the flexible and integrated role of these networks' vs L293 'SC-FC coupling in association areas, which have lower structural connectivity, was lower than that in sensory areas. This configuration effectively releases the association cortex from strong structural constraints imposed by early activity cascades, promoting higher cognitive functions that transcend simple sensori-motor exchanges'
  
  ** I am not sure I follow the reasoning. Could you expand on why it would be the decoupling promoting the cognitive function in one case (association areas generally), but on the reverse the increased coupling in frontoparietal promoting the cognition in the other (specifically frontoparietal)?
  
  We tried to explain the problem, for general intelligence, increased coupling in frontoparietal could allow more effective information integration enable efficient collaboration between different cognitive processes.
  
  * Formatting errors etc.
  
  L52: maybe rephrase?
  
  We have rephrased, please see lines 51-53: “The T1- to T2-weighted (T1w/T2w) ratio of MRI has been proposed as a means of quantifying microstructure profile covariance (MPC), which reflects a simplified recapitulation in cellular changes across intracortical laminar structure[6, 1215].”
  
  L68: specialization1,[20].
  
  We have corrected it.
  
  L167: 'networks significantly increased with age and exhibited greater increased' - needs rephrasing.
  
  We have corrected it.
  
  L194: 'networks were significantly predicted the general intelligence' - needs rephrasing.
  
  We have corrected it, please see lines 204-205: “we found that the weights of frontoparietal and default mode networks significantly contributed to the prediction of the general intelligence.”
  
  L447: 'and temporal bandpass filtering' - there is a verb missing.
  
  We have corrected it, please see line 471: “executed temporal bandpass filtering.”
  
  L448: 'greater than 0.15' - unit missing.
  
  We have corrected it, please see line 472: “greater than 0.15 mm”.
  
  L452: 'After censoring, regression of nuisance variables, and temporal bandpass filtering,' - no need to repeat the steps as you mentioned them 3 sentences earlier.
  
  We have removed it.
  
  L458-459: sorry I find this description slightly confusing. What do you mean by 'modal'? Connectional -> connectivity profile. The whole thing could be simplified, if I understand correctly your vector of independent variables is a set of wm and microstructural 'connectivity' of the given node... if this is not the case, please make it clearer.
  
  We have corrected it, please see line 488: “where 𝒔𝑖 is the 𝑖th SC profiles, 𝑛 is the number of SC profiles”.
  
  L479: 'values and system-specific of 480 coupling'.
  
  We have corrected it.
  
  L500: 'regular' - regularisation.
  
  We have changed it to “regularization”.
  
  L567: Do you mean that in contrast to probabilistic with FSL you use deterministic methods within Camino? For L570, you introduce communication models through 'such as': did you fit all models like before? If not, it might be clearer to just list the ones you estimated rather than introduce through 'such as'.
  
  We have changed the description to avoid ambiguity, please see lines 608-609: “We then calculated the communication properties of the WMC including communicability, mean first passage times of random walkers, and flow graphs (timescales=1).”
  
  Citation [12], it is unusual to include competing interests in the citation, moreover, Dr. Bullmore mentioned is not in the authors' list - this is most likely an error with citation import, it would be good to double-check.
  
  We have corrected it.
  
  L590: Python scripts used to perform PLS regression can 591 be found at https://scikitlearn.org/. The link leads to general documentation for sklearn.
  
  We have corrected it, please see lines 627-630: “Python scripts used to perform PLS regression can be found at https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cro ss_decomposition.PLSRegression.”
  
  P26 and 27 - there are two related sections: Data and code availability and Code availability - it might be worth merging into one section if possible.
  
  We have corrected it, please see lines 623-633.
  
  References
  
  (1) Arnatkeviciute A, Fulcher BD, Fornito A. A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage. 2019;189:353-67. Epub 2019/01/17. doi: 10.1016/j.neuroimage.2019.01.011. PubMed PMID: 30648605.
  
  (2) Zhong S, He Y, Gong G. Convergence and divergence across construction methods for human brain white matter networks: an assessment based on individual differences. Hum Brain Mapp. 2015;36(5):1995-2013. Epub 2015/02/03. doi: 10.1002/hbm.22751. PubMed PMID: 25641208; PubMed Central PMCID: PMCPMC6869604.
  
  (3) Waehnert MD, Dinse J, Weiss M, Streicher MN, Waehnert P, Geyer S, et al. Anatomically motivated modeling of cortical laminae. Neuroimage. 2014;93 Pt 2:210-20. Epub 2013/04/23. doi: 10.1016/j.neuroimage.2013.03.078. PubMed PMID: 23603284.
  
  (4) Paquola C, Vos De Wael R, Wagstyl K, Bethlehem RAI, Hong SJ, Seidlitz J, et al. Microstructural and functional gradients are increasingly dissociated in transmodal cortices. PLoS Biol. 2019;17(5):e3000284. Epub 2019/05/21. doi: 10.1371/journal.pbio.3000284. PubMed PMID: 31107870.
  
  (5) Haufe S, Meinecke F, Gorgen K, Dahne S, Haynes JD, Blankertz B, et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage. 2014;87:96-110. Epub 2013/11/19. doi: 10.1016/j.neuroimage.2013.10.067. PubMed PMID: 24239590.
  
  (6) Demirtas M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, et al. Hierarchical Heterogeneity across Human Cortex Shapes Large-Scale Neural Dynamics. Neuron. 2019;101(6):1181-94 e13. Epub 2019/02/13. doi: 10.1016/j.neuron.2019.01.017. PubMed PMID: 30744986; PubMed Central PMCID: PMCPMC6447428.
  
  (7) Deco G, Kringelbach ML, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, et al. Dynamical consequences of regional heterogeneity in the brain's transcriptional landscape. Sci Adv. 2021;7(29). Epub 2021/07/16. doi: 10.1126/sciadv.abf4752. PubMed PMID: 34261652; PubMed Central PMCID: PMCPMC8279501.
  
  (8) Chen J, Tam A, Kebets V, Orban C, Ooi LQR, Asplund CL, et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat Commun. 2022;13(1):2217. Epub 2022/04/27. doi: 10.1038/s41467-022-29766-8. PubMed PMID: 35468875; PubMed Central PMCID: PMCPMC9038754.
  
  (9) Li J, Bzdok D, Chen J, Tam A, Ooi LQR, Holmes AJ, et al. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci Adv. 2022;8(11):eabj1812. Epub 2022/03/17. doi: 10.1126/sciadv.abj1812. PubMed PMID: 35294251; PubMed Central PMCID: PMCPMC8926333.
  
  (10) Thomas C, Ye FQ, Irfanoglu MO, Modi P, Saleem KS, Leopold DA, et al. Anatomical accuracy of brain connections derived from diffusion MRI tractography is inherently limited. Proc Natl Acad Sci U S A. 2014;111(46):16574-9. Epub 2014/11/05. doi: 10.1073/pnas.1405672111. PubMed PMID: 25368179; PubMed Central PMCID: PMCPMC4246325.
  
  (11) Reveley C, Seth AK, Pierpaoli C, Silva AC, Yu D, Saunders RC, et al. Superficial white matter fiber systems impede detection of long-range cortical connections in diffusion MR tractography. Proc Natl Acad Sci U S A. 2015;112(21):E2820-8. Epub 2015/05/13. doi: 10.1073/pnas.1418198112. PubMed PMID: 25964365; PubMed Central PMCID: PMCPMC4450402.
  
  (12) Gu Z, Jamison KW, Sabuncu MR, Kuceyeski A. Heritability and interindividual variability of regional structure-function coupling. Nat Commun. 2021;12(1):4894. Epub 2021/08/14. doi: 10.1038/s41467-021-25184-4. PubMed PMID: 34385454; PubMed Central PMCID: PMCPMC8361191.
  
  (13) Liu ZQ, Vazquez-Rodriguez B, Spreng RN, Bernhardt BC, Betzel RF, Misic B. Time-resolved structure-function coupling in brain networks. Commun Biol. 2022;5(1):532. Epub 2022/06/03. doi: 10.1038/s42003-022-03466-x. PubMed PMID: 35654886; PubMed Central PMCID: PMCPMC9163085.
  
  (14) Zamani Esfahlani F, Faskowitz J, Slack J, Misic B, Betzel RF. Local structure-function relationships in human brain networks across the lifespan. Nat Commun. 2022;13(1):2053. Epub 2022/04/21. doi: 10.1038/s41467-022-29770-y. PubMed PMID: 35440659; PubMed Central PMCID: PMCPMC9018911.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.11.557107v4
www.biorxiv.org www.biorxiv.org

PKR activation-induced mitochondrial dysfunction in HIV-transgenic mice with nephropathy

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study presents valuable new insights into HIV-associated nephropathy (HIVAN) kidney phenotype in the Tg26 transgenic mouse model and delineates the kidney cell types that express HIV genes and are injured in these HIV-transgenic mice. A series of compelling experiments demonstrated that PKR inhibition can ameliorate HIVAN with reversal of mitochondrial dysfunction (mainly confined to endothelial cells), a prominent feature shared in other kidney diseases. Although there are concerns regarding the specificity of C16 to PKR inhibition, as well as with the in situ hybridization studies, the data suggests that inhibition of PKR and mitochondrial dysfunction has potential clinical significance for HIVAN.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  HIV-associated nephropathy (HIVAN) is a rapidly progressing form of kidney disease that manifests secondary to untreated HIV infection, and is predominantly seen in individuals of African descent. Tg26 mice carrying an HIV transgene lacking gag and pol exhibit high levels of albuminuria and rapid decline in renal function that recapitulates many features of HIVAN in humans. HIVAN is seen predominantly in individuals carrying two copies of missense variants in the APOL1 gene, and the authors have previously shown that APOL1 risk variant mRNA induces activity of the double-strand RNA sensor kinase PKR. Because of the tight association between the APOL1 risk genotype and HIVAN, the authors hypothesized that PKR activation may mediate renal injury in Tg26 mice and tested this hypothesis by treating mice with a commonly used PKR inhibitory compound called C16. Treatment with C16 substantially attenuated renal damage in the Tg26 model as measured by urinary albumin/creatinine ratio, urinary NGAL/creatinine ratio, and improvement in histology. The authors then performed bulk and single-nucleus RNAseq on kidneys from mice from different treatment groups to identify pathways and patterns of cell injury associated with HIV transgene expression as well as to determine the mechanistic basis for the effect of C16 treatment. They show that proximal tubule nuclei from Tg26 mice appear to have more mitochondrial transcripts which was reversed by C16 treatment and suggest that this may provide evidence of mitochondrial dysfunction in this model. They explore this hypothesis by showing there is a decrease in the expression of nuclear-encoded genes and proteins involved in oxidative phosphorylation as well as a decrease in respiratory capacity via functional assessment of respiration in tubule and glomerular preparations from these mouse kidneys. All of these changes were reversed by C16 treatment. The authors propose the existence of a novel injured proximal tubule cell-type characterized by the leak of mitochondrial transcripts into the nucleus (PT-Mito). Analysis of HIV transgene expression showed high level expression in podocytes, consistent with the pronounced albuminuria that characterizes this model and HIVAN, but transcripts were also detected in tubular and endothelial cells. Because of the absence of mitochondrial transcripts in the podocytes, the authors speculate that glomerular mitochondrial dysfunction in this model is driven by damage to glomerular endothelial cells.
  
  Strengths:
  
  The strengths of this study include the comprehensive transcriptional analysis of the Tg26 model, including an evaluation of HIV transgene expression, which has not been previously reported. This data highlights that HIV transcripts are expressed in a subset of podocytes, consistent with the highly proteinuric disease seen in mice and humans. However, transcripts were also seen in other tubular cells, notably intercalated cells, principal cells and injured proximal tubule cells. Though the podocyte expression makes sense, the relevance of the tubular expression to human disease is still an open question.
  
  The data in support of mitochondrial dysfunction are also robust and rely on combined evidence from downregulation of transcripts involved in oxidative phosphorylation, decreases in complex I and II as determined by immunoblot, and assessments of respiratory capacity in tubular and glomerular preparations. These data are largely consistent with other preclinical renal injury models reported in the literature as well as previous, less thorough assessments in the Tg26 model.
  
  Weaknesses:
  
  The key weakness of the study lies in the use of a PKR inhibitor with questionable specificity. C16 has been reported to inhibit numerous other kinases including cyclin CDKs and GSK3α and -β, and this means that the conclusions of this study with respect to the role of PKR are highly questionable. The rationale for the dose used was not provided (and is lower than used in other publications with C16), and in the absence of drug exposure data and assessment of target engagement, it is difficult to ascertain whether substantial inhibition of PKR was achieved.
  
  A second key weakness lies in the identification of the PT-Mito cell cluster. Though the authors provide some rationale for the identification of this specific cell type, it seems equally plausible the cells merely reflect a high background capture of mitochondria in a subset of droplets. The IHC analysis that was provided is not convincing enough to support the claim and more careful high resolution imaging and in situ hybridization (with appropriate quantitation) will be needed to provide substantive support for the presence of a proximal tubule cell type with mitochondrial transcript that are trafficked to the nucleus.
  
  We appreciate the reviewer’s thoughtful summary.
  
  With regard to non-specificity of C16, we added to the Discussion a description and references that describe non-specificity of C16. as suggested by the reviewer. Of note, the C16 doses that we used were also used previously (Okamoto, CommBiol, 2018). Importantly, newly-added immunofluorescence images using a phospho-PKR specific antibody showed PKR inhibition (Supplemental Figure 1).
  
  Identification of the PT-Mito cluster in tissues was challenging, mainly due to the absence of existence of know marker genes for newly-identified cluster. Finally, We added in situ hybridization images, with a negative control probe, to show specificity of target probes.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Numerous studies by the authors and other groups have demonstrated an important role for HIV gene expression kidney cells in promoting progressive chronic kidney disease, especially HIV-associated nephropathy. The authors had previously demonstrated a role for protein kinase R (PKR) in a non-HIV transgenic model of kidney disease (Okamoto, Commun Bio, 2021). In this study, the authors used innovative techniques including bulk and single nuclear RNAseq to demonstrate that mice expressing a replication-incompetent HIV transgene have prominent dysregulation of mitochondrial gene expression and activation of PKR and that treatment of these mice with a small molecule PKR inhibitor ameliorated the kidney disease phenotype in HIV-transgenic mice. They also identified STAT3 as a key upstream regulator of kidney injury in this model, which is consistent with previously published studies. Other important advances include identifying the kidney cell types that express the HIV transgene and have dysregulation of cellular pathways.
  
  Strengths:
  
  Major strengths of the study include the use of a wide variety of state-of-the-art molecular techniques to generate important new data on the pathogenesis of kidney injury in this commonly used model of kidney disease and the identification of PKR as a potential druggable target for the treatment of HIV-induced kidney disease. The authors also identify a potential novel cell type within the kidney characterized by high expression of mitochondrial genes.
  
  Weaknesses:
  
  Though the HIV-transgenic model used in these studies results in a phenotype that is very similar to HIV-associated nephropathy in humans, the model has several limitations that may prevent direct translation to human disease, including the fact that mice lack several genetic factors that are important contributors to HIV and kidney pathogenesis in humans. Additional studies are therefore needed to confirm these findings in human kidney disease.
  
  We appreciate the succinct summary of the present work. We agree that the findings from the HIV Tg26 mouse model warrant additional investigation in human kidney disease samples. Further studies will be needed to confirm whether the mechanisms presented here are operative in human HIVAN or other RNA virus-associated kidney diseases.
  
  Reviewer #1 (Recommendations For The Authors)
  
  The specificity of the C16 tool has been called into question in 3 publications - Chen et al, 2008, PMID: 19046382; Lopez-Grancha et al, 2021, PMID: 34531308; and Cusak et al, 2023, PMID: 36400288. Lopez-Grancha et al have reported a novel, more selective PKR inhibitor with good pharmacological properties that might enable a more robust test of the PKR hypothesis. Regardless, compound exposures and target engagement (i.e. by monitoring phosphorylation of PKR targets such eIF2α) should accompany these studies. Alternatively, it may be easier to probe the role of PKR in Tg26 pathogenicity by crossing the Tg26 line to a PKR knockout mouse.
  
  In response, we have added a description and references about the the possibility of non-specificity of C16 in the Discussion as a limitation as suggested. (Page 21).
  
  “Third, we acknowledge possibility of a non-specific effect of C16 as an inhibitor of PKR.66-68”
  
  Further, we added immunohistochemistry images of pPKR on kidney tissue as shown in Supplemental Figure 1A-D. Images showed PKR activation in Tg26 tubular cells, which was inhibited by C16 treatment.
  
  Author response image 1.
  
  Immunofluorescent images showing pPKR. (A-D) Immunofluorescent images showed PKR activation by detecting pPKR in Tg26 mouse kidney. pPKR was inhibited by C16 treatments.
  
  The suggested PKR knockout mice experiment is an excellent idea for future work but we believe Is outside the scope of the current manuscript.
  
  To enhance the evidentiary base for the PT-Mito cell type, it would be interesting to know whether these cells can also be found in human datasets like KPMP, though this might require reprocessing the original snRNAseq data. Further in situ hybridization in both mouse and human samples using fluorescent rather than colorimetric approaches should yield a more compelling dataset to provide evidence for this cell type. These approaches would also allow for more precise quantification of the PT-Mito cells compared to the population of proximal tubule cells. Again, the default assumption here should be that the mitochondrial transcripts represent a contamination, and the purpose of these additional experiments is to definitively rule out that explanation.
  
  Authors: First, as suggested, we carried out additional analyses. We examined a publiclyavailable human kidney snRNA-seq dataset (GSE131882) and found in it the same PT-Mito cluster as shown in Supplemental Figure 6. The PT-Mito cluster was located in close proximity to the PT cluster in a UMAP plot. We added this finding in the Results as follows (Page 12):
  
  “We also confirmed the existence of similar PT-Mito cluster in published human kidney single-nuclear RNA-seq data47 by the re-analysis of the original data. (Supplemental Figure 6A-C).”
  
  Author response image 2.
  
  PT-Mito cluster detection of publicly available human kidney single-nuclear RNA-seq data (GSE131882) (A) UMAP plot of human kidney single-nuclear RNA-seq data shows 16 clusters. Cluster 1, 4 are proximal tubule (PT) clusters, and cluster 7 is PT-Mito cluster. (B) Dot plot shows expression of PT marker genes and PT-Mito marker genes obtained from current manuscript data. PTMito markers including MT-CO1 and MT-CO2 had high expression in cluster 7. (C) UMAP plot shows all six samples are contributing to all cell clusters.
  
  Second, as suggested, we also included negative control data from in situ hybridization studies (Supplementary Figure 5A, 5B), which shows that the signals in Figure 4B, 4C are true signals.
  
  Author response image 3.
  
  Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.
  
  Reviewer #2 (Recommendations For The Authors)
  
  (1) The supplementary data file seems to have been uploaded twice but the supplementary methods were not available which would have been helpful when assessing some methods such as using PodoCount to count podocytes.
  
  We acknowledge that we inadvertently failed to upload the Supplementary Methods section-thank you for pointing this out. The supplementary methods are now provided in the revised submission, including detailed methods about PodoCount. Corresponding descriptions are as follows:
  
  “Estimation of glomerular podocyte count
  
  PodoCount5, a computational tool for whole slide podocyte estimation from digitized histologic sections, was used to detect, enumerate, and characterize podocyte nuclear profiles in the glomeruli of immunohistochemically labeled (IHC-labeled) murine kidney sections. Formalin-fixed, paraffin embedded tissues (2 µm thickness) were IHC-labeled for p57kip2, a marker of podocyte terminal differentiation (ab75974, Abcam, Cambridge, UK), and detected with horse radish peroxidase (RU-HRP1000, Diagnostic BioSystems, Pleasanton, CA) and diaminobenzidine chromogen substrate (BSB0018A, Bio SB, Santa Barbara, CA). A periodic acid-Schiff post-stain was applied without hematoxylin counterstain. The tool uses a combination of stain deconvolution, digital image processing, and feature engineering to compute histologic podometrics6 with correction for section thickness7. In this study, PodoCount was used to assess mean glomerular podocyte count per mouse.“
  
  (2) In the abstract, the authors give the impression that they know definitively the sequence of HIV gene expression, cytoskeletal dysregulation, dedifferentiation, then loss from glomeruli. Since they could only examine cells that were present in glomeruli, they can't definitively say much about the cells that were lost from glomeruli.
  
  As suggested, deleted the following text: “and were lost from glomeruli tuft”
  
  (3) The authors state that 56,976 cells were used for snRNAseq studies. Was the number of cells similar for each of the 8 mice (from 4 different groups)?
  
  In response, we have created a new table summarizing numbers of nuclei from each sample (i.e. each mouse) added to the Supplemental Figure 2D as follows:
  
  Author response table 1.
  
  Pre-processing of single-nuclear RNA-seq data, Breakdown of nuclei numbers from each sample showed comparable numbers of nuclei analyzed.
  
  (4) Please provide information on the assay that was used to measure creatinine since some methods can be unreliable in mice
  
  This is now provided in the revised submission, including creatinine measurement methods (LC-MS/MS) on page 3 of Supplementary Material:
  
  “Mouse chemistry measurements
  
  Plasma creatinine was measured by isotope dilution LC-MS/MS at The University of Alabama at Birmingham O’Brien Center Core C (Birmingham, AL).”
  
  (5) The authors state that expression of PKR (Eif2ak2) was expressed in all nephron segments. However, it appears on visual inspection of the UMAP in Fig S2B that the percentage of cells expressing Eif2ak2 was low. What percent of cells expressed Eif2ak2 and if it was a low percentage, what is the authors hypothesis for how expression in a small percentage of cells led to the kidney phenotype?
  
  Supplemental Figure 2B (now 3B) does show modest expression of Eif2ak2, approximately 10%. The technique may lack sensitivity to detect low gene expression and even low gene expression may be sufficient to cause phenotypic change.
  
  (6a) In figure 4B and C, it is not clear what genotype/treatment group is shown.
  
  The legend for figure 4B, 4C has been modified to state that the group was wildtype mice
  
  (B, C) In situ hybridization of mt-Co1 and mt-Atp6 genes showed signals inside nuclei of WT mice
  
  (6b) Also, if these ISH images are from Tg26 mice, it would be helpful to do ISH in mice with/without C16 treatment.
  
  These images of ISH for these two genes are from wild-type mice, as now stated in the revised legend. Our purpose was to show that these mitochondrial-encoded gene transcripts (mt-Co1 and mt-Atp6) are transported to nuclei from the cytoplasm. We believe it is not necessary to do ISH in Tg26 mice because these genes are not disease-specific.
  
  (6c) Also, only 3-6% of cells express these "PT-mito" markers by snRNAseq, but it appears that far more are expressed by ISH, raising concerns for nonspecific binding of the ISH probe.
  
  (6d) Also, nonsense controls should be included to demonstrate the specificity of the ISH data.
  
  First (comment 6c), the PT-mito cluster does not have specific markers, to our knowledge. Second (comment 6d) , to address the concern for non-specific binding of the ISH probes, we have now added additional ISH images, together with a negative control probe (C. elegans gene dapB) and a positive control probe (mouse Ppib), as shown in Supplementary Figure 5A and 5B, respectively.
  
  Author response image 4.
  
  Additional in situ hybridization images. (A) In situ hybridization images probing dapB (negative control probe) showed no signals. (B) In situ hybridization images probing Ppib (positive control probe) showed strong signals.
  
  (7) The authors state that "mitochondrial dysfunction was most pronounced in the PT-Mito cluster" but in Figure 4D, the oxidative phosphorylation activation Z score was most down in the PT-inj (injured PT cells) and the PT-Mito cells were the 4-most downregulated cell type.
  
  We appreciate the careful reading and agree with reviewer’s comment. In the revision, we have deleted “most” from this description.
  
  (8) In Fig 4F, please state what "Cp expression" means.
  
  We have spelled out ceruloplasmin (Cp).
  
  (9) It is not clear in immunohistochemistry images in Fig 5F where the p-stat3 was detected due to the hematoxylin counterstain which may have obscured subtle nuclear staining. Also, some of the strongest staining appears to be in peritubular capillaries, instead of tubular and glomerular epithelial cells.
  
  We have added arrows to help readers see where we show that p-Stat3 was detected as faintly-brown and distinct cytoplasmic granules in injured tubular cells in Tg26 mice (panel F), as opposed to diffuse in tubular cytoplasmic color in wild-type mice (panel E).
  
  Author response image 5.
  
  (10) For the studies of mitochondrial oxygen consumption (Fig 6), it would be helpful to also provide data on the effect of C16 in wild-type kidneys, in case C16 somehow causes a primary increase in mitochondrial oxygen consumption rather than preventing HIV-induced loss in kidney cells from HIV-transgenic mice.
  
  We did not include Seahorse data regarding oxygen consumption from WT mice treated with C16, as C16 did not affect either renal function or transcriptomes in WT mice, in contrast to the Tg26 mice (Figure 1A-G).
  
  (11) The authors emphasize that podocytes had the highest expression of HIV genes (Fig 7). However, it appears that <2% of podocytes expressed HIV genes. How do the authors explain the severe renal phenotype given the relatively small number of cells expressing the HIV transgene? Also, did the same cells express all/most of the HIV transcripts, or did some cells express some HIV transcripts? For instance, since the authors state that vpr and nef have the most important role in kidney injury, were the same cells that expressed nef also expressing Vpr?
  
  We know that snRNA-seq cannot detect the whole transcriptome in each cell, due to the well-known drop-out effect characteristic of the method. Several factors may contribute to this drop-out effect, including stochastic patterns of gene expression, low RNA amounts and inefficient mRNA capture (Qiu, Nature Comm, 2020; Ran, Bioinformatics, 2020).
  
  Our interpretation is that HIV gene expressing-podocytes had higher expression of HIV genes, but it does not mean that other kidney cells entirely lack HIV gene expression. With regard to co-expression of other HIV transcripts, nef and vpr were more often coexpressed as shown in Figure 7J. Vpr was expressed in nef-positive podocytes and not detected in nef-negative podocytes.
  
  (12) In figure 8, the authors emphasize the dysregulation of genes involved in cell-cell interaction, particularly PDGF-D. They show some data for the effect of C16 in this system in Fig 8 but it would be helpful if they can state the effect in the text of the Results section.
  
  We have added text in the Results describing activating interactions in Tg26 mice, that were reduced by C16 exposure, as follows: (page 18)
  
  “For example, platelet derived growth factor D (PDGF-D) was upregulated in PT-Inj in Tg26 mice and was downregulated by C16 treatment (Figure 8D). Further, PDGF-D may interact with PDGFR-B in fibroblasts.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.10.03.510678v4
www.biorxiv.org www.biorxiv.org

Interplay of YEATS2 and GCDH regulates histone crotonylation and drives EMT in head and neck cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We sincerely appreciate the editors for overseeing an efficient review process and for upholding the high standards of the journal. We have made extensive revisions to the manuscript after carefully reviewing the reviewers’ comments. We have addressed all the comments in our response and have incorporated the changes suggested by the reviewers to the best of our abilities. Notably, we have made the following major changes to the manuscript:
  
  (1) We have increased the patient cohort size from 10 to 23 for evaluating the levels of YEATS2 and H3K27cr.
  
  (2) To further strengthen the clinical relevance of our study, we have checked the expression of major genes involved in the YEATS2-mediated histone crotonylation axis (YEATS2, GCDH, ECHS1, Twist1 along with H3K27cr levels) in head and neck cancer tissues using immunohistochemistry.
  
  (3) We have performed extensive experiments to look into the role of p300 in assisting YEATS2 in regulating promoter histone crotonylation.
  
  The changes made to the manuscript figures have been highlighted in our response. We have also updated the Results section in accordance with the updated figures. Tables 1-4 and Supplementary files 1-3 have been moved to one single Excel workbook named ‘Supplementary Tables 1-8’. Additional revisions have been made to improve the overall quality of the manuscript and enhance data visualization. These additional changes are highlighted in the tracked changes version of the manuscript.
  
  Our response to the Public Reviews and ‘Recommendations to the Authors’ can be found below.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript investigates a mechanism between the histone reader protein YEATS2 and the metabolic enzyme GCDH, particularly in regulating epithelial-to-mesenchymal transition (EMT) in head and neck cancer (HNC).
  
  Strengths:
  
  Great detailing of the mechanistic aspect of the above axis is the primary strength of the manuscript.
  
  Weaknesses:
  
  Several critical points require clarification, including the rationale behind EMT marker selection, the inclusion of metastasis data, the role of key metabolic enzymes like ECHS1, and the molecular mechanisms governing p300 and YEATS2 interactions.
  
  We would like to sincerely thank the reviewer for the detailed, in-depth, and positive response. We have implemented constructive revisions to the manuscript to address the reviewer’s concerns effectively.
  
  Major Comments:
  
  (1) The title, "Interplay of YEATS2 and GCDH mediates histone crotonylation and drives EMT in head and neck cancer," appears somewhat misleading, as it implies that YEATS2 directly drives histone crotonylation. However, YEATS2 functions as a reader of histone crotonylation rather than a writer or mediator of this modification. It cannot itself mediate the addition of crotonyl groups onto histones. Instead, the enzyme GCDH is the one responsible for generating crotonyl-CoA, which enables histone crotonylation. Therefore, while YEATS2 plays a role in recognizing crotonylation marks and may regulate gene expression through this mechanism, it does not directly catalyse or promote the crotonylation process.
  
  We thank the reviewer for their insightful comment regarding the precision of our title. We agree that the initial wording 'mediates' could imply a direct enzymatic role for YEATS2 in histone crotonylation, which is indeed not the case. As the reviewer correctly points out, YEATS2 functions as a 'reader' of histone crotonylation marks.
  
  However, our research demonstrates that YEATS2 plays a crucial indirect regulatory role in the establishment of these crotonylation marks. Specifically, our data indicates that YEATS2 facilitates the recruitment of the histone crotonyltransferase p300 to specific gene promoters, such as that of SPARC. This recruitment mechanism directly impacts the localized deposition of crotonyl marks on nearby histone residues. Therefore, while YEATS2 does not directly catalyze the addition of crotonyl groups, its presence and interaction with p300 are essential for the regulation and establishment of histone crotonylation at these critical sites.
  
  To accurately reflect this nuanced, yet significant, regulatory mechanism, we have revised the title. We are replacing 'mediates' with 'regulates' to precisely convey that YEATS2 influences the histone crotonylation process, albeit indirectly, through its role in recruiting the enzymatic machinery. The updated title will now read: 'Interplay of YEATS2 and GCDH regulates histone crotonylation and drives EMT in head and neck cancer.' We believe this change maintains the core message of our findings while enhancing the scientific accuracy of the title.
  
  (2) The study suggests a link between YEATS2 and metastasis due to its role in EMT, but the lack of clinical or pre-clinical evidence of metastasis is concerning. Only primary tumor (PT) data is shown, but if the hypothesis is that YEATS2 promotes metastasis via EMT, then evidence from metastatic samples or in vivo models should be included to solidify this claim.
  
  We thank the reviewer for their valuable suggestion regarding the need for clinical or pre-clinical evidence of metastasis. We fully agree that direct evidence linking YEATS2 to metastasis would significantly strengthen our claims, especially given its demonstrated role in EMT.
  
  Our primary objective in this study was to meticulously dissect the molecular mechanisms by which YEATS2 regulates histone crotonylation and drives EMT in head and neck cancer. We have provided comprehensive upstream and downstream molecular insights into this process, culminating in a clear demonstration of YEATS2's functional importance in promoting EMT through multiple in vitro phenotypic assays (e.g., Matrigel invasion, wound healing, 3D invasion assays). As the reviewer notes, EMT is a widely recognized prerequisite for cancer metastasis[1]. Therefore, establishing YEATS2 as a driver of EMT directly implicates its potential role in metastatic progression.
  
  To further address the reviewer's concern and bridge the gap between EMT and metastasis, we have performed additional analyses that will be incorporated into the revised manuscript:
  
  Clinical Correlation with Tumor Grade: We analyzed publicly available head and neck cancer patient datasets. Our analysis revealed a significant positive correlation between YEATS2 expression and increasing tumor grade. Specifically, we observed significantly higher YEATS2 expression in Grade 2-4 tumors compared to Grade 1 tumors. Given that higher tumor grades are frequently associated with increased metastatic potential and poorer prognosis in HNC[2], this finding provides compelling clinical correlative evidence linking elevated YEATS2 expression to more aggressive disease.
  
  Gene Set Enrichment Analysis (GSEA) for Metastasis Pathways: To further explore the biological processes associated with YEATS2 in a clinical context, we performed GSEA on TCGA HNC patient samples stratified by high versus low YEATS2 expression. This analysis robustly demonstrated a positive enrichment of metastasis-related gene sets in the high YEATS2 expression group, compared to the low YEATS2 group. This strengthens the mechanistic link by showing that pathways associated with metastasis are co-ordinately upregulated when YEATS2 is highly expressed.
  
  These new clinical data provide strong correlative evidence supporting a direct association of YEATS2 with metastasis, building upon our detailed mechanistic dissection of its role in EMT.
  
  (3) There seems to be some discrepancy in the invasion data with BICR10 control cells (Figure 2C). BICR10 control cells with mock plasmids, specifically shControl and pEGFP-C3 show an unclear distinction between invasion capacities. Normally, we would expect the control cells to invade somewhat similarly, in terms of area covered, within the same time interval (24 hours here). But we clearly see more control cells invading when the invasion is done with KD and fewer control cells invading when the invasion is done with OE. Are these just plasmid-specific significant effects on normal cell invasion? This needs to be addressed.
  
  We thank the reviewer for their careful examination of Figure 2C and their insightful observation regarding the appearance of the control cells in relation to the knockdown (Figure 2B) and overexpression (Figure 2C) experiments. We understand how, at first glance, the control invasion levels across these panels might seem disparate.
  
  We wish to clarify that Figure 2B (YEATS2 knockdown) and Figure 2C (YEATS2 overexpression) represent two entirely independent experiments, conducted with distinct experimental conditions and methodologies, as detailed in our Methods section.
  
  Specifically:
  
  Figure 2B (Knockdown): Utilizes lentivirus-mediated transduction for stable shRNA delivery (shControl as control).
  
  Figure 2C (Overexpression): Utilizes transfection with plasmid DNA (pEGFP-C3 as control) via a standard transfection reagent.
  
  These fundamental differences in genetic manipulation methods (transduction vs. transfection), along with potential batch-to-batch variations in reagents or cell passage number at the time of each independent experiment, can indeed lead to variations in absolute basal invasion rates of control cells[3].
  
  Therefore, the invasion capacity of BICR10 control cells in Figure 2B (shControl) should only be compared to the YEATS2 knockdown conditions within that same panel. Similarly, the invasion capacity of control cells in Figure 2C (pEGFP-C3) should only be compared to the YEATS2 overexpression conditions within that specific panel. The crucial finding in each panel lies in the relative change in invasion caused by YEATS2 manipulation (knockdown or overexpression) compared to its respective, concurrently run control.
  
  We have ensured that all statistical analyses (as indicated in the figure legends and methods) were performed by comparing the experimental groups directly to their matched internal controls within each independent experiment. The significant increase in invasion upon YEATS2 overexpression and the significant decrease upon YEATS2 knockdown, relative to their respective controls, are robust and reproducible findings.
  
  (4) In Figure 3G, the Western blot shows an unclear band for YEATS2 in shSP1 cells with YEATS2 overexpression condition. The authors need to clearly identify which band corresponds to YEATS2 in this case.
  
  We thank the reviewer for pointing out the ambiguity in the YEATS2 Western blot for the shSP1 + pEGFP-C3-YEATS2 condition in Figure 3G. We apologize for this lack of clarity. The two bands seen in the shSP1+pEGFP-C3-YEATS2 condition correspond to the endogenous YEATS2 band (lower band) and YEATS2-GFP band (upper band, corresponding to overexpressed YEATS2-GFP fusion protein, which has a higher molecular weight). To avoid confusion, the endogenous band is now highlighted (marked by *) in the lane representing the shSP1+pEGFP-C3-YEATS2 condition. We have also updated the figure legend accordingly.
  
  (5) In ChIP assays with SP1, YEATS2 and p300 which promoter regions were selected for the respective genes? Please provide data for all the different promoter regions that must have been analysed, highlighting the region where enrichment/depletion was observed. Including data from negative control regions would improve the validity of the results.
  
  Throughout our study, we have performed ChIP-qPCR assays to check the binding of SP1 on YEATS2 and GCDH promoter, and to check YEATS2 and p300 binding on SPARC promoter. Using transcription factor binding prediction tools and luciferase assays, we selected multiple sites on the YEATS2 and GCDH promoter to check for SP1 binding. The results corresponding to the site that showed significant enrichment were provided in the manuscript. The region of SPARC promoter in YEATS2 and p300 ChIP assay was selected on the basis of YEATS2 enrichment found in the YEATS2 ChIP-seq data. The ChIP-qPCR data for all the promoter regions investigated (including negative controls) can be found below (Author response image 1.).
  
  Authors’ response image 1.
  
  (A) SP1 ChIP-qPCR results indicating SP1 occupancy on different regions of YEATS2 promoter. YEATS2 promoter region showing SP1 binding sites (indicated by red boxes) is shown above. SP1 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 3D. (B) SP1 ChIPqPCR results indicating SP1 occupancy on different regions of GCDH promoter. GCDH promoter region showing SP1 binding sites (indicated by red boxes) is shown above. SP1 showed significant enrichment at F2R2 region. The results corresponding to F2R2 region were included in Figure 7E. (C) YEATS2 ChIP-qPCR results in shControl vs. shYEATS2 BICR10 cells indicating YEATS2 occupancy on different regions of SPARC promoter. SPARC promoter region showing YEATS2 ChIP-seq and H3K27cr ChIP-seq signals is shown above. YEATS2 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 5C. (D) p300 ChIP-qPCR results in shControl vs. shYEATS2 BICR10 cells indicating p300 occupancy on different regions of SPARC promoter. p300 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 5F.
  
  (6) The authors establish a link between H3K27Cr marks and GCDH expression, and this is an already well-known pathway. A critical missing piece is the level of ECSH1 in patient samples. This will clearly delineate if the balance shifted towards crotonylation.
  
  We greatly appreciate the reviewer's insightful comment regarding the importance of assessing ECSH1 levels in patient samples to clearly delineate the metabolic balance shifting towards crotonylation. We fully agree that this is a critical piece of evidence.
  
  To directly address this point and substantiate our claim regarding the altered metabolic balance in HNC, we had previously analyzed the expression of both GCDH and ECHS1 in TCGA HNC RNA-seq data (as presented in Figure 4—figure supplement 1A and B). This analysis revealed a consistent increase in GCDH expression and a concomitant decrease in ECHS1 expression in tumor samples compared to normal tissues. Based on these findings, we hypothesized that this altered expression profile would indeed lead to an accumulation of crotonyl-CoA and, consequently, an overall increase in histone crotonylation in HNC.
  
  To further validate and extend these findings at the protein level, we have now performed immunohistochemistry (IHC) analysis for both ECHS1 and GCDH in a cohort of HNC normal vs. tumor tissues. Our IHC results strikingly corroborate the RNA-seq data: GCDH consistently showed increased protein expression in tumor samples, whereas ECHS1 exhibited significantly reduced protein expression in tumors compared to their adjacent normal counterpart tissues (Figure 4E and Authors’ response figure 5).
  
  These new data, combined with existing TCGA HNC RNA-seq analysis strongly supports our proposed mechanism where altered GCDH and ECHS1 expression contributes to increased histone crotonylation in head and neck cancer.
  
  (7) The p300 ChIP data on the SPARC promoter is confusing. The authors report reduced p300 occupancy in YEATS2-silenced cells, on SPARC promoter. However, this is paradoxical, as p300 is a writer, a histone acetyltransferase (HAT). The absence of a reader (YEATS2) shouldn't affect the writer (p300) unless a complex relationship between p300 and YEATS2 is present. The role of p300 should be further clarified in this case. Additionally, transcriptional regulation of SPARC expression in YEATS2 silenced cells could be analysed via downstream events, like Pol-II recruitment. Assays such as Pol-II ChIP-qPCR could help explain this.
  
  We greatly appreciate the reviewer's insightful observation regarding the apparently paradoxical reduction of p300 occupancy on the SPARC promoter upon YEATS2 silencing (Figure 5F), and their call for further clarification of p300's role and the potential complex relationship with YEATS2. We agree that this point required further mechanistic investigation.
  
  As we have shown through RNA-seq and ChIP-seq analyses, YEATS2 broadly influences histone crotonylation levels at gene promoters, thereby impacting gene expression. While p300 is indeed a known histone acetyltransferase (HAT) with promiscuous acyltransferase activity, including crotonyltransferase activity[4], the precise mechanism by which its occupancy is affected by a 'reader' protein like YEATS2 was unclear. Our initial data suggested a dependency of p300 recruitment on YEATS2.
  
  To directly address the reviewer's concern and thoroughly delineate the molecular mechanism of cooperativity between YEATS2 and p300 in regulating histone crotonylation, we have now performed a series of targeted experiments, which have been incorporated into the revised manuscript:
  
  (a) Validation of p300's role in SPARC expression: We performed p300 knockdown in BICR10 cells, followed by immunoblotting to assess SPARC protein levels. As expected, a significant decrease in SPARC protein levels was observed upon p300 knockdown (Figure 5G). This confirms p300's direct involvement in SPARC gene expression.
  
  (b) Direct interaction between YEATS2 and p300: To investigate a potential physical association, we performed co-immunoprecipitation assays to check for an interaction between endogenous YEATS2 and p300. Our results clearly demonstrate the presence of YEATS2 in the p300-immunoprecipitate sample, indicating that YEATS2 and p300 physically interact and likely function together as a complex to drive the expression of target genes like SPARC (Figure 5H). This direct interaction provides the mechanistic basis for how YEATS2 influences p300 occupancy.
  
  (c) Impact on transcriptional activity (Pol II recruitment): As suggested, we performed RNA Polymerase II (Pol II) ChIP-qPCR on the SPARC promoter in YEATS2 knockdown cells. We observed a significant decrease in Pol II occupancy on the SPARC promoter after YEATS2 knockdown in BICR10 cells (Figure 6C). This confirms that YEATS2 silencing leads to reduced transcriptional initiation/elongation at this promoter.
  
  (d) p300's direct role in H3K27cr on SPARC promoter: To confirm p300's specific role in crotonylation at this locus, we performed H3K27cr ChIP-qPCR after p300 knockdown. As anticipated, a significant decrease in H3K27cr enrichment was observed on the SPARC promoter upon p300 knockdown (Figure 6J), directly demonstrating p300's crotonyltransferase activity at this site.
  
  (e) Rescue of p300 occupancy and H3K27cr by YEATS2 overexpression in SP1deficient cells: To further establish the YEATS2-p300 axis, we performed SP1 knockdown (which reduces YEATS2 expression) followed by ectopic YEATS2 overexpression, and then assessed p300 occupancy and H3K27cr levels on the SPARC promoter. While SP1 knockdown led to a decrease in both p300 and H3K27cr enrichment, we observed a significant rescue of both p300 occupancy and H3K27cr enrichment upon YEATS2 overexpression in the shSP1 cells (Figure 6E and F). This provides strong evidence that YEATS2 acts downstream of SP1 to regulate p300 recruitment and H3K27cr levels.
  
  Collectively, these comprehensive new results clearly establish that YEATS2 directly interacts with and assists in the recruitment of p300 to the SPARC promoter. This recruitment is crucial for p300's localized crotonyltransferase activity, leading to increased H3K27cr marks and subsequent activation of SPARC transcription. This clarifies the previously observed 'paradox' and defines a novel cooperative mechanism between a histone reader (YEATS2) and a writer (p300) in regulating histone crotonylation and gene expression.
  
  (8) The role of GCDH in producing crotonyl-CoA is already well-established in the literature. The authors' hypothesis that GCDH is essential for crotonyl-CoA production has been proven, and it's unclear why this is presented as a novel finding. It has been shown that YEATS2 KD leads to reduced H3K27cr, however, it remains unclear how the reader is affecting crotonylation levels. Are GCDH levels also reduced in the YEATS2 KD condition? Are YEATS2 levels regulating GCDH expression? One possible mechanism is YEATS2 occupancy on GCDH promoter and therefore reduced GCDH levels upon YEATS2 KD. This aspect is crucial to the study's proposed mechanism but is not addressed thoroughly.
  
  We appreciate the reviewer's valuable comment questioning the novelty of GCDH's role in crotonyl-CoA production and seeking further clarification on how YEATS2 influences crotonylation levels beyond its reader function.
  
  We agree that GCDH's general role in producing crotonyl-CoA is well-established[5,6]. Our study, however, aims to delineate a novel epigenetic-metabolic crosstalk in head and neck cancer, specifically investigating how the interplay between the histone crotonylation reader YEATS2 and the metabolic enzyme GCDH contributes to increased histone crotonylation and drives EMT in this context.
  
  Our initial investigations using GSEA on publicly available TCGA RNA-seq data revealed that HNC patients with high YEATS2 expression also exhibit elevated expression of genes involved in the lysine degradation pathway, prominently including GCDH. Recognizing the known roles of YEATS2 in preferentially binding H3K27cr7 and GCDH in producing crotonylCoA, we hypothesized that the elevated H3K27cr levels observed in HNC are a consequence of the combined action of both YEATS2 and GCDH. We have provided evidence that increased nuclear GCDH correlates with higher H3K27cr abundance, likely due to an increased nuclear pool of crotonyl-CoA, and that YEATS2 contributes through its preferential maintenance of crotonylation marks by recruiting p300 (as detailed in Figure 5FH and Figure 6J-L of the manuscript and elaborated in our response to point 7). Thus, our work highlights that both YEATS2 and GCDH are crucial for the regulation of histone crotonylation-mediated gene expression in HNC.
  
  To directly address the reviewer's query regarding YEATS2's influence on GCDH levels and nuclear histone crotonylation:
  
  • YEATS2 does not transcriptionally regulate GCDH: We did not find any evidence of YEATS2 directly regulating the expression levels of GCDH at the transcriptional level in HNC cells.
  
  • Novel finding: YEATS2 regulates GCDH nuclear localization: Crucially, we discovered that YEATS2 downregulation significantly reduces the nuclear pool of GCDH in head and neck cancer cells (Figure 7G). This is a novel mechanism suggesting that YEATS2 influences histone crotonylation not only by affecting promoter H3K27cr levels via p300 recruitment, but also by regulating the availability of the crotonyl-CoA producing enzyme, GCDH, within the nucleus.
  
  • Common upstream regulation by SP1: Interestingly, we found that both YEATS2 and GCDH expression are commonly regulated by the transcription factor SP1 in HNC. Our data demonstrate that SP1 binds to the promoters of both genes, and its downregulation leads to a decrease in their respective expressions (Figure 3 and Figure 7). This provides an important upstream regulatory link between these two key players.
  
  • Functional validation of GCDH in EMT: We further assessed the functional importance of GCDH in maintaining the EMT phenotype in HNC cells. Matrigel invasion assays after GCDH knockdown and overexpression in BICR10 cells revealed that the invasiveness of HNC cells was significantly reduced upon GCDH knockdown and significantly increased upon GCDH overexpression (results provided in revised manuscript Figure 7F and Figure 7—figure supplement 1F).
  
  These findings collectively demonstrate a multifaceted role for YEATS2 in regulating histone crotonylation by both direct recruitment of the writer p300 and by influencing the nuclear availability of the crotonyl-CoA producing enzyme GCDH. We acknowledge that the precise molecular mechanism governing YEATS2's effect on GCDH nuclear localization remains an exciting open question for future investigation, but our current data establishes a novel regulatory axis.
  
  (9) The authors should provide IHC analysis of YEATS2, SPARC alongside H3K27cr and GCDH staining in normal vs. tumor tissues from HNC patients.
  
  We thank the reviewer for their suggestion. We have performed IHC analysis for YEATS2, H3K27cr and GCDH in normal and tumor samples obtained from HNC patient.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript emphasises the increased invasive potential of histone reader YEATS2 in an SP1-dependent manner. They report that YEATS2 maintains high H3K27cr levels at the promoter of EMT-promoting gene SPARC. These findings assigned a novel functional implication of histone acylation, crotonylation.
  
  We thank the reviewer for the constructive comments. We are committed to making beneficial changes to the manuscript in order to alleviate the reviewer’s concerns.
  
  Concerns:
  
  (1) The patient cohort is very small with just 10 patients. To establish a significant result the cohort size should be increased.
  
  We thank the reviewer for this suggestion. We have increased the number of patient samples to assess the levels of YEATS2 (n=23 samples) and the results have been included in Figure 1G and Figure 1—figure supplement 1F.
  
  (2) Figure 4D compares H3K27Cr levels in tumor and normal tissue samples. Figure 1G shows overexpression of YEATS2 in a tumor as compared to normal samples. The loading control is missing in both. Loading control is essential to eliminate any disparity in protein concentration that is loaded.
  
  To address the reviewer’s concern, we have repeated the experiment and used H3 as a loading control as nuclear protein lysates from patient samples were used to check YEATS2 and H3K27cr levels.
  
  (3) Figure 4D only mentions 5 patient samples checked for the increased levels of crotonylation and hence forms the basis of their hypothesis (increased crotonylation in a tumor as compared to normal). The sample size should be more and patient details should be mentioned.
  
  As part of the revision, we have now checked the H3K27cr levels in a total of 23 patient samples and the results have been included in Figure 4D and Figure 4— figure supplement 1D. Patient details are provided in Supplementary Table 6.
  
  (4) YEATS2 maintains H3K27Cr levels at the SPARC promoter. The p300 is reported to be hyper-activated (hyperautoacetylated) in oral cancer. Probably, the activated p300 causes hyper-crotonylation, and other protein factors cause the functional translation of this modification. The authors need to clarify this with a suitable experiment.
  
  We thank the reviewer for this insightful comment regarding the functional relationship between YEATS2 and p300 in the context of H3K27cr, especially considering reports of p300 hyper-activation in oral cancer. We agree that a precise clarification of p300's role and its cooperativity with YEATS2 is crucial to fully understand the functional translation of this modification.
  
  As we have shown through global RNA-seq and ChIP-seq analyses, YEATS2 broadly affects gene expression by regulating histone crotonylation levels at gene promoters. We also recognize that the histone writer p300 is a promiscuous acyltransferase, known to add various non-acetyl marks, including crotonylation[4]. Our initial data, showing decreased p300 occupancy on the SPARC promoter upon YEATS2 downregulation (Figure 5F), suggested a strong dependency of p300 on YEATS2 for its recruitment. To fully delineate the molecular mechanism of this cooperativity and clarify how YEATS2 influences p300-mediated histone crotonylation and its functional outcomes, we have performed the following series of experiments, which have been integrated into the revised manuscript:
  
  (a) Validation of p300's role in SPARC expression: We performed p300 knockdown in BICR10 cells, followed by immunoblotting to assess SPARC protein levels. As expected, a significant decrease in SPARC protein levels was observed upon p300 knockdown (Figure 5G). This confirms p300's direct involvement in SPARC gene expression.
  
  (b) Direct interaction between YEATS2 and p300: To investigate a potential physical association, we performed co-immunoprecipitation assays to check for an interaction between endogenous YEATS2 and p300. Our results clearly demonstrate the presence of YEATS2 in the p300-immunoprecipitate sample, indicating that YEATS2 and p300 physically interact and likely function together as a complex to drive the expression of target genes like SPARC (Figure 5H). This direct interaction provides the mechanistic basis for how YEATS2 influences p300 occupancy.
  
  (c) Impact on transcriptional activity (Pol II recruitment): As suggested, we performed RNA Polymerase II (Pol II) ChIP-qPCR on the SPARC promoter in YEATS2 knockdown cells. We observed a significant decrease in Pol II occupancy on the SPARC promoter after YEATS2 knockdown in BICR10 cells (Figure 6C). This confirms that YEATS2 silencing leads to reduced transcriptional initiation/elongation at this promoter.
  
  (d) p300's direct role in H3K27cr on SPARC promoter: To confirm p300's specific role in crotonylation at this locus, we performed H3K27cr ChIP-qPCR after p300 knockdown. As anticipated, a significant decrease in H3K27cr enrichment was observed on the SPARC promoter upon p300 knockdown (Figure 6J), directly demonstrating p300's crotonyltransferase activity at this site.
  
  (e) Rescue of p300 occupancy and H3K27cr by YEATS2 overexpression in SP1deficient cells: To further establish the YEATS2-p300 axis, we performed SP1 knockdown (which reduces YEATS2 expression) followed by ectopic YEATS2 overexpression, and then assessed p300 occupancy and H3K27cr levels on the SPARC promoter. While SP1 knockdown led to a decrease in both p300 and H3K27cr enrichment, we observed a significant rescue of both p300 occupancy and H3K27cr enrichment upon YEATS2 overexpression in the sh_SP1_ cells (Figure 6K and L). This provides strong evidence that YEATS2 acts downstream of SP1 to regulate p300 recruitment and H3K27cr levels.
  
  Collectively, these comprehensive new results clearly establish that YEATS2 directly interacts with and assists in the recruitment of p300 to the SPARC promoter. This recruitment is crucial for p300's localized crotonyltransferase activity, leading to increased H3K27cr marks and subsequent activation of SPARC transcription. This clarifies the previously observed 'paradox' and defines a novel cooperative mechanism between a histone reader (YEATS2) and a writer (p300) in regulating histone crotonylation and gene expression.
  
  (5) I do not entirely agree with using GAPDH as a control in the western blot experiment since GAPDH has been reported to be overexpressed in oral cancer.
  
  We would like to clarify that GAPDH was not used as a loading control for protein expression comparisons between normal and tumor samples. GAPDH was used as a loading control only in experiments using head and neck cancer cell lines where shRNA-mediated knockdown or overexpression was employed. These manipulations specifically target the genes of interest and are not expected to alter GAPDH expression, making it a suitable loading control in these instances.
  
  (6) The expression of EMT markers has been checked in shControl and shYEATS2 transfected cell lines (Figure 2A). However, their expression should first be checked directly in the patients' normal vs. tumor samples.
  
  We thank the reviewer for the suggestion. We have now checked the expression of EMT marker Twist1 alongside YEATS2 expression in normal vs. tumor tissue samples using IHC (Figure 4E).
  
  (7) In Figure 3G, knockdown of SP1 led to the reduced expression of YEATS2 controlled gene Twist1. Ectopic expression of YEATS2 was able to rescue Twist1 partially. In order to establish that SP1 directly regulates YEATS2, SP1 should also be re-introduced upon the knockdown background along with YEATS2 for complete rescue of Twist1 expression.
  
  To address the reviewer’s concern regarding the partial rescue of Twist1 in SP1 depleted-YEATS2 overexpressed cells, we performed the experiment as suggested by the reviewer. We overexpressed both SP1 and YEATS2 in SP1-depleted cells and found that Twist1 depletion was almost completely rescued.
  
  Authors’ response image 2.
  
  Immunoblot depicting the decreased Twist1 levels on SP1 knockdown and its subsequent rescue of expression upon YEATS2 and SP1 overexpression in BICR10 (endogenous YEATS2 band indicated by *).
  
  (8) In Figure 7G, the expression of EMT genes should also be checked upon rescue of SPARC expression.
  
  We thank the reviewer for the suggestion. We have examined the expression of EMT marker Twist1 on YEATS2/ GCDH rescue. On overexpressing both YEATS2 and GCDH in sh_SP1_ cells we found that the depleted expression of Twist1 was rescued.
  
  Authors’ response image 3.
  
  Immunoblot depicting the decreased Twist1 levels on SP1 knockdown and its subsequent rescue of expression upon dual overexpression of YEATS2 and GCDH in BICR10 (* indicates GFP-tagged YEATS2 probed using GFP antibody).
  
  Reviewer #1 (Recommendations for the authors):
  
  While the study offers insights into the specific role of this axis in regulating epithelial-tomesenchymal transition (EMT) in HNC, its broader mechanistic novelty is limited by prior discoveries in other cancer types (https://doi.org/10.1038/s41586-023-06061-0). The manuscript would benefit from the inclusion of metastasis data, the role of key metabolic enzymes like ECHS1, the molecular mechanisms governing p300 and YEATS2 interactions, additional IHC data, negative control data in ChIP, and an explanation of discrepancies in certain figures.
  
  We thank the reviewer for their constructive suggestions. We have made extensive revisions to our manuscript to substantiate our findings. We have looked into the expression of ECHS1/ GCDH in HNC tumor tissues using IHC, performed extensive experiments to validate the role of p300 in YEATS2-mediated histone crotonylation, and provided additional data supporting our findings wherever required. The revised figures have been provided in the updated version of the manuscript and also in the Authors’ response.
  
  Minor Comments:
  
  (1) The study begins with a few EMT markers, such as Vimentin, Twist, and N-Cadherin to validate the role of YEATS2 in promoting EMT. Including a broader panel of EMT markers would strengthen the conclusions about the effects of YEATS2 on EMT and invasion. Additionally, the rationale for selecting these EMT markers is not fully elaborated. Why were other well-known EMT players not included in the analysis?
  
  On performing RNA-seq with shControl and sh_YEATS2_ samples, we discovered that TWIST1 was showing decrease in expression on YEATS2 downregulation. So Twist1 was investigated as a potential target of YEATS2 in HNC cells. N-Cadherin was chosen because it is known to get upregulated directly by Twist1[8]. Further, Vimentin was chosen as it a well-known marker for mesenchymal phenotype and is frequently used to indicate EMT in cancer cells[9].
  
  Authors’ response image 4.
  
  IGV plot showing the decrease in Twist1 expression in shControl vs. shYEATS2 RNA-seq data.
  
  Other than the EMT-markers used in our study, the following markers were amongst those that showed significant change in gene expression on YEATS2 downregulation.
  
  Authors’ response table 1.
  
  List of EMT-related genes that showed significant change in expression on YEATS2 knockdown in RNA-seq analysis.
  
  As depicted in the table above, majority of the genes that showed downregulation on YEATS2 knockdown were mesenchymal markers, while epithelial-specific genes such as Ecadherin and Claudin-1 showed upregulation. This data signifies the essential role of YEATS2 in driving EMT in head and neck cancer.
  
  (2) The authors use Ponceau staining, but the rationale behind this choice is unclear. Ponceau is typically used for transfer validation. For the same patient, western blot loading controls like Actin/GAPDH should be shown. Also, at various places throughout the manuscript, Ponceau staining has been used. These should also be replaced with Actin/GAPDH blots.
  
  Ponceau S staining is frequently used as alternative for housekeeping genes like GAPDH as control for protein loading[10]. However, to address this issue, we have repeated the western and used H3 as a loading control as nuclear protein lysates from patient samples were used to check YEATS2 and H3K27cr levels.
  
  For experiments (In Figures 5E, 6F, 6I, and 7H ) where we assessed SPARC levels in conditioned media obtained from BICR10 cells (secretory fraction), Ponceau S staining was deliberately used as the loading control. In such extracellular protein analyses, traditional intracellular housekeeping genes (like Actin or GAPDH) are not applicable. Ponceau S has been used as a control for showing SPARC expression in secretory fraction of mammalian cell lines in previous studies as well11.
  
  (3) The manuscript briefly mentions that p300 was identified as the only protein with increased expression in tumours compared to normal tissue in the TCGA dataset. What other writers were checked for? Did the authors check for their levels in HNC patients?
  
  We thank the reviewer for this observation. As stated by previous studies [12,13], p300 and GCN5 are the histone writers that can act as crotonyltransferases at the H3K27 position. Although the crotonyltransferase activity of GCN5 has been demonstrated in yeast, it has not been confirmed in human. Whereas the histone crotonyltransferase activity of p300 has been validated in human cells using in vitro HCT assays[4,14]. Therefore, we chose to focus on p300 for further validation of its role in YEATS2mediated regulation of histone crotonylation. We did not check the levels of p300 in HNC patient tissues. However, p300 showed higher expression in tumor as compared to normal in publicly available HNC TCGA RNA-seq data (Figure 5—figure supplement 1G).
  
  We acknowledge that the original statement in the manuscript, 'For this we looked at expression of the known writers of H3K27Cr mark in TCGA dataset, and discovered that p300 was the only protein that had increased expression in tumor vs. normal HNC dataset…', was indeed slightly misleading. Our intention was to convey that p300 is considered the major and most validated histone crotonyltransferase capable of influencing crotonylation at the H3K27 position in humans, and that its expression was notably increased in the HNC TCGA tumor dataset. We have now reframed this sentence in the revised manuscript to accurately reflect our findings and focus, as follows:
  
  'For this, we checked the expression of p300, a known writer of H3K27cr mark in humans, in the TCGA dataset. We found that p300 had increased expression in tumor vs. normal HNC dataset…'
  
  This revised wording more accurately reflects our specific focus on p300's established role and its observed upregulation in HNC.
  
  (4) Figure 6E, blot should be replaced. The results aren't clearly visible.
  
  We thank the reviewer for this observation. We have repeated the western blot and the Figure 6E (Figure 6F in the revised version of manuscript) has now been replaced with a cleaner blot.
  
  (5) Reference 9 and 19 are the same. Please rectify.
  
  We apologize for this inadvertent error. We have rectified this error in the updated version of the manuscript.
  
  References
  
  (1) Brabletz, T.; Kalluri, R.; Nieto, M. A.; Weinberg, R. A. EMT in Cancer. Nat Rev Cancer 2018, 18(2), 128–134. https://doi.org/10.1038/nrc.2017.118.
  
  (2) Pisani, P.; Airoldi, M.; Allais, A.; Aluffi Valletti, P.; Battista, M.; Benazzo, M.; Briatore, R.; Cacciola, S.; Cocuzza, S.; Colombo, A.; Conti, B.; Costanzo, A.; Della Vecchia, L.; Denaro, N.; Fantozzi, C.; Galizia, D.; Garzaro, M.; Genta, I.; Iasi, G. A.; Krengli, M.; Landolfo, V.; Lanza, G. V.; Magnano, M.; Mancuso, M.; Maroldi, R.; Masini, L.; Merlano, M. C.; Piemonte, M.; Pisani, S.; Prina-Mello, A.; Prioglio, L.; Rugiu, M. G.; Scasso, F.; Serra, A.; Valente, G.; Zannetti, M.; Zigliani, A. Metastatic Disease in Head & Neck Oncology. Acta Otorhinolaryngol Ital 2020, 40 (SUPPL. 1), S1–S86. https://doi.org/10.14639/0392-100X-suppl.1-40-2020.
  
  (3) Lin, J.; Zhang, P.; Liu, W.; Liu, G.; Zhang, J.; Yan, M.; Duan, Y.; Yang, N. A Positive Feedback Loop between ZEB2 and ACSL4 Regulates Lipid Metabolism to Promote Breast Cancer Metastasis. Elife 2023, 12, RP87510. https://doi.org/10.7554/eLife.87510.
  
  (4) Liu, X.; Wei, W.; Liu, Y.; Yang, X.; Wu, J.; Zhang, Y.; Zhang, Q.; Shi, T.; Du, J. X.; Zhao, Y.; Lei, M.; Zhou, J.-Q.; Li, J.; Wong, J. MOF as an Evolutionarily Conserved Histone Crotonyltransferase and Transcriptional Activation by Histone Acetyltransferase-Deficient and Crotonyltransferase-Competent CBP/P300. Cell Discov 2017, 3 (1), 17016. https://doi.org/10.1038/celldisc.2017.16.
  
  (5) Jiang, G.; Li, C.; Lu, M.; Lu, K.; Li, H. Protein Lysine Crotonylation: Past, Present, Perspective. Cell Death Dis 2021, 12 (7), 703. https://doi.org/10.1038/s41419-021-03987-z.
  
  (6) Yuan, H.; Wu, X.; Wu, Q.; Chatoff, A.; Megill, E.; Gao, J.; Huang, T.; Duan, T.; Yang, K.; Jin, C.; Yuan, F.; Wang, S.; Zhao, L.; Zinn, P. O.; Abdullah, K. G.; Zhao, Y.; Snyder, N. W.; Rich, J. N. Lysine Catabolism Reprograms Tumour Immunity through Histone Crotonylation. Nature 2023, 617 (7962), 818–826. https://doi.org/10.1038/s41586-023-06061-0.
  
  (7) Zhao, D.; Guan, H.; Zhao, S.; Mi, W.; Wen, H.; Li, Y.; Zhao, Y.; Allis, C. D.; Shi, X.; Li, H. YEATS2 Is a Selective Histone Crotonylation Reader. Cell Res 2016, 26 (5), 629–632. https://doi.org/10.1038/cr.2016.49.
  
  (8) Alexander, N. R.; Tran, N. L.; Rekapally, H.; Summers, C. E.; Glackin, C.; Heimark, R. L. NCadherin Gene Expression in Prostate Carcinoma Is Modulated by Integrin-Dependent Nuclear Translocation of Twist1. Cancer Res 2006, 66 (7), 3365–3369.
  
  https://doi.org/10.1158/0008-5472.CAN-05-3401.
  
  (9) Satelli, A.; Li, S. Vimentin in Cancer and Its Potential as a Molecular Target for Cancer Therapy. Cellular and Molecular Life Sciences 2011, 68 (18), 3033–3046. https://doi.org/10.1007/s00018-011-0735-1.
  
  (10) Romero-Calvo, I.; Ocón, B.; Martínez-Moya, P.; Suárez, M. D.; Zarzuelo, A.; Martínez-Augustin, O.; de Medina, F. S. Reversible Ponceau Staining as a Loading Control Alternative to Actin in Western Blots. Anal Biochem 2010, 401 (2), 318–320. https://doi.org/https://doi.org/10.1016/j.ab.2010.02.036.
  
  (11) Ling, H.; Li, Y.; Peng, C.; Yang, S.; Seto, E. HDAC10 Inhibition Represses Melanoma Cell Growth and BRAF Inhibitor Resistance via Upregulating SPARC Expression. NAR Cancer 2024, 6 (2), zcae018. https://doi.org/10.1093/narcan/zcae018.
  
  (12) Gao, D.; Li, C.; Liu, S.-Y.; Xu, T.-T.; Lin, X.-T.; Tan, Y.-P.; Gao, F.-M.; Yi, L.-T.; Zhang, J. V; Ma, J.Y.; Meng, T.-G.; Yeung, W. S. B.; Liu, K.; Ou, X.-H.; Su, R.-B.; Sun, Q.-Y. P300 Regulates Histone Crotonylation and Preimplantation Embryo Development. Nat Commun 2024, 15 (1), 6418. https://doi.org/10.1038/s41467-024-50731-0.
  
  (13) Li, K.; Wang, Z. Histone Crotonylation-Centric Gene Regulation. Epigenetics Chromatin 2021, 14 (1), 10. https://doi.org/10.1186/s13072-021-00385-9.
  
  (14) Sabari, B. R.; Tang, Z.; Huang, H.; Yong-Gonzalez, V.; Molina, H.; Kong, H. E.; Dai, L.; Shimada, M.; Cross, J. R.; Zhao, Y.; Roeder, R. G.; Allis, C. D. Intracellular Crotonyl-CoA Stimulates Transcription through P300-Catalyzed Histone Crotonylation. Mol Cell 2015, 58 (2), 203–215. https://doi.org/https://doi.org/10.1016/j.molcel.2015.02.029.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.24.614679v3
www.medrxiv.org www.medrxiv.org

Statistical learning shapes pain perception and prediction independently of external cues

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.
  
  Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.
  
  Response to the reviewers
  
  Reviewer 1:
  
  Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.
  
  Initial reply: Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.
  
  Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.
  
  Revised reply: We clarified our modelling choices in the ”Modelling strategy” subsection of the results section.
  
  Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?
  
  Initial reply: We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:
  
  •    Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)
  
  •    Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)
  
  •    Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).
  
  •    Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)
  
  Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their *subjective* feelings. It might have been better to query participants about perceived stimulus intensity levels. This perspective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”
  
  Initial reply: Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.
  
  The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.
  
  Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the relevance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.
  
  Initial reply: The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.
  
  Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.12.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.
  
  Initial reply: Thank you for these suggestions. We will consider restructuring the paper in the revised version.
  
  Revised reply: We restructured introduction, results and parts of the methods. We followed the reviewer’s suggestion regarding enhancing clarity through graphical diagrams. We have visualised the experimental design in Figure 1D. Furthemore, we have visualised the two main computational models (eRL and eKF) in Figure 2, following from Jepma et al. (2018). As a result, we have updated the notation in Section 4.4 to be clearer and consistent with the graphical representation (rename the variable referring to observed thermal input from Ot to Nt).
  
  Reviewer Comment 1.6 — In lines 99-100, the statement ”following the work by [23]” would be more helpful if it included a concise summary of the main concepts from the referenced work.
  
  - It would be helpful to have descriptions of the conditions that Figure 1C is elaborating on.
  
  - In line 364, the ”N {t}” in the sentence ”The observation on trial t, N {t}”, should be O {t}.
  
  Initial reply: Thank you for spotting these and for providing the suggestions. We will include the correction in the revised version.
  
  Revised reply: We have added the following regarding the lines 99-100:
  
  ”We build on the work by [23], who show that pain perception is strongly influenced by expectations as defined by a cue that predicts high or low pain. In contrast to the cue-paradigm from [23], the primary aim of our experiment was to determine whether the expectations participants hold about the sequence itself inform their perceptual beliefs about the intensity of the stimuli.”
  
  See comment in the previous reply, regarding the notation change from Ot to Nt.
  
  Reviewer 2:
  
  Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential implications for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.
  
  Initial reply: Thank you very much for these positive comments.
  
  Reviewer 3:
  
  Summary:
  
  I am pleased to have had the opportunity to review this manuscript, which investigated the role of statistical learning in the modulation of pain perception. In short, the study showed that statistical aspects of temperature sequences, with respect to specific manipulations of stochasticity (i.e., randomness of a sequence) and volatility (i.e., speed at which a sequence unfolded) influenced pain perception. Computational modelling of perceptual variables (i.e., multi-dimensional ratings of perceived or predicted stimuli) indicated that models of perception weighted by expectations were the best explanation for the data. My comments below are not intended to undermine or question the quality of this research. Rather, they are offered with the intention of enhancing what is already a significant contribution to the pain neuroscience field. Below, I highlight the strengths and weaknesses of the manuscript and offer suggestions for incorporating additional methodological details.
  
  Strengths:
  
  The manuscript is articulate, coherent, and skilfully written, making it accessible and engaging.
  
  - The innovative stimulation paradigm enables the exploration of expectancy effects on perception without depending on external cues, lending a unique angle to the research.
  
  - By including participants’ ratings of both perceptual aspects and their confidence in what they perceived or predicted, the study provides an additional layer of information to the understanding of perceptual decision-making. This information was thoughtfully incorporated into the modelling, enabling the investigation of how confidence influences learning.
  
  - The computational modelling techniques utilised here are methodologically robust. I commend the authors for their attention to model and parameter recovery, a facet often neglected in previous computational neuroscience studies.
  
  - The well-chosen citations not only reflect a clear grasp of the current research landscape but also contribute thoughtfully to ongoing discussions within the field of pain neuroscience.
  
  Initial reply: We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.
  
  Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.
  
  Initial reply: Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally transformed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens.
  
  Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.
  
  Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.
  
  Revised reply: We re-plotted Figure 1E-F with a different exemplary participant, whose rating go above the pain threshold. We also included all participant pain perception and prediction ratings, noxious input sequences and confidence ratings in the supplement in Figures S1-S3.
  
  Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.
  
  Initial reply: We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.
  
  Author response image 1.
  
  Stimulis intensity transformation
  
  Revised reply: We clarified our modelling choices in the ”2.2 Modelling strategy” subsection.
  
  Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.
  
  Initial reply: While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.
  
  Revised reply: We elaborated on the significance statements in the ”Modelling Results” subsection:
  
  • We considered at least a 2 sigma effect as indication of a significant difference. In each condition, the expectation weighted models (eKF and eRL) provided better fit than models without this element (KF and RL; approx. 2-4 sigma difference, as reported in Figure 5A-D). This suggests that regardless of the levels of volatility and stochasticity, participants still weigh perception of the stimuli with their expectation.
  
  and in the first paragraph of the Discussion:
  
  • When varying different levels of inherent uncertainty in the sequences of stimuli (stochasticity and volatility), the expectation and confidence weighted models fitted the data better than models weighted for confidence but not for expectations (Figure 5A-D). The expectation-weighted bayesian (KF) model offered a better fit than the expectation-weighted, model-free RL model, although in conditions of high stochasticity this difference was short of significance. Overall, this suggests that participants’ expectations play a significant role in the perception of sequences of noxious stimuli.
  
  We are aware of the limitations and lack of clear guidance regarding using sigma effects to establish significance (as per reviewer’s suggestion: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009). Here we decided to use the above-mentioned threshold of 2-sigma as an indication of significance, but note the potential limitations of the inferences - especially when distinguishing between eRL/eKF models.
  
  Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?
  
  Initial reply: We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.
  
  Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.
  
  Initial reply: It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.
  
  Revised reply: We increased the number of simulations per model pair to ≈ 100 (after rejecting fits based on diagnostics criteria - E-BFMI and divergent transitions) and updated the confusion matrix (Table S4). Although the confusion between eRL and eKF remains, the model recovery shows good distinction between expectation weighted vs non-expectation weighted (and Random) models, which supports our main conclusion in the paper.
  
  Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines significance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.
  
  Initial reply: Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.
  
  Revised reply: We clarify this further, as per our revised response to Comment 3.3 above. We have also added the following statement in section 4.5.1 (Methods, Model comparison): ”There’s no agreed-upon threshold of SEs that determines significance, but the higher the sigma difference, the more robust is the effect.”
  
  Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the xaxis and the recovered parameters on the y-axis would effectively convey this missing information.
  
  Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.
  
  Initial reply: Thanks for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.
  
  Revised reply: We included parameter recovery scatter plots for each model and parameter in the Supplement Figures S7-S11.
  
  Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.
  
  Initial reply: Thank you very much for this suggestion, we will aim to include these measures in the revised version.
  
  Revised reply: We have considered the suggested diagnostics and include bulk and tail ESS values for each condition, model, parameter in the Supplement Tables S6-S9. We also report number of chain with low E-BFMI (0), number of divergent transitions (0) and the E-BFMI values per chain in Table S10.
  
  Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regulation.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”
  
  Initial reply: This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.
  
  Revised reply: We have removed this statement from the revised version.
  
  Reviewer Comment 3.10 — In relation to the comment on model comparison in my public review, I believe the following link may provide further insight and clarify the basis for my observation. It discusses the use of standard error in model comparison and may be useful for the authors in addressing this particular point: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009
  
  Initial reply: Thank you for this suggestion, we will consider the forum discussion in our manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2023.03.23.23287656v3
www.biorxiv.org www.biorxiv.org

A whole-organism landscape of X-inactivation in humans

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer 1 (Public review):
  
  (1) The authors state that they have reclassified the allelic expression status of 32 genes (shown in Table S5, Supplementary Figure 3). The concern is the source of the tissue or cell line which was originally used to make the classification of XCI status, and whether the comparisons are equivalent. For example, if cell lines (and not tissues) were used to define the XCI status for EGFL6, TSPAN6, and CXorf38, then how can the authors be sure that the escape status in whole tissues would be the same? Also, along these lines, the authors should consider whether escape status in previous studies using immortalized/cancer cell lines (such as the meta-analyses done in Balaton publication) would be different compared to healthy tissues (seems like it should be). Therefore, making comparisons between healthy whole tissues and cancer cell lines doesn't make sense.
  
  Indeed, many previous classifications were based on clonal cell lines, which could result in atypical patterns of escape due to the profound and varied effects of adaptation to culture. However, one of the primary goals of our study was to directly determine allele-specific expression from the X-chromosome in healthy primary tissues, in part to exclude the potential confounding effects of cell culture.
  
  Whereas we do perform comparisons with cell culture-based classifications, we also provide detailed comparisons with the previous classification of Tukiainen et al, which also uses primary human tissues. In addition, whereas the comparison with Balaton et al is not optimal, we hold that it is valuable as it reveals which genes may exhibit aberrant escape patterns in culture. Finally, despite the above reservations, our comparison revealed an over-whelming agreement with previous research which suggests that in the vast majority of cases, escape appears to be correctly maintained in culture.
  
  (2) The authors note that skewed XCI is prevalent in the human population, and cite some publications (references 8, 10-12). If RNAseq data is available from these female individuals with skewed XCI (such as ref 12), the authors should consider using their allelic expression pipeline to identify XCI status of more X-linked genes.
  
  Indeed, we completely agree and are in the process of obtaining this data which has proven complex and time-consuming in the currently regulatory environment.
  
  (3) It has been well established that the human inactive X has more XCI escape genes compared to the mouse inactive X. In light of the author's observations across human tissues, how does the XCI status compare with the same tissues in mice?
  
  This is a very interesting point, and a comparison we are currently working on. However, this is a major undertaking and one that is outside of the scope of this study. We do appreciate the differences in mice and humans on X-chromosome level and could only speculate on the overlap being relatively small as the number of escapees in mice has been shown the be far lower than in humans.
  
  Reviewer 2 (Public review):
  
  In my view there are only minor weaknesses in this work, that tend to come about due to the requirement to study individuals with highly skewed X inactivation. I wonder whether the cause of the highly skewed X inactivation may somehow influence the likelihood of observing tissue-specific escape from X inactivation. In this light, it would be interesting to further understand the genetic cause for the highly skewed X inactivation in each of these three cases in the whole exome sequencing data. Future additional studies may validate these findings using single-cell approaches in unrelated individuals across tissues, where there is normal X inactivation.
  
  We thank the reviewer for their positive assessment of our work. This is a point we have and continue to grapple with. We cannot rule out that the genetic cause of complete skewing may influence tissue-specific XCI. Moreover, the genetic cause for the non-mosaic XCI is currently unclear and is likely to vary between individuals, which could also result in inter-individual variation in tissue-specific escape. We are currently performing large prospective studies in the tissues of healthy females to specifically address this point.
  
  Reviewer 3 (Public review):
  
  There are very few, except that this escape catalogue is limited to 3 donors, based on a single(representative) tissue screen in 285 female donors, mostly using muscle samples. However, if only pituitary samples had been screened, nmXCI-1 would have been missed. Additional donors in the 285 representative samples cross a lower threshold of AE = 0.4. It would be worthwhile to query all tissues of the 285 donors to discover more nmXCI cases, as currently fewer than half of X-linked genes received a call using this very worthwhile approach.
  
  We thank the reviewer for their positive assessment of our work. Of course, we agree that a tissue-wide screen in all individuals would have been optimal and is a line of research we are currently pursuing. However, the analysis of allele-specific expression in all 5,000 RNA-seq samples is a massive undertaking and was simply not practicable within the time-scale of this study.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  Thanks to the authors for an interesting manuscript! I enjoyed reading it and the care that has gone into explaining the analyses and the findings. There are a few recommendations that I have for strengthening the work.
  
  We thank the reviewer for the nice feedback. Much appreciated.
  
  (1) I would like to see a genetic analysis of the three individuals, to try and identify the genetic causes of the skewed X inactivation beyond just considering the XIC or translocations. The cause of the highly skewed X inactivation would be of interest to many.
  
  This is certainly a very interesting avenue of research and one that we are currently focusing on. However, in the current study we simply had too few skewed XCI females to assess this in an exhaustive manner. To tackle this issue, we have begun a prospective study of healthy females to identify additional non-mosaic females.
  
  (2) I wonder whether the cause of the skewed XCI may somehow influence the assessment of tissue-specific escape? If there is a problem with X inactivation itself, perhaps escape would also be different, making it appear more constitutive than tissue-specific?
  
  This is a point we have and continue to grapple with. We cannot rule out that the genetic cause of complete skewing may influence tissue-specific XCI. Moreover, the genetic cause for the non-mosaic XCI is currently unclear and is likely to vary between individuals, which could result in inter-individual variation in tissue-specific escape.
  
  (3) Presentation/wording suggestions:
  
  I think the abstract is likely a bit inaccessible to those outside the field. I am in the X inactivation field, but don't use the term non-mosaic X inactivation, but rather would call it highly skewed, or non-random X inactivation. In my view, it would be simpler for the abstract to call non-mosaic XCI highly skewed XCI instead, or to use more words to ensure it is clear for the reader.
  
  We agree that the terminology of completely skewed/non-mosaic XCI could be more clearly defined in the abstract and have clarified this. “Using females that are non-mosaic (completely skewed) for X-inactivation (nmXCI) has proven a powerful and natural genetic system for profiling X-inactivation in humans.”
  
  I would consider calling the always escape genes constitutive escapees, while the variable may be facultative.
  
  This is something we have also considered and have received differing feedback on. However, we will definitely keep this in mind for future publications.
  
  Line 132, it would be useful to explain median >0.475 as less than 2.5% of reads coming from the inactive allele here, not just in the methods. Can you also explain why this cutoff was chosen?
  
  We thank the reviewer for this clarification. A clarification has been added to the main text as suggested.
  
  The cutoff was applied to account for potential variations in skewing, given that we screened only a single tissue sample per individual. Although nmXCI females are theoretically expected to have 0% of reads originating from the 'inactive' allele, this is not always observed due to (a) technical errors such as PCR or sequencing inaccuracies, or (b) differences in skewing between tissue types.
  
  Lines 156-160 describe how the heterozygous SNPs were identified in relation to Figure 2. I read these in the methods so that I could understand Figure 1, so I suggest moving this section up.
  
  We have moved the section as suggested by the reviewer.
  
  Line 156, consider adding in a sentence to describe what is shown in Figures 2A and B i.e, the overlap of SNPs and spread along the X.
  
  We have added a sentence describing what is shown in Figures 2A and 2B as suggested by the reviewer.
  
  Line 217, it would be useful to give the % of genes that show tissue-specific escape, to quantify rare.
  
  We have added a sentence quantifying ‘rare’ at the suggested line.
  
  (4) Typos:
  
  Line 119, missing 'the most' before extensive (and remove an).
  
  We thank the reviewer for pointing this out. This error has been corrected.
  
  Reviewer #3 (Recommendations for the authors):
  
  Some results in the supplementary figures were quite striking. What is going on with DDX3X and ZRSR2? How come total read counts are so different between individuals?
  
  Indeed, this is a very intriguing observation and one that we have simply failed to understand thus far. We are currently performing a large prospective study to obtain greater number of non-mosaic females and tissues samples. Hopefully, additional observations across females will allow us to gain further insights into the inter-individual behaviour of DDX3X and ZRSR2.
  
  One item I would like to see added is some analysis to address the cause of these extremely skewed XCI individuals. The copy number analysis suggests there are some segmental deletions on the X in all three nmXCI cases. Where are these deletions, and do any fall in the region of the X-inactivation centre? Have the authors performed any analysis of potentially deleterious X-linked variants in the WGS or WES data? Why are these donors so skewed? It's interesting that UPIC was still more skewed than the other two.
  
  The segmental deletions the reviewer points out are not segmental deletions, the same variation in coverage is found in all females we’ve looked at including females with a mosaic XCI (see Author response image 1 below where the same pattern of slightly lower read counts is observed at the same sites in all female samples). No deletions were identified in the XIC region. No analysis was performed of deleterious X-linked variants. Why the donors are so skewed is unknown and intriguing. Indeed, identifying the origin of extreme skewing (including the females in this study) is now the main focus of the group. Whereas UPIC had trisomy 17, which has likely resulted in the observed skewing, we have not yet found a genetic variant that could explain the skewing observed in 13PLJ or ZZPU.
  
  Author response image 1.
  
  Copy number as log2 ratio using 500kb bins across the X-chromosome for 3 mosaic XCI females (1QPFJ, OXRO, and RU1J) and 3 nmXCI females, UPIC, nmXCI-1 and nmXCI-2.
  
  This is not necessary to address with new analyses, but as alluded to above, the authors could screen more than a single representative tissue. And to apply this analysis to larger databases (UK biobank), which the authors may be planning to do already.
  
  This an avenue of research we are currently investigating.
  
  The code is well-documented and accessible. Additional information on the manual reclassification (to deal with inflated binomial P-values) would be helpful. Why not require a minimal threshold for escape (10% of active X allele) in addition to a significant binomial P (inactive X exp. > 2.5% of active)?
  
  We thank the reviewer for this positive assessment of the code.
  
  Indeed, how to define ‘escape’ is a vexed issue, and one we feel has been given undue weight within the field. In reality, studies of escape are often dealing with sparse data (e.g. read depth), few observations (genes and individuals) and substantial amounts of missing data. Thus, it is unlikely that a standard statistical approach will be sensitive and specific across different studies and data types. Similarly, cut-offs, though useful would also need to be adjusted to the data type and quality in any given study.
  
  Whereas we initially used a significant binomial P-value as our sole test (often quoted as ‘best practice’), this resulted in wide-spread inflation of P-values. Thus, we switched to manually curating the allelic expression status of all 380 genes using the empirical guideline of allelic ratio >0.4 (also a commonly used cut-off) as indicating mono-allelic expression. We considered combining the binomial P-value with the cut-off but felt that this would result in an overly complex definition of escape and would unnecessarily exclude many genes from classification, due to the opposing effects of low/high read depth on the binomial and cut-off approaches respectively.
  
  Indeed, due to the difficultly of both accurate and objective ‘classification’ of escape that we placed an emphasis on clearly displaying all data for each gene in each individual to allow readers to see all the data on which each classification was based.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.26.546519v3
www.biorxiv.org www.biorxiv.org

Structural features of heteromeric channels composed of CALHM2 and CALHM4 paralogs

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank both reviewers for their supportive comments. Reviewer 1 has suggested a different data processing strategy to better resolve subunits at the CALHM4/CALHM2 interface:
  
  I recommend an alternative data processing strategy. First, refine particles with 2-4 CALHM4 subunits with symmetry imposed. This is followed by symmetry expansion, signal subtraction of two adjacent subunits, and subsequent classification and refinement of the subtracted particles. This approach, while not guaranteed, can potentially provide a clearer definition of CALHM2 and CALHM4 interfaces and show whether CALHM2 subunits adopt different conformations based on their proximity to CALHM4 subunits.
  
  We have followed the recommended strategy in an attempt to improve the resolution and better resolve the structural heterogeneity in CALHM2/4 channels. To this end, we have combined symmetry expansion and partial signal subtraction, as suggested by the reviewer. Initially, a symmetrized (C11) 3.4 Å consensus map of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 was used. The particles of this reconstruction were subjected to symmetry expansion (C11) followed by signal subtraction of nine adjacent subunits. Next, we performed focused, alignment-free 3D classification of the remaining two subunits followed by refinement of these classes, leading to the classification of CALHM subunit pairs. The majority of the classes feature well-resolved CALHM2 pairs, consistent with the original approach (Author response image 1A). A minority of the classes contain CALHM4 subunits, revealing heterogeneity similar to regions of CALHM4 subunits observed in the non-symmetrized channel reconstruction (Author response image 1B). Unfortunately, this approach thus did not improve resolution or facilitate a more accurate subunit assignment. Consequently, we decided not to include these attempts in our manuscript. The resubmitted version thus contains only small corrections compared to the previous version.
  
  Author response image 1.
  
  Classification of subunit pairs of undecameric CALHM2/4 channels bound to sybodies SbC2 and SbC4 after the processing combining symmetry expansion and partial signal subtraction. (A) Classes showing CALHM2 subunit pairs. (B) Classes showing subunits at interfaces to CALHM4.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.18.576238v2
www.biorxiv.org www.biorxiv.org

An atypical basement membrane forms a midline barrier during left-right asymmetric gut development in the chicken embryo

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  Summary:
  
  Left-right asymmetry in the developing embryo is important for establishing correct lateralisation of the internal organs, including the gut. It has been shown previously that the dorsal mesentery (DM), which supports looping of the endodermal gut tube during development, is asymmetric with sharp delineation of left and right domains prior to gut looping. The authors set out to investigate the nature of the midline barrier that separates the left and right sides of the DM. They identify a transient basement membrane-like structure which is organised into two layers between the notochord and descending endoderm. In the time window when this basement membrane structure exists, there is no diffusion or cell mixing between the left and right sides of the DM, but once this structure starts breaking down, mixing and diffusion occur. This suggests it acts as a barrier, both physical and chemical, between left and right at the onset of gut lateralisation.
  
  Strengths:
  
  The authors identify a new midline structure that likely acts as a barrier to facilitate left and right separation during early organogenesis. This is an interesting addition to the field of laterality, with relevance to laterality-related disorders including heterotaxia, and may represent a gut-specific mechanism for establishing and maintaining early left-right asymmetry. The structure of this midline barrier appears to be an atypical basement membrane, comprising two adjacent basement membranes. The complexities of basement membrane assembly, maintenance, and function are of importance in almost all organismal contexts. Double basement membranes have been previously reported (for example in the kidney glomeruli as the authors note), and increasing evidence suggests that atypical basement membrane organisation or consideration is likely to be more prevalent than previously appreciated. Thus this work is both novel and broadly interesting.
  
  The data presented are well executed, using a variety of well-established methods. The characterisation of the midline barrier at the stages examined is extensive, and the data around the correlation between the presence of the midline barrier and molecular diffusion or cell mixing across the midline are convincing.
  
  Weaknesses:
  
  The study is rather descriptive, and the authors' hypotheses around the origins of the midline barrier are speculative and not experimentally demonstrated. While several potential origins of the midline are excluded raising interesting questions about the timing and cell-type-specific origin of the midline basement membrane, these remain unanswered which limits the scope of the paper.
  
  We extend our appreciation to Reviewer #1 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to our work. We agree that functional data would significantly strengthen our understanding of the midline barrier and its exact role during LR asymmetric gut development. However, we would like to note that repeated and diligent attempts to perturb this barrier were made using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation) but we observed no significant effect or stable disruption of the midline. We acknowledge and accept this limitation and hope that our discovery will invite future investigations and perturbation of this novel midline structure.
  
  For example, it is unclear whether the two basement membranes originally appear to be part of a single circular/spherical structure (which looks possible from the images) that simply becomes elongated, or whether it is indeed initially two separate basement membranes that extend.
  
  We favor the hypothesis that the elongation of the preexisting small circular structure to an extended double membrane of relatively increased length would be unlikely without continued contribution of new basement membrane components. However, our attempts to label and trace the basement membrane of the endoderm using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). As such, it remains difficult to differentiate between the two possibilities suggested. We also believe this is an important question and will continue to investigate methods to trace it.
  
  There is a substantial gap between the BMs at earlier stages before the endoderm has descended - is this a lumen, or is it filled with interstitial matrix?
  
  Our preliminary studies indicate that the gap enclosed by the basement membranes in the early midline structure does have extracellular matrix present, such as fibrillin-2 (see Author response image 1). Also, the electron microscopy shown in Fig. 2 C’’ supports that the space between the notochord and endoderm has fibrillar matrix.
  
  Author response image 1.
  
  The authors show where this basement membrane does not originate from, but only speculate on its origin. Part of this reasoning is due to the lack of Lama1-expressing cells either in the early midline barrier before it extends, or in the DM cells adjacent to it. However, the Laminin observed in the midline could be comprised of a different alpha subtype for example, that wasn't assessed (it has been suggested that the Laminin antibody used in this study is not specific to the alpha-1 subunit, see e.g. Lunde et al, Brain Struct Funct, 2015).
  
  We appreciate this comment and have tried other laminin RNA probes that showed similar lack of midline expression (Lama1, lama3, lama5). Importantly, the laminin alpha 1 subunit is a component of the laminin 111 heterotrimer, which along with laminin 511 is the first laminin to be expressed and assemble in embryonic basement membranes, as reviewed in Yurchenco 2011. Laminin 111 is particularly associated with embryonic development while laminins 511/521 become the most widespread in the adult (reviewed in Aumailley 2013). It is likely that the midline contains laminin 111 based on our antibody staining and the accepted importance and prevalence of laminin 111 in embryonic development. However, it is indeed worth noting that most laminin heterotrimers contain beta 1, gamma 1, or both subunits, and due to this immunological relation laminin antibody cross reactivity is certainly known (Aumailley 2013). As such, while laminin 511 remains a possibility as a component of the midline BM, our lama5 in situs have shown no differential expression at the midline of the dorsal mesentery (see Author response image 2), and as such we are confident that our finding of no local laminin transcription is accurate. Additionally, we will note that the study referenced by the Reviewer observed cross reactivity between the alpha 1 and alpha 2 subunits. Laminin 211/221 is an unlikely candidate based on the embryonic context, and because they are primarily associated with muscle basement membranes (Aumailley 2013). In further support, we recently conducted a preliminary transcriptional profile analysis of midline cells isolated through laser capture microdissection (LCM), which revealed no differential expression of any laminin subunit at the midline. Please note that these data will be included as part of a follow-up story and falls beyond the scope of our initial characterization.
  
  Author response image 2.
  
  Similarly, the authors show that the midline barrier breaks down, and speculate that this is due to the activity of e.g. matrix metalloproteinases, but don't assess MMP expression in that region.
  
  This is an important point, as the breakdown of the midline is unusually rapid. Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 (and TS9) at HH19-21 indicates no differential activity at the midline (see Author response images 3 and 4). Our future focus will be on identifying a potential protease that exhibits differential activity at the midline of the DM.
  
  Author response image 3.
  
  Author response image 4.
  
  The authors suggest the (plausible) hypothesis that the descent of the endoderm pulls or stretches the midline barrier out from its position adjacent to the notochord. This is an interesting possibility, but there is no experimental evidence to directly support this. Similarly, while the data supporting the barrier function of this midline is good, there is no analysis of the impact of midline/basement membrane disruption demonstrating that it is required for asymmetric gut morphogenesis. A more functional approach to investigating the origins and role of this novel midline barrier would strengthen the study.
  
  Yes, we fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations. We again thank Reviewer #1 for the detailed feedback on our manuscript, guidance, and the time taken to provide these comments.
  
  Recommendations For The Authors:
  
  Using Laminin subunit-specific antibodies, or exploring the mRNA expression of more laminin subunits may support the argument that the midline does not derive from the notochord, endoderm, or DM.
  
  As mentioned above, RNA in situ hybridization for candidate genes and a preliminary RNA-seq analysis of cells isolated from the dorsal mesentery midline revealed no differential expression of any laminin subunits.
  
  Similarly, expression analysis of Laminin-degrading MMPs, and/or application of an MMP inhibitor and assessment of midline integrity could strengthen the authors' hypothesis that the BM is actively and specifically broken down.
  
  Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 at HH19-21shows no differential expression pattern at the midline of the DM (see Author response image 3). We have not included these data in the revision, but future work on this topic will aim at identifying a protease that is differentially active at the midline of the DM.
  
  Functionally testing the role of barrier formation in regulating left-right asymmetry or the role of endoderm descent in elongating the midline barrier would be beneficial. Regarding the former, the authors show that Netrin4 overexpression is insufficient to disrupt the midline, but perhaps overexpression of e.g. MMP9 prior to descent of the endoderm would facilitate early degradation of the midline, and the impact of this on gut rotation could be assessed.
  
  Unfortunately, MMP9 electroporation has produced little appreciable effect. We acknowledge that the lack of direct evidence for the midline’s role in regulating left-right asymmetry is a shortcoming, but current work on this subject aims to define the midline’s function to LR asymmetric morphogenesis.
  
  Reviewer #2:
  
  When the left-right asymmetry of an animal body is established, the barrier that prevents the mixing of signals or cells across the midline is essential. The midline barrier that prevents the mixing of asymmetric signals during the patterning step has been identified. However, a midline barrier that separates both sides during asymmetric organogenesis is unknown. In this study, the authors discovered the cellular structure that seems to correspond to the midline in the developing midgut. This midline structure is transient, present at the stage when the barrier would be required, and composed of Laminin-positive membrane. Stage-dependent diffusion of dextran across the midline (Figure 6) coincides with the presence or absence of the structure (Figures 2, 3). These lines of indirect evidence suggest that this structure most likely functions as the midline barrier in the developing gut.
  
  We extend our gratitude to Reviewer #2 for their thoughtful assessment of our research and for taking the time to provide these constructive comments. We are excited to report that we have now included additional new data on midline diffusion using BODIPY and quantification method to further support our findings on the midline's barrier function. While our data on dextran and now BODIPY both indirectly suggests barrier function, we aspire to perturb the midline directly to assess its role in the dorsal mesentery more conclusively. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Moving forward, our focus is on identifying an effective means of perturbation that can offer direct evidence of barrier function.
  
  Recommendations For The Authors:
  
  (1) It would be much nicer if the requirement of this structure for asymmetric morphogenesis was directly tested. However, experimental manipulations such as ectopic expression of Netrin4 or transplantation of the notochord were not able to influence the formation of this structure (these results, however, suggested the mechanism of the midline formation in the gut dorsal mesentery). Therefore, it seems not feasible to directly test the function of the structure, and this should be the next issue.
  
  We fully agree that the midline will need to be perturbed to fully elucidate its role in asymmetric gut morphogenesis. As noted, multiple attempts were ineffective at perturbing this structure. Extensive current work on this topic is dedicated to finding an effective perturbation method.
  
  (2) Whereas Laminin protein was present in the double basement membrane at the midline, Laminin mRNA was not expressed in the corresponding region (Fig. 4A-C). It is necessary to discuss (with experimental evidence if available) the origin of Laminin protein.
  
  As we have noted, the source of laminin and basement membrane components for the midline remains unclear - no local transcription and the lack of sufficiency of the notochord to produce a midline indicates that the endoderm to be a likely source of laminin, as we have proposed in our zippering endoderm model. We will note that Fig. 4A-C indicate that laminin is in fact actively transcribed in the endoderm. Currently, attempts to trace the endodermal basement membrane using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). Confirmation of our proposed endodermal origin model is a goal of our ongoing work.
  
  (3) Figure 4 (cell polarity from GM130 staining): addition of representative GM130 staining images for each Rose graph (Figure 4E) would help. They can be shown in Supplementary Figures. Also, a graph for the right coelomic epithelium in Fig. 4E would be informative.
  
  We have added the requested GM130 images in our Supplemental Figures (please refer to Fig. S4ABB’) and modified the main Fig. 4E to include a rose graph for the polarity of the right coelomic epithelium.
  
  (4) Histological image of HH19 DM shown in Fig. 2J looks somehow different from that shown in Fig. 3F. Does Fig. 2J represent a slightly earlier stage than Fig. 3F?
  
  Figure 2J and Figure 3F depict a similar stage, although the slight variation in the length of the dorsal mesentery is attributed to the pseudo time phenomenon illustrated in Figure 3J-J’’’. This implies that the sections in Figure 2J and Figure 3F might originate from slightly different positions along the anteroposterior axis. Nonetheless, these distinctions are minimal, and based on the dorsal mesentery's length in Figure 2J, the midline is likely extremely robust regardless of this minor pseudo time difference.
  
  Reviewer #3:
  
  Summary:
  
  The authors report the presence of a previously unidentified atypical double basement membrane (BM) at the midline of the dorsal mesentery (DM) during the establishment of left-right (LR) asymmetry. The authors suggest that this BM functions as a physical barrier between the left and the right sides of the DM preventing cell mixing and ligand diffusion, thereby establishing LR asymmetry.
  
  Strengths:
  
  The observation of the various components in the BM at the DM midline is clear and convincing. The pieces of evidence ruling out the roles of DM and the notochord in the origin of this BM are also convincing. The representation of the figures and the writing is clear.
  
  Weaknesses:
  
  The paper's main and most important weakness is that it lacks direct evidence for the midline BM's barrier and DM LR asymmetry functions.
  
  We thank Reviewer #3 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to assessing our study. We fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, several distinct attempts at perturbing this barrier have encountered technical obstacles. While our laboratory routinely perturbs the left and right compartments of the DM via DNA electroporation and other techniques, directly perturbing the midline using these methods is far more challenging. We have made diligent attempts to address this using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). However, we have not yet been able to identify a means of producing consistent and interpretable perturbation of the midline. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations.
  
  Recommendations For The Authors:
  
  Major:
  
  (1) We suggest the authors test their hypotheses i.e., physical barrier and proper LR asymmetry establishment by the midline BM, by disrupting it using techniques such as physical ablation, over-expression of MMPs, or treatment with commercially available enzymes that digest the BM.
  
  As above, efforts involving physical ablation and MMP overexpression have not yielded significant effects on the midline thus far. Moving forward, investigating the midline's role in asymmetric morphogenesis will necessitate finding a method to perturb it effectively. In pursuit of progress on this critical question, we recently conducted laser capture microdissection (LCM) and RNA-sequencing of the midline to unravel the mechanisms underlying its formation and potential disruption. This work shows promise but it is still in its early stages; validating it will require significant time and effort, and it falls outside the scope of the current manuscript.
  
  (2) Lefty1's role in the midline BM was ruled out by correlating lack of expression of the gene at the midline during HH19 when BM proteins expression was observed. Lefty1 may still indirectly or directly trigger the expression of these BM proteins at earlier stages. The only way to test this is by inhibiting lefty1 expression and examining the effect on BM protein localization.
  
  We have added a section to discuss the potential of Lefty1 inhibition as a future direction. However, similar to perturbing global Nodal expression, interpreting the results of Lefty1 inhibition could be challenging. This is because it may not specifically target the midline but could affect vertebrate laterality as a whole. Despite this complexity, we acknowledge the value of such an experiment and consider it worth pursuing in the future.
  
  (3) Using a small dextran-based assay, the authors conclude that diffusible ligands such as cxcl2 and bmp4 do not diffuse across the midline (Figure 6). However, dextran injection in this system seems to label the cells, not the extracellular space. The authors measure diffusion, or the lack thereof, by counting the proportion of dextran-labeled cells rather than dextran intensity itself. Therefore, This result shows a lack of cell mixing across the midline (already shown in Figure 2 ) rather than a lack of diffusion.
  
  We should emphasize that the dextran-injected embryos shown in Fig. 6 D-F were isolated two hours post-injection, a timeframe insufficient for cell migration to occur across the DM (Mahadevan et al., 2014). We also collected additional post-midline stage embryos ten minutes after dextran injections - too short a timeframe for significant cellular migration (Mahadevan et al., 2014). Importantly, the fluorescent signal in those embryos was comparable to that observed in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM when the barrier starts to fragment (HH20-HH23) is unlikely to represent cell migration. More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated substantial cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Collectively, our experiments suggest that the dextran signal we observed at HH20 and HH23 is likely not driven by cell mixing.
  
  To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY diffusion and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.
  
  (4) Moreover, in a previous study (Mahadevan et al., Dev Cell., 2014), cxcl2 and bmp4 expression was observed on both the left and right side before gut closure (HH17, when midline BM is observed). Then their expression patterns were restricted on the left or right side of DM at around HH19-20 (when midline BM is dissociated). The authors must explain how the midline BM can act as a barrier against diffusible signals at HH-17 to 19, where diffusible signals (cxcl12 and bmp4) were localized on both sides.
  
  We appreciate the Reviewer's invitation to clarify this crucial point. Early in dorsal mesentery (DM) formation, genes like Cxcl12 (Mahadevan et al., Dev Cell 2014) and Bmp4 (Sanketi et al., Science 2021) exhibit symmetry before Pitx2 expression initiates on the left (around ~HH18, Sanketi et al., 2021). Pitx2 then inhibits BMP4 (transcription) and maintains Cxcl12 (mRNA) expression on the left side. The loss of Cxcl12 mRNA on the right is due to the extracellular matrix (ECM), particularly hyaluronan (Sivakumar et al., Dev Cell 2018). Our hypothesis is that during these critical stages of initial DM asymmetry establishment, the midline serves as a physical barrier against protein diffusion to protect this asymmetry during a critical period of symmetry breaking. Although some genes, such as Pitx2 and Cxcl12 continue to display asymmetric transcription after midline dissolution (Cxcl12 becomes very dynamic later on – see Mahadevan), it's crucial to note that the midline's primary role is preventing protein diffusion across it, akin to an insurance policy. Thus, the absence of the midline barrier at HH21 does not result in the loss of asymmetric mRNA expression. We think its primary function is to block diffusible factors from crossing the midline at a critical period of symmetry breaking. We acknowledge that confirming this hypothesis will necessitate experimental disruption of the midline and observing the consequent effects on asymmetry in the DM. This remains central to our ongoing research on this subject.
  
  (5) On page 11, lines 15-17, the authors mention that "We know that experimentally mixing left and right signals is detrimental to gut tilting and vascular patterning-for example, ectopic expression of pro-angiogenic Cxcl12 on the right-side results in an aberrant vessel forming on the right (Mahadevan et al., Dev Cell., 2014)". In this previous report from the author's laboratory, the authors suggested that ectopic expression of cxcl12 on the right side induced aberrant formation of the vessel on the right side, which was formed from stage HH17, and the authors also suggested that the vessel originated from left-sided endothelial cells. If the midline BM acts as a barrier against the diffusible signal, how the left-sided endothelial cells can contribute to vessel formation at HH17 (before midline BM dissociation)?
  
  To address this point, we suggest directing the Reviewer to previously published supplemental movies of time-lapse imaging, which clearly illustrate the migration path of endothelial cells from left to right DM (Mahadevan et al., Dev Cell 2014). While the Reviewer correctly notes that ectopic induction of Cxcl12 on the right induces left-to-right migration, it's crucial to highlight that these cells never cross the midline. Instead, they migrate immediately adjacent to the tip of the endoderm (please also refer to published Movies S2 and S3). We observe this migration pattern even in wild-type scenarios during the loss of the endogenous right-sided endothelial cords, where some endothelial cells from the right begin slipping over to the left around HH19-20 (over the endoderm), as the midline is beginning to fragment, but never traverse the midline. We attribute this migration pattern to a dorsal-to-ventral gradient of left-sided Cxcl12 expression, as disrupting this pattern perturbs the migration trajectory (Mahadevan).
  
  6) It is unclear how continuous is the midline BM across the anterior-posterior axis across the relevant stages. Relatedly, it is unclear how LR segregated the cells are, across the anterior-posterior axis across the relevant stages.
  
  We refer the reviewer to Fig. 3J-K, in which the linear elongation of the midline basement membrane structure is shown and measured at HH19 in three embryos from the posterior of the embryo to the anterior point at which the midline is fragmented and ceases to be continuous. Similarly, Fig. S2 shoes the same phenomenon in serial sections along the length of the anterior-posterior (AP) axis at HH17, also showing the continuity of the midline. All our past work at all observed sections of the AP axis has shown that cells do not move across the midline as indicated by electroporation of DNA encoding fluorescent reporters (Davis et al. 2008, Kurpios et al. 2008, Welsh et al. 2013, Mahadevan et al. 2014, Sivakumar et al. 2018, Sanketi et al. 2022), and is shown again in Fig. 2 E-H. As noted previously, very few endothelial cells cross the midline at a point just above the endoderm (image above) when the right endothelial cord remodels (Mahadevan et al. 2014), but this is a limited phenomenon to endothelial cells and cells of the left and right DM are fully segregated as previously established.
  
  Minor comments:
  
  (1) The authors found that left and right-side cells were not mixed with each other even after the dissociation of the DM midline at HH21 (Fig2 H). And the authors also previously mentioned that N-cadherin contributes to cell sorting for left-right DM segregation (Kurpios et al., Proc Natl Acad Sci USA., 2008). It could be a part of the discussion about the difference in tissue segregation systems before or after the dissociation of DM midline.
  
  We appreciate this thoughtful suggestion. N-cadherin mediated cell sorting is key to the LR asymmetry of the DM and gut tilting, and we believe it underlies the observed lack of cell mixing from left and right DM compartments after the midline fragments. We have added a brief section to the discussion concerning the asymmetries in N-cadherin expression that develop after the midline fragments.
  
  (2) Please add the time point on the images (Fig3 C, D, Fig 6A and B)
  
  We have updated these figures to provide the requested stage information.
  
  (3) The authors suggested that the endoderm might be responsible for making the DM BM midline because the endoderm links to DM midlines and have the same resistance to NTN4. The authors mentioned that the midline and endoderm might have basement membranes of the same "flavor." However, perlecan expression was strongly expressed in the midline BM compared with the endodermal BM. It could be a part of the discussion about the difference in the properties of the BM between the endoderm and DM midline.
  
  Perlecan does indeed localize strongly to the endoderm as well as the midline. The HH18 image included in prior Fig. S3 B’, B’’ appears to show atypically low antibody staining in the endoderm for all membrane components. Perlecan is an important component for general basement membrane assembly, and the bulk of our HH18 and HH19 images indicate strong staining for perlecan in both midline and endoderm. Perlecan staining at the very earliest stages of midline formation also indicate perlecan in the endoderm as well, supporting the endoderm as a potential source for the midline basement membrane. We have updated Fig. S3 to include these images in our revision.
  
  (4) The authors investigated whether the midline BM originates from the notochord or endoderm, but did not examine a role for endothelial cells and pericytes surrounding the dorsal aorta (DA). In Fig S1, Fig S2, and FigS3, the authors showed that DA is very close to the DM midline basement membrane, so it is worth checking their roles.
  
  We fully agree that the dorsal aorta and the endothelial cords that originate from the dorsal aorta may interact with the midline in important ways. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Additionally, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in DiRusso et al., 2017). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction.
  
  Reviewer #4 (Recommendations For The Authors):
  
  Major comments:
  
  (1) The descending endoderm zippering model for the formation of the midline lacks evidence.
  
  We have attempted to address this issue by introducing several tagged laminin constructs (LAMB1-GFP, LAMB1-His, LAMC1-His), and more recently tagged nidogen plasmids (NID1-GFP and NID1-mNG) to the endoderm via DNA electroporation to try to label the source of the basement membrane. Production of the tagged components occurred but no export was observed in any case (despite extensive collaboration with experts in this area, Drs. Dave Sherwood and Peter Yurchenco). This experiment was further complicated by the necessary large size of these constructs at 10-11kb due to the size of laminin subunit genes, resulting in low electroporation efficiency. We also believe this is an important question and are continuing to investigate methods to trace it.
  
  The midline may be Ntn4 resistant until it is injected in the source cells.
  
  Ntn4 has been shown to disrupt both assembling and existing basement membranes (Reuten et al. 2016). Thus, we feel that the midline and endodermal basement membranes’ resistance to degradation is not determined by stage of assembly or location of secretion.
  
  Have you considered an alternative origin from the bilateral dorsal aorta or the paraxial mesoderm, which would explain the double layer as a meeting of two lateral tissues? The left and right paraxial mesoderm seem to abut in Fig. S1B-C and S2E, and is laminin-positive in Fig 4A'. What are the cells present at the midline (Fig.4D-E)? Are they negative for the coelomic tracing, paraxial or aortic markers?
  
  We fully agree that alternate origins of the midline basement membrane cannot be ruled out from our existing data. We agree and have considered the dorsal aorta and even the endothelial cords that originate from the dorsal aorta. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Importantly, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in Hallmann et al. 2005). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Note in Fig. 3 E-H that our laminin alpha 1 antibody staining does not label the aortae. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction. Moreover, at the earliest stages of midline basement membrane emergence, the dorsal aortae are distant from the nascent basement membrane, as are the somites, which have not yet undergone any epithelial to mesenchymal transition. Fig. S2G provides an example of an extremely early midline basement membrane without dorsal aorta or somite contact. S2G is from a section of the embryo that is fairly posterior in the embryo, it is thus less developed in pseudo-time and gives a window on midline formation in very early embryos.
  
  (2) The importance of the midline is inferred from previously published data and stage correlations but will require more direct evidence. Can the midline be manipulated with Hh signaling or MMPs?
  
  We agree that direct evidence in the form of midline perturbation will be critically required. As previously noted, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Targeting Hh signaling between the endoderm and notochord is a good idea and we will continue these efforts. Thanks very much.
  
  Minor comments:
  
  - Please add the species in the title.
  
  We have altered the title as follows: “An atypical basement membrane forms a midline barrier during left-right asymmetric gut development in the chicken embryo.”
  
  - The number of observations in Fig2, Fig3A-B, 4A-C, G-H, S1, S3 is lacking.
  
  We have added the requested n numbers of biological replicates to the legends of the specified figures.
  
  - Please annotate Fig 3J to show what is measured in K.
  
  We have modified Fig. 3J to include a dashed bar indicating the length measurements in Fig. 3K.
  
  - Please provide illustrations of Fig 4E.
  
  We have added a representative image of GM130 staining to the supplement.
  
  - If laminin gamma is the target of Ntn4, its staining would help interpret the results of Ntn4 manipulation. Is laminin gamma present in different proportions in the different types of basement membranes, underlying variations in sensitivity?
  
  Laminin is exported as a heterotrimer consisting of an alpha, beta, and gamma subunit. Laminin gamma is therefore present in equal proportions to other laminins in all basement membranes with a laminin network. Several gamma isoforms do exist, but only laminin gamma 1 will bind to laminin alpha 1, which we use throughout this paper to mark the midline as well as nearby basement membranes that are sensitive to Ntn4 disruption. Thus, gamma laminin proportions or isoforms are unlikely to underlie the resistance of the midline and endodermal basement membranes to Ntn4 (reviewed in Yurchenco 2011).
  
  - Please comment: what is the red outline abutting the electroporated DM on the left of Fig5B?
  
  The noted structure is the basement membrane of the nephric duct – we added this information to Fig. 5B image and legend.
  
  - The stage in Fig 6A-B is lacking.
  
  We have added the requested stage information to Fig. 6.
  
  - Please comment on whether there is or is not some cell mixing Fig 2H, at HH21 after the midline disappearance. Is it consistent with Fig. 6E-F which labels cells?
  
  More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated dorsal mesentery cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Cell mixing does not occur even after midline disappearance, most likely due to asymmetric N-cadherin expression on the left side of the DM (Kurpios et al., 2008). The sparse, green-labeled cells observed on the right side in Fig. 2H are likely a result of DNA electroporation - the accuracy of this process relies on the precise injection of the left (or right) coelomic cavity (precursor to the gut mesenchyme including the DM) and subsequent correct placement of the platinum electrodes.
  
  Based on these data, we strongly feel that cellular migration is not responsible for the pattern of dextran observed in Fig. 6E-F, especially in light of the N-cadherin mediated segregation of left and right. We will also note that there is no significant difference between dextran diffusion at HH19 and HH20, only a trend towards significance. Additionally, we would like to note that the dextran-injected embryos were isolated two hours post-injection, which we do not believe is sufficient time for any cell migration to occur across the DM. We also collected additional post-midline stage embryos ten minutes after dextran injections (data not shown), too short a timeframe for significant cellular migration, and the fluorescent signal in those embryos was comparable to that represented in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM observed when the barrier starts to fragment at HH20 and HH23 is unlikely to represent movement of cells.
  
  To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.
  
  - 'independent of Lefty1': rephrase or show the midline phenotype after lefty1 inactivation.
  
  We agree with this comment and have rephrased this section to indicate the midline is present “at a stage when Lefty1 is no longer expressed at the midline.”
  
  We again would like to extend our sincere gratitude to our reviewers and the editors at eLife for their dedicated time and thorough evaluation of our paper. Their meticulous attention to detail and valuable insights have strengthened our data and provided further support for our findings.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.15.553395v2
www.biorxiv.org www.biorxiv.org

Supralinear dendritic integration in murine dendrite-targeting interneurons

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).
  
  In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.
  
  The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.
  
  While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.
  
  The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.
  
  I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).
  
  We thank the reviewer for the evaluation and address the detailed points below.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.
  
  Strengths:
  
  (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.
  
  (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.
  
  Weaknesses:
  
  It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.
  
  We are grateful to the reviewer for the critique. Indeed, the degree to which clustered inputs belonging to a functional neuronal assembly occur on interneuron dendrites is an open question. However, Chen et al (2013, Nature 499:295-300) reported that dendritic domains of PV-positive interneurons in visual cortex, unlike their somata, exhibit calcium transients in vivo which are highly tuned to stimulus orientation. This suggests that clustered inputs to dendritic segments may well belong to functional assemblies, much as in principal cells (e.g. Wilson et al, 2016, Nature Neuroscience 19:1003–1009; Iacaruso et al, 2017, Nature 547;449–452). In our earlier work reporting NMDAR-dependent supralinear summation of glutamate uncaging-evoked responses at a subset of dendrites on PV-positive interneurons, we demonstrated how this arrangement in an oscillating feedback circuit could be exploited to stabilise neuronal assemblies.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.
  
  Strengths:
  
  Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.
  
  Weaknesses:
  
  (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.
  
  We thank the reviewer for the helpful comments.
  
  We did not actually claim that EPSP amplitude has no role in determining the magnitude of non-linearity: “Among possible sources of variability for voltage supralinearity, we did not observe a systematic dependence on the average amplitude of individual uEPSPs […] (Fig. S2)”. Whilst we fully agree that, at first sight, a positive dependence of supralinearity on uEPSP amplitude might be expected simply from the voltage-dependent kinetics of NMDARs, there are two main reasons why this could have been obscured. First, the expected relationship is non-monotonic, because with large local depolarizations the driving force collapses, as seen in the overall sigmoid shape of the average relationship between the scaled observed response and arithmetic sum (e.g. Figs 2a & c; 4c & e). Therefore, we would arguably expect a parabolic relationship rather than a simple positive slope relating the degree of supralinearity to the average amplitude of individual uEPSPs. Second, given that the uncaging distance varied substantially, the average amplitudes of the individual uEPSPs recorded at the soma would have undergone different degrees of electrotonic attenuation and further distortion by active conductances before they were measured. Ultimately, the plots in Fig. S2 show too much scatter to be able to exclude a positive or parabolic relationship of nonlinearity to uEPSP amplitude. To avoid misunderstanding, we have changed the sentence in the Results that refers to Fig. S2 to: “Among possible sources of variability for voltage supralinearity, we did not observe a significant monotonic dependence on the average amplitude of individual uEPSPs, distance from the uncaging location along the dendrite to the soma, [or] the dendrite order (Fig. S2)”.
  
  As for the relative contributions of NMDARs and AMPARs, voltage clamp recordings from both neurogliaform and OLM interneurons have already been reported, with the conclusion that neurogliaform cells exhibit relatively larger NMDAR-mediated currents (e.g. Chittajallu et al. 2017; Booker et al. 2021; Mercier et al. 2022), entirely in keeping with the conclusions of our study. Repeating these measurements would add little to the study. Furthermore, because the mean baseline uEPSP amplitude was <0.5 mV (Fig S2), it would be difficult to obtain reliable meaurements of isolated NMDAR-mediated uEPSCs.
  
  Turning to the high variability of supralinearity, indeed, the 95% confidence interval for the data in Fig. 2d is 73%, 213%. This degree of variability is consistent with the wide range of NMDAR/AMPAR ratios reported by Chittajallu et al. 2017 (their Fig. 1g), compounded by the expected non-monotonic relationship alluded to above.
  
  (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.
  
  We have plotted the degree of nonlinearity against dendritic diameter for neurogliaform cells (under baseline conditions and in D-AP5) in Fig S2h-k. We did not observe any significant linear correlations, other than between amplitude nonlinearity and dendrite diameter post D-AP5. This does not negate the possibility that the significant difference in average dendritic diameters between neurogliaform and OLM cells contributes to differences in impedance (which we have rephrased as “Among possible explanations is that the local dendritic impedance is greater in neurogliaform cells, lowering the threshold for recruitment of regenerative currents”).
  
  (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.
  
  The difference in average dendritic diameters between OLM and neurogliaform cells is highly significant (Fig. 8q, P<0.001). We do not claim that dendritic diameter (and by implication local impedance) is the only determinant of the degree of non-linearity. The suggestion that a gradient of glutamate receptor density contributes is interesting. However, the results of uncaging experiments targeting more distal OLM dendrites of similar diameter as neurogliaform dendrites would be subject to numerous confounds, not least the very different electrotonic attenuation, likely differences in various active conductances, and the presence of spines in OLM dendrites (which are generally sparse and were not reliably imaged in our experiments). Moreover, the cell would have to remain patched for longer in order for the fluorescent dyes to invade the distal dendrites. This alone could potentially result in systematic biases among groups. We now cite Guirrado et al (2014) and Gramuntell et al (2021) to highlight that factors other than dendritic diameter per se, such as inhomogeneity in spine and NMDA receptor density may also contribute to the heterogeneity of nonlinear summation in OLM cells.
  
  (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.
  
  The same response to point 1) above applies to OLM cells. As with neurogliaform cells, mean OLM baseline asynchronous (separate response) amplitudes were <0.5 mV, making it very difficult to record an isolated NMDAR-mediated uEPSC. Having said that, NMDARs do contribute to EPSCs elicited by stimulation of multiple afferents (e.g. Booker et al, 2021). We do not claim that dendritic calcium transients cannot be elicited following activation of NMDARs in OLM cells. We simply reported that the evoked uEPSPs, designed to approximate individual synaptic signals, were sub-threshold for detectable dendritic calcium signals under conditions that were suprathreshold in neurogliaform cells. The statement has been amended to specify that there were no detectable signals under our recording conditions. There is no evidence presented in the manuscript to suggest that OLM NMDARs are calcium impermeable and indeed no such claim was made.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  There is a large variability in the observed dendritic nonlinearity, in NDNF IN dendrites e.g. the uEPSP amplitude nonlinearity measure varies from as low as 10-20% to over 200%. As only single dendrites were recorded from each IN, it is unclear if this variability is among the cells or between individual dendrites. While the authors analyzed some potential factors, such as distance along the dendrites, branch order, or response magnitude (amplitude and integral), they did not find any substantial correlation. It remains open if different dendrites of NDNF INs, located in the str. moleculare vs. those in or projecting towards str. radiatum, have divergent properties. Similarly, for SOM INs an important question is if axon-carrying dendrites show distinct properties.
  
  In this context, it would be interesting to see not only values for the mean nonlinearity but also the maximal nonlinearity and its distribution.
  
  Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in Fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify the analysis method. We did not address specifically whether dendrites projecting in different directions behaved differently. This is an interesting question beyond the scope of this study. Nor did we compare axon-carrying OLM dendrites to other dendrites.
  
  Figures:
  
  Figure 1: The gray line in plots g and h is not explained. While it looks like an identity line, the legend in plot i ("asynchronous") interferes.
  
  In plots g and h the gray line is the line of identity. In plot i it is an estimate of the linear summation. In plot i it is not the line of identity as it does not start at the origin with a slope of 1. The figure legend has been amended to clarify.
  
  In the same panels (Figure 1g,h, and subsequent figures) consider changing the title from "soma (voltage)" to uEPSP.
  
  The titles have been amended.
  
  In panel Figure 1i note the missing "(" in the title.
  
  Title amended.
  
  In panel Figure 1h: Shouldn't the X-axis label and legend text read "Arithmetic sum of (EPSP) integrals" instead of "Integral of arithmetic sum").
  
  The wording more accurately reflects the analytical operations. The asynchronous (separate) responses were summed arithmetically first, and then the integral was taken of each cumulative sum. We have therefore left the axis title and legend unchanged.
  
  Figure 2a,c: Could you please describe how the scaling was performed for the two axes?
  
  Method section amended.
  
  In the same panels (Figure 2a,c, and subsequent figures), the legend seems to be misleading: the plot is NS Amplitude/Integral vs Arithmetic sum, and the black line is the identity line (or scaled interpolation of the arithmetic sum, which is essentially the same).
  
  The scaled arithmetic sums (uEPSP amplitude, integral) represent linear summation and so overlap with the line of identity. The interpolation estimate of the asynchronous (separate) calcium transient response does not overlap with the line of identity as this estimate does not start at the origin with a slope of 1. The legends throughout the manuscript have been amended to clarify this.
  
  Figure 2b,d,f (and subsequent figures) slope plots: Please indicate that this is the average amplitude supralinearity for the individual recorded dendrites. Note here that the Results text mentions only the average amplitude supralinearity, but not the slop plots, paired mean difference, or Gardner-Altman estimation, illustrated in the figures.
  
  Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify.
  
  Fig 2e: The legend (both text and figure, also in the following figures) is confusing, as the gray line and diamonds are defined as separate 12(?) responses, but it seems to represent a linear interpolation of the scaled arithmetic sums (ultimately nothing else but an identity line).
  
  The grey line shows the linear interpolation output between the calcium transient measurements at 1 uncaging location and at 12 uncaging locations. The 12th uncaging location is indicated in the key as “separate 12”. The linear interpolation in these plots does represent linear summation but is not the line of identity as it does not begin at the origin and does not have a slope of 1.
  
  Reviewer #2 (Recommendations for the authors):
  
  This study is well-developed and technically executed. I only have minor comments for the authors:
  
  (1) To target NDNF+ neurons, the authors use the NDNF-Cre mouse line and a Cre-dependent AAV using the mDLX promotor. Why the mDLX promotor? Would it have been sufficient to use any Cre-dependent fluorophore?
  
  Pilot experiments revealed leaky expression when a virus driving flexed ChR2 under a non-specific promoter (EF1a) was injected in the neocortex of Ndnf-Cre mice (Author response image 1). In our hands, and in line with Dimidschtein et al (2016), the use of the mDLX enhancer reduced off-target expression.
  
  Author response image 1.
  
  A. AAV2/5-EF1a-DIO-hChR2(H134R)-mCherry injected into superficial neocortex of Ndnf-Cre mice led to expression in a few pyramidal neurons in addition to layer 1 neurogliaform cells. B. Patch-clamp recording from a non-labelled pyramidal cell showed that an optogenetically evoked glutamatergic current remained after blockade of GABAA and GABAB receptors, further confirming limited specificity of expression of ChR2. (Data from M Muller, M Mercier and V Magloire, Kullmann lab.)
  
  (2) The distance of the uncaring sites from the soma plays a key role. The authors should indicate the mean distance of the cluster and its variance.
  
  Uncaging distance from soma is indicated for both NGF and OLM interneurons in the supplementary figures S2 and S3 respectively.
  
  (3) Martina et al., in Science 2000, showed high levels of Na+ channels in the dendrites of OLM cells and hinted that spikes could occur in them. The authors should discuss this possible discrepancy.
  
  Discussion amended.
  
  (4) Looking at Figure 1d, the EPSPs look exceptionally long-lasting, longer than those observed by stimulating axonal inputs. Could this indicate spill-over excitation? If so, how could this affect the outcome of this study?
  
  The asynchronous (separate responses) decay to baseline within 100 ms, similar to the neurogliaform EPSPs evoked by electrical stimulation of axons in the SLM in Mercier et al. 2022. We observed clear plateau potentials in a minority of cells (e.g. Fig. S1b). Such plateau potentials can be generated by dendritic calcium channels and we do not consider that glutamate spillover needs to be invoked to account for them.
  
  (5) In the legend of Figure 2: "n=11 dendrites in 11 cells from 9 animals". Why do the authors only study 11 dendrites from 11 cells? Isn't it possible to repeatedly stimulate clusters of synaptic inputs onto the same cells? In principle, could one test many dendrites of the same cell at different distances from the soma? It is also remarkable that there were very few cells per animal.
  
  The goal always was to record from as many dendrites as possible from the same cells whilst maintaining high standards of cell health. When cell health indicators such as blebbing, input resistance change or resting voltage change were detected, no further dendritic location could be tested with reasonable confidence. In a given 400 um slice there would be relatively few healthy candidate cells at a suitable depth to attempt to patch-clamp.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.12.579998v4
www.biorxiv.org www.biorxiv.org

The PMA Phorbol Ester Tumor Promoter Increases Canonical Wnt Signaling Via Macropinocytosis

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  In this ms, Tejeda-Muñoz and colleagues examine the roles of macropinocytosis in WNT signalling activation in development (Xenopus) and cancer (CRC sections, cell lines and xenograft experiments). Furthermore, they investigate the effect of the inflammation inducer Phorbol-12-myristate-13-acetate (PMA) in WNT signalling activation through macropinocytosis. They propose that macropinocytosis is a key driver of WNT signalling, including upon oncogenic activation, with relevance in cancer progression.
  
  I found the analyses and conclusions of the relevance of macropinocytosis in WNT signalling compelling, notably upon constitutive activation both during development and in CRC.
  
  Thank you.
  
  However, I think this manuscript only partially characterises the effects of PMA in WNT signalling, largely due to a lack of an epistatic characterisation of PMA roles in Wnt activation. For example: 1- The authors show that PMA cooperate with 1) GSK3 inhibition in Xenopus to promote WNT activation, and 2) (possibly) with APCmut in SW480 to induce b-cat and FAK accumulation. To sustain a specific functional interaction between WNT and PMA, the effects should be tested through additional epistatic experiments. For example, does PMA cooperate with Wnt8 in axis duplication analyses? Does PMA cooperate with any other WNT alteration in CRC or other cell lines? Importantly, does APC re-introduction in SW480 rescue the effect of PMA? Such analyses could be critical to determine specificity of the functional interactions between WNT and PMA. This question could be addressed by performing classical epistatic analyses in cell lines (CRC or HEK) focusing on WNT activity, and by including rescue experiments targeting the WNT pathway downstream of the effects e.g., dnTCF, APC re- introduction, etc.
  
  We agree that there was need for additional direct evidence of functional interactions of between macropinocytosis, Wnt signaling, and PMA beyond the previously provided target gene assays in Xenopus (now shown in Figure 1I) and luciferase assays in cultured cells (Figure 1J) which used LiCl and inhibition by Bafilomycin. We therefore carried out a new experiment using 3T3 cells, now shown in Figure 1K-P. Wnt3a protein increased the uptake of TMR-dextran 70 kDa, and PMA enhanced this response. The macropinocytosis inhibitor EIPA blocked induction of macropinocytosis by Wnt3a and PMA. These results were quantitated in Figure 1Q. We think this new experiment strengthens the main conclusion that the tumor promoter PMA increases macropinocytosis. Thank you.
  
  2) While the epistatic analyses of WNT and macropinocytosis are clear in frog, the causal link in CRC cells is contained to b-catenin accumulation. While is clear that macropinocytosis reduces spheroid growth in SW480, the lack of rescue experiments with e.g., constitutive active b-catenin or any other WNT perturbation or/and APC re-introduction, limit the conclusions of this experiment.
  
  We now provide new experiments in 3T3 cells treated with LiCl, overexpression of constitutively-active β-catenin and constitutively-active Lrp6 (Figure 4, panels I through L’’); the new results indicate that Wnt signaling activation increases protein levels of the macropinocytosis activator Rac1.
  
  Minor comments:
  
  3- Different compounds targeting membrane trafficking are used to rescue modes of WNT activation (Wnt8 vs LiCl) in Xenopus.
  
  The main goal of our experiments was to test the requirement of membrane trafficking for tumor promoter activity through the Wnt pathway. We therefore used PMA, and a variety of inhibitors such as EIPA (Na+/H+ exchanger, Figure 1I and Figure 3D), Bafilomycin A (Figure 1H), DN-Rab7 (Figure 3G) and EHT1864 (a Rac1 inhibitor, Figure 4G). One could argue that using a wide variety of membrane trafficking inhibitors is a plus.
  
  4- The abstract does not state the results in CRC/xenografts
  
  We have added a sentence to the abstract.
  
  5- Labels of Figure 2E might be swap
  
  Thank you for detecting this error, we now label the last two columns in Figure 2E correctly.
  
  6- Figure 4i,j, 6 and s4 rely on qualitative analyses instead of quantifications, which underscores their evaluation. On the other hand, the detailed quantifications in Figure S3A-D strongly support the images of Figure 5
  
  The quantifications of the previous Figure 4I-J supported the data in the initial reviewed preprint, shown in Author response image 1:
  
  Author response image 1.
  
  However, these data have now been deleted from this version to make space for new experiments showing the stabilization of Rac1 by stabilized β-catenin and CA-LRP6. Quantifications in Figure 6C-F’’ are not shown because they represent changes in subcellular localization, but a western blot is provided in Figure 6B. Quantifications for Figure 6H-I’’ are shown in panel 6G. Supplemental Figure S4 already has 24 panels so introducing quantifications would be unwieldy.
  
  Thank you for the thoughtful comments.
  
  Reviewer #2 (Public Review):
  
  Tejeda Muñoz et al. investigate the intersection of Wnt signaling, macropinocytosis, lysosomes, focal adhesions and membrane trafficking in embryogenesis and cancer. Following up on their previous papers, the authors present evidence that PMA enhances Wnt signaling and embryonic patterning through macropinocytosis. Proteins that are associated with the endo-lysosomal pathway and Wnt signaling are co-increased in colorectal cancer samples, consistent with their pro-tumorigenic action. The function of macropinocytosis is not well understood in most physiological contexts, and its role in Wnt signaling is intriguing. The authors use a wide range of models - Xenopus embryos, cancer cells in culture and in xenografts and patient samples to investigate several endolysosomal processes that appear to act upstream or downstream of Wnt. A downside of this broad approach is a lack of mechanistic depth. In particular, few experiments monitor macropinocytosis directly, and macropinocytosis manipulations have pleiotropic effects that are open alternative interpretations. Several experiments are confirmatory of previous findings; the manuscript could be improved by focusing on the novel relationship between PMA-induced macropinocytosis and better support these conclusions with additional experiments.
  
  New additional experiments focusing on the role of PMA are now provided.
  
  The authors use a range of inhibitors that suppress macropinosome formation (EIPA, Bafilomycin A1, Rac1 inhibition). However, these are not specific macropinocytosis inhibitors (EIPA blocks an Na+/H+ exchanger, which is highly toxic and perturbs cellular pH balance; Bafilomycin blocks the V-ATPase, which has essential functions in the Golgi, endosomes and lysosomes; Rac1 signals through multiple downstream pathways). A specific macropinocytosis inhibitor does not exist, and it is thus important to support key conclusions with dextran uptake experiments.
  
  We used a wide range of inhibitors because the main idea is to show that membrane trafficking is important in Wnt and PMA activity. We would like to point out that the current experimental definition in the field of macropinocytosis, despite any caveats, is the ability to block dextran uptake with EIPA. Because inhibitors may not be entirely specific, we think using a broad approach to target membrane trafficking might be a plus. We now provide in Figure 1K-Q a new experiment showing that Wnt3a protein treatment increases dextran uptake and PMA stimulates this macropinocytosis in 3T3 cells. EIPA inhibited dextran macropinocytosis in the presence of Wnt and PMA (Figure 1N and 1Q). We also provide a time-lapse video of the rapid macropinocytic vesicles induction by PMA in SW480 CRC cells in which the plasma membrane is tagged (Supplemental Movie S1).
  
  The title states that PMA increases Wnt signaling through macropinocytosis. However, the mechanistic relationship between PMA-induced macropinocytosis and Wnt signaling is not well supported. The authors refer to a classical paper that demonstrates macropinocytosis induction by PMA in macrophages (PMID: 2613767). Unlike most cell types, macrophages display growth factor-induced and constitutive macropinocytic pathways (PMID: 30967001). It would thus be important to demonstrate macropinocytosis induction by PMA experimentally in Xenopus embryos / cancer cells. Does treatment with EIPA / Bafilomycin / Rac1i decrease the dextran signal in embryos? In macrophages, the PKC inhibitor Calphostin C blocks macropinocytosis induction by PMA (PMID: 25688212). Does Calphostin C block macropinocytosis in embryos / cancer cells? Do the various combinations of Wnts / Wnt agonists and PMA have additive or synergistic effects on dextran uptake? If the authors want to conclude that PMA activates Wnt signaling, it would also be important to demonstrate the effect of PMA on Wnt target gene expression.
  
  We now provide a new experiment showing macropinocytosis induction of PMA experimentally in cancer cells. CRC SW480 cells, despite having a mutant APC, are able to respond to PMA by further increasing TMR-dextran 70 kDa uptake over background within 1 hour (now shown in Figure S1):
  
  Investigating PKC and Calphostin C is outside of goals of this paper. With respect to final the point on the effect of PMA on Wnt target gene expression, this was shown in the context of the Xenopus embryo in Figure 1I (Siamois and Xnr3 are direct targets of Wnt).
  
  Author response image 2.
  
  The experiments concerning macropinosome formation in Xenopus embryos are not very convincing. Macropinosomes are circular vesicles whose size in mammalian cells ranges from 0.2 - 10 µM (PMID: 18612320). The TMR-dextran signal in Fig. 1A does not obviously label structures that look like macropinosomes; rather the signal is diffusely localized throughout the dorsal compartment, which could be extracellular (or perhaps cytosolic). I have similar concerns for the cell culture experiments, where dextran uptake is only shown for SW480 spheroids in Fig. S2. It would be helpful to quantify size of the circular structures (is this consistent with macropinosomes?).
  
  In response, we have deleted the TMR experiments in Xenopus embryos; they will be reinvestigated at a later time. With respect to macropinosome sizes in cultured cells, they are indeed large at the plasma membrane level (see new Supplemental Movie S1), but rapidly decrease in size once dextran is concentrated inside the cell. This can be visualized in the new experiments showing dextran vesicles in Supplemental Figure S1J-K and Figure 1K-P.
  
  In Fig. 4I - J, the dramatic decrease in b-catenin and especially in Rac1 after overnight EIPA treatment is rather surprising. How do the authors explain these findings? Is there any evidence that macropinocytosis stabilizes Rac1? Could this be another effect of EIPA or general toxicity?
  
  We now provide new evidence that Wnt signaling stabilizes Rac1. The old data relying on overnight EIPA treatment has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’).
  
  On a similar note, Fig. 6 K - L the FAK staining in control cells appears to localize to focal adhesions, but in PMA-treated cells is strongly localized throughout the cell. Do the authors have any thoughts on how PMA stabilizes FAK and where the kinase localizes under these conditions? Does PMA treatment increase FAK signaling activity?
  
  The previous Figure 6K-L’’ are now found in Supplementary Figure S4, panels C-D’’. The result is that FAK is greatly stabilized by overnight incubation with PMA. How this achieved is unknown, perhaps the result of increased macropinocytosis, but we do not wish to speculate in the main manuscript. We have not measured FAK activity, but the FAK inhibitor PF-00562271 strongly decreased β-catenin signaling by GSK3 inhibition (Figure 6J) and has strong effects in neural development that mimic inhibition of the early Wnt signal (new experiments shown in Figure 6K-L’’’). The results suggest that FAK activity affects Wnt signaling and dorsal development; the molecular mechanism of this interaction is unknown but worthy of future studies.
  
  The tumor stainings in Figure 5 are interesting but correlative. Pak1 functions in multiple cellular processes and Pak1 levels are not a direct marker for macropinocytosis. In the discussion, the authors discuss evidence that the V-ATPase translocates to the plasma membrane in cancer to drive extracellular acidification. To which extent does the Voa3 staining reflect lysosomal V-ATPase? Do the authors have controls for antibody specificity?
  
  It is true that Pak1 has multiple functions, yet it is essential for the actin machinery that drives macropinocytosis. We have now rephrased the discussion to say “Rac1 is an upstream regulator of the Pak1 kinase required for the actin machinery that drive macropinocytosis (Redelman-Sidi et al., 2018)”. We also explain that: “V-ATPase has been associated with acidification of the extracellular milieu in tumors (Capecci and Forgac, 2013; Hinton et al., 2009; Perona and Serrano, 1988). Extracellular acidification is probably due to increased numbers of lysosomes which are exocytosed, since V0a3 was located within the cytoplasm in advanced cancer or xenografts in mice (Figures 5I and S3I)”. The antibody we used for V0a3 is highly specific and has been used widely (Ramirez et al., 2019).
  
  Reviewer #3 (Public Review):
  
  The manuscript by Tejeda-Munoz examines signaling by Wnt and macropinocytosis in Xenopus embryos and colon cancer cells. A major problem with the study is the extensive use of pleiotropic inhibitors as "specific" inhibitors of macropinocytosis in embryos. It is true that BafA and EIPA block macropinocytosis, but they do many other things as well. A major target of EIPA is the NheI Na+/proton transporter, which also regulates invasive structures (podosomes, invadopodia) which could have major roles in development. Similarly, Baf1 will disrupt lysosomes and the endocytic system, which secondary effects on mTOR signaling and growth factor receptor trafficking. The authors cannot assume that processes inhibited by these drugs demonstrate a role of macropinocytosis. While correlations in tumor samples between increased expression of PAK1 and V0a3 and decreased expression of GSK3 are consistent with a link between macropinocytosis and Wnt-driven malignancy, the cell and embryo-based experiments do not convincingly make this connection. Finally, the data on FAK and TES are not well integrated with the rest of the manuscript.
  
  The criticism that drugs are not entirely specific is a valid one. Our approach of using a variety of drugs such as EIPA, BafA, EHT1864 or FAK inhibitor PF-00562271 all point to the main conclusion that the membrane trafficking is important in signaling by Wnt and the action of the tumor promoter PMA. The data on FAK, TES and focal adhesions have been better integrated in the manuscript and new experiments on the effect of FAK inhibitor in embryonic dorsal development are now provided (Figure 6K-L’’’).
  
  1) The data in Fig. 1A do not convincingly demonstrate macropinocytosis - it is impossible to tell what is being labeled by the dextran.
  
  In response, we have deleted the TMR-dextran experiments in Xenopus embryos; they will be reported at a later time.
  
  2) The data in Fig. 2 do not make sense. LiCL2 bypasses the WNT activation pathway by inhibiting GSK3. If subsequent treatment with BafA blocks the effects of GSK3 inhibition, then BafrA is doing something unrelated to Wnt activation, whose target is the inhibition/sequestration of GSK3. While BafA might block GSK3 sequestration by inhibiting MVB function, it should have no effect on the inhibition of GSK3 by LiCl2.
  
  We now explain in the main text describing Figure 2 in the results, the initial effect of GSK3 inhibition by LiCl is to trigger macropinocytosis (Albrecht et al., 2020). If the downstream acidification of lysosomes is inhibited, then the brief treatment with LiCl (7 min at 32-cell stage) has no effect (LiCl 1st+BafA 2nd, Figure 2H). BafA inhibits lysosomal acidification at 32-cell stage resulting in ventralization, but the effect of brief BafA treatment can be reversed by inducing membrane trafficking by LiCl (BafA 1st+LiCl 2nd, Figure 2C). The labelling of the figure panels C and H has been modified to indicate this is an order-of-addition experiment. These order-of-addition experiments strongly support the proposal that endogenous lysosomal activity is required to generate the initial endogenous Wnt signal that takes place at the 32-cell stage of development (Tejeda-Muñoz and De Robertis, 2022a).
  
  3) The effect of EHT on MP in SW480 cells is not clearly related to what is happening in the embryos. The nearly total loss of staining for Rac and -catenin after overnight EIPA does not implicate MP in protein stability - critical controls for cell viability and overall protein turnover are absent. Inhibition of WNT signaling might be expected to enhance -catenin turnover, but the effect on Rac1 is surprising. A more quantitative analysis by western blotting is required.
  
  The results from SW480 cells inhibition by EIPA have been replaced in Figure 4. We now provide new evidence in 3T3 cells that Wnt signaling stabilizes Rac1. The old data relying on EIPA treatment in SW480 cells has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’). In the original EIPA experiment in SW480 cells, now deleted from this version of the manuscript, we tested the cell viability using a Vi-Cell Beckman-Coulter Viability Analyzer and found that cells were 96-98% viable but proliferation was strongly decreased after 12 h of EIPA treatment. The effect of brief Rac1 inhibition (7 min) in decreasing dorsal development in embryos at the critical 32-cell stage is robust (Figure 4A-C). In addition, coinjection of EHT is able to entirely block the effects of microinjected xWnt8 mRNA (compare Figure 4E to 4G, see also Figure 4H), suggesting that Rac1 is required for Wnt signaling. Quantitative target gene expression analysis is provided for the embryo experiments (Figure 4C and 4H); for the stabilization of Rac1 by Wnt we are not providing quantitative measurements, but found similar results with 3 independent approaches (LiCl, CA-β-catenin and CA-Lrp6).
  
  4) The data on FAK inhibition and TES trafficking are poorly integrated with the rest of the paper.
  
  We attempted to better relate the TES trafficking to our previous paper showing that canonical Wnt signaling induces focal adhesion and Integrin-β1 endocytosis. We now write in the results: “We have previously reported a crosstalk between the Wnt and focal adhesion (FA) signaling pathways. Wnt3a treatment rapidly led to the endocytosis of Integrin β1 and of multiple focal adhesion proteins into MVBs (Tejeda-Muñoz et al., 2022). FAs link the actin cytoskeleton with the extracellular matrix (Figure 6A), and we now investigated whether FA activity is affected by Wnt signaling, PMA treatment and CRC progression”.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The reliance on pleiotropic inhibitors is a weakness and should be supplemented by genetic approaches to inhibit macropinocytosis.
  
  We agree, but that would be outside of the scope of this study.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.02.543509v2
www.biorxiv.org www.biorxiv.org

Transdifferentiation of fibroblasts into muscle cells to constitute cultured meat with tunable intramuscular fat deposition

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This solid study investigates the transdifferentiation of chicken embryonic fibroblasts into muscle and fat cells in 3D to create whole-cut meat mimics. The study is important and provides a method to control muscle, fat, and collagen content within the 3D meat mimics and thus provides a new avenue for customized cultured meat production. Limitations of this study include the use of transgene for transdifferentiation and thus the creation of GMO food.
  
  We are grateful for the substantial effort that editors and reviewers put into assessing our manuscript and providing insightful feedback. We have tried to address, as much as possible, all comments and criticisms. We believe that we have now a significantly improved manuscript. Below, there is a point-by-point response.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors presented here a novel 3D fibroblast culture and transdifferentiation approach for potential meat production with GelMA hydrogel.
  
  Strengths:
  
  (1) Reduced serum concentration for 3D chicken fibroblast culture and transdifferentiation is optimized.
  
  (2) Efficient myogenic transdifferentiation and lipogenesis as well as controlled fat deposition are achieved in the 3D GelMA.
  
  Weaknesses:
  
  (1) While the authors stated the rationale of using fibroblasts instead of myogenic/adipogenic stem cells for meat production, the authors did not comment on the drawbacks/disadvantages of genetic engineering (e.g., forced expression of MyoD) in meat production.
  
  Thanks for the reviewer for raise this important issue. We have now described this drawback in the discussion part.
  
  As a proof-of-concept study, we sought to explore the potential of utilizing the transdifferentiation integrated transgene tools for overexpressing a transdifferentiation factor to achieve the maximum muscle production. However, it is important to acknowledge that genetically modified meat products derived from the genetic engineering of cultured cells will not be suitable for consumer acceptance and market viability. We are currently testing other non-genomic integrating delivery means such as modRNAs and chemical cocktails to induce myogenic transdifferentiation in fibroblasts. We believe the new non-genomic integration means would be compatible for the meat production and consumer acceptance.
  
  Please see lines 439-445.
  
  “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products.”
  
  (2) While the authors cited one paper to state the properties and applications of GelMA hydrogel in tissue engineering and food processing, concerns/examples of the food safety with GelMA hydrogel are not discussed thoroughly.
  
  Thank you for pointing out this issue. We discussed the drawbacks of Gelma hydrogel applications in the meat production in the main text.
  
  GelMA-based hydrogels have shown great potential due to their biocompatibility and mechanical tenability. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used Gelma hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider Gelma hydrogen as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022).
  
  Bomkamp, C., Skaalure, S. C., Fernando, G. F., Ben‐Arye, T., Swartz, E. W., & Specht, E. A. J. A. S. (2022). Scaffolding biomaterials for 3D cultivated meat: prospects and challenges. Advanced Science (Weinh), 9(3), 2102908.
  
  Jeong, D., Seo, J. W., Lee, H. G., Jung, W. K., Park, Y. H., & Bae, H. (2022). Efficient Myogenic/Adipogenic Transdifferentiation of Bovine Fibroblasts in a 3D Bioprinting System for Steak-Type Cultured Meat Production. Advanced Science (Weinh), 9(31), e2202877.
  
  Li, Y., Liu, W., Li, S., Zhang, M., Yang, F., & Wang, S. J. J. o. F. F. (2021). Porcine skeletal muscle tissue fabrication for cultured meat production using three-dimensional bioprinting technology. Journal of Future Foods, 1(1), 88-97.
  
  Park, S., Hong, Y., Park, S., Kim, W., Gwon, Y., Jang, K.-J., & Kim, J. J. J. o. B. E. (2023). Designing Highly Aligned Cultured Meat with Nanopatterns-Assisted Bio-Printed Fat Scaffolds. Journal of Biosystems Engineering, 48(4), 503-511.
  
  We discussed the drawbacks of GelMA hydrogel. Please see lines 445-457.
  
  “Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”
  
  (3) In Fig. 4C, there seems no significant difference in the Vimentin expression between Fibroblast_MyoD and Myofibroblast. The conclusion of "greatly reduced in the myogenic transdifferentiated cells" is overstated.
  
  Thanks for pointing out this mistake.
  
  We revised the wording accordingly. The vimentin expression was reduced in fibroblast_MyoD compare to the original fibroblast.
  
  Please see lines 231-233.
  
  “The fibroblast intermediate filament Vimentin (Tarbit et al., 2019) was abundantly expressed in the fibroblasts but reduced in the myogenic transdifferentiated cells (Figure 4C)”
  
  (4) The presented cell culture platform is only applied to chicken fibroblasts and should be tested in other species such as pigs and fish.
  
  Thank you for the suggestion.
  
  In this pilot cultured meat study, we utilized chicken embryonic fibroblasts. These specific cells were chosen for their near-immortal nature and robustness in culture, as well as the inducible myogenic capacity. In our previous experiments (Ren et al, Cell Reports, 2022, 40:111206), we have tested the myogenic transdifferentiation potential of fibroblasts from mice, pigs, and chickens, and observed varying efficiencies of myogenesis. It is important to note that fibroblast cells derived from different species, or even different tissues within the same species, would exhibit significant variations in their capacities for myogenic and adipogenic transdifferentiation.
  
  In this proof-of-concept study we used only one source of fibroblasts for testing culture meat production and confirmed the myogenic/adipogenic transdifferentiation could be manipulated as feasible means to precisely control muscle, fat and collagen content. We would expect that different origins of fibroblasts to display different transdifferentiation efficiencies and thus produce various muscle/fat ratios in meat mimics. That is beyond the scope of current study.
  
  Furthermore, we are also testing myogenic/adipogenic transdifferentiation of fibroblasts from pigs through non-genomic integration approaches. We believe only the non-transgene tools are viable solutions for culture meat production in the future. We added the species information in the discussion part.
  
  See lines 515-517.
  
  “This approach can be readily extrapolated to other species such as pigs and presents promising avenues for the large-scale production of customized and versatile meat products that may cater to varying consumer preferences.”
  
  Reviewer #2 (Public Review):
  
  The manuscript by Ma et al. tries to develop a protocol for cell-based meat production using chicken fibroblasts as three-dimensional (3D) muscle tissues with fat accumulation. The authors used genetically modified fibroblasts which can be forced to differentiate into muscle cells and formulated 3D tissues with these cells and a biphasic material (hydrogel). The degrees of muscle differentiation and lipid deposition in culture were determined by immunohistochemical, biochemical, and molecular biological evaluations. Notably, the protocol successfully achieved the process of myogenic and lipogenic stimulation in the 3D tissues.
  
  Overall, the study is reasonably designed and performed including adequate analysis. The manuscript is clearly written with well-supported figures. While it presents valuable results in the field of cultivated meat science and skeletal muscle biology, some critical concerns were identified. First, it is unclear whether some technical approaches were really the best choice for cell-based meat production. Next, more careful evaluations and justifications would be required to properly explain biological events in the results. These points include additional evaluations and considerations with regard to myocyte alignment and lipid accumulation in the differentiated 3D tissues. The present data are very suggestive in general, but further clarifications and arguments would properly support the findings and conclusions.
  
  Thanks for the reviewer’s comments. We have performed additional experiments and analysis to address the critical questions. We also revised the text extensively to clarify or discuss some of the concerns, such as the cell alignment and cellular distribution of intramuscular fat issues. We expect the revised data and text could adequately support the conclusions of the manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) In Figure 1, the authors used 1% chicken serum. Have the authors tested other lower concentrations? It will be interesting to see the lowest chicken serum concentrations in fibroblast culture and transdifferentiation;
  
  Thank you for your suggestion.
  
  Yes, we actually have tested the lower concentrations of serum, such as 1% FBS, and 0.5% chicken serum. However, the cells are not in a healthy state under these low levels of serum, as shown by the abnormal cell morphology and nearly no cell growth. Please see the revised Supplementary Figure S1D, in which we added the 1%FBS and 0.5% chicken serum data. Hence, the 1% chicken serum is optimal in our hands. We will also test other types of specialized serum-free medium in future experiments.
  
  (2) In Figure 2, the authors should quantify the fold expansion of fibroblasts cultured in 3D gel after 1, 3, 5, and 9 days since this data is important for future meat manufacturing. In addition, long-term expansion (e.g., 1 month) in 3D gel should also be shown;
  
  Thanks for the question. We have quantified the cell growth in 3D by measuring the PHK26 stained cells. Since the cells were implanted into the gel, they propagated exponentially from 1 day to 9 days. The cell proliferation data provide good reference for the future meat manufacturing (Figure 2D). We have tried the long-term expansion in 3D but failed to measure the cell proliferation. Because the 3D gel always collapsed during 12-15 days in cell culture for some unknown reasons, either the cells are grown too crowded to compromise the gel structure or the gel matrix itself is not strong enough for standing long-term. We believe the cells will grow well in long-term if we provide enough 3D attachment surface, since they grow indefinitely in 2D. We will testing different 3D matrix in the future.
  
  Please see the revised Figure 2D for the quantification of cells.
  
  (3) In Figure 3, please also show MyoD staining as it'll be interesting to see the expression of exogenous and endogenous MyoD expression after dox treatment. In Figure G, the hydrogel meat seems very small, please show/discuss the maximum size of hydrogel meat that may be achieved using this approach;
  
  Thanks for asking this information. We performed the immunostaining by using the anti-MyoD and anti-Flag to show the expression of all MyoD (exogenous and endogenous) and only exogenous MyoD after dox treatment. The MyoD and 3xFlag were fused in-frame in the transgene plasmid and thus the anti-Flag staining indicate the exogenous MyoD expression and anti-MyoD staining indicate the expression of exogenous and endogenous MyoD together.
  
  As shown in Figure S4, we found that almost 100% of cells were positive for MyoD staining and 60% of which expressed Flag, these data were consistent with our previous results (Ren et al., 2022, Cell Reports).
  
  Author response image 1.
  
  As for the size of the culture meat based on hydrogel, we discussed the possibilities in scalable production of hydrogel based whole-cut meat mimics. Please see lines 446-449. “Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters.”
  
  (4) In Figure 5 and Supplementary Figure 6, please quantify the Oil-red O+ fat cells in the 2D and 3D lipogenic induction. Also in Fig. 6B, quantify the oil-red+MHC+ cells;
  
  Thank you for this advice. We have quantified the oil-red O stained images in the result “Stimulate the fat deposition in chicken fibroblasts in 3D” using analysis software imageJ and the quantification of Oil-red O area was added to the corresponding graphs (Figure 5C, Figure S6C and S6F).
  
  However, due to the unique structure of the 3D matrix, many MHC+ and Oil Red O+ double-positive cells overlap with each other across different Z-stack layers in 3D. This overlap makes it challenging to accurately position and quantify the double-positive cells as the different layers interfere with each other.
  
  (5) In Figure 7, please show immunostaining images of collagen and other major ECMs;
  
  Thank you for this question. We have tried to stain collagen networks the by the Picrosirius Red staining but failed. Instead, we employed the laminin immunostainings to confirm that the ECM contents in the 3D matrix is increasing steadily during cell culturation.
  
  Please see Figure 7C. Lines 346-348.
  
  “the laminin protein content was accumulated and increased steadily during 3D culturation (Figure 7C) “
  
  (6) In Figure 8, please show hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI. A Venn Diagram showing the overlap and distinct gene expression among these groups is also appreciated.
  
  Thank you for the suggestion.
  
  We added the hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI using Euclidean distance with ward.D cluster method. Please see Figure 8B. The result showed that these groups formed two large clusters, in which the 3D+FI clustered separately and the 3D_fibroblasts, 3D_MyoD and 3D_MyoD+FI were more similar. Please see Figure 8B.
  
  As the reviewer suggested, we also compared the transcriptomes of 3D_MyoD, 3D+FI, and 3D_MyoD+FI to the original 3D_fibroblasts to identify differentially expression genes (DEG) and then analyzed the overlap and distinct DEGs respectively. As shown in Figure 8D, the Venn Diagram showed that majority of DEG from 3D_MyoD+FI (3D_MyoD+FI versus 3D_fibroblasts) are overlapped with 3D_MyoD and 3D+FI, indicating that 3D_MyoD+FI are compatible with myogenic and adipogenic function.
  
  Please see the revised Figure 8.
  
  Reviewer #2 (Recommendations For The Authors):
  
  In this study, the authors demonstrated a new approach for cultivated meat production using chicken fibroblasts. Specifically, the cells were cultured as 3D and induced muscle differentiation and lipid deposition. The manuscript contains a good set of data, which would be valuable to researchers in the fields of both cell-based meat and skeletal muscle biology. From the aspect of cultivated meat science, the rationale behind the idea is understandable, but it remains unclear whether the proposed approach was really the best choice to achieve their final goal. On the other hand, when we read this manuscript as a paper in skeletal muscle biology, the overall approach was not innovative enough and several uncertain issues remain. The authors should add more sufficient justifications, arguments, and discussions.
  
  (1) When considering their goal to produce edible meat products, the current approach has some concerns. First, there are issues with the approach used for the induction of myogenesis by MyoD transgene. This makes the end products GMO foods, which are not easily acceptable to a wide range of consumers. Next, the hydrogel was used for 3D tissue formation, but it is unclear whether this matrix type is edible, safe, and bio-comparable for cell-based meat production. The authors already discussed these points by excusing that the current work remains proof-of-concept. However, more careful considerations and justifications would be required.
  
  Thank you for the suggestion.
  
  We acknowledge that the current transgene myogenic induction method is not suitable for mass production of culture meat because of the GMO food concerns. We utilized the MyoD transgene as the means of myogenic transdifferentiation at the first place, because of the ease of genetic manipulation and maximum efficiency. We are current testing non-genomic integration tools such as chemical cocktails and modified RNAs for myogenic transdifferentiation.
  
  When it comes to the applications of hydrogel in the food industry, certain types of hybrid hydrogels, such as those made from pectin or sodium polyacrylate, are not only edible but also safe for consumption. While GelMA hydrogel is typically utilized in tissue engineering and subsequent implantation in patients for therapeutic regenerative medicine purposes, it has not been commonly employed in food processing. In this study, we cultivated cells within GelMA hydrogel due to its durability and ease of use in cell culture. Moving forward, we plan to investigate alternative types of matrices to develop cultured meat suitable for food applications.
  
  We have now described the GMO and hydrogel drawbacks in the discussion part. Please see lines 439-457.
  
  “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products. Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”
  
  (2) From the view of skeletal muscle biology, the approaches (MyoD overexpression, hydrogel-based 3D tissue formation, and lipogenic induction) have already been tested.
  
  Thank you for the insightful comments from the perspective of skeletal muscle cell biology. We totally agree that the current approaches including MyoD overexpression, 3D cell culture and lipogenic induction, were routine experiments in muscle cell biology. However, we want to highlight that utilization of these classical and robust muscle cell approaches, combine with the unique advantages of fibroblast cells (easily accessible, immortalized, cost-effective, ...) would provide a novel and practical avenue for culture meat production. We stated these issues in the revised manuscript in the discussion part.
  
  Please see lines 511-515.
  
  “In conclusion, we have effectively utilized immortalized chicken fibroblasts in conjunction with classical myogenic/adipogenic transdifferentiation approaches within 3D hydrogel to establish a cultured meat model. This model allows for the precise regulation of the synthesis of key components found in conventional meat, including muscle, fat, and ECM.”
  
  (3) The common emphasis in this manuscript is to use the advantages of 3D culture for tissue differentiation. As the authors described, skeletal muscle is a highly aligned tissue. In this study, some results successfully demonstrated advantages in terms of myocyte alignment, maturation, and lipid deposition. However, the current results cannot address whether the entire 3D tissues maintained these advantageous characteristics or not. Because the method for 3D formation does not have any additional modifications to make the cells aligned, like micropatterning, scaffolding, or bioprinting.
  
  Thank you for the suggestion.
  
  We agree with the reviewer that the skeletal muscle tissues are composed of well organized, directional bundles of fibers, and the cell alignment would greatly affect the meat tenderness and sensory properties. Therefore, it is a desired attribute if the cells in the culture meat matrix could be aligned together. But this alignment would require sophisticated biomaterial engineering mainly involved in the scaffold manipulation which is beyond the scope of this study. The hydrogel used in this study formed different sizes of pores at random directions and we would expect the embedded cells to be totally non-directional. But we still found localized cell alignments in some parts of the gel matrix which confirming the cell-cell interactions, please see figure 3D. We describe this feature in the results part. In the future, we will be testing the application of physical or electrical stimulations to the matrix to see if we can align the cells better to make all the muscle cells in the whole matrix to align together.
  
  Please see lines 186-190.
  
  “The separate XY axis views of the orthogonal projections at different depths (Figure 3D) and a multi-angle video (Supplementary Video 2) also showed the several myotubes were aligned together. Nevertheless, many myotubes were oriented in different directions, preventing the entire matrix from aligning in one direction.”
  
  (4) In the skeletal muscle, fat accumulation mainly occurs in adipocytes between myocytes. This means that "intra-" muscular fat deposition is identified. However, lipid deposition within myocytes also occurred in this preparation (Supplementary Figure 7C). This situation is not "intra-" muscular accumulation, which sounds different from what is going on in normal skeletal muscle tissues. Please explain what happened and what biological situations accounted for this. Also, the authors should clarify better how lipogenesis was induced in the 3D tissues, such as cell types (transdifferentiated myocytes, remained/un-transdifferentiated fibroblasts, or both).
  
  Thank you for the very insightful question. We have revised the corresponding text to further explain the intramuscular fat distribution in different cell types in culture meat.
  
  We totally agree with the reviewer that intramuscular fat accumulation may occur mainly in the intramuscular adipocytes. However, under some pathological and physiological conditions in human and animals, the lipid droplets were also abundantly observed inside myofibers (intramyocellular lipids within myofiber cytoplasm). For instance, high intramyocellular lipid content was found in insulin resistance patients and paradoxically in endurance trained athletes, (doi.org/10.1016/j.tem.2012.05.009), as well as in some farm animals under intensive selective breeding (doi:10.2174/1876142910901010059). In the current study, with the Oil Red O staining of lipid droplets, we identified lipid deposition in both the transdifferentiated myocytes and the remained un-transdifferentiated fibroblasts in the culture meat. This lipid distribution pattern is comparable to the intramuscular fat storage pattern observed in some human and animals, in which fat accumulation occurs in both myofibers (intramyocellular lipids) and intramuscular adipocyte cells (extramyocellular lipids) which reside within the muscle tissue bundle but between myofibers. We reason that current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts. It is difficult to compare the absolute amount of lipids between these two types of cells via the Oil Red O staining. Also, it is almost impossible to separate these two types of cells from the 3D meat mimics. Thus, we can only confirm the lipid deposition occurs in both transdifferentiated myocytes and un-transdifferentiated fibroblasts, but without knowing which one is dominant and the major contributor to the intramuscular fat content in the culture meat.
  
  Please see lines 486-492.
  
  “In this study, the deposition of fat in the myotubes/myofibers facilitated the storage of significant lipid quantities in transdifferentiated muscle cells, known as intramyocellular lipids. Additionally, we observed Oil Red O staining in the remaining un-transdifferentiated fibroblasts, resembling cells of intramuscular adipocytes (extramyocellular lipids) found within muscle tissue. Hence, current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.26.564179v2
www.biorxiv.org www.biorxiv.org

Tissue-resident NK cells support survival in pancreatic cancer through promotion of cDC1-CD8T activity

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors demonstrate that the immunosuppressive environment in pancreatic ductal adenocarcinoma (PDAC) can be mitigated by a combination of ionizing radiation (IR), CCR5 inhibition, and PD1 blockade. This combination therapy increases tissue-resident natural killer (trNK) cells that facilitate CD8 T cell activity, resulting in a reduction of E-cadherin positive tumor cells. They identify a specific "hypofunctional" NK cell population in both mouse and human PDAC that supports CD8 T cell involvement. A trNK signature is found to be associated with better survival outcomes in PDAC and other solid tumors.
  
  Strengths:
  
  Overall, I think this is an interesting study that combines testing of therapeutic concepts in mice with bioinformatics analysis of single-cell transcriptome data in primary tumors and exploration of clinical outcomes using signature genes in TCGA data. The key finding is that immunoregulatory properties of tumor-infiltrating/resident CD56-bright NK cells (assumed to be non-cytotoxic) are beneficial for outcome through cross-talk with DC and recruitment of CD8 T cells. The latter is specifically induced by irradiation combined with CCR5i and PD1 blockade.
  
  "These results collectively support the notion that IR/CCR5i/αPD1 combination treatment alters immune infiltration by reducing Tregs and increasing NK and CD8 T cells, thereby resulting in greater local tumor control." I agree with this conclusion.
  
  Weaknesses:
  
  There are a few points to discuss and that the authors may want to address.
  
  (1)   "Notably, CCR5i significantly reduced Treg infiltration but had no effect on the infiltration of other immune cells, indicating the active recruitment of CCR5+ Tregs in PDAC (Figure 2B)."
  
  CCR5i treatment seems to inhibit infiltration of CD8 T cells and NK cells to a greater extent, in relative terms, compared to Treg, albeit it is not statistically significant. If this visual inspection of the graph does not reflect reality, additional experiments may be needed to verify the selective targeting of Tregs or confirm the fact that also CD8 T cells and NK cells are affected by single agent CCR5i. The reduced recruitment of Treg, NK cells, and CD8T cells was completely reversed when combined with irradiation. In the data shown in Figure 3E it seems as if CCR5i induced infiltration of Tregs along with other immune cells. However, this said, I agree with the conclusion of the authors that this combined treatment leads to an altered immune composition and ratio between Tregs and effector cells (CD8T cells and NK cells). Could this altered composition be displayed more clearly?
  
  We would like to thank the reviewer for their comments and agree that there is a trend for reduced NK and T-cell infiltration during CCR5i standalone treatment (as seen in Figure 2B), although it does not reach significance. To reflect this more clearly, we have added n.s (non-significant) for the NK cells and CD8+ T-cells and adjusted the text to reflect a trend for decreased NK and CD8+ T-cell infiltration (See Lines 162-165). Moreover, to reflect the data accurately, we have taken the Treg data out of the original Figure 2B and present it separately as a percentage of CD45+CD3+ T-cells.
  
  (2) The definition of active and hypofunctional NK cells based on solely NKG2D expression alone seems like an oversimplification. I realize it is not trivial to test tumor-infiltrating NK cells from these tumors functionally but perhaps scRNAseq of the tumors would allow for characterization of cytotoxicity scores using KEGG or GO analysis or reversed gene set enrichment in responders/non-responders.
  
  We agree that scRNA-seq of tumors would add to the overall characterization of the tumor-infiltrating NK cells and their characterization, however we are currently unfortunately not in the position to carry out this experiment. We did however immunophenotype the tumor infiltrating NK cell population in more depth by also looking at NKp46 and NKG2D surface expression. This newly added data demonstrates not only increased infiltration of “bona-fide” trNK cells (based on surface expression of CD103+CD49a+) under the triple treatment combination, but more importantly these trNK have reduced levels of CD69, NKp46, NKG2D and increased TIM-3 surface expression compared to conventional NK cells – suggesting that these trNKs could be more hypoactive compared to the conventional NK cells. These data have been added to the manuscript as Figure 4E, F; Figure supplement 4E-G and Lines 244-260 in the revised manuscript. To clarify this difference, we have replaced the word “hypofunctional” with “hypoactive” throughout the manuscript.
  
  (3) It seems as if the abstract refers to this phenotype incorrectly since the "hyporesponsive" subset is described as NKG2C-negative.
  
  We apologize for the typographic confusion and have corrected our abstract and changed the subset to NKG2D-negative (as was intended).
  
  (4) "The NK_C1 cluster correlates best with the hypofunction NK phenotype observed in mice as similarly displayed reduced activation (reduced NKG7, NKp80, GZMA, and PRF1) with additional expression of tissue residency markers CD103, CD49a and, surprisingly, the adaptive activating receptor NKG2C (KLRC2) (Figure 5B, C)."
  
  There is no doubt that NK_C1 represents tumor-infiltrating NK cells with a CD56bright gene signature with a strong tissue resident score. However, the transcriptional expression of KLRC2 on these is not surprising! It is well established that KLRC2 transcripts (but not protein) are highly expressed on conventional CD56bright NK cells. There are several published sources where the authors can find such data for confirmation. Thus, this is not to be confused with adaptive NK cells having an entirely different transcriptional signature and expressing high levels of NKG2C at the cell surface. I strongly recommend reinterpreting the results based on the fact that KLRC2 is expressed at high levels in conventional CD56bright NK cells. If not, it would be important to verify that these tissueresident NK cells express NKG2C and not NKG2A at the cell surface.
  
  We agree with the reviewer and have modified the text accordingly in the revised manuscript (Lines 279-283), including references to tissue-resident adaptive-like cells as described previously in literature.
  
  (5) NCAM1 transcript alone is not sufficient to deconvolute CD56bright NK cells in TCGA data (Figure 7A). As a single marker, it likely reflects NK cell infiltration without providing further evidence on the contribution of the bright/dim components. Therefore, the use of the bright Tr NK signature described in Table 1 is very important (Figure 7B). Table 1 is not provided. Nor Supplementary Table 1. There is only one supplementary figure in the ppt attached.
  
  We agree that a high NCAM1/CD56 single gene signature could also represent NK cell infiltration. We have rephrased this in the text accordingly (Lines 354-357). We apologize for the missing tables and Supplementary figures. We have added these now to the manuscript as Supplementary table 1.
  
  Reviewer #2 (Public Review)
  
  Summary:
  
  This work elaborates on a combined therapeutic approach comprising ionizing radiation and CCR5i/αPD1 immunotherapy as a promising strategy in pancreatic cancer. Previous research has established that NK cell-derived CCL5 and XCL1 play a crucial role in recruiting cDC1 cells to the tumor microenvironment, contributing to tumor control. In this study, by using a murine pancreatic cancer model, the authors propose that the addition of radiation therapy to CCR5i and αPD1 immunotherapy could upregulate CD8+ T cells and a subgroup of NK cells within the tumor and result in better tumor control. They further analyzed human single-cell sequencing data from pancreatic cancer patients and identified one subgroup of NK cells (NK C1) with tissue-resident features. Subsequent cell-cell contact analysis reveals the NK-cDC1-CD8 cell axis in pancreatic cancer. By analyzing TCGA data, they found that high NK C1 signature levels were associated with better survival in pancreatic cancer patients. Thus, radiotherapy could benefit the outcome of patients bearing low NK C1 signatures. Importantly, the positive correlation between NK C1 score with survival extends beyond pancreatic cancer, showing potential applicability across various solid cancers.
  
  Strengths:
  
  This study could add new insight into the clinical practice by introducing such novel combined therapy and shed light on the underlying immune cell dynamics. These findings hold potential for more effective and targeted treatment in the future. Mouse experiments nicely confirmed that such combined therapy could significantly reduce tumor volume. The elegant use of single-cell sequencing analysis and human database examination enriches the narrative and strengthens the study's foundation. Additionally, the notion that NK C1 signature correlates with patient survival in various solid cancers is of high interest and relevance.
  
  Weaknesses:
  
  The role of CCR5i requires further clarification. While the authors demonstrated its capacity to reduce Treg in murine tumors, its impact on other cell populations, including NK cells and CD8+ T cells, was not observed. Nevertheless, the effect of CCR5i on tumor growth in Figure 2B should be shown. If the combination of radiotherapy and αPD1 already can achieve good outcomes as shown in Figure 3A, the necessity to include CCR5i is questioned. Overall, a more comprehensive elucidation of the roles of CCL5 and CCR5i in this context would be good.
  
  We would like to thank the reviewer for their comments and agree that standalone CCR5i also shows a trend of reduced infiltrating NK cells and CD8+ T-cells, although this does not reach significance. We have mentioned this trend in the manuscript (see Lines 162-165) and added n.s to Figure 2B as well. In regards to adding CCR5i; although we observe volumetric control by radiotherapy and anti-PD1, we observe an increase in necrosis induction only in the triple combination compared to radiotherapy combined with anti-PD1 – suggesting that there is an additive effect of CCR5i in our model only as a combination modality. We therefore believe that addition of CCR5i to radiotherapy and anti-PD1 has a beneficial effect. The growth curves for CCR5i alone were already presented in Figure 3A, and we have modified our manuscript to refer to this (see Lines 165-167).
  
  (1) In line with this, spatial plots in Figure 4 did not include the group with only radiotherapy and αPD1. This inclusion would facilitate a clearer comparison and better highlight the essential role of CCR5i.
  
  We agree with the reviewer that inclusion of radiotherapy and αPD1 would facilitate a clear comparison of our data and our experiments did include single controls for radiotherapy and αPD1; however, unfortunately, the tissue slides were of bad quality and therefore not suitable for quantification. In line with this, we have added references to other studies that investigated the effect of immune checkpoint inhibitors in combination with radiotherapy (see Lines 169-172).
  
  (2) NK C1 cells should be also analyzed in the mouse model. The authors suggest that NKNKG2Dve could be the cell population. Staining of inhibitory markers should be considered, for example, TIGIT and TIM3 as presented in Figure 5B.
  
  As per the reviewer suggestion, we have now included some additional data on the surface expression of inhibitory markers/activating receptor on tumor-infiltrating NK cells in our model under the triple combination. These additional data demonstrate increased infiltration of trNK under the triple combination that seem to be more ‘hypoactive’ than conventional NK cells. This data has been added as Figure 4E in the revised Figure.
  
  (3) While the cell-cell contact analysis generated from single-cell sequencing data is insightful, extending this analysis to the mouse model under therapy would be highly informative. NK and CD8 cells in the tumor increased upon the combined therapy. However, cDC1 was not characterized. Analysis regarding cDC1 would provide more information on the NK/cDC1/CD8 axis.
  
  We agree that looking into cDC1 would be highly interesting in our treatment model and its characterization is currently under investigation. The importance about the interaction between cDC1-NK cells has been described before by various groups, and we have provided additional references for that in our manuscript (see Lines 449-455)
  
  (4) Human database analysis showed a positive correlation between NK C1 score and CCL5 in pancreatic cancer. Furthermore, radiotherapy could benefit the outcome of patients bearing low NK C1 scores. It would be interesting to test if radiotherapy could also benefit patients with low CCL5 levels in this cohort.
  
  We would like to thank the reviewer for their suggestion and please see the figure below for the comparison. Patients with CCL5high are enriched for NK_C1 (Figure 7D) and CCL5high patients with NK_C1high have significantly increased overall and disease-free survival compared to NK_C1low (Figure 7E); where those with NK_C1low significantly benefit from radiotherapy (Figure 7B). Accordingly, patients with CCL5high have significantly decreased overall survival compared to CCL5low patients, again confirming CCL5 as a prognostic marker (Figure 1A, Figure R1). When we look at CCL5low patients however, there is no additional significant benefit for radiotherapy (see insert below) in the CCL5low group (not significant; only significant p-values are shown). These data collectively support the strong correlation between CCL5 levels and NK_C1 enrichment, and imply that radiotherapy alone is insufficient to drive NK_C1 cells in the absence of high CCL5 gradients to improve overall survival. However, given the increased overall survival of CCL5low compared to CCL5high it is likely that other factors are at play. Future studies will be required to further elucidate the role of CCL5 gradients on NK_C1 cells and the beneficial effect of radiotherapy.
  
  Author response image 1.
  
  Overall survival of CCL5high versus CCL5low patients stratified into groups with and without radiotherapy using TCGA-PAAD. Log-rank p-value indicates the significance level across all groups while individual significant comparisons are shown as indicated.
  
  Reviewer #3 (Public Review):
  
  Summary
  
  In the submitted manuscript by Go et al, the authors evaluated the tumor microenvironment in pancreatic ductal adenocarcinoma (PDAC) and made a number of interesting observations, including the following: 1) CCL5 expression within the tumor microenvironment negatively correlated with clinical outcomes in human patients with PDAC; 2) there were both positive and negative correlations between CCL5 expression and the expression of specific genes (e.g. those encoding CD56 and CD16, respectively) included among gene signature lists for Treg, MDSC, TAM, and NK cells; 3) CCR5 inhibition with the inhibitor, maraviroc, reduced Treg infiltration but not that of other immune cell types in an orthotopic murine model of PDAC; 4) CCR5 inhibition augmented anti-PD1 immunotherapy when combined with ionizing radiation (IR) therapy in the murine model; 5) the above therapy resulted in increased infiltration of CD8+ cytotoxic T cells as well as of a subset of NKG2D-negative, tissueresidency (tr) marker expressing NK cells (deemed Cluster 1 NK in their data sets) that inversely correlated with the number of E-cadherin+ cells (i.e. tumor cells) and showed predicted interactions with cDC1 dendritic cells (including XCL1/XCL2 expressed by the NK and XCR1 expressed by the cDC1); 6) the authors identified a number of putative signals stemming from the trNK (e.g. IL-16, TNFSF14, FASLG, CSF, MIF) as well as incoming from cDC1s to NK (e.g. BAG6-NKp30); 7) these trNK cells positively correlated with good outcomes and with CD8+ T cell infiltrations in human PDAC as well as in many other solid tumor types; and 8) importantly, the benefit of IR therapy was specific to the subset of PDAC patients (represented in the TCGA dataset) that were predicted to have low amounts of trNK cells. The authors used murine experimental models, multiplexed imaging analyses, and a number of publicly available sequencing data sets from human tumor samples to perform their investigations. Based on their findings, the authors proposed that combining IR with CCR5 inhibition and anti-PD1 immunotherapy is a promising strategy to treat solid cancers.
  
  Strengths
  
  Overall, the collective analyses and conclusions appear to be novel and could be of high and rapid impact on the field, particularly in terms of directing clinical trials to incorporate IR with CCR5 inhibition and immunotherapy. The manuscript is well written; the figures are for the most part clear; and the Discussion is very thoughtful.
  
  Weaknesses
  
  There were a number of minor typographical errors, missing references, or minor issues with the figures. In general, while many of the observations provided strong suggestive evidence of relationships, phenotypes, and functions, the authors often used language to indicate that such things were confirmed, validated, or proven. In fact, there was a paucity of such functional/confirmatory experiments. This does not necessarily detract from the overall significance, excitement for, and potential impact of the study; but the language could likely be adjusted to be more in keeping with the true nature of the findings. The main title and running title are a bit different; consider making them more similar.
  
  We apologize for the typographical errors, missing references and issues with the figures. We have revised our manuscript, with a major focus on adjusting our language to more carefully reflect our data, and hope to have addressed all the concerns of the reviewer. The slight discrepancy between the main title and running title are to be able to convey the contents of this manuscript in a comprehensive way.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Please make sure all files are made available. Also please check available datasets describing KLRC2 transcripts in CD56brights. This is not to be confused with an adaptive-like signature.
  
  We have added the missing table to the supplementary figures and revised the manuscript text in regards to KLRC2 transcript in our NK_C1 cluster and its implications for an adaptive-like signature in the context of tissue-residency (see Lines 279-283; 465-474).
  
  Reviewer #2 (Recommendations For The Authors):
  
  Additional experiments as mentioned in the 'weakness' section could help to further strengthen this study. Besides these points, I would recommend the following:
  
  (1) The description in the figure should be more precise and clear. Especially in Figure 3A, it seems the addition of IR into CCR5i or CCR5i/aPD1 leads to a bigger tumor volume.
  
  We have adjusted the figure descriptions to more clearly describe the figures. We apologise for the confusion in Figure 3A, this was a figure legend error and has been correctly rectified in the revised Figures (i.e. closed symbols represent +IR conditions).
  
  (2) The definition of Tregs in figures should be described, e.g. it is not specified which population is shown in Figure S2c.
  
  We have added a definition of Tregs (i.e. Live/CD45+CD3+CD4+FOXP3+) in our revised manuscript (see Lines 162-165). To avoid confusion, we have removed the subsequent gating of CCR5 and PD-1 of Tregs in our revised Supplementary Figures.
  
  (3) Please add a bar in all histology figures, for example, Figure 2A, S2A, S3E. It seems in Figure S3D, E, the green group is missing.
  
  We have added the scale bar to all the indicated figures. Unfortunately, indeed as correctly pointed out by the reviewer, we are missing the green group (i.e. IR+CCR5i) as we felt that the excessive growth seen with CCR5i alone may have given a false impression of the extent of infiltration, therefore we did not include this in the original analysis and do not have the data in the Figure.
  
  (4) Please check through the manuscript, there are some grammar mistakes.
  
  We apologise for the grammar mistakes in our original manuscript and have carefully revised the current manuscript to avoid grammar mistakes
  
  (5) Figure S7B, the left cell lacks a name.
  
  We have annotated the left cell accordingly in our revised supplementary figure.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Abbreviations (e.g. PDAC) should be spelled out the first time introduced in the manuscript.
  
  We have adjusted this in our revised manuscript.
  
  (2) Referring to the tissue-resident NK cells as "hypofunctional" may not be useful...they seem to be functional, just not in the conventional sense. The authors may want to consider another term, such as non-cytotoxic (given the low expression of cytolytic granules, etc) or immunoregulatory (as they actually refer to them on line 310).
  
  We agree with the reviewer and have revised the manuscript to refer to them as “immunoregulatory” or “hypoactive” when appropriate. The latter is supported by the additional experiments as shown in Figure 4E.
  
  (3) Barry et al 2018 Nat Med demonstrated that NK cells in melanoma could support cDC1s and promote positive clinical outcomes in the setting of immunotherapy. It would likely be beneficial to also cite this paper (e.g. on line 425).
  
  Thank you for the suggestion, which would work in line with our hypothesis of crosstalk between NK_C1 and cDC1. We have looked for FLT3L in our NK_C1 cluster and did not find any enrichment for FLT3L transcript (see Figure 5E). Nevertheless, we have added the reference in the discussion of our manuscript to further support the importance of crosstalk between cDC1 and NK cells (see Lines 449455)
  
  (4) Figure 2B: by eye, it looks like the difference between CD8+ T cells in the two conditions would be significantly different; is this not the case? Same thing for the NK cells...what are the pvalues?
  
  We have added n.s. to our revised Figure 2B. The p-values for CD8+ T-cells and NK cells were 0.14 and 0.19 {2-tailed students t-test), respectively.
  
  (5) The murine data strongly suggest that the combination therapy promotes trNK cell infiltration into the tumors, in turn resulting in cDC1-mediated CD8+ T cell infiltration and/or activation. It could be highly valuable/useful to functionally determine (e.g. by depleting NK cells in this model) if NK cells are required for the effects seen.
  
  We agree that depletion of NK cells could really solidify the findings even more, and it is part of ongoing investigations for future projects. However, it would be imperative to first characterise these NK cells in more depth as conventional global ablation of NK cells is excepted to highly impact immunosurveillance as well. This is part of current ongoing work.
  
  (6) Figure 7B: how were "high" and "low" defined (for the NK signature)?
  
  An enrichment score of the NK_C1 gene signature (see Table supplement 1) was first calculated per patient sample in the TCGA RNA-seq dataset using the Gene Set Variation Analysis (GSVA) method. A cut-off value was then determined using the maximally selected rank statistics (max-stat R package) method to divide patients into “high” and “low”.
  
  (7) Lines 164-165 of the Results: it would be good to include a reference supporting the statement.
  
  We have added rephrased the manuscript and added corresponding references (see Lines 170-173 in revised manuscript).
  
  (8) There are many conclusions and very speculative language based only on sequencing results, and these have not been validated (e.g. in the Discussion, lines 447-453). As another example, it was concluded that a decrease in NKG2D+ NK cells implied a reduction in overall NK cell cytolytic activity and that NKG2D- NK cells were hypofunctional and did not kill well. This was not tested. Generally, it would be useful for the authors to use language that conveys that the data are primarily suggestive (rather than "confirmatory", line 447) of relationships, phenotypes, and functions at this point.
  
  We thank the reviewer for their concerns and have carefully adapted the manuscript text to more clearly clarify the findings in a careful manner.
  
  (9) On lines 246-247 the authors refer to cluster 3 NK cells, which express CD16, as "immature". The rationale for this designation is not provided, and most human NK cell development models hold that CD16+ NK cells represent the most mature subset(s).
  
  We apologize for the typographic error – later on we refer to the NK_C3 cluster as cytotoxic NK cells and we have corrected this in our revised manuscript (see Lines 273-275).
  
  (10) On line 351, the authors reference supplemental Figure 7C...but I don't see this figure in the accompanying powerpoint file.
  
  This should have been Supplementary Figure 7B, and we have corrected it in the revised manuscript (see Lines 374-377)
  
  (11) On line 417, the authors reference NKp40; this is likely a typographical error.
  
  This has been corrected in the revised manuscript to NKp46 (see Lines 439-442).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.14.562332v2
www.biorxiv.org www.biorxiv.org

New submission 11/12/2023, 09:33:47

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the current reviews.
  
  Overall Response
  
  We thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. Based on the reviewer’s comments and the updated eLife assessment, we would like to chose the current version of our manuscript as the Version of Record of our manuscript.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model which takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.
  
  Strengths:
  
  The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input, than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter.
  
  The authors control for some degree of redundancy between their training and test sets, both using sequence and structural similarity criteria. This is more careful than can be said of most works in the field of PPI prediction.
  
  As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.
  
  We thank the reviewer for recognizing the strengths of our work!
  
  Weaknesses:
  
  The authors check for performance drops when the test set is restricted to pairs of interacting proteins such that the chain pair is not similar as a pair (in sequence or structure) to a pair present in the training set. A more challenging test would be to restrict the test set to pairs of interacting proteins such that none of the chains are separately similar to monomers present in the training set. In the case of structural similarity (TM-scores), this would amount to replacing the two "min"s with "max"s in Eq. (4). In the case of sequence similarity, one would simply require that no monomer in the test set is in any MMSeqs2 cluster observed in the training set. This may be an important check to make, because a protein may interact with several partners, and/or may use the same sites for several distinct interactions, contributing to residual data leakage in the test set.
  
  We thank the reviewer for the suggestion! In the case of protein-protein prediction (“0D prediction”) or protein-protein interfacial residue prediction(“1D prediction”), we think making none of the chains in the test set separately similar to monomers in the training set is necessary, as the reviewer pointed out that a protein may interact with several partners, and may even use the same sites for the interactions. Since the task of this study is predicting the inter-protein residue-residue contacts (“2D prediction”), even though a protein uses the same site to interact with different partners, as long as the interacting partners are different, the inter-protein contact maps would be different. Therefore, we don’t think that in our task, making this restriction to the test set is necessary.
  
  The training set of AFM with v2 weights has a global cutoff of 30 April 2018, while that of PLMGraph-Inter has a cutoff of March 7 2022. So there may be structures in the test set for PLMGraph-Inter that are not in the training set of AFM with v2 weights (released between May 2018 and March 2022). The "Benchmark 2" dataset from the AFM paper may have a few additional structures not in the training or test set for PLMGraph-Inter. I realize there may be only few structures that are in neither training set, but still think that showing the comparison between PLMGraph-Inter and AFM there would be important, even if no statistically significant conclusions can be drawn.
  
  We thank the reviewer for the suggestion! It is not enough to only use the date cutoff to remove the redundancy, since similar structures can be deposited in the PDB in different dates. Because AFM does not release the PDB codes of its training set, it is difficult for us to totally remove the redundancy. Therefore, we think no rigorous conclusion can be drawn by including these comparisons in the manuscript. Besides, the main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM, rather than providing a tool which can beat AFM at this moment. We think including too many stuffs in the comparison with AFM may distract the readers. Therefore, we choose to not include these comparisons in the manuscript.
  
  Finally, the inclusion of AFM confidence scores is very good. A user would likely trust AFM predictions when the confidence score is high, but look for alternative predictions when it is low. The authors' analysis (Figure 6, panels c and d) seems to suggest that, in the case of heterodimers, when AFM has low confidence, PLMGraph-Inter improves precision by (only) about 3% on average. By comparison, the reported gains in the "DockQ-failed" and "precision-failed" bins are based on knowledge of the ground truth final structure, and thus are not actionable in a real use-case.
  
  We agree with the reviewer that more studies are needed for providing a model which can well complement or even beat AFM. The main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM.
  
  Reviewer #2 (Public Review):
  
  This work introduces PLMGraph-Inter, a new deep learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.
  
  We thank the reviewer for recognizing the strengths of our work!
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I recommend renaming the section "Further potential redundancies removal between the training and the test" to "Further potential redundancies removal between the training and the test sets"
  
  Changed.
  
  In lines 768-769, the sentence seems to end prematurely in "to use more stringent threshold in the redundancy removal"
  
  Corrected.
  
  In Eq. (4), line 789, there are many instances of dashes that look like minus signs, creating some confusion.
  
  Corrected.
  
  I think I may have mixed up figure references in my first review. When I said (Recommendations to the authors): "p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8", I think I was referring to what is now lines 423-424, referring to what is now Figure 5c. The point stands there, I think.
  
  Corrected.
  
  A couple of new grammatical mishaps have been introduced in the revision. These could be rectified.
  
  We carefully rechecked our revisions, and corrected the grammatical issues we found.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Most of my concerns were resolved through the revision. I have only one suggestion for the main figure.
  
  The current scatter plots in Figure 2 are hard to understand as too many different methods are abstracted into a single plot with multiple colors. I would suggest comparing their performances using box plot or violin plot for the figure 2.
  
  We thank the reviewer for the suggestion! In the revision, we tried violin plot, but it does not look good since too many different methods are included in the plot. Besides, we chose the scatter plot as it can provide much more details. We also provided the individual head-to-head scatter plots as supplementary figures, we think which can also be helpful for the readers to capture the information of the figures.
  
  The following is the authors’ response to the original reviews.
  
  Overall Response
  
  We would like to thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. We have carefully revised the manuscript to address all the concerns and suggestions raised by the reviewers.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.
  
  Strengths:
  
  The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.
  
  We thank the reviewer for recognizing the strengths of our work!
  
  Weaknesses:
  
  My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.
  
  We thank the reviewer for the suggestion! In the revision, to emphasize the performance of PLMGraph-Inter using the predicted monomer structures, we moved the evaluation results based on the predicted monomer from the supplementary to the main text (see the new Table 1 and Figure 2 in the revised manuscript) and re-organized the two subsections “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and “Impact of the monomeric structure quality on contact prediction” in the main text.
  
  In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.
  
  We thank the reviewer for the suggestion! It is worth noting that AFM automatically searches monomer templates in the prediction, and when we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) at least 20 templates were identified (AFM employed the top 20 templates in the prediction), and 87.8% of the targets employed the native templates (line 455-462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”). Therefore, we think Figure 6 not Figure S5 (the original Figure S2) shows a fairer comparison. Besides, it is also worth noting the targets used in this study would have a large overlap with the training set of AlphaFold-Multimer, since AFM used all protein complex structures in PDB deposited before 2018-04-30 in the model training, which would further cause the overestimation of the performance of AFM (line 450-455 in page 24-25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).
  
  To mimic the performance of AlphaFold2 in real practice and produce predicted monomeric structures with more diverse qualities, we only used the MSA searched from Uniref100 protein sequence database as the input to AlphaFold2 and set to not use the template (line 203~210 in page 12 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets”). Since some of the predicted monomer structures are of bad quality, it is reasonable that the performance of PLMGraph-Inter drops when the predicted monomeric structures are used in the prediction. We provided a detailed analysis of the impact of the monomeric structure quality on the prediction performance in the subsection “Impact of the monomeric structure quality on contact prediction” in the main text.
  
  We provided the analysis of the AFM multimer confidence values (“iptm + ptm”) in the revision (Figure 6, Figure S5 and line 495-501 in page 27 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).
  
  Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.
  
  We thank the reviewers for the suggestion, and we are sorry for the confusion! In the AFM runs to predict protein complex structures, we used the default setting of AFM which automatically searches monomer templates in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions (AFM only used the top 20 templates), and 87.8% of the targets employed the native template. We further clarified this in the revision (line 455462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFoldMultimer”). We also included the mean precisions of AFM (top-50 contact prediction) in the revision (Table S5 and line 483-484 in page 26 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).
  
  It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?
  
  We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number would be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets.
  
  It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.
  
  Author response image 1.
  
  The head-to-head comparison of qualities of complex predicted by AlphaFold-Multimer (2.2.0) and AlphaFold-Multimer (2.3.2) for each target PPI.
  
  We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. During the revision, we also tested the new version of AFM on the datasets of HomoPDB and HeteroPDB, but we found the performance difference between the two versions of AFM is actually very little (see the figure above, not shown in the main text). One reason might be that some targets in HomoPDB and HeteroPDB are redundant with the training sets of the two version of AFM. Since our test sets would have more overlaps with the training set of AFM V3, we keep using the AFM V2 weights in this study.
  
  Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.
  
  We thank the reviewer for the suggestion! In the revision, we explored the performance of PLMGraph-Inter when using different thresholds of fold similarity scores of interacting monomers to further remove potential redundancies between the training and test sets (i.e. redundancy in structure ) (line 353-386 in page 19-21 in the subsection “Ablation study”; line 762-797 in page 41-43 in the subsection “Further potential redundancies removal between the training and the test”). We found that for heteromeric PPIs (targets in HeteroPDB), the further removal of potential redundancy in structure has little impact on the model performance (~3%, when TM-score 0.5 is used as the threshold). However, for homomeric PPIs (targets in HomoPDB), the further removal of potential redundancy in structure significantly reduce the model performance (~18%, when TM-score 0.5 is used as the threshold) (see Table 2). One possible reason for this phenomenon is that the binding mode of the homomeric PPI is largely determined by the fold of its monomer, thus the does not generalize well on targets whose folds have never been seen during the training.
  
  Whether the deep learning model can generalize well on targets with novel folds is a very interesting and important question. We thank the reviewer for pointing out this! However, to the best of our knowledge, this question has rarely been addressed by previous studies including AFM. For example, the Benchmark 2 dataset is prepared by ClusPro TBM (bioRxiv 2021.09.07.459290; Proteins 2020, 88:1082-1090) which uses a sequence-based approach (HHsearch) to identify templates not structure-based. Therefore, we don’t think this dataset is non-redundant in structure.
  
  Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.
  
  Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Some sections of the paper use technical terminology which limits accessibility to a broad audience. An obvious example is in the section "Results > Overview of PLMGraph-Inter > The residual network module": the average eLife reader is not a machine learning expert and might not be familiar with a "convolution with kernel size of 1 * 1". In general, the "Overview of PLMGraph-Inter" is a bit heavy with technical details, and I suggest moving many of these to Methods. This overview section can still be there but it should be shorter and written using less technical language.
  
  We thank the reviewer for the suggestion! We moved some technical details to the Methods section in the revision (line 184-185 in page 11; line 729-735 in page 39).
  
  List of typos and minor issues (page number according to merged PDF):
  
  p. 3. line -3: remove "to"
  
  Corrected (line 36, page 3)
  
  p. 5, line 7: "GINTER" should be "GLINTER"
  
  Corrected (line 64, page 5)
  
  p. 6, line -4: "Given structures" -> "Given the structures"
  
  Corrected (line 95, page 6)
  
  p. 6, line -2: "with which encoded"... ?
  
  We rephrased this sentence in revision. (line 97, page 6)
  
  p. 9, line 1: "principal" -> "principle"
  
  Corrected (line 142, page 9)
  
  p. 13, line 1: "has" -> "but have"
  
  Corrected (line 231, page 13)
  
  p. 14, lines 6-7: "As can be seen from the figure that the predicted" -> "As can be seen from the figure, the predicted"
  
  We rephrased this paragraph, and the sentence was deleted in the revision (line 257-259 in page 15).
  
  p. 18, line 1: the "five models" are presumably models a-e? If so, say "of models a-e"
  
  Corrected (line 310, page 17)
  
  p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8
  
  Based the Figure 3C, we think 0.8 is a more appropriate cutoff, since the precision drops significantly when the DTM-score is within 0.7~0.8.
  
  p. 23, lines 2-3: "worth to making" -> "worth making"
  
  Corrected (line 443, page 24)
  
  p. 24, line -5: "predict" -> "predicted"
  
  Corrected (line 484, page 26)
  
  p 28, line -5: Please clarify what you mean by "We doubt": are you saying that you don't think these rearrangements exist in nature? If not, then reword.
  
  Corrected (line 566, page 30)
  
  Figure 2, panel c, "DCPred" in the legend should be "CDPred"
  
  Corrected
  
  Figures 3 and 5: Please improve the y-axis title in panel C. "Percent" of what?
  
  We changed the “Percent” to “% of targets” in the revision.
  
  We thank the reviewer for carefully reading our manuscript!
  
  Reviewer #2 (Public Review):
  
  This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.
  
  The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.
  
  We thank the reviewer for recognizing the significance of our work! We have carefully revised the manuscript to address the reviewer’s concerns.
  
  (1) The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.
  
  We thank the reviewer for the valuable suggestion! The “40 sequence identity” is a widely used threshold to remove redundancy when evaluating deep-learning based protein-protein interaction and protein complex structure prediction methods, thus we also chose this threshold in our study (bioRxiv 2021.10.04.463034, Cell Syst. 2021 Oct 20;12(10):969-982.e6). In the revision, we explored whether PLMGraph-inter can keep its performance when more stringent thresholds (30%,20%,10%) is applied (line 353386 in page 20-21 in the subsection of “Ablation study” and line 762-780 in page 40 in the subsection of “Further potential redundancies removal between the training and the test”). The result shows that even when using “10% sequence identity” as the threshold, mean precisions of the predicted contacts only decreases by ~3% (Table 2).
  
  (2) Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-tohead scatter plots as supplementary figures, not in the main figure.
  
  We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision (Figure S1 and Figure S2 in the supplementary).
  
  (3) The authors claim that PLMGraph-Inter is complementary to AlphaFoldmultimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.
  
  We thank the reviewer for the suggestion! We included this comparison in the revision (Figure S7).
  
  (4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).
  
  We thank the reviewer for the suggestion! We analyzed the relationship between the prediction performance and the depth of MSA in the revision (Figure S4 and Line 253264 in page 15 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and line 798-806 in page 42 in the subsection of “Calculating the normalized number of the effective sequences of paired MSA”).
  
  Reviewer #2 (Recommendations For The Authors):
  
  I have the following suggestions in addition to the public review.
  
  (1) Overall, the manuscript is well-written; however, I recommend a careful review for minor grammar corrections to polish the final text.
  
  We carefully checked the manuscript and corrected all the grammar issues and typos we found in the revision.
  
  (2) It would be better to indicate that single sequence embeddings, MSA embeddings, and structure embeddings are ESM-1b, ESM-MSA & PSSM, and ESM-IF when they are first mentioned in the manuscript e.g. single sequence embeddings from ESM-1b, MSA embeddings from ESM-MSA and PSSM, and structural embeddings from ESM-IF.
  
  We revised the manuscript according to the reviewer’s suggestion (line 86-88 in page 6; line 99-101 in page 7).
  
  (3) I don't think "outer concatenation" is commonly used. Please specify whether it's outer sum, outer product, or horizontal & vertical tiling followed by concatenation.
  
  It is horizontal & vertical tiling followed by concatenation. We clarified this in the revision (line 129-130 in page 8).
  
  (4) 10th sentence on the page where the Results section starts, please briefly mention what are the other 2D pairwise features.
  
  We clarified this in the revision (line 131-132 in page 8).
  
  (5) In the result section, it states edges are defined based on Ca distances, but in the method section, it says edges are determined based on heavy atom distances. Please correct one of them.
  
  It should be Ca distances. We are sorry for the carelessness, and we corrected this in the revision (line 646 in page 35).
  
  (6) For the sentence, "Where ESM-1b and ESM-MSA-1b are pretrained PLMs learned from large datasets of sequences and MSAs respectively without label supervision,", I'd suggest replacing "without label supervision" with "with masked language modeling tasks" for clarity.
  
  We revised the manuscript according to the reviewer’s suggestion (line 150-151 in page 9).
  
  (7) It would be better to briefly explain what is the dimensional hybrid residual block when it first mentioned.
  
  We explained the dimensional hybrid residue block when it first mentioned in the revision (line 107 in page 7).
  
  (8) Please include error bars for the bar plots and standard deviations for the tables.
  
  We thank the reviewer for the suggestion! Our understanding is the error bars and standard deviations are very informative for data which follow gaussian-like distributions, but our data (precisions of the predicted contacts) are obviously not this type. Most previous studies in protein contact prediction and inter-protein contact prediction also did not include these in their plots or tables. In our case, including these elements requires a dramatic change of the styles of our figures and tables, but we would like to not change our figures and tables too much in the revision.
  
  (9) Please indicate whether the chain break is considered to generate attention map features from ESM-MSA-1b. If it's considered, please specify how.
  
  The paired sequences were directly concatenated without using any letter to connect them, which means we did not consider chain break in generating the attention maps from ESM-MSA-1b.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.07.523121v4
www.biorxiv.org www.biorxiv.org

Prosapip1 in the dorsal hippocampus mediates synaptic protein composition, long-term potentiation, and spatial memory

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Recommendations for the authors):
  
  The biochemical fractionation and use of the term "synaptic" were my biggest issues. I would recommend using a more targeted approach to measure the PSD or compare and contrast synaptic from extrasynaptic. For instance, PMID 16797717 does a PSD purification, whereas other papers have fractionated extrasynaptic from synaptic. Moreover, a PSD95 immunoprecipitation may be of interest as one question that could arise is since you see decreases in PSD95 GluN2B, but not 2A or GluA1, could the association of PSD95 with the different proteins be altered? To evaluate this, proteomics or some other unbiased methodology could enhance an understanding of the full panoply of changes induced by Prosapip1 within the dHP.
  
  The reviewer makes value points; however, this is a large endeavor, which we will address in future experiments.
  
  There seems to be a missed opportunity to really determine how Prosapip1 is influencing protein expression and/or phosphorylation at the PSD.
  
  There is no indication that Prosapip1 is linked to transcription or translation machinery; therefore, we don’t see the value of examining protein expression in this context. Phosphorylation is a broad term, and although this can be answered through phosphoproteomics, this is outside the scope of this study.
  
  At the very least, additional discussion within this realm would help the reader contextualize the biochemical data.
  
  Further studies are needed to determine the mechanism by which Prosapip1 controls the localization of PSD95, GlunN2B, and potentially others. It is plausible that posttranslational modifications are responsible for Prosapip1 function. For example, the Prosapip1 sequence contains a potential glycosylation site (Ser622), and several potential phosphorylation sites (https://glygen.org/protein/O60299#Glycosylation, https://www.phosphosite.org/proteinAction.action?id=18395&showAllSites=true#appletMsg). These posttranslational modifications can contribute to the stabilization of the synaptic localization of GluN2B and PSD95.
  
  We added to the discussion the paragraph above as well as the caveat that proteomic studies are needed for a comprehensive study of the role of Prosapip1 in the PSD.
  
  Weaknesses:
  
  (1) Methodological Weaknesses
  
  a. The synapsin-Cre mice may more broadly express Cre-recombinase than just in neuronal tissues. Specifically, according to Jackson Laboratories, there is a concern with these mice expressing Cre-recombinase germline. As the human protein atlas suggests that Prosapip1 protein is expressed extraneuronally, validation of neuron or at least brain-specific knockout would be helpful in interpreting the data. Having said that, the data demonstrating that the brain region-specific knockout has similar behavioral impacts helps alleviate this concern somewhat; however, there are no biochemical or electrophysiological readouts from these animals, and therefore an alternative mechanism in this adult knockout cannot be excluded.
  
  This is a valuable insight from the reviewer, especially considering the information from Jackson Laboratories. As mentioned in the paper, we exclusively used female Syn1-Cre carrying breeders to avoid germline recombination. Furthermore, we consistently assessed the prevalence of the Prosapip1 flox sites alongside the presence of Syn1-Cre with our regular litter genotyping, confirming the presence of Prosapip1. Additionally, Prosapip1 protein expression was directly examined in rats in Wendholdt et al., 2006, where this group reported that Prosapip1 is a brain-specific protein, minimizing the potential consequences of a peripheral loss of Prosapip1. In addition, to confirm that Prosapip1 is a brain-specific protein in mice, we performed a western blot analysis on the dorsal hippocampus, liver, and kidney of a C57BL/6 mouse (Author response image 1), and found that Prosapip1 protein is not found in these peripheral organs, aligning with the findings in rats reported by Wendholdt et al.
  
  Author response image 1. Prosapip1 protein in the dorsal hippocampus, liver, and kidney of C57BL/6 mice.
  
  b. The use of the word synaptic and the crude fractionation make some of the data difficult to interpret/contextualize. It is unclear how a single centrifugation that eliminates the staining of a nuclear protein can be considered a "synaptic" fraction. This is highlighted by the presence of GAPDH in this fraction which is a cytosolically-enriched protein. While GAPDH may be associated with some membranes it is not a synaptic protein. There is no quantification of GAPDH against total protein to validate that it is not enriched in this fraction over control. Moreover, it should not be used as a loading control in the synaptic fraction. There are multiple different ways to enrich membranes, extrasynaptic fractions, and PSDs and a better discussion on the caveats of the biochemical fractionation is a minimum to help contextualize the changes in PSD95 and GluN2B.
  
  We apologize for the confusion. As we described in the methods section, the crude synaptosome was isolated by several centrifugations as depicted in the figure which we are now including in the manuscript. As shown in Extended Figure 2, the P2 fraction does contain PSD-95 and synapsin, as well as GluN2B, GluN2A, and GluA1; however, it does not contain the transcription factor CREB, indicating the isolation of the crude synaptosomal fraction. As shown in the figure, a small amount of GAPDH is present in the crude synaptosomal fraction. The presence of GAPDH in the crude synaptosomal fraction has been previously reported in (Atsushi et al., 2003; Lee et al. 2016; Wang et al. 2012). As we have added to the discussion, there remains a caveat that we cannot differentiate the pre- and post-synaptic fraction, and as a result we do not know if Prosapip1 plays a role in the assembly of axonal proteins.
  
  c. Also, the word synaptosomal on page 7 is not correct. One issue is this is more than synaptosomes and another issue is synaptosomes are exclusively presynaptic terminals. The correct term to use is synaptoneurosome, which includes both pre and postsynaptic components. Moreover, as stated above, this may contain these components but is most likely not a pure or even enriched fraction.
  
  Since we cannot exclude the possibility that Prosapip1 is also expressed in glia, we do not believe that the term synaptoneurosome is accurate.
  
  d. The age at which the mice underwent injection of the Cre virus was not mentioned.
  
  We apologize for the oversight. As now noted in the methods, the mice used for experiments underwent surgery to infect neurons with the AAV-GFP or AAV-Cre viruses between 5 and 6 weeks of age to ensure full viral expression by the experimental window beginning at 8 weeks old.
  
  (2) Weaknesses of Results
  
  a. There were no measures of GluN1 or GluA2 in the biochemical assays. As GluN1 is the obligate subunit, how it is impacted by the loss of Prosapip1 may help contextualize the fact that GluN2B, but not GluN2A, is altered. Moreover, as GluA2 has different calcium permeance, alterations in it may be informative.
  
  Since we detect NMDAR current, which requires the obligatory subunit GluN1 and at least one GluN2 subunit (GluN2A, GluN2B, GluN2C, GluN2D), we did not see the rationale behind examining the level of GluN1 in the Prosapip1 knockout mice.
  
  b. While there was no difference in GluA1 expression in the "synaptic" fraction, it does not mean that AMPAR function is not impacted by the loss of Prosapip1. This is particularly important as Prosapip1 may interact with kinases or phosphatases or their targeting proteins. Therefore, measuring AMPAR function electrophysiologically or synaptic protein phosphorylation would be informative.
  
  We agree with the reviewer that the loss of Prosapip1 could potentially impact AMPAR function. To address this, we measured spontaneous excitatory postsynaptic currents (sEPSCs) in hippocampal pyramidal neurons from both Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice. Given that neurons were voltage-clamped at -70 mV and extracellular Mg<sup>2+</sup> was maintained at 1.3 mM, the sEPSCs we recorded were primarily mediated by AMPARs.
  
  We found no significant differences in either the frequency or amplitude of these AMPA-mediated sEPSCs between Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice, suggesting that AMPAR function in hippocampal pyramidal neurons is not noticeably affected by the loss of Prosapip1 (see Author response image 2 below).
  
  Author response image 2. Comparison of hippocampal sEPSCs between Prosapip1(flx/flx); Syn1-Cre(-) (Cre(-)) and Prosapip1(flx/flx);Syn1-Cre(+) (Cre(+)) mice. sEPSCs were recorded in the presence of 1.3 mM Mg²⁺ and 0.1 mM picrotoxin, with neurons clamped at -70 mV. (A) Sample sEPSC traces from Prosapip1(flx/flx); Syn1-Cre(-) (top) and Prosapip1(flx/flx); Syn1-Cre(+) (bottom) mice. (B, C) Bar graphs showing no significant differences in sEPSC frequency (B) or amplitude (C) between Prosapip1(flx/flx); Syn1-Cre(-)and Prosapip1(flx/flx); Syn1-Cre(+) mice. Statistical analysis was performed using an unpaired t-test; p > 0.05, n.s. (not significant). Data represent 11 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(-) mice (11/3) and 8 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(+) mice (8/3).
  
  c. There is a lack of mechanistic data on what specifically and how GluN2B and PSD95 expression is altered. This is due to some of the challenges with interpreting the biochemical fractionation and a lack of results regarding changes in protein posttranslational modifications.
  
  See response above.
  
  d. The loss of social novelty measures in both the global and dHP-specific Prosapip1 knockout mice were not very robust. As they were consistently lost in both approaches and as there were other consistent memory deficits, this does not impact the conclusions, but may be important to temper discussion to match these smaller deficits within this domain.
  
  There is a clear difference between the Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice as well as the AAV-GFP and AAV-Cre mice in the loss of social novelty metric. We have emphasized that the Prosapip1(flx/flx);Syn1-Cre(+) mice and AAV-Cre mice do not recognize social novelty, which is supported by the statistics.
  
  4E: Two-way ANOVA: Effect of Social Novelty F<sub>(1,20)</sub> = 17.60, p = 0.0002; Post hoc Familiar vs. Novel (Cre(-)) p = 0.0008, Familiar vs. Novel (Cre(+)) p = 0.1451.
  
  5I: Two-way ANOVA: Effect of Social Novelty F<sub>(1,31)</sub> = 9.777, p = 0.0038; Post hoc Familiar vs. Novel (AAV-GFP) p = 0.0303, Familiar vs. Novel (AAV-Cre) p = 0.1319.
  
  e. Alterations in presynaptic paired-pulse ratio measures are intriguing and may point to a role for Prosapip1 in synapse development, as discussed in the manuscript. It would be interesting to delineate if these PPR changes also occur in the adult knockout to help detail the specific Prosapip1-induced neuroadaptations that link to the alterations in novelty-induced behaviors.
  
  This interesting question will be addressed in future studies.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The test statistics are required for each experiment for completeness. Currently, only p-values, tests used, and N are included.
  
  The entirety of the statistical information can be found in TYable 1, including test statistics and degrees of freedom (see Column 7, ‘Result’).
  
  (2) The authors claim that the function of Prosapip1 is not known in vivo, yet detail a study in the NAc where they investigated its function in vivo. The wording or discussion around what is and is not known should be altered to reflect this.
  
  The reviewer is correct to point to our previous manuscript (Laguesse et al. Neuron. 2017.) in which we found that Prosapip1 is important in mechanisms underlying alcohol-associated molecular, cellular and behavioral adaptations. However, these findings are specific to alcohol-related paradigms. Since the normal physiological role of Prosapip1 has never been delineated, this study was aimed to start addressing this gap in knowledge.
  
  References
  
  Wang, M., Li, S., Zhang, H. et al. Direct interaction between GluR2 and GAPDH regulates AMPAR-mediated excitotoxicity. Mol Brain 5, 13 (2012). https://doi.org/10.1186/1756-6606-5-13
  
  Atsushi Ikemoto, David G. Bole, Tetsufumi Ueda, Glycolysis and Glutamate Accumulation into Synaptic Vesicles: Role of Glyceraldehyde Phosphate Dehydrogenase and 3-Phosphoglycerate Kinase, Journal of Biological Chemistry, 8, 278 (2003). https://doi.org/10.1074/jbc.M211617200.
  
  Lee, F., Su, P., Xie, YF. et al. Disrupting GluA2-GAPDH Interaction Affects Axon and Dendrite Development. Sci Rep 6, 30458 (2016). https://doi.org/10.1038/srep30458
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.13.597459v2
www.biorxiv.org www.biorxiv.org

Whole-brain neural substrates of behavioral variability in the larval zebrafish

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this paper, Manley and Vaziri investigate whole-brain neural activity underlying behavioural variability in zebrafish larvae. They combine whole brain (single cell level) calcium imaging during the presentation of visual stimuli, triggering either approach or avoidance, and carry out whole brain population analyses to identify whole brain population patterns responsible for behavioural variability. They show that similar visual inputs can trigger large variability in behavioural responses. Though visual neurons are also variable across trials, they demonstrate that this neural variability does not degrade population stimulus decodability. Instead, they find that the neural variability across trials is in orthogonal population dimensions to stimulus encoding and is correlated with motor output (e.g. tail vigor). They then show that behavioural variability across trials is largely captured by a brain-wide population state prior to the trial beginning, which biases choice - especially on ambiguous stimulus trials. This study suggests that parts of stimulus-driven behaviour can be captured by brain-wide population states that bias choice, independently of stimulus encoding.
  
  Strengths:
  
  -The strength of the paper principally resides in the whole brain cellular level imaging in a well-known but variable behaviour.
  
  - The analyses are reasonable and largely answer the questions the authors ask.
  
  - Overall the conclusions are well warranted.
  
  Weaknesses:
  
  A more in-depth exploration of some of the findings could be provided, such as:
  
  - Given that thousands of neurons are recorded across the brain a more detailed parcelation of where the neurons contribute to different population coding dimensions would be useful to better understand the circuits involved in different computations.
  
  We thank the reviewer for noting the strengths of our study and agree that these findings have raised a number of additional avenues which we intend to explore in depth in future studies. In response to the reviewer’s comment above, we have added a number of additional figure panels (new Figures S1E, S3F-G, 4I(i), 4K(i), and S5F-G) and updated panels (Figures 4I(ii) and 4K(ii) in the revised manuscript) to show a more detailed parcellation of the visually-evoked neurons, noise modes, turn direction bias population, and responsiveness bias population. To do so. we have aligned our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figure S1E. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in Figures 4H and 4J in the revised manuscript. We also found that the distribution of neurons across our huc:h2b-gcamp6s recordings is very similar to the distribution of labeling in the huc:h2b-rfp reference image from the Z-Brain atlas (Figure S1E), which further supports our whole-brain imaging results.
  
  Overall, we find that this more detailed quantification and visualization is consistent with our interpretations. In particular, we show that the optimal visual decoding population (w<sub>opt</sub>) and the largest noise mode (e1) are localized to the midbrain (Figures S3F-G). This is expected, as in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide new evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).
  
  - Given that the behaviour on average can be predicted by stimulus type, how does the stimulus override the brain-wide choice bias on some trials? In other words, a better link between the findings in Figures 2 and 3 would be useful for better understanding how the behaviour ultimately arises.
  
  We agree with the reviewer that one of the most fundamental questions that this study has raised is how the identified neuronal populations predictive of decision variables (which we describe as an internal “bias”) interact with the well-studied, visually-evoked circuitry. A major limitation of our study is that the slow dynamics of the NL-GCaMP6s prevent clearly distinguishing any potential difference in the onset time of various neurons during the short trials, which might provide clues into which neurons drive versus later reflect the motor output. However, given that these ensembles were also found to be correlated with spontaneous turns, our hypothesis is that these populations reflect brain-wide drives that enable efficient exploration of the local environment (Dunn et al. 2016, doi.org/10.7554/eLife.12741). Further, we suspect that a sufficiently strong stimulus drive (e.g., large, looming stimuli) overrides these ongoing biases, which would explain the higher average pre-stimulus predictability in trials with small to intermediate-sized stimuli. An important follow-up line of experimentation could involve comparing the neuronal dynamics of specific components of the visual circuitry at distinct internal bias states, ideally utilizing emerging voltage indicators to maximize spatiotemporal specificity. For example, what is the difference between trials with a large looming stimulus in the left visual fields when the turn direction bias indicates a leftward versus rightward drive?
  
  - What other motor outputs do the noise dimensions correlate with?
  
  To better demonstrate the relationship between neural noise modes and motor activity that we described, we have provided a more detailed correlation analysis in new Figure S4A. We extracted additional features related to the larva’s tail kinematics, including tail vigor, curvature, principal components of curvature, angular velocity, and angular acceleration (S4A(i)). Some of these behavioral features were correlated with one another; for example, in the example traces, PC1 appears to capture nearly the same behavioral feature as tail vigor. The largest noise modes showed stronger correlations with motor output than the smaller noise modes, which is reminiscent recent work in the mouse showing that some of the neural dimensions with highest variance were correlated with various behavioral features (Musall et al. 2019; Stringer et al. 2019; Manley et al. 2024). We anticipate additional motor outputs would exhibit correlations with neural noise modes, such as pectoral fin movements (not possible to capture in our preparation due to immobilization) and eye movements.
  
  The dataset that the authors have collected is immensely valuable to the field, and the initial insights they have drawn are interesting and provide a good starting ground for a more expanded understanding of why a particular action is determined outside of the parameters experimenters set for their subjects.
  
  We thank the reviewer for noting the value of our dataset and look forward to future efforts motivated by the observations in our study.
  
  Reviewer #2 (Public Review):
  
  Overview
  
  In this work, Manley and Vaziri investigate the neural basis for variability in the way an animal responds to visual stimuli evoking prey-capture or predator-avoidance decisions. This is an interesting problem and the authors have generated a potentially rich and relevant data set. To do so, the authors deployed Fourier light field microscopy (Flfm) of larval zebrafish, improving upon prior designs and image processing schemes to enable volumetric imaging of calcium signals in the brain at up to 10 Hz. They then examined associations between neural activity and tail movement to identify populations primarily related to the visual stimulus, responsiveness, or turn direction - moreover, they found that the activity of the latter two populations appears to predict upcoming responsiveness or turn direction even before the stimulus is presented. While these findings may be valuable for future more mechanistic studies, issues with resolution, rigor of analysis, clarity of presentation, and depth of connection to the prior literature significantly dampen enthusiasm.
  
  Imaging
  
  - Resolution: It is difficult to tell from the displayed images how good the imaging resolution is in the brain. Given scattering and lensing, it is important for data interpretation to have an understanding of how much PSF degrades with depth.
  
  We thank the reviewer for their comments and agree that the dependence of the PSF and resolution as a function of depth is an important consideration in light field imaging. To quantify this, we measured the lateral resolution of the fLFM as a function of distance from the native image plane (NIP) using a USAF target. The USAF target was positioned at various depths using an automated z-stage, and the slice of the reconstructed volume corresponding to that depth was analyzed. An element was considered resolved if the modulation transfer function (MTF) was greater than 30%.
  
  In new Figure S1A, we plot the resolution measurements of the fLFM as compared to the conventional LFM (Prevedel et al., 2014), which shows the increase in resolution across the axial extent of imaging. In particular, the fLFM does not exhibit the dramatic drop in lateral resolution near the NIP which is seen in conventional LFM. In addition, the expanded range of high-resolution imaging motivates our increase from an axial range of 200 microns in previous studies to 280 microns in this study.
  
  - Depth: In the methods it is indicated that the imaging depth was 280 microns, but from the images of Figure 1 it appears data was collected only up to 150 microns. This suggests regions like the hypothalamus, which may be important for controlling variation in internal states relevant to the behaviors being studied, were not included.
  
  The full axial range of imaging was 280 microns, i.e. spanning from 140 microns below to 140 microns above the native imaging plane. After aligning our recordings to the Z-Brain dataset, we have compared the 3D distribution of neurons in our data (new Figure S1E(i)) to the labeling of the reference brain (Figure S1E(ii)). This provides evidence that our imaging preparation largely captures the labeling seen in a dense, high-resolution reference image within the indicated 280 microns range.
  
  - Flfm data processing: It is important for data interpretation that the authors are clearer about how the raw images were processed. The de-noising process specifically needs to be explained in greater detail. What are the characteristics of the noise being removed? How is time-varying signal being distinguished from noise? Please provide a supplemental with images and algorithm specifics for each key step.
  
  We thank the reviewer for their comment. To address the reviewer’s point regarding the data processing pipeline utilized in our study, in our revised manuscript we have added a number of additional figure panels in Figure S1B-E to quantify and describe the various steps of the pipeline in greater depth.
  
  First, the raw fLFM images are denoised. The denoising approach utilized in the fLFM data processing pipeline is not novel, but rather a custom-trained variant of Lecoq et al.’s (2021) DeepInterpolation method. In our original manuscript, we also described the specific architecture and parameters utilized to train our specific variation of DeepInterpolation model. To make this procedure clearer, we have added the following details to the methods:
  
  “DeepInterpolation is a self-supervised approach to denoising, which denoises the data by learning to predict a given frame from a set of frames before and after it. Time-varying signal can be distinguished from shot noise because shot noise is independent across frames, but signal is not. Therefore, only the signal is able to be predicted from adjacent frames. This has been shown to provide a highly effective and efficient denoising method (Lecoq et al., 2021).”
  
  Therefore, time-varying signal is distinguished from noise based on the correlations of pixel intensity across consecutive imaging frames. To better visualize this process, in new Figure S1B we show example images and fluorescence traces before and after denoising.
  
  - Merging: It is noted that nearby pixels with a correlation greater than 0.7 were merged. Why was this done? Is this largely due to cross-contamination due to a drop in resolution? How common was this occurrence? What was the distribution of pixel volumes after aggregation? Should we interpret this to mean that a 'neuron' in this data set is really a small cluster of 10-20 neurons? This of course has great bearing on how we think about variability in the response shown later.
  
  First, to be clear, nearby pixels were not merged; instead neuronal ROIs identified by CNMF-E were merged, as we had described: “the CNMF-E algorithm was applied to each plane in parallel, after which the putative neuronal ROIs from each plane were collated and duplicate neurons across planes were merged.” If this merging was not performed, the number of neurons would be overestimated due to the relatively dense 3D reconstruction with voxels of 4 m axially. Therefore, this merging is a requisite component of the pipeline to avoid double counting of neurons, regardless of the resolution of the data.
  
  However, we agree with the reviewer that the practical consequences of this merging were not previously described in sufficient detail. Therefore, in our revision we have added additional quantification of the two critical components of the merging procedure: the number of putative neuronal ROIs merged and the volume of the final 3D neuronal ROIs, which demonstrate that a neuron in our data should not be interpreted as a cluster of 10-20 neurons.
  
  In new Figure S1C(i), we summarize the rate of occurrence of merging by assessing the number of putative 2D ROIs which were merged to form each final 3D neuronal ROI. Across n=10 recordings, approximately 75% of the final 3D neuronal ROIs involved no merging at all, and few instances involved merging more than 5 putative ROIs. Next, in Figure S1C(ii), we quantify the volume of the final 3D ROIs. To do so, we counted the number of voxels contributing to each final 3D neuronal ROI and multiplied that by the volume of a single voxel (2.4 x 2.4 x 4 µm<sup>3</sup>). The majority of neurons had a volume of less than 1000 µm<up>3</sup>, which corresponds to a spherical volume with a radius of roughly 6.2 m. In summary, both the merging statistics and volume distribution demonstrate that few neuronal ROIs could be consistent with “a small cluster of 10-20 neurons”.
  
  - Bleaching: Please give the time constants used in the fit for assessing bleaching.
  
  As described in the Methods, the photobleaching correction was performed by fitting a bi-exponential function to the mean fluorescence across all neurons. We have provided the time constants determined by these fits for n=10 recordings in new Figure S1D(i). In addition, we provided an example of raw mean activity, the corresponding bi-exponential fit, and the mean activity after correction in Figure S1D(ii). These data demonstrate that the dominant photobleaching effect is a steep decrease in mean signal at the beginning of the recording (represented by the estimated time constant τ<sub>1</sub>), followed by a slow decay (τ<sub>2</sub>).
  
  Analysis
  
  - Slow calcium dynamics: It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given and the authors should account for variability in this kernel time across cell types. Moreover, by not deconvolving their signals, the authors allow for contamination of their signal at any given time with a signal from multiple seconds prior. For example, in Figure 4A (left turns), it appears that much of the activity in the first half of the time-warped stimulus window began before stimulus presentation - without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing. This also suggests that in some cases the signals from the prior trial may contaminate the current trial.
  
  We would like to respond to each of the points raised here by the reviewer individually.
  
  (1) “It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given…”
  
  We disagree with the reviewer’s claim that the slow dynamics of the calcium indicator GCaMP were not accounted for. While we did not deconvolve the neuronal traces with the GCaMP response kernel, in every step in which we correlated neural activity with sensory or motor variables, we convolved the stimulus or motor timeseries with the GCaMP kernel, as described in the Methods. Therefore, the expected delay and smoothing effects were accounted for when analyzing the correlation structure between neural and behavioral or stimulus variables, as well as during our various classification approaches. To better describe this, we have added the following description of the kernel to our Methods:
  
  “The NL-GCaMP6s kernel was estimated empirically by aligning and averaging a number of calcium events. This kernel corresponds to a half-rise time of 400 ms and half-decay time of 4910 ms.”
  
  This approach accounts for the GCaMP kernel when relating the neuronal dynamics to stimuli and behavior, while avoiding any artifacts that could be introduced from improper deconvolution or other corrections directly to the calcium dynamics. Deconvolution of calcium imaging data, and in particular nuclear-localized (NL) GCaMP6s, is not always a robust procedure. In particular, GCaMP6s has a much more nonlinear response profile than newer GCaMP variants such as jGCaMP8 (Zhang et al. 2023, doi:10.1038/s41586-023-05828-9), as the reviewer notes later in their comments. The nuclear-localized nature of the indicator used in our study also provides an additional nonlinear effect. Accounting for a nonlinear relationship between calcium concentration and fluorescence readout is significantly more difficult because such nonlinearities remove the guarantee that the optimization approaches generally used in deconvolution will converge to global extrema. This means that deconvolution assuming nonlinearities is far less robust than deconvolution using the linear approximation (Vogelstein et al. 2010, doi: 10.1152/jn.01073.2009). Therefore, we argue that we are not currently aware of any appropriate methods for deconvolving our NL-GCaMP6s data, and take a more conservative approach in our study.
  
  We also argue that the natural smoothness of calcium imaging data is important for the analyses utilized in our study (Shen et al., 2022, doi:10.1016/j.jneumeth.2021.109431). Even if our data were deconvolved in order to estimate spike trains or more point-like activity patterns, such data are generally smoothed (e.g., by estimating firing rates) before dimensionality reduction, which is a core component of our neuronal population analyses. Further, Wei et al. (2020, doi:10.1371/journal.pcbi.1008198) showed in detail that deconvolved calcium data resulted in less accurate population decoding, whereas binned electrophysiological data and raw calcium data were equally accurate. When using other techniques, such as clustering of neuronal activity patterns (a method we do not employ in this study), spike and deconvolved calcium data were instead shown to be more accurate than raw calcium data. Therefore, we do not believe deconvolution of the neuronal traces is appropriate in this case without a better understanding of the NL-GCaMP6s response, and do not rely on the properties of deconvolution for our analyses. Still, we agree with the reviewer that one must be mindful of the GCaMP kernel when analyzing and interpreting these data, and therefore have noted the delayed and slow kinematics of the NL-GCaMP within our manuscript, for example: “To visualize the neuronal activity during a given trial while accounting for the delay and kinematics of the nuclear-localized GCaMP (NL-GCaMP) sensor, a duration of approximately 15 seconds is extracted beginning at the onset of the 3-second visual stimulus period.”
  
  (2) “… and the authors should account for variability in this kernel time across cell types.”
  
  In addition to the points raised above, we are not aware of any deconvolution procedures which have successfully shown the ability to account for variability in the response kernel across cell types in whole-brain imaging data when cell type is unknown a priori. Pachitariu et al. (2018, doi:10.1523/JNEUROSCI.3339-17.2018) showed that the best deconvolution procedures for calcium imaging data rely on a simple algorithm with a fixed kernel. Further, more complicated approaches either utilize either explicit priors about the calcium kernel or learn implicit priors using supervised learning, neither of which we would be able to confirm are appropriate for our dataset without ground truth electrophysiological spike data.
  
  However, we agree with the reviewer that we must interpret the data while being mindful that there could be variability in this kernel across neurons, which is not accounted for in our fixed calcium kernel. We have added the following sentence to our revised manuscript to highlight this limitation:
  
  “The used of a fixed calcium kernel does not account for any variability in the GCaMP response across cells, which could be due to differences such as cell type or expression level. Therefore, this analysis approach may not capture the full set of neurons which exhibit stimulus correlations but exhibit a different GCaMP response.”
  
  (3) “without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing”
  
  While we agree with the reviewer that the slow dynamics of the indicator will cause a delay and smoothing of the signal over time, we would like to point out that this effect is highly directional. In particular, we can be confident that pre-stimulus activity is not contaminated by the stimulus given the data we describe in the next point regarding the timing of visual stimuli relative to the GCaMP kernel. The reviewer is correct that post-stimulus firing can be mixed with pre-stimulus firing due to the GCaMP kernel. However, our key claims in Figure 4 center around turn direction and responsiveness biases, which are present even before the onset of the stimulus. Still, we have highlighted this delay and smoothing to readers in the updated version of our manuscript.
  
  (4) “This also suggests that in some cases the signals from the prior trial may contaminate the current trial”
  
  We have carefully chosen the inter-stimulus interval for maximum efficiency of stimulation, while ensuring that contamination from the previous stimulus is negligible. The inter-stimulus interval was chosen by empirically analyzing preliminary data of visual stimulation with our preparation. New Figure S3C shows the delay and slow kinematics due to our indicator; indeed, visually-evoked activity peaks after the end of the short stimulus period. Importantly, however, the visually-evoked activity is at or near baseline at the start of the next trial.
  
  Finally, we would like to note that our stimulation protocol is randomized, as described in the Methods. Therefore, the previous stimulus has no correlation with the current stimulus, which would prevent any contamination from providing predictive power that could be identified by our visual decoding methods.
  
  - Partial Least Squares (PLS) regression: The steps taken to identify stimulus coding and noise dimensions are not sufficiently clear. Please provide a mathematical description.
  
  We have updated the Results and Methods sections of our revised manuscript to describe in more mathematical detail the approach taken to identify the relevant dimensions of neuronal activity:
  
  “The comparison of the neural dimensions encoding visual stimuli versus trial-to-trial noise was modeled after Rumyantsev et al. (2020). Partial least squares (PLS) regression was used to find a low-dimensional space that optimally predicted the visual stimuli, which we refer to as the visually-evoked neuronal activity patterns. To perform regression, a visual stimulus kernel was constructed by summing the timeseries of each individual stimulus type, weighted by the stimulus size and negated for trials on the right visual field, thus providing a single response variable encoding both the location, size, and timing of all the stimulus presentations. This stimulus kernel was the convolved with the temporal response kernel of our calcium indicator (NL-GCaMP6s).
  
  PLS regression identifies the normalized dimensions and that maximize the covariance between paired observations and , respectively. In our case, the visual stimulus is represented by a single variable , simplifying the problem to identifying the subspace of neural activity that optimally preserves information about the visual stimulus (sometimes referred to as PLS1 regression). That is, the N x T neural time series matrix X is reduced to a d x T matrix spanned by a set of orthonormal vectors. PLS1 regression is performed as follows:
  
  PLS1 algorithm
  
  Let X<sub>i</sub> = X and . For i = 1…d,
  
  (1)
  
  (2)
  
  (3)
  
  (4)
  
  (5) (note this is scalar)
  
  (6)
  
  The projections of the neural data {p<sub>i</sub>} thus span a subspace that maximally preserves information about the visual stimulus . Stacking these projections into the N x d matrix P that represents the transform from the whole-brain neural state space to the visually-evoked subspace, the optimal decoding direction is given by the linear least squares solution . The dimensionality d of PLS regression was optimized using 6-fold cross-validation with 3 repeats and choosing the dimensionality between d = 1 and 20 with the lowest cross-validated mean squared error for each larva. Then, was computed using all time points.
  
  For each stimulus type, the noise covariance matrix was computed in the low-dimensional PLS space, given that direct estimation of the noise covariances across many thousands of neurons would likely be unreliable. A noise covariance matrix was calculated separately for each stimulus, and then averaged across all stimuli. As before, the mean activity µ<sub>i</sub> for each neuron was computed over each stimulus presentation period. The noise covariance then describes the correlated fluctuations δ<sub>i</sub> around this mean response for each pair of neurons i and j, where
  
  The noise modes for α = 1 …d were subsequently identified by eigendecomposition of the mean noise covariance matrix across all stimuli, . The angle between the optimal stimulus decoding direction and the noise modes is thus given by .”
  
  - No response: It is not clear from the methods description if cases where the animal has no tail response are being lumped with cases where the animal decides to swim forward and thus has a large absolute but small mean tail curvature. These should be treated separately.
  
  We thank the reviewer for raising the potential for this confusion and agree that forward-motion trials should not treated the same as motionless trials. While these types of trial were indeed treated separately in our original manuscript, we have updated the Methods section of our revised manuscript to make this clear:
  
  “Left and right turn trials were extracted as described previously. Response trials included both left and right turn trials (i.e., the absolute value of mean tail curvature > σ<sub>active</sub>), whereas nonresponse trials were motionless (absolute mean tail curvature < σ<sub>active</sub>). In particular, forward-motion trials were excluded from these analyses.”
  
  While our study has focused specifically on left and right turns, we hypothesize that the responsiveness bias ensemble may also be involved in forward movements and look forward to future work exploring the relationship between whole-brain dynamics and the full range of motor outputs.
  
  - Behavioral variability: Related to Figure 2, within- and across-subject variability are confounded. Please disambiguate. It may also be informative on a per-fish basis to examine associations between reaction time and body movement.
  
  The reviewer is correct that our previously reported summary statistics in Figure 2D-F were aggregated across trials from multiple larvae. Following the reviewer’s suggestion to make the magnitudes of across-larvae and within-larva variability clear, in our revised manuscript we have added two additional figure panels to Figure S2.
  
  New Figure S2A highlights the across-larvae variability in mean head-directed behavioral responses to stimuli of various sizes. Overall, the relationship between stimulus size and the mean tail curvature across trials is largely consistent across larvae; however, the crossing-over point between leftward (positive curvature) and rightward (negative curvature) turns for a given side of the visual field exhibits some variability across larvae.
  
  New Figure S2B shows examples of within-larva variability by plotting the mean tail curvature during single trials for two example larvae. Consistent with Figure 2G which also demonstrates within-larva variability, responses to a given stimulus are variable across trials in both examples. However, this degree of within-larva variability can appear different across larvae. For example, the larva shown on the left of Figure S2B exhibits greater overlap between responses to stimuli presented on opposite visual fields, whereas the larva shown on the right exhibits greater distinction between responses.
  
  - Data presentation clarity: All figure panels need scale bars - for example, in Figure 3A there is no indication of timescale (or time of stimulus presentation). Figure 3I should also show the time series of the w_opt projection.
  
  We appreciate the reviewer’s attention to detail in this regard. We have added scalebars to Figures 3A, 3H-I, S4B(ii), 4H, 4J in the revised manuscript, and all new figure panels where relevant. In addition, the caption of Figure 3A has been updated to include a description of the time period plotted relative to the onset of the visual stimulus.
  
  Additionally, we appreciate the reviewer’s idea to show w<sub>opt</sub> in Figure 3J of the revised manuscript (previously Figure 3I). This clearly shows that the visual decoding project is inactive during the short baseline period before visual stimulation begins, whereas the noise mode is correlated with motor output throughout the recording.
  
  - Pixel locations: Given the poor quality of the brain images, it is difficult to tell the location of highlighted pixels relative to brain anatomy. In addition, given that the midbrain consists of much more than the tectum, it is not appropriate to put all highlighted pixels from the midbrain under the category of tectum. To aid in data interpretation and better connect this work with the literature, it is recommended that the authors register their data sets to standard brain atlases and determine if there is any clustering of relevant pixels in regions previously associated with prey-capture or predator-avoidance behavior.
  
  We agree with the reviewer that registration of our datasets to a standard brain atlas is a highly useful addition. While the dense, pan-neuronal labeling makes the isolation of highly specific circuit components difficult, we have shown in more detail the specific brain regions contributing to these populations by aligning our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figures S1E, S3F-G, 4I, 4K, and S5F-G. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in new Figures 4H and 4J. We also found that the distribution of neurons in our huc:H2B-GCaMP6s recordings is very similar to the distribution of labeling in the huc:H2B-RFP reference image from the Z-Brain atlas (new Figure S1E), which further supports our whole-brain imaging results.
  
  Overall, we find that this more detailed quantification and visualization is consistent with the interpretations in the previous version of our manuscript. In particular, we show that optimal visual decoding population (w<sub>opt</sub>) and largest noise mode (e1) are localized to the midbrain (new Figures S3F-G), which is expected since in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide additional evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).
  
  Finally, the reviewer is correct that our original label of “tectum” was a misnomer; the region analyzed corresponded to the midbrain, including the tegmentum, torus longitudinalis, and torus semicicularis in addition to the tectum. We have updated the brain regions shown and labels throughout the manuscript.
  
  Interpretation
  
  - W_opt and e_1 orthogonality: The statement that these two vectors, determined from analysis of the fluorescence data, are orthogonal, actually brings into question the idea that true signal and leading noise vectors in firing-rate state-space are orthogonal. First, the current analysis is confounding signals across different time periods - one could assume linearity all the way through the transformations, but this would only work if earlier sources of activation were being accounted for. Second, the transformation between firing rate and fluorescence is most likely not linear for GCaMP6s in most of the cells recorded. Thus, one would expect a change in the relationship between these vectors as one maps from fluorescence to firing rate.
  
  Unfortunately, we are not entirely sure we have understood the reviewer’s argument. We are assuming that the reviewer’s first sentence is suggesting that the observation of orthogonality in the neural state space measured in calcium imaging precludes the possibility (“actually brings into question”, as the reviewer states) that the same neural ensembles could be orthogonal in firing rate state space measured by electrophysiological data. If this is the reviewer’s conjecture, we respectfully disagree with it. Consider a toy example of a neural network containing N ensembles of neurons, where the neurons within an ensemble all fire simultaneously, and two populations never fire at the same time. As long as the “switching” of firing between ensembles is not fast relative to the resolution of the GCaMP kernel, the largest principal components would represent orthogonal dimensions differentiating the various ensembles, both when observing firing rates or observing timeseries convolved by the GCaMP kernel. This is a simple example where the observed orthogonality would appear similar in both calcium imaging and electrophysical data, demonstrating that we should not allow conclusions from fluorescence data to “bring into question” that the same result could be observed in firing rate data.
  
  We also disagree with the reviewer’s argument that we are “confounding signals across time periods”. Indeed, we must interpret the data in light of the GCaMP response kernel. However, all of the analyses presented here are performed on instantaneous measurements of population activity patterns. These activity patterns do represent a smoothed, likely nonlinear integration of recent neuronal activity, but unless the variability in the GCaMP response kernel (discussed above) is widely different across these populations (which has not been observed in the literature), we do not expect that the GCaMP transformations would artificially induce orthogonality in our analysis approach. Such smoothing operations tend to instead increase correlations across neurons and population decoding approaches generally benefit from this smoothness, as we have argued above. However, a much more problematic situation would be if we were comparing the activity of two neuronal populations at different points in time (which we do not include in this study), in which case the nonlinearities could overaccentuate orthogonality between non-time-matched activity patterns.
  
  Finally, we agree with the reviewer that the transformation between firing rate and fluorescence is very likely nonlinear and that these vectors of population activity do not perfectly represent what would be observed if one had access to whole-brain, cellular-resolution electrophysiology spike data. However, similar observations regarding the brain-wide, distributed encoding of behavior have been confirmed across recording modalities in the mouse (Stringer et al., 2019; Steinmetz et al., 2019), where large-scale electrophysiology utilizing highly invasive probes (e.g., Neuropixels) is more feasible than in the larval zebrafish. With the advent of whole-brain voltage imaging in the larval zebrafish, we expect any differences between calcium and voltage dynamics will be better understood, yet such techniques will likely continue to suffer to some extent from the nonlinearities described here.
  
  - Sources of variability: The authors do not take into account a fairly obvious source of variability in trial-to-trial response - eye position. We know that prey capture responsiveness is dependent on eye position during stimulus (see Figure 4 of PMID: 22203793). We also expect that neurons fairly early in the visual pathway with relatively narrow receptive fields will show variable responses to visual stimuli as the degree of overlap with the receptive field varies with eye movement. There can also be small eye-tracking movements ahead of the decision to engage in prey capture (Figure 1D, PMID: 31591961) that can serve as a drive to initiate movements in a particular direction. Given these possibilities indicating that the behavioral measure of interest is gaze, and the fact that eye movements were apparently monitored, it is surprising that the authors did not include eye movements in the analysis and interpretation of their data.
  
  We agree with the reviewer that eye movements, such as saccades and convergence, are important motor outputs that are well-known to play a role in the sequence of motor actions during prey capture and other behaviors. Therefore, we have added the following new eye tracking results to our revised manuscript:
  
  “In order to confirm that the observed neural variability in the visually-evoked populations was not predominantly due to eye movements, such as saccades or convergence, we tracked the angle of each eye. We utilized DeepLabCut, a deep learning tool for animal pose estimation (Mathis et al., 2018), to track keypoints on the eye which are visible in the raw fLFM images, including the retina and pigmentation (Figure S3D(i)). This approach enabled identification of various eye movements, such as convergence and the optokinetic reflex (Figure S3D(ii-iii)). Next, we extracted a number of various eye states, including those based on position (more leftward vs. rightward angles) and speed (high angular velocity vs. low or no motion). Figure S3E(i) provides example stimulus response profiles across trials of the same visual stimulus in each of these eye states, similar to a single column of traces in Figure 3A broken out into more detail. These data demonstrate that the magnitude and temporal dynamics of the stimulus-evoked responses show apparently similar levels of variability across eye states. If neural variability was driven by eye movement during the stimulus presentation, for example, one would expect to see much more variability during the high angular velocity trials than low, which is not apparent. Next, we asked whether the dominant neural noise modes vary across eye states, which would suggest that the geometry of neuronal variability is influenced by eye movements or states. To do so, the dominant noise modes were estimated in each of the individual eye conditions, as well as bootstrapped trials from across all eye conditions. The similarity of these noise modes estimated from different eye conditions (Figure S3E(ii), right)) was not significantly different from the similarity of noise modes estimated from bootstrapped random samples across all eye conditions (Figure S3E(ii), left)). Therefore, while movements of the eye likely contribute to aspects of the observed neural variability, they do not dominate the observed neural variability here, particularly given our observation that the largest noise mode represents a considerable fraction of the observed neural variance (Figure 3E).”
  
  While these results provide an important control in our study, we anticipate further study of the relationship between eye movements or states, visually-evoked neural activity, and neural noise modes would identify the additional neural ensembles which are correlated with and drive this additional motor output.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this study, Manley and Vaziri designed and built a Fourier light-field microscope (fLFM) inspired by previous implementations but improved and exclusively from commercially available components so others can more easily reproduce the design. They combined this with the design of novel algorithms to efficiently extract whole-brain activity from larval zebrafish brains.
  
  This new microscope was applied to the question of the origin of behavioral variability. In an assay in which larval zebrafish are exposed to visual dots of various sizes, the fish respond by turning left or right or not responding at all. Neural activity was decomposed into an activity that encodes the stimulus reliably across trials, a 'noise' mode that varies across trials, and a mode that predicts tail movements. A series of analyses showed that trial-to-trial variability was largely orthogonal to activity patterns that encoded the stimulus and that these noise modes were related to the larvae's behavior.
  
  To identify the origins of behavioral variability, classifiers were fit to the neural data to predict whether the larvae turned left or right or did not respond. A set of neurons that were highly distributed across the brain could be used to classify and predict behavior. These neurons could also predict spontaneous behavior that was not induced by stimuli above chance levels. The work concludes with findings on the distributed nature of single-trial decision-making and behavioral variability.
  
  Strengths:
  
  The design of the new fLFM microscope is a significant advance in light-field and computational microscopy, and the open-source design and software are promising to bring this technology into the hands of many neuroscientists.
  
  The study addresses a series of important questions in systems neuroscience related to sensory coding, trial-to-trial variability in sensory responses, and trial-to-trial variability in behavior. The study combines microscopy, behavior, dynamics, and analysis and produces a well-integrated analysis of brain dynamics for visual processing and behavior. The analyses are generally thoughtful and of high quality. This study also produces many follow-up questions and opportunities, such as using the methods to look at individual brain regions more carefully, applying multiple stimuli, investigating finer tail movements and how these are encoded in the brain, and the connectivity that gives rise to the observed activity. Answering questions about variability in neural activity in the entire brain and its relationship to behavior is important to neuroscience and this study has done that to an interesting and rigorous degree.
  
  Points of improvement and weaknesses:
  
  The results on noise modes may be a bit less surprising than they are portrayed. The orthogonality between neural activity patterns encoding the sensory stimulus and the noise modes should be interpreted within the confounds of orthogonality in high-dimensional spaces. In higher dimensional spaces, it becomes more likely that two random vectors are almost orthogonal. Since the neural activity measurements performed in this study are quite high dimensional, a more explicit discussion is warranted about the small chance that the modes are not almost orthogonal.
  
  We agree with the reviewer that orthogonality is less “surprising” in high-dimensional spaces, and we have added this important point of interpretation to our revised manuscript. Still, it is important to remember that while the full neural state space is very high-dimensional (we record that activity of up to tens of thousands of neurons simultaneously), our analyses regarding the relationship between the trial-to-trial noise modes and decoding dimensions were performed in a low-dimensional subspace (up to 20 dimensions) identified by PLS regression to that optimally preserved visual information. This is a key step in our analysis which serves two purposes: 1. it removes some of the confound described the reviewer regarding the dimensionality of the neural state space analyzed; and 2. it ensures that the noise modes we analyze are even relevant to sensorimotor processing. It would certainly not be surprising or interesting if we identified a neural dimension outside the midbrain which was orthogonal to the optimal visual decoding dimension.
  
  Regardless, in order to better control for this confound, we estimated the distribution of angles between random vectors in this subspace. As we describe in the revised manuscript:
  
  “However, in high-dimensional spaces, it becomes increasingly common that two random vectors could appear orthogonal. While this is particularly a concern when analyzing a neural state space spanned by tens of thousands of neurons, our application of PLS regression to identify a low-dimensional subspace of relevant neuronal activity partially mitigates this concern. In order to control for this confound, we compared the angles between w<sub>opt</sub> and e1 across larvae to that computed with shuffled versions of w<sub>opt,shuff</sub> estimated by randomly shuffling the stimulus labels before identifying the optimal decoding direction. While it is possible to observe shuffled vectors which are nearly orthogonal to e<sub>1</sub>, the shuffled distribution spans a significantly greater range of angles than the observed data, demonstrating that this orthogonality is not simply a consequence of analyzing multi-dimensional activity patterns.”
  
  The conclusion that sparsely distributed sets of neurons produce behavioral variability needs more investigation because the way the results are shown could lead to some misinterpretations. The prediction of behavior from classifiers applied to neural activity is interesting, but the results are insufficiently presented for two reasons.
  
  (1) The neurons that contribute to the classifiers (Figures 4H and J) form a sufficient set of neurons that predict behavior, but this does not mean that neurons outside of that set cannot be used to predict behavior. Lasso regularization was used to create the classifiers and this induces sparsity. This means that if many neurons predict behavior but they do so similarly, the classifier may select only a few of them. This is not a problem in itself but it means that the distributions of neurons across the brain (Figures 4H and J) may appear sparser and more distributed than the full set of neurons that contribute to producing the behavior. This ought to be discussed better to avoid misinterpretation of the brain distribution results, and an alternative analysis that avoids the confound could help clarify.
  
  We thank the reviewer for raising this point, which we agree should be discussed in the manuscript. Lasso regularization was a key ingredient in our analysis; l2 regularization alone was not sufficient to prevent overfitting to the training trials, particularly when decoding turn direction and responsiveness. Previous studies have also found that sparse subsets of neurons better predict behavior than single neuron or non-sparse populations, for example Scholz et al. (2018).
  
  While showing l2 regularization would not be a fair comparison given the poor performance of the l2-regularized classifiers, we opted to identify a potentially “fuller” set of neurons correlated with these biases based on the correlation between each neuron’s activity over the recording and the projection along the turn direction or responsiveness dimension identified using l1 regularization. This procedure has the potential to identify all neurons correlated with the final ensemble dynamics, rather than just a “sufficient set” for lasso regression. In new Figures S5F-G, we show the 3D distribution of all neurons significantly correlated with these biases, which appear similar to those in Figures 4H-K and widely distributed across practically the entire labeled area of the brain.
  
  (2) The distribution of neurons is shown in an overly coarse manner in only a flattened brain seen from the top, and the brain is divided into four coarse regions (telencephalon, tectum, cerebellum, hindbrain). This makes it difficult to assess where the neurons are and whether those four coarse divisions are representative or whether the neurons are in other non-labeled deeper regions. For these two reasons, some of the statements about the distribution of neurons across the brain would benefit from a more thorough investigation.
  
  We agree with the reviewer that a more thorough description and visualization of these distributed populations is warranted.
  
  While the dense, pan-neuronal labeling makes the isolation of highly specific circuit components difficult, we have shown in more detail the specific brain regions contributing to these populations by aligning our recordings to the Z-Brain atlas (Randlett et al., 2015) as shown in new Figures S1E, S3F-G, 4I, 4K, and S5F-G. In addition, we provided a more detailed parcellation of the neuronal ensembles by providing projections of the full 3D volume along the xy and yz axes, in addition to the unregistered xy projection shown in new Figures 4H and 4J. We also found that the distribution of neurons in our huc:H2B-GCaMP6s recordings is very similar to the distribution of labeling in the huc:H2B-RFP reference image from the Z-Brain atlas (new Figure S1E), which further supports our whole-brain imaging results.
  
  Overall, we find that this more detailed quantification and visualization is consistent with the interpretations in the previous version of our manuscript. In particular, we show that optimal visual decoding population (w<sub>opt</sub>) and largest noise mode (e1) are localized to the midbrain (new Figures S3F-G), which is expected since in Figure 3 we first extracted a low-dimensional subspace of whole-brain neural activity that optimally preserved visual information. Additionally, we provide additional evidence that the populations correlated with the turn bias and responsiveness bias are distributed throughout the brain, including a relatively dense localization to the cerebellum, telencephalon, and dorsal diencephalon (habenula, new Figures 4H-K and S5F-G).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  In addition to the overall strengths and weaknesses above, I have a few specific comments that I think could improve the study:
  
  (1) In lines 334-335 you write that 'We proceeded to build various logistic regression classifiers to decode'. Do you mean you tested this with other classifier types as well (e.g. SVM, Naive Bayes) or do you mean various because you trained the classifier described in the methods on each animal? This is not clear. If it is the first, more information is needed about what other classifiers you used.
  
  We appreciate the reviewer raising this point of clarification. Here, we simply meant that we fit the multiclass logistic regression classifier in the one-vs-rest scheme. In this sense, a single multiclass logistic regression classifier was fit for each larva. We have updated our revised manuscript with this clarification: “The visual stimuli were decoded using a one-versus-rest, multiclass logistic regression classifier with lasso regularization.”
  
  (2) In Figure 3 you train the decoder on all visually responsive cells identified across the brain. Does this reliability of stimulus decoding also hold for neurons sampled from specific brain regions? For example, does this reliable decoding come from stronger and more reliable responses in the optic tectum, whereas stimulus decodability is not as good in visual encoding neurons identified in other structures?
  
  In new Figure S5B, we show the performance of stimulus decoding from various brain regions. We find that stimulus classification is possible from the midbrain and cerebellum, very poor from the hindbrain, and not possible from the telencephalon during the period between stimulus onset and the decision.
  
  (3) In relation to point 2, it would be good to show in which brain areas the visually responsive neurons are located, and maybe the average coefficients per brain area. Plots like Figures 3G, and H would benefit from a quantification into areas. Similarly, a parcellation into more specific brain areas in Figure 4 would also be valuable.
  
  In addition to providing a more detailed parcellation of the turn direction and responsiveness bias populations in Figure 4, we have provided a similar visualization and quantification of the optimal stimulus decoding population and the dominant noise mode in new Figures S3F-G, respectively.
  
  (4) In Figure 3f, it is not clear to me how this shows that w<sub>opt</sub> and e1 are orthogonal. They appear correlated.
  
  The orthogonality we quantify is related to the pattern of coefficients across neurons, not necessarily the timeseries of their projections. The slight shift in the noise mode activations as you move from stimuli on the left visual field to the right actually comes from the motor outputs. Large left stimuli tend to evoke a rightward turn and vice versa, and the example noise mode shown encodes the directionality and vigor of tail movements, resulting in the slight shifts observed.
  
  (5) I think the wording of this conclusion is too strong for the results and a bit illogical:
  
  'Thus, our data suggest that the neural dynamics underlying single-trial action selection are the result of a widely-distributed circuit that contains subpopulations encoding internal time-varying biases related to both the larva's responsiveness and turn direction, yet distinct from the sensory encoding circuitry.'
  
  If that is the case, how is it even possible that the larvae can do a visually guided behaviour?
  
  Especially given Suppl Fig 4C it would be more appropriate to say something along the lines of: 'When stimuli are highly ambiguous, single trial action selection is dominated by widely-distributed circuit that contains subpopulations encoding internal time-varying biases related to both the larva's responsiveness and turn direction, that encode choice distinctly from the sensory encoding circuitry'.
  
  We appreciate the reviewer’s suggestion and have re-worded this line in the discussion in order to clarify that these time-varying biases are predominant in the case of ambiguous stimuli, as shown in Figure S5C in our revised manuscript (corresponding to Figure S4C in our original submission).
  
  (6) Line 599: typo: trial-to-trail
  
  We thank the reviewer for noting this error, which has been corrected in the revised text of the manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.03.583208v2
www.biorxiv.org www.biorxiv.org

Omissions of Threat Trigger Subjective Relief and Prediction Error-Like Signaling in the Human Reward and Salience Systems

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  As you will see, the main changes in the revised manuscript pertain to the structure and content of the introduction. Specifically, we have tried to more clearly introduce our paradigm, the rationale behind the paradigm, why it is different from learning paradigms, and why we study “relief”.
  
  In this rebuttal letter, we will go over the reviewers’ comments one-by-one and highlight how we have adapted our manuscript accordingly. However, because one concern was raised by all reviewers, we will start with an in-depth discussion of this concern.
  
  The shared concern pertained to the validity of the EVA task as a model to study threat omission responses. Specifically, all reviewers questioned the effectivity of our so-called “inaccurate”, “false” or “ruse” instructions in triggering an equivalent level of shock expectancy, and relatedly, how this effectivity was affected by dynamic learning over the course of the task.
  
  We want to thank the reviewers for raising this important issue. Indeed, it is a vital part of our design and it therefore deserves considerable attention. It is now clear to us that in the previous version of the manuscript we may have focused too little on why we moved away from a learning paradigm, and how we made sure that the instructions were successful at raising the necessary expectations; and how the instructions were affected by learning. We believe this has resulted in some misunderstandings, which consequently may have cast doubts on our results. In the following sections, we will go into these issues.
  
  The rationale behind our instructed design
  
  The main aim of our study was to investigate brain responses to unexpected omissions of threat in greater detail by examining their similarity to the reward prediction error axioms (Caplin & Dean, 2008), and exploring the link with subjective relief. Specifically, we hypothesized that omission-related responses should be dependent on the probability and the intensity of the expected-but-omitted aversive event (i.e., electrical stimulation), meaning that the response should be larger when the expected stimulation was stronger and more expected, and that fully predicted outcomes should not trigger a difference in responding.
  
  To this end, we required that participants had varying levels of threat probability and intensity predictions, and that these predictions would most of the time be violated. Although we fully agree with the reviewers that fear conditioning and extinction paradigms can provide an excellent way to track the teaching properties of prediction error responses (i.e., how they are used to update expectancies on future trials), we argued that they are less suited to create the varying probability and intensity-related conditions we required (see Willems & Vervliet, 2021). Specifically, in a standard conditioning task participants generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intraindividual variability in the prediction error responses. This precludes an in-depth analysis of the probability-related effects. Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, intensity-related effects cannot be tested. Finally, because CS-US contingencies change over the course of a fear conditioning and extinction study (e.g. from acquisition to extinction), there is never complete certainty about when the US will (not) follow. This precludes a direct comparison of fully predicted outcomes.
  
  Another added value of studying responses to the prediction error at threat omission outside a learning context is that it can offer a way to disentangle responses to the violation of threat expectancy, with those of subsequent expectancy updating.
  
  Also note that Rutledge and colleagues (2010), who were the first to show that human fMRI responses in the Nucleus Accumbens comply to the reward prediction error axioms also did not use learning experiences to induce expectancy. In that sense, we argued it was not necessary to adopt a learning paradigm to study threat omission responses.
  
  Adaptations in the revised manuscript: We included two new paragraphs in the introduction of the revised manuscript to elaborate on why we opted not to use a learning paradigm in the present study (lines 90-112).
  
  “However, is a correlation with the theoretical PE over time sufficient for neural activations/relief to be classified as a PE-signal? In the context of reward, Caplin and colleagues proposed three necessary and sufficient criteria all PE-signals should comply to, independent of the exact operationalizations of expectancy and reward (the socalled axiomatic approach24,25; which has also been applied to aversive PE26–28). Specifically, the magnitude of a PE signal should: (1) be positively related to the magnitude of the reward (larger rewards trigger larger PEs); (2) be negatively related to likelihood of the reward (more probable rewards trigger smaller PEs); and (3) not differentiate between fully predicted outcomes of different magnitudes (if there is no error in prediction, there should be no difference in the PE signal).”
  
  “It is evident that fear conditioning and extinction paradigms have been invaluable for studying the role of the threat omission PE within a learning context. However, these paradigms are not tailored to create the varying intensity and probability-related conditions that are required to evaluate the threat omission PE in the light of the PE axioms. First, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested. Second, in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses. Moreover, because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction16, which further reduces the necessary variability to properly evaluate the probability axiom. Third, because CS-US contingencies change over the course of the task (e.g. from acquisition to extinction), there is never complete certainty about whether the US will (not) follow. This precludes a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether PErelated responses are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”
  
  Can verbal instructions be used to raise the expectancy of shock?
  
  The most straightforward way to obtain sufficient variability in both probability and intensityrelated predictions is by directly providing participants with instructions on the probability and intensity of the electrical stimulation. In a previous behavioral study, we have shown that omission responses (self-reported relief and omission SCR) indeed varied with these instructions (Willems & Vervliet, 2021). In addition, the manipulation checks that are reported in the supplemental material provided further support that the verbal instructions were effective at raising the associated expectancy of stimulation. Specifically, participants recollected having received more stimulations after higher probability instructions (see Supplemental Figure 2). Furthermore, we found that anticipatory SCR, which we used as a proxy of fearful expectation, increased with increasing probability and intensity (see Supplemental Figure 3). This suggests that it is not necessary to have expectation based on previous experience if we want to evaluate threat omission responses in the light of the prediction error axioms.
  
  Adaptations in the revised manuscript: We more clearly referred to the manipulation checks that are presented in the supplementary material in the results section of the main paper (lines 135-141).
  
  “The verbal instructions were effective at raising the expectation of receiving the electrical stimulation in line with the provided probability and intensity levels. Anticipatory SCR, which we used as a proxy of fearful expectation, increased as a function of the probability and intensity instructions (see Supplementary Figure 3). Accordingly, post-experimental questions revealed that by the end of the experiment participants recollected having received more stimulations after higher probability instructions, and were willing to exert more effort to prevent stronger hypothetical stimulations (see Supplementary Figure 2).”
  
  How did the inconsistency between the instructed and experienced probability impact our results?
  
  All reviewers questioned how the inconsistency between the instructed and experienced probability might have impacted the probability-related results. However, judging from the way the comments were framed, it seems that part of the concern was based on a misunderstanding of the design we employed. Specifically, reviewer 1 mentions that “To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; I.e., 25% of shocks are omitted regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, 0%.”, and reviewer 3 states that “... the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.” We want to emphasize that this was not what we did, and if it were true, we fully agree with the reviewers that it would have caused serious trust- and learning related issues, given that it would be immediately evident to participants that probability instructions were false. It is clear that under such circumstances, dynamic learning would be a big issue.
  
  However, in our task 0% and 100% instructions were always accurate. This means that participants never received a stimulus following 0% instructions and always received the stimulation of the given intensity on the 100% instructions (see Supplemental Figure 1 for an overview of the trial types). Only for the 25%, 50% and 75% trials an equal reinforcement rate (25%) was maintained, meaning that the stimulation followed in 25% of the trials, irrespective of whether a 25%, 50% or 75% instruction was given. The reason for this was that we wanted to maximize and balance the number of omission trials across the different probability levels, while also keeping the total number of presentations per probability instruction constant. We reasoned that equating the reinforcement rate across the 25%, 50% and 75% instructions should not be detrimental, because (1) in these trials there was always the possibility that a stimulation would follow; and (2) we instructed the participants that each trial is independent of the previous ones, which should have discouraged them to actively count the number of shocks in order to predict future shocks.
  
  Adaptations in the revised manuscript: We have tried to further clarify the design in several sections of the manuscript, including the introduction (lines 121-125), results (line 220) and methods (lines 478-484) sections:
  
  Adaptation in the Introduction section: “Specifically, participants received trial-by-trial instructions about the probability (0%, 25%, 50%, 75% and 100%) and intensity (weak, moderate, strong) of a potentially painful upcoming electrical stimulation, time-locked by a countdown clock (see Fig.1A). While stimulations were always delivered on 100% trials and never on 0% trials, most of the other trials (25%-75%) did not contain the expected stimulation and hence provoked an omission PE.”
  
  Adaptation in the Results section: “Indeed, the provided instructions did not map exactly onto the actually experienced probabilities, but were all followed by stimulation in 25% on the trials (except for the 0% trials and the 100% trials).”
  
  Adaptation in the Methods section: “Since we were mainly interested in how omissions of threat are processed, we wanted to maximize and balance the number of omission trials across the different probability and intensity levels, while also keeping the total number of presentations per probability and intensity instruction constant. Therefore, we crossed all non-0% probability levels (25, 50, 75, 100) with all intensity levels (weak, moderate, strong) (12 trials). The three 100% trials were always followed by the stimulation of the instructed intensity, while stimulations were omitted in the remaining nine trials. Six additional trials were intermixed in each run: Three 0% omission trials with the information that no electrical stimulation would follow (akin to 0% Probability information, but without any Intensity information as it does not apply); and three trials from the Probability x Intensity matrix that were followed by electrical stimulation (across the four runs, each Probability x Intensity combination was paired at least once, and at most twice with the electrical stimulation).”
  
  Could the incongruence between the instructed and experienced reinforcement rate have detrimental effects on the probability effect? We agree with reviewer 2 that it is possible that the inconsistency between instructed and experienced reinforcement rates could have rendered the exact probability information less informative to participants, which might have resulted in them paying less attention to the probability information whenever the probability was not 0% or 100%. This might to some extent explain the relatively larger difference in responding between 0% and 25% to 75% trials, but the relatively smaller differences between the 25% to 75% trials.
  
  However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but is inherent to “uncertain” probabilities.
  
  We added a description of these reasons to the supplementary materials in a supplementary note (supplementary note 4; lines 97-129 in supplementary materials), and added a reference to this note in the methods section (lines 488-490).
  
  “Supplementary Note 4: “Accurate” probability instructions do not alter the Probability-effect
  
  A question that was raised by the reviewers was whether the inconsistency between the probability instruction and the experienced reinforcement rate could have detrimental effects on the Probability-related results; especially because the effect of Probability was smaller when only including non-0% trials.
  
  However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but that they are inherent to “uncertain” probabilities.
  
  First, in a previously unpublished pilot study, we provided participants with “accurate” probability instructions, meaning that the instruction corresponded to the actual reinforcement rate (e.g., 75% instructions were followed by a stimulation in 75% of the trials etc.). In line with the present results and our previous behavioral study (Willems & Vervliet, 2021), the results of this pilot (N = 20) showed that the difference in the reported relief between the different probability levels was largest when comparing 0% and the rest (25%, 50% and 75%). Furthermore the overall effect size of Probability (excluding 0%) matched the one of our previous behavioral study (Willems & Vervliet, 2021): ηp2 = +/- 0.50.”
  
  Author response image 1.
  
  Main effect of Probability including 0% : F(1.74,31.23) = 53.94, p < .001, ηp2 = 0.75. Main effect of Probability excluding 0%: F(1.50, 28.43) = 21.03, p < .001, ηp2 = 0.53.
  
  Second, also in other published studies that used CSs with varying reinforcement rates (which either included explicit written instructions of the reinforcement rates or not) showed that the difference in expectations, anticipatory SCR or omission SCR was largest when comparing the CS0% to the other CSs of varying reinforcement rates (Grings & Sukoneck, 1971; Öhman et al., 1973; Ojala et al., 2022).
  
  Together, this suggests that when there is a possibility of stimulation, any additional difference in probability will have a smaller effect on the omission responses, irrespective of whether the underlying reinforcement rate is accurate or not.
  
  Adaptation to methods section: “Note that, based on previous research, we did not expect the inconsistency between the instructed and perceived reinforcement rate to have a negative effect on the Probability manipulation (see Supplementary Note 4).”
  
  Did dynamic learning impact the believability of the instructions?
  
  Although we tried to minimize learning in our paradigm by providing instructions that trials are independent from one another, we agree with the reviewers that this cannot preclude all learning. Any remaining learning effects should present themselves by downweighing the effect of the probability instructions over time. We controlled for this time-effect by including a “run” regressor in our analyses. Results of the Run regressor for subjective relief and omission-related SCR are presented in Supplemental Figure 5. These figures show that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This indicates that even though some learning might have taken place, the main manipulations of probability and intensity were still present until the end of the task.
  
  Adaptations in the revised manuscript: We more clearly referred to the results of the Blockregressor which were presented in the supplementary material in the results section of the main paper (lines 159-162).
  
  Note that while there was a general drop in reported relief pleasantness and omission SCR over time, the effects of Probability and Intensity remained present until the last run (see Supplementary Figure 5). This further confirms that probability and intensity manipulations were effective until the end of the task.
  
  In the following sections of the rebuttal letter, we will go over the rest of the comments and our responses one by one.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Willems and colleagues test whether unexpected shock omissions are associated with reward-related prediction errors by using an axiomatic approach to investigate brain activation in response to unexpected shock omission. Using an elegant design that parametrically varies shock expectancy through verbal instructions, they see a variety of responses in reward-related networks, only some of which adhere to the axioms necessary for prediction error. In addition, there were associations between omission-related responses and subjective relief. They also use machine learning to predict relief-related pleasantness, and find that none of the a priori "reward" regions were predictive of relief, which is an interesting finding that can be validated and pursued in future work.
  
  Strengths:
  
  The authors pre-registered their approach and the analyses are sound. In particular, the axiomatic approach tests whether a given region can truly be called a reward prediction error. Although several a priori regions of interest satisfied a subset of axioms, no ROI satisfied all three axioms, and the authors were candid about this. A second strength was their use of machine learning to identify a relief-related classifier. Interestingly, none of the ROIs that have been traditionally implicated in reward prediction error reliably predicted relief, which opens important questions for future research.
  
  Weaknesses:
  
  To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; i.e. 25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%. Given previous findings on interactions between verbal instruction and experiential learning (Doll et al., 2009; Li et al., 2011; Atlas et al., 2016), it seems problematic a) to treat the instructions as veridical and b) average responses over time. Based on this prior work, it seems reasonable to assume that participants would learn to downweight the instructions over time through learning (particularly in the 100% and 0% cases); this would be the purpose of prediction errors as a teaching signal. The authors do recognize this and perform a subset analysis in the 21 participants who showed parametric increases in anticipatory SCR as a function of instructed shock probability, which strengthened findings in the VTA/SN; however given that one-third of participants (n=10) did not show parametric SCR in response to instructions, it seems like some learning did occur. As prediction error is so important to such learning, a weakness of the paper is that conclusions about prediction error might differ if dynamic learning were taken into account.
  
  We thank the reviewer for raising this important concern. We believe we replied to all the issues raised in the general reply above.
  
  Lastly, I think that findings in threat-sensitive regions such as the anterior insula and amygdala may not be adequately captured in the title or abstract which strictly refers to the "human reward system"; more nuance would also be warranted.
  
  We fully agree with this comment and have changed the title and abstract accordingly.
  
  Adaptations in the revised manuscript: We adapted the title of the manuscript.
  
  “Omissions of Threat Trigger Subjective Relief and Prediction Error-Like Signaling in the Human Reward and Salience Systems”
  
  Adaptations in the revised manuscript: We adapted the abstract (lines 27-29).
  
  “In line with recent animal data, we showed that the unexpected omission of (painful) electrical stimulation triggers activations within key regions of the reward and salience pathways and that these activations correlate with the pleasantness of the reported relief.”
  
  Reviewer #2 (Public Review):
  
  The question of whether the neural mechanisms for reward and punishment learning are similar has been a constant debate over the last two decades. Numerous studies have shown that the midbrain dopamine neurons respond to both negative and salient stimuli, some of which can't be well accounted for by the classic RL theory (Delgado et al., 2007). Other research even proposed that aversive learning can be viewed as reward learning, by treating the omission of aversive stimuli as a negative PE (Seymour et al., 2004).
  
  Although the current study took an axiomatic approach to search for the PE encoding brain regions, which I like, I have major concerns regarding their experimental design and hence the results they obtained. My biggest concern comes from the false description of their task to the participants. To increase the number of "valid" trials for data analysis, the instructed and actual probabilities were different. Under such a circumstance, testing axiom 2 seems completely artificial. How does the experimenter know that the participants truly believe that the 75% is more probable than, say, the 25% stimulation? The potential confusion of the subjects may explain why the SCR and relief report were rather flat across the instructed probability range, and some of the canonical PE encoding regions showed a rather mixed activity pattern across different probabilities. Also for the post-hoc selection criteria, why pick the larger SCR in the 75% compared to the 25% instructions? How would the results change if other criteria were used?
  
  We thank the reviewer for raising this important concern. We believe the general reply above covers most of the issues raised in this comment. Concerning the post-hoc selection criteria, we took 25% < 75% as criterium because this was a quite “lenient” criterium in the sense that it looked only at the effects of interest (i.e., did anticipatory SCR increase with increasing instructed probability?). However, also when the criterium was more strict (e.g., selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants), the probability effect (ωp2 = 0.08), but not the intensity effect, for the VTA/SN remained.
  
  To test axiom 3, which was to compare the 100% stimulation to the 0% stimulation conditions, how did the actual shock delivery affect the fMRI contrast result? It would be more reasonable if this analysis could control for the shock delivery, which itself could contaminate the fMRI signal, with extra confound that subjects may engage certain behavioral strategies to "prepare for" the aversive outcome in the 100% stimulation condition. Therefore, I agree with the authors that this contrast may not be a good way to test axiom 3, not only because of the arguments made in the discussion but also the technical complexities involved in the contrast.
  
  We thank the reviewer for addressing this additional confound. It was indeed impossible to control for the delivery of shock since the delivery of the shock was always present on the 100% trials (and thus completely overlapped with the contrast of interest). We added this limitation to our discussion in the manuscript. In addition, we have also added a suggestion for a contrast that can test the “no surprise equivalence” criterium.
  
  Adaptations in the revised manuscript: We adapted lines 358-364.
  
  “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”
  
  Reviewer #3 (Public Review):
  
  We thank the reviewer for their comments. Overall, based on the reviewer’s comments, we noticed that there was an imbalance between a focus on “relief” in the introduction and the rest of the manuscript and preregistration. We believe this focus raised the expectation that all outcome measures were interpreted in terms of the relief emotion. However, this was not what we did nor what we preregistered. We therefore restructured the introduction to reduce the focus on relief.
  
  Adaptations in the revised manuscript: We restructured the introduction of the manuscript. Specifically, after our opening sentence: “We experience a pleasurable relief when an expected threat stays away1” we only introduce the role of relief for our research in lines 79-89.
  
  “Interestingly, unexpected omissions of threat not only trigger neural activations that resemble a reward PE, they are also accompanied by a pleasurable emotional experience: relief. Because these feelings of relief coincide with the PE at threat omission, relief has been proposed to be an emotional correlate of the threat omission PE. Indeed, emerging evidence has shown that subjective experiences of relief follow the same time-course as theoretical PE during fear extinction. Participants in fear extinction experiments report high levels of relief pleasantness during early US omissions (when the omission was unexpected and the theoretical PE was high) and decreasing relief pleasantness over later omissions (when the omission was expected and the theoretical PE was low)22,23. Accordingly, preliminary fMRI evidence has shown that the pleasantness of this relief is correlated to activations in the NAC at the time of threat omission. In that sense, studying relief may offer important insights in the mechanism driving safety learning.”
  
  Summary:
  
  The authors conducted a human fMRI study investigating the omission of expected electrical shocks with varying probabilities. Participants were informed of the probability of shock and shock intensity trial-by-trial. The time point corresponding to the absence of the expected shock (with varying probability) was framed as a prediction error producing the cognitive state of relief/pleasure for the participant. fMRI activity in the VTA/SN and ventral putamen corresponded to the surprising omission of a high probability shock. Participants' subjective relief at having not been shocked correlated with activity in brain regions typically associated with reward-prediction errors. The overall conclusion of the manuscript was that the absence of an expected aversive outcome in human fMRI looks like a reward-prediction error seen in other studies that use positive outcomes.
  
  Strengths:
  
  Overall, I found this to be a well-written human neuroimaging study investigating an often overlooked question on the role of aversive prediction errors, and how they may differ from reward-related prediction errors. The paper is well-written and the fMRI methods seem mostly rigorous and solid.
  
  Weaknesses:
  
  I did have some confusion over the use of the term "prediction-error" however as it is being used in this task. There is certainly an expectancy violation when participants are told there is a high probability of shock, and it doesn't occur. Yet, there is no relevant learning or updating, and participants are explicitly told that each trial is independent and the outcome (or lack thereof) does not affect the chances of getting the shock on another trial with the same instructed outcome probability. Prediction errors are primarily used in the context of a learning model (reinforcement learning, etc.), but without a need to learn, the utility of that signal is unclear.
  
  We operationalized “prediction error” as the response to the error in prediction or the violation of expectancy at the time of threat omission. In that sense, prediction error and expectancy violation (which is more commonly used in clinical research and psychotherapy; Craske et al., 2014) are synonymous. While prediction errors (or expectancy violations) are predominantly studied in learning situations, the definition in itself does not specify how the “expectancy” or “prediction” arises: whether it was through learning based on previous experience or through mere instruction. The rationale why we moved away from a conditioning study in the present manuscript is discussed in our general reply above.
  
  We agree with the reviewer that studying prediction errors outside a learning context limits the ecological validity of the task. However, we do believe there is also a strength to this approach. Specifically, the omission-related responses we measure are less confounded by subsequent learning (or updating of the wrongful expectation). Any difference between our results and prediction error responses in learning situation can therefore point to this exact difference in paradigm, and can thus identify responses that are specific to learning situations.
  
  An overarching question posed by the researchers is whether relief from not receiving a shock is a reward. They take as neural evidence activity in regions usually associated with reward prediction errors, like the VTA/SN . This seems to be a strong case of reverse inference. The evidence may have been stronger had the authors compared activity to a reward prediction error, for example using a similar task but with reward outcomes. As it stands, the neural evidence that the absence of shock is actually "pleasurable" is limited-albeit there is a subjective report asking subjects if they felt relief.
  
  We thank the reviewer for cautioning us and letting us critically reflect on our interpretation. We agree that it is important not to be overly enthusiastic when interpreting fMRI results and to attribute carelessly psychological functions to mere activations. Therefore, we will elaborate on the precautions we took not to minimize detrimental reverse inference.
  
  First, prior to analyzing our results, we preregistered clear hypotheses that were based on previous research, in addition to clear predictions, regions of interest and a testing approach on OSF. With our study, we wanted to investigate whether unexpected omissions of threat: (1) triggered activations in the VTA/SN, putamen, NAc and vmPFC (as has previously been shown in animal and human studies); (2) represent PE signals; and (3) were related to self-reported relief, which has also been shown to follow a PE time-curve in fear extinction (Vervliet et al., 2017). Based on previous research, we selected three criteria all PE signals should comply to. This means that if omission-related activations were to represent true PE signals, they should comply to these criteria. However, we agree that it would go too far to conclude based on our research that relief is a reward, or even that the omission-related activations represent only PE signals. While we found support for most of our hypotheses, this does not preclude alternative explanations. In fact, in the discussion, we acknowledge this and also discuss alternative explanations, such as responding to the salience (lines 395-397; “One potential explanation is therefore that the deactivation resulted from a switch from default mode to salience network, triggered by the salience of the unexpected threat omission or by the salience of the experienced stimulation.”), or anticipation (line 425-426; “... we cannot conclusively dismiss the alternative interpretation that we assessed (part of) expectancy instead”).
  
  Second, we have deliberately opted to only use descriptive labels such as omission-related activations when we are discussing fMRI results. Only when we are talking about how the activations were related to self-reported relief, we talk about relief-related activations.
  
  I have some other comments, and I elaborate on those above comments, below:
  
  (1) A major assumption in the paper is that the unexpected absence of danger constitutes a pleasurable event, as stated in the opening sentence of the abstract. This may sometimes be the case, but it is not universal across contexts or people. For instance, for pathological fears, any relief derived from exposure may be short-lived (the dog didn't bite me this time, but that doesn't mean it won't next time or that all dogs are safe). And even if the subjective feeling one gets is temporary relief at that moment when the expected aversive event is not delivered, I believe there is an overall conflation between the concepts of relief and pleasure throughout the manuscript. Overall, the manuscript seems to be framed on the assumption that "aversive expectations can transform neutral outcomes into pleasurable events," but this is situationally dependent and is not a common psychological construct as far as I am aware.
  
  We thank the reviewer for their comment. We have restructured the introduction because we agree with the reviewer that the introduction might have set false expectations concerning our interpretation of the results. The statements related to relief have been toned down in the revised manuscript.
  
  Still, we want to note that the initial opening statement “unexpected absence of danger constitutes the pleasurable emotion relief” was based on a commonly used definition of relief that states that relief refers to “the emotion that is triggered by the absence of expected or previously experienced negative stimulation ” (Deutsch, 2015). Both aspects that it is elicited by the absence of an otherwise expected aversive event and that it is pleasurable in nature has received considerable empirical support in emotion and fear conditioning research (Deutsch et al., 2015; Leknes et al., 2011; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021).
  
  That said, the notion that the feeling of relief is linked to the (reward) prediction error underlying the learning of safety is included in several theoretical papers in order to explain the commonly observed dopaminergic response at the time of threat omission (both in animals and humans; Bouton et al., 2020; Kalisch et al., 2019; Pittig et al., 2020).
  
  Together, these studies indicate that the definition of relief, and its potential role in threat omission-driven learning is – at least in our research field – established. Still, we felt that more direct research linking feelings of relief to omission-related brain responses was warranted.
  
  One of the main reasons why we specifically focus on the “pleasantness” of the relief is to assess the hedonic impact of the threat omission, as has been done in previous studies by our lab and others (Leknes et al., 2011; Leng et al., 2022; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021). Nevertheless, we agree with the reviewer that the relief we measure is a short-lived emotional state that is subjected to individual differences (as are all emotions).
  
  (2) The authors allude to this limitation, but I think it is critical. Specifically, the study takes a rather simplistic approach to prediction errors. It treats the instructed probability as the subjects' expectancy level and treats the prediction error as omission related activity to this instructed probability. There is no modeling, and any dynamic parameters affected by learning are unaccounted for in this design . That is subjects are informed that each trial is independently determined and so there is no learning "the presence/absence of stimulations on previous trials could not predict the presence/absence of stimulation on future trials." Prediction errors are central to learning. It is unclear if the "relief" subjects feel on not getting a shock on a high-probability trial is in any way analogous to a prediction error, because there is no reason to update your representation on future trials if they are all truly independent. The construct validity of the design is in question.
  
  (3) Related to the above point, even if subjects veered away from learning by the instruction that each trial is independent, the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.
  
  We thank the reviewer for raising these concerns. We believe that the general reply above covers the issues raised in points 2 and 3.
  
  (4) Bouton has described very well how the absence of expected threat during extinction can create a feeling of ambiguity and uncertainty regarding the signal value of the CS. This in large part explains the contextual dependence of extinction and the "return of fear" that is so prominent even in psychologically healthy participants. The relief people feel when not receiving an expected shock would seem to have little bearing on changing the long-term value of the CS. In any event, the authors do talk about conditioning (CS-US) in the paper, but this is not a typical conditioning study, as there is no learning.
  
  We fully agree with the reviewer that our study is no typical conditioning study. Nevertheless, because our research mostly builds on recent advances in the fear extinction domain, we felt it was necessary to introduce the fear extinction procedure and related findings. In the context of fear extinction learning, we have previously shown that relief is an emotional correlate of the prediction error driving acquisition of the novel safety memory (CSnoUS; Papalini et al., 2021; Vervliet et al., 2017). The ambiguity Bouton describes is the result of extinguished CS holding multiple meanings once the safety memory is acquired. Does it signal danger or safety? We agree with Bouton that the meaning of the CS for any new encounter will depend on the context, and the passage of time, but also on the initial strength of the safety acquisition (which is dependent on the size of the prediction error, and hence the amount of relief; Craske et al., 2014). However, it was not our objective to directly study the relation of relief to subsequent CS value, and our design is not tailored to do so post hoc.
  
  (5) In Figure 2 A-D, the omission responses are plotted on trials with varying levels of probability. However, it seems to be missing omission responses in 0% trials in these brain regions. As depicted, it is an incomplete view of activity across the different trial types of increasing threat probability.
  
  We thank the reviewer for pointing out this unclarity. The betas that are presented in the figures represent the ROI averages from each non-0% vs 0% contrasts (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.
  
  Adaptations in the revised manuscript: We have adapted the figure captions of figures 2 and 3.
  
  “The extracted beta-estimates in figures A-D represent the ROI averages from each non0% > 0% contrast (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.”
  
  (6) If I understand Figure 2 panels E-H, these are plotting responses to the shock versus no-shock (when no-shock was expected). It is unclear why this would be especially informative, as it would just be showing activity associated with shocks versus no-shocks. If the goal was to use this as a way to compare positive and negative prediction errors, the shock would induce widespread activity that is not necessarily reflective of a prediction error. It is simply a response to a shock. Comparing activity to shocks delivered after varying levels of probability (e.g., a shock delivered at 25% expectancy, versus 75%, versus 100%) would seem to be a much better test of a prediction error signal than shock versus no-shock.
  
  We thank the reviewer for this comment. The purpose of this preregistered contrast was to test whether fully predicted outcomes elicited equivalent activations in our ROIs (corresponding to the third prediction error axiom). Specifically, if a region represents a pure prediction error signal, the 100% (fully predicted shocks) > 0% (fully predicted shock omissions) contrast should be nonsignificant, and follow-up Bayes Factors would further provide evidence in favor of this null-hypothesis.
  
  We agree with the reviewer that the delivery of the stimulation triggers widespread activations in our regions of interest that confounded this contrast. However, given that it was a preregistered test for the prediction error axioms, we cannot remove it from the manuscript. Instead, we have argued in the discussion that future studies who want to take an axiomatic stance should consider alternative tests to examine this axiom.
  
  Adaptations in the revised manuscript: We adapted lines 358-364.
  
  “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”
  
  Also note that our task did not lend itself for an in-depth analysis of aversive (worse-thanexpected) prediction error signals, given that there was only one stimulation trial for each probability x intensity level (see Supplemental Figure 1). The most informative contrast that can inform us about aversive prediction error signals contrasts all non-100% stimulation trials with all 100% stimulation trials. The results of this contrast are presented in Supplemental Figure 16 and Supplemental Table 11 for completeness.
  
  (7) I was unclear what the results in Figure 3 E-H were showing that was unique from panels A-D, or where it was described. The images looked redundant from the images in A-D. I see that they come from different contrasts (non0% > 0%; 100% > 0%), but I was unclear why that was included.
  
  We thank the reviewer for this comment. Our answer is related to that of the previous comment. Figure 3 presents the results of the axiomatic tests within the secondary ROIs we extracted from a wider secondary mask based on the non0%>0% contrast.
  
  (8) As mentioned earlier, there is a tendency to imply that subjects felt relief because there was activity in "the reward pathway ."
  
  We thank the reviewer for their comment, but we respectfully disagree. Subjective relief was explicitly probed when the instructed stimulations stayed away. In the manuscript we only talk about “relief” when discussing these subjective reports. We found that participants reported higher levels of relief-pleasantness following omissions of stronger and more probable threat. This was an observation that matches our predictions and replicates our previous behavioral study (Willems & Vervliet, 2021).
  
  The fMRI evidence is treated separately from the “pleasantness” of the relief. Specifically, we refrain from calling the threat omission-related neural responses “relief-activity” as this would indeed imply that the activation would only be attributed to this psychological function. Instead, we talked about omission-related activity, and we assessed whether it complied to the prediction error criteria as specified by the axiomatic approach.
  
  Only afterwards, because we hypothesized that omission-related fMRI activation and selfreported relief-pleasantness were related, and because we found a similar response pattern for both measures, we examined how relief and omission-related fMRI activations within our ROIs were related on a trial-by-trial basis. To this end, we entered relief-pleasantness ratings as a parametric modulator to the omission regressor.
  
  By no means do we want to reduce an emotional experience (relief) to fMRI activations in isolated regions in the brain. We agree with the reviewer that this would be far too reductionist. We therefore also ran a pre-registered LASSO-PCR analysis in order to identify whether a whole-brain pattern of activations can predict subjective relief (independent from the exact instructions we gave, and independent of our a priori ROIs). This analysis used trialby-trial patterns of activation across all voxels in the brain as the predictor and self-reported relief as the outcome variable. It is therefore completely data-driven and can be seen as a preregistered exploratory analysis that is intended to inform future studies.
  
  (9) From the methods, it wasn't entirely clear where there is jitter in the course of a trial. This centers on the question of possible collinearity in the task design between the cue and the outcome. The authors note there is "no multicollinearity between anticipation and omission regressors in the firstlevel GLMs," but how was this quantified? b The issue is of course that the activity coded as omission may be from the anticipation of the expected outcome.
  
  We thank the reviewer for pointing out this unclarity. Jitter was introduced in all parts of the trial: i.e., the duration of the inter-trial interval (4-7s), countdown clock (3-7s), and omission window (4-8s) were all jittered (see fig. 1A and methods section, lines 499-507). We added an additional line to the method section.
  
  Adaptations in the revised manuscript: We added an additional line of to the methods section to further clarify the jittering (lines 498-500).
  
  “The scale remained on the screen for 8 seconds or until the participant responded, followed by an intertrial interval between 4 and 7 seconds during which only a fixation cross was shown. Note that all phases in the trial were jittered (i.e., duration countdown clock, duration outcome window, duration intertrial interval).”
  
  Multicollinearity between the omission and anticipation regressors was assessed by calculating the variance inflation factor (VIF) of omission and anticipation regressors in the first level GLM models that were used for the parametric modulation analyses.
  
  Adaptations in the revised manuscript: We replaced the VIF abbreviation with “variance inflation factor” (line 423-424).
  
  “Nevertheless, there was no multicollinearity between anticipation and omission regressors in the first-level GLMs (VIFs Variance Inflation Factor, VIF < 4), making it unlikely that the omission responses purely represented anticipation.”
  
  (10) I did not fully understand what the LASSO-PCR model using relief ratings added. This result was not discussed in much depth, and seems to show a host of clusters throughout the brain contributing positively or negatively to the model. Altogether, I would recommend highlighting what this analysis is uniquely contributing to the interpretation of the findings.
  
  The main added value of this analyses is that it uses a different approach altogether. Where the (mass univariate) parametric modulation analysis estimated in each voxel (and each ROI) whether the activity in this voxel/ROI covaried with the reported relief, a significant activation only indicated that this voxel was related to relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network across the brain, and which regions contributed most to the prediction of relief. The multivariate LASSO-PCR analysis approach we took attempts to overcome this limitation by examining if a more whole-brain pattern can predict relief. Because we use the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data-driven and is intended to inform future studies. In addition, the LASSO-PCR model was cross-validated using five-fold cross-validation, which is also a difference (and a strength) compared to the mass univariate GLM approach.
  
  One interesting finding that only became evident when we combined univariate and multivariate approaches is that despite that the parametric modulation analysis showed that omission-related fMRI responses in the ROIs were modulated by the reported relief, none of these ROIs contributed significantly to the prediction of relief based on the identified signature. Instead, some of the contributing clusters fell within other valuation and errorprocessing regions (e.g. lateral OFC, mid cingulate, caudate nucleus). This suggests that other regions than our a priori ROIs may have been especially important for the subjective experience of relief, at least in this task. However, all these clusters were small and require further validation in out of sample participants. More research is necessary to test the generalizability and validity of the relief signature to new individuals and tasks, and to compare the signature with other existing signature models (e.g., signature of pain, fear, reward, pleasure). However, this was beyond the scope of the present study.
  
  Adaptations in the revised manuscript: We altered the explanation of the LASSO-PCR approach in the results section (lines 286-295) and the discussion (lines 399-402)
  
  Adaptations in the Results section: “The (mass univariate) parametric modulation analysis showed that omission-related fMRI activity in our primary and secondary ROIs correlated with the pleasantness of the relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network of activation across the brain, and which regions contributed most to the prediction of relief. To overcome these limitations, we trained a (multivariate) LASSO-PCR model (Least Absolute Shrinkage and Selection Operator-Regularized Principle Component Regression) in order to identify whether a spatially distributed pattern of brain responses can predict the perceived pleasantness of the relief (or “neural signature” of relief)31. Because we used the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data driven and can thus identify which clusters contribute most to the relief prediction.”
  
  Adaptations in the Discussion section: “In addition to examining the PE-properties of neural omission responses in our a priori ROIs, we trained a LASSO-PCR model to establish a signature pattern of relief. One interesting finding that only became evident when we compared the univariate and multivariate approach was that none of our a priori ROIs appeared to be an important contributor to the multivariate neural signature, even though all of them (except NAc) were significantly modulated by relief in the univariate analysis.”
  
  In addition to the public peer review, the reviewers provided some recommendation on how to further improve our manuscript. We will reply to the recommendations below.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Given that you do have trial-level estimates from the classifier analysis, it would be very informative to use learning models and examine responses trial-by-trial to test whether there are prediction errors that vary over time as a function of learning.
  
  We thank the reviewer for the suggestion. However, based on the results of the run-regressor, we do not anticipate large learning effects in our paradigm. As we mentioned in our responses above, we controlled for time-related drops in omission-responding by including a “run” regressor in our analyses. Results of this regressor for subjective relief and omission-related SCR showed that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This suggests that even though some learning might have taken place, its effect was likely small and did not abolish our manipulations of probability and intensity. In any case, we cannot use the LASSO-PCR signature model to investigate learning, as this model uses the trial-level brain pattern at the time of US omission to estimate the associated level of relief. These estimates can therefore not be used to examine learning effects.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The LASSO-PCR model feels rather disconnected from the rest of the paper and does not add much to the main theme. I would suggest to remove this part from the paper.
  
  We thank the reviewer for this suggestion. However, the LASSO-PCR analysis was a preregistered. We therefore cannot remove it from the manuscript. We hope to have clarified its added value in the revised version of the manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.15.553434v2
www.biorxiv.org www.biorxiv.org

New submission 03/07/2023, 12:48:34

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Recommendations For The Authors):
  
  - There were no mechanistic or causation-focused investigations that could have greatly strengthened the study. The study is ultimately providing two prioritized candidate genes that may be causative, reactive, or independent of the disease.
  
  Answer: We thank the reviewer for their positive assessment and agree that our study lacks formal causal analyses. We are aware of this limitation and have made it clear throughout the text. Through triangulation of evidence across tissues and species, we point to very interesting candidates that merit further study, which is the usual scope of such systems genetics investigations. Nevertheless, to introduce some causal inference and reinforce the human relevance of our results, we have performed Mendelian randomization (MR) analysis to investigate the potential associations between MUC4’s gene expression in human colons and the risk of IBD. EPHA6 lacks detectable eQTLs in human colon so we could not include it in this analysis. We found suggestive evidence that increased expression of MUC4 in the sigmoid, but not transverse, colon may increase the risk of IBD (nominal p = 0.033).
  
  The description in the manuscript:
  
  However, it is unclear through what mechanisms the genetic variants in the candidate genes affect IBD susceptibility. One possibility is that genetic variation leads to altered levels of expression of the gene, ultimately affecting disease susceptibility. To test this possibility, we examined the GTEx resource (GTEx Consortium, 2013) and found that MUC4, but not EPHA6, has cis-eQTLs in the sigmoid and transverse colon. To establish likely causal links with IBD incidence, we used these associations as instruments in a two-sample Mendelian randomization (MR) (Hemani, Tilling and Smith, 2017; Hemani et al., 2018) analysis. Using publicly available GWAS summary statistics for IBD, Crohn’s disease, and ulcerative colitis (Liu et al., 2015; Elsworth et al., 2020) as outcomes, we found suggestive evidence that increased expression of MUC4 in the sigmoid, but not transverse, colon may increase the risk of IBD (nominal P value = 0.033, Appendix 1 - Table 6). No eQTLs were reported for EPHA6 in the colon, precluding us from investigating the potential consequences of changes in its expression in these tissues.
  
  - Figures 3 and its supplement Figure 1: Among the 39 modules, the authors have only focused on significantly overlapping up-regulated IBD-related gene modules in both CD (M28 and M32) and HFD (M9 and M28) for their follow up analyses in Figures 4 and 5 to prioritize candidate genes. However, this reviewer thinks there is great value in also focusing on significantly overlapping down-regulated IBD-related gene modules in both CD (M17) and HFD (M15 and M26) for their follow up candidate gene prioritization analyses.
  
  Answer: Thank you for your suggestion. We had initially performed overrepresentation analyses in HFD_M15, HFD_M26 and CD_M17, but did not find enrichments related to inflammation (see Author response image 1 below). We did not include this result in the manuscript.
  
  Author response image 1.
  
  Dot plot showing the enrichment of IBD-related modules in hallmark genesets. Gene ratios higher than 0.1 are shown and represented by dot size. Dots are colored by -Log10(BH-adjusted P values).
  
  We also checked the module QTL mapping for the significantly overlapping down-regulated IBD-related gene modules in both CD and HFD. We did not find any loci that are significantly associated with these modules, indicating that they are not modulated by genetic variation and hence are less likely to inform on IBD susceptibility.
  
  The description in the manuscript:
  
  The ModQTL analysis was also performed on the modules that are significantly enriched in IBD-downregulated genes (HFD_M15, HFD_M24, and HFD_M26), but no significant or suggestive QTLs were detected. Therefore, we focused on the QTL for IBD-induced genes in HFD_M28 and annotated its candidate genes based on three criteria (Figure 5B).
  
  Reviewer #2 (Recommendations For The Authors):
  
  - One small addition that would be nice would be to indicate if the two candidate genes have cis eQTL in human tissues and/or have any protein-coding variants in humans. This would provide nice additional evidence of causality for these two genes.
  
  Answer: Thank you for your positive assessment and suggestion. MUC4 and EPHA6 both have protein-coding variants in humans that were listed in the Appendix – Table 3 and Table 4. In addition, cis-eQTLs have been found for MUC4 in both the sigmoid and transverse colon in humans (GTEx, https://gtexportal.org/home/locusBrowserPage/ENSG00000145113.21). As indicated in our response to the first comment of Reviewer #1, we have now performed mendelian randomization on human eQTL for MUC4. However, no eQTLs were reported for EPHA6 in the colon, preventing us from performing MR analysis on its expression.
  
  - Also, it would be helpful to include the size of the modules in the text of the manuscript. Especially the two modules that were followed up on.
  
  Answer: Thank you for your suggestion, we have indicated the size of IBD-related modules in the text of the manuscript.
  
  The description in the manuscript:
  
  Enrichment analyses indicated that modules HFD_M9 (484 genes), HFD_M16 (328 genes), and HFD_M28 (123 genes) were enriched with genes that are upregulated by DSS-induced colitis, while HFD_M15 (368 genes), HFD_M24 (159 genes), and HFD_M26 (135 genes) were significantly enriched with downregulated genes (Figure 3C). Of note, more than 20% of genes involved in HFD_M9 and HFD_M28 were part of the dysregulated genes of the acute phase of mouse UC (day6 and day7) (Figure 3C). Interestingly, genes perturbed during IBD pathogenesis in humans were also enriched in HFD_M9 and HFD_M28 (Figure 3C).
  
  While IBD-related genes were predominantly found in HFD modules, we also found that two modules, CD_M28 (185 genes) and CD_M32 (142 genes), in CD-fed mouse colons were associated with IBD (Figure 3—figure supplement 1A). These two-modules significantly overlapped with the IBD-related HFD_M9 and HFD_M28 modules, respectively (BH-adjusted P value < 0.05) (Figure 3—figure supplement 1B). Moreover, the molecular signatures underlying human UC and Crohn’s disease were also clustered in these two modules (CD_M28 and CD_M32) under CD (Figure 3—figure supplement 1C). Collectively, the co-expression and enrichment analyses identify HFD_M9 and HFD_M28 as IBD-related modules on which we focus our subsequent investigation.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.22.533818v2
www.biorxiv.org www.biorxiv.org

New submission 09/02/2024, 08:52:19

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #2 (Public Review)
  
  Weaknesses
  
  1) The usage of young growing mice (8-10 weeks) versus adult mice (>4 months) in the murine mechanical overload experiments. The usage of adult mice would be preferable for these experiments given that maturational growth may somehow affect the outcomes.
  
  The basis for this critique is not clear as it has been shown that the longitudinal growth of bones is complete by ⁓8 weeks of age (e.g., PMID: 28326349, and 31997656). These studies, along with others, also indicate that 8 weeks is a post-pubescent age in mice. For these reasons, 8 weeks of age was viewed as being representative of the human equivalent of when people start to perform resistance exercise with the goal of increasing muscle mass. Also, it’s important to consider that the mice were 10-12 weeks of age when the muscles were collected which would be equivalent to a human in their lower 20’s. In our human study, the mean age of the subjects was 23. Given the above points, it’s hard for us to appreciate why the use of mice that started at 8-10 weeks of age is viewed as a weakness. With that being said, we recognize that there may be age-related changes in mechanisms of mechanical load-induced growth, but it was not our intent to address this topic.
  
  1b) No consideration for biological sex.
  
  We appreciate this point and we agree that sex is an important variable to consider. In this study, we explored an unchartered topic and therefore we wanted to minimize as many known variables as possible. We did that, in part, by focusing specifically on male subjects. In the future, it will certainly be important to explore whether sex (and age) impact the structural adaptations that drive the mechanical load-induced growth of muscle fibers.
  
  2) Information on whether myofibrillogenesis is dependent on hypertrophy induced by loading, or just hypertrophy in general. To provide information on this, the authors could use, for instance, inducible Myostatin KO mice (a model where hypertrophy and force production are not always in lockstep) to see whether hypertrophy independent from load induces the same result as muscle loading regarding myofibrillogenesis.
  
  This is a great suggestion, but it goes beyond the intended scope of our study. Nevertheless, with the publication of our FIM-ID methodology, the answer to this and related questions can now be obtained in a time- and cost-effective manner.
  
  3) Limited information on Type 1 fiber hypertrophy. A "dual overload" model is used for the mouse where the soleus is also overloaded, but presumably, the soleus was too damaged to analyze. Exploring hypertrophy of murine Type 1 fibers using a different model (weight pulling, weighted wheel running, or forced treadmill running) would be a welcome addition.
  
  The point is well taken and further studies that are aimed at determining whether there are differences in how Type I vs. Type II fibers grow would be an excellent subject for future studies.
  
  Reviewer #3 (Public Review)
  
  1) Supplemental Figure 1 is not very clear.
  
  Supplemental Figure 1 is now presented as Supplemental Figure 2. We carefully reexamined this figure and, in our opinion, the key points have been appropriately conveyed. We would be more than happy to revise the figure, but we would need guidance with respect to which aspect(s) of the figure were not clear to the reviewer.
  
  Reviewer #1 (Recommendations For The Authors)
  
  Introduction.
  
  1) I do not think the first paragraph is really necessary. Cell growth is a fundamental property of cell biology that requires no further justification.
  
  We believe that it is essential to remind all readers about the importance of skeletal muscle research. For some, the detrimental impact of skeletal muscle loss on one’s quality of life and the greater burden on the healthcare system may not be known.
  
  2) I prefer "fundamental" over "foundationally".
  
  All mentions of the word “foundational” and “foundationally” have been changed to “fundamental” and “fundamentally.”
  
  3) As usual for the Hornberger lab, the authors do an excellent job of providing the (historical) context of the research question.
  
  Thank you for this positive comment.
  
  4) I prefer “Goldspink” as “Dr. Goldspink” feels too personal especially when you are critical of his studies.
  
  All instances of “Dr.” have been removed when referring to the works of others. This includes Dr. Goldspink and Dr. Tokuyasu.
  
  5) Fourth paragraph, after reference #17. I felt like this discussion was not necessary and did not really add any value to the introduction.
  
  We believe that this discussion should remain since it highlights the widely accepted notion that mechanical loading leads to an increase in the number of myofibrils per fiber, yet there is no compelling data to support this notion. This discussion highlights the need for documented evidence for the increase in myofibril number in response to mechanical loading and, as such, it serves as a major part of the premise for the experiments that were conducted in our manuscript.
  
  6) The authors do a nice job of laying out the challenge of rigorously testing the Goldspink model of myofiber hypertrophy.
  
  Thank you!
  
  Results
  
  1). For the EM images, can the authors provide a representative image of myofibril tracing? From the EM image provided, it is difficult to evaluate how accurate the tracing is.
  
  -Representative images and an explanation of myofibril calculation have been provided in Supplemental Figure 5.
  
  2) In the mouse, how does the mean myofibril CSA compare between EM and FIM-ID?
  
  Author response image 1.
  
  The above figures compare the myofibril CSA and fiber CSA measurements that were obtained with EM and FIM-ID for all analyzed fibers, as well as the same fibers separated according to the fiber type (i.e., Ox vs. Gly). The above figure shows that the FIM-ID measurements of myofibril CSA were slightly, yet significantly, lower than the measurements obtained with EM. However, we believe that it would be misleading to present the data in this manner. Specifically, as shown in Fig. 4C, a positive linear relationship exists between myofibril CSA and fiber CSA. Thus, a direct comparison of myofibril CSA measurements obtained from EM and FIM-ID would only be meaningful if the mean CSA of the fibers that were analyzed were the same. As shown on the panel on the right, the mean CSA of the fibers analyzed with FIM-ID was slightly, yet significantly, lower than the mean CSA of the fibers analyzed with EM. As such, we believe that the most appropriate way to compare the measurements of the two methods is to express the values for the myofibril CSA relative to the fiber CSA and this is how we presented the data in the main figure (i.e., Fig. 4E).
  
  3) Looking at Fig. 3D, how is intermyofibrillar space calculated when a significant proportion of the ROI is odd-shaped myofibrils that are not outlined? It is not clear how the intermyofibrillar space between the odd-shaped myofibrils is included in the total intermyofibrillar space calculation for the fiber.
  
  The area occupied by the intermyofibrillar components is calculated by using our custom “Intermyofibrillar Area” pipeline within CellProfiler. Briefly, the program creates a binary image of the SERCA signal. The area occupied by the white pixels in the binary image is then used to calculate the area that is occupied by the intermyofibrillar components. To help readers, an example of this process is now provided in supplemental figure 4.
  
  4) What is the average percentage of each ROI that was not counted by CP (because a myofibril did not fit the shape criteria)? The concern is that the method of collection is biasing the data. In looking at EM images of myofibrils (from other studies), it is apparent that myofibrils are not always oval; in fact, it appears that often myofibrils have a more rectangular shape. These odd-shaped myofibrils are excluded from the analysis yet they might provide important information; maybe these odd-shaped myofibrils always hypertrophy such that their inclusion might change the overall conclusion of the study. I completely understand the challenges of trying to quantify odd-shaped myofibrils. I think it is important the authors discuss this important limitation of the study.
  
  First, we would like to clarify that myofibrils of a generally rectangular shape were not excluded. The intent of the filtering steps was to exclude objects that exhibited odd shapes because of an incomplete closure of the signal from SERCA. To illustrate this point we have annotated the images from Figure 3B-D with a red arrow which points to a rectangular object and blue arrows which point to objects that most likely consisted of two or more individual myofibrils that were falsely identified as a single object.
  
  Author response image 2.
  
  We appreciate the reviewer's concern that differences in the exclusion rates between groups could have biased the outcomes. Indeed, this was something that we were keeping a careful eye on during our analyses, and we hope that the reviewer will take comfort in knowing that objects were excluded at a very similar rate in both the mouse and human samples (44% vs. 46% for SHAM vs. MOV in mice, and 47% vs. 47% for PRE vs. POST in humans). We realize that this important data should have been included in our original submission and it is now contained with the results section of the revised version of our manuscript. Hopefully the explanation above, along with the inclusion of this data, will alleviate the reviewers concerns that differences between the groups may have been biased by the filtering steps.
  
  Discussion.
  
  1) I think the authors provided a balanced interpretation of the data by acknowledging the limitation of having only one time-point. i.e., not being able to assess the myofibril splitting mechanism.
  
  Thank you!
  
  2) I think a discussion on the important limitation of only quantifying oval-shaped myofibrils should be included in the discussion.
  
  Please refer to our response to comment #4 of the results section.
  
  Reviewer #2 (Recommendations For The Authors)
  
  Overall, this is a thoughtful, clear, and impactful manuscript that provides valuable tools and information for the skeletal muscle field. My specific comments are as follows:
  
  1) In the introduction, I really appreciate the historical aspect provided on myofbrillogenesis. As written, however, I was expecting the authors to tackle the myofibril "splitting" question in greater detail with their experiments given the amount of real estate given to that topic, but this was not the case. Consider toning this down a bit as I think it sets a false expectation.
  
  We acknowledge that the study does not directly address the question about myofibril splitting. However, we believe that it is important to highlight the background of this untested theory since it serves as a major part of the premise for the experiments that were performed.
  
  2) In the introduction, is it worth worth citing this study? https://rupress.org/jcb/articlepdf/111/5/1885/1464125/1885.pdf.
  
  This is a very interesting study but, despite the title, we do not believe that it is accurate to say that this study investigated myofibrillogenesis. Instead (as illustrated by the author in Fig. 9) the study focused on the in-series addition of new sarcomeres at the ends of the pre-existing myofibrils (i.e., it studied in-series sarcomerogenesis). In our opinion, the study does not provide any direct evidence of myofibrillogenesis, and we are not aware of any studies that have shown that the chronic stretch model employed by the authors induces myofibrillogenesis. However, numerous studies have shown that chronic stretch leads to the in-series addition of new sarcomeres.
  
  3) Is there evidence for myofbrillogenesis during cardiac hypertrophy that could be referenced here?
  
  This is a great question, and one would think that it would have been widely investigated. However, direct evidence for myofibrillogenesis during load-induced cardiac hypertrophy is just as sparse as the evidence for myofibrillogenesis during load-induced skeletal muscle hypertrophy.
  
  4) In the introduction, perhaps mention that prolonged fixation is another disadvantage of EM tissue preparation. This typically prevents the usage of antibodies afterwards, whereas the authors have been able to overcome this using their method, which is a great strength.
  
  Thank you for the suggestion. This point has been added the 5th paragraph of the introduction.
  
  5) In the introduction, are there not EM-compatible computer programs that could sidestep the manual tracing and increase throughput? Why could software such as this not be used? https://www.nature.com/articles/s41592-019-0396-9
  
  While we agree that automated pipelines have been developed for EM, such methods require a high degree of contrast between the measured objects. With EM, the high degree of contrast required for automated quantification is rarely observed between the myofibrils and the intermyofibrillar components (especially in glycolytic fibers). Moreover, one of the primary goals of our study was to develop a time and cost-effective method for identifying and quantifying myofibrils. As such, we developed a method that would not require the use of EM. We only incorporated EM imaging and analysis to validate the FIM-ID method. Therefore, utilizing an EM-compatible program to sidestep the manual tracing would have sped up the validation step, but it would not have accomplished one of the primary goals of our study.
  
  6) In the results, specifically for the human specimens, were "hybrid" fibers detected and, if so, how did the pattern of SERCA look? Also, did the authors happen to notice centrallynucleated muscle fibers in the murine plantaris after overload? If so, how did the myofibrils look? Could be interesting.
  
  For the analysis of the human fibers, two distinct immunolabeling methods were performed. One set of sections was stained for SERCA1 and dystrophin, while the other set was stained for SERCA2 and dystrophin. In other words, we did not perform dual immunolabeling for SERCA1 and SERCA2 on the same sections. Therefore, during the analysis of the human fibers, we did not detect the presence of hybrid fibers. Furthermore, while we did not perform nuclear staining on these sections, it should be noted that nuclei do not contain SERCA, and to the best of our recollection, we did not detect any SERCAnull objects within the center of the fibers. Moreover, our previous work has shown that the model of MOV used in this study does not lead to signs of degeneration/regeneration (You, Jae-Sung et al. (2019). doi:10.1096/fj.201801653RR). Therefore, it can be safely assumed that very few (if any) of the fibers analyzed in this study were centrally nucleated.
  
  7) In the Results, fixed for how long? This is important since, at least in my experience, with 24+ hours of fixation, antibody reactivity is significantly reduced unless an antigen retrieval step is performed (even then, not always successful). Also, presumably these tissues were drop-fixed? These details are in the Methods but some additional detail here could be warranted for the benefit of the discerning and interested reader.
  
  For both the mouse and human, the samples were immersion-fixed (presumably the equivalent of “drop-fixed”) in 4% paraformaldehyde in 0.1M phosphate buffer solution for a total of 24 hours (as described in the Methods section). We agree that prolonged aldehyde fixation can affect antibody reactivity; however, the antibodies used for FIM-ID did not require an antigen retrieval step.
  
  8) In the results regarding NADH/FAD autofluorescence imaging, a complimentary approach in muscle was recently described and could be cited here: https://journals.physiology.org/doi/full/10.1152/japplphysiol.00662.2022
  
  We appreciate the reviewer’s recommendation to add this citation for the support of our method for fiber type classification and have added it to the manuscript in the second paragraph under the “Further refinement and validation of the automated measurements with FIM-ID” subsection of the Results as citation number 57.
  
  9) In the results, "Moreover, no significant differences in the mean number of myofibrils per fiber CSA were found when the results from the FIM-ID and EM-based measurements were directly compared, and this point was true when the data from all analyzed fibers was considered..." Nit-picky, but should it be "were considered" since data is plural?
  
  Thanks, this error was corrected.
  
  10) In the discussion, are the authors developing a "methodology" or a "method"? I think it may be the latter.
  
  We agree that “method” is the correct term to use. Instances of the word “methodology” have been replaced with “method.”
  
  11) In the discussion, since the same fibers were not being tracked over time, I'm not sure that saying "radial growth" is strictly correct. It is intuitive that the fibers were growing during loading, of course, but it may be safer to say "larger fibers versus control or the Pre sample" or something of the like. For example, "all the fiber types that were larger after loading versus controls" as opposed to "showed significant radial growth"
  
  While we agree that the fiber size was not tracked over time, the experiments were designed to test for a main effect of mechanical loading. Therefore, we are attributing the morphological adaptations to the mechanical loading variable (i.e., mechanical loadinduced growth). The use of terms like “the induction of radial growth” or “the induction of hypertrophy” are commonly used in studies with the methods employed in this study. Respectfully, we believe that it would be more confusing for the readers if we used the suggested terms like "all the fiber types that were larger after loading versus controls". For instance, if I were the reader I would think to myself… but there fiber types that were larger than others before loading (e.g., Ox vs. Gly), so what are the authors really trying to talk about?
  
  12) I would suggest making a cartoon summary figure to complement and summarize the Methods/Results/Discussion
  
  Thank you for this suggestion. We created a cartoon that summarizes the overall workflow for FIM-ID and this cartoon is now presented in Supplemental Figure 1.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.13.557204v2
www.biorxiv.org www.biorxiv.org

An Hfq-dependent post-transcriptional mechanism fine tunes RecB expression in Escherichia coli

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #2 (Public Review):
  
  The authors make a compelling case for the biological need to exquisitely control RecB levels, which they suggest is achieved by the pathway they have uncovered and described in this work. However, this conclusion is largely inferred as the authors only investigate the effect on cell survival in response to (high levels of) DNA damage and in response to two perturbations - genetic knock-out or over-expression, both of which are likely more dramatic than the range of expression levels observed in unstimulated and DNA damage conditions.
  
  In the discussion of the updated version of the manuscript, we have clarified the limits of our interpretation of the role of the uncovered regulation.
  
  Lines 411-417: “It is worth noting that the observed decrease in cell viability upon DNA damage was detected for relatively drastic perturbations such as recB deletion and RecBCD overexpression. Verifying these observations in the context of more subtle changes in RecB levels would be important for further investigation of the biological role of the uncovered regulation mechanism. However, the extremely low numbers of RecB proteins make altering its abundance in a refined, controlled, and homogeneous across cells manner extremely challenging and would require the development of novel synthetic biology tools.”
  
  Reviewer #3 (Public Review):
  
  The major weaknesses include a lack of mechanistic depth, and part of the conclusions are not fully supported by the data.
  
  (1) Mechanistically, it is still unclear why upon DNA damage, translation level of recB mRNA increases, which makes the story less complete. The authors mention in the Discussion that a moderate (30%) decrease in Hfq protein was observed in previous study, which may explain the loss of translation repression on recB. However, given that this mRNA exists in very low copy number (a few per cell) and that Hfq copy number is on the order of a few hundred to a few thousand, it's unclear how 30% decrease in the protein level should resides a significant change in its regulation of recB mRNA.
  
  We agree that the entire mechanistic pathway controlling recB expression may be not limited to just Hfq involvement. We have performed additional experiments, proposed by the reviewer, suggesting that a small RNA might be involved (see below, response to comments 3&4). However, we consider that the full characterisation of all players is beyond the scope of this manuscript. In addition to describing the new data (see below), we expanded the discussion to explain more precisely why changes in Hfq abundance upon DNA damage may impact RecB translation.
  
  Lines 384-391: “A modest decrease (~30%) in Hfq protein abundance has been seen in a proteomic study in E. coli upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002). While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, it is conceivable that even modest changes in Hfq availability could result in significant changes in gene expression, and this could explain the increased translational efficiency of RecB under DNA damage conditions. “
  
  (2) Based on the experiment and the model, Hfq regulates translation of recB gene through binding to the RBS of the upstream ptrA gene through translation coupling. In this case, one would expect that the behavior of ptrA gene expression and its response to Hfq regulation would be quite similar to recB. Performing the same measurement on ptrA gene expression in the presence and absence of Hfq would strengthen the conclusion and model.
  
  Indeed, based on our model, we expect PtrA expression to be regulated by Hfq in a similar manner to RecB. However, the product encoded by the ptrA gene, Protease III, (i) has been poorly characterised; (ii) unlike RecB, is located in the periplasm (DOI: 10.1128/jb.149.3.1027-1033.1982); and (iii) is not involved in any DNA repair pathway. Therefore, analysing PtrA expression would take us away from the key questions of our study.
  
  (3) The authors agree that they cannot exclude the possibility of sRNA being involved in the translation regulation. However, this can be tested by performing the imaging experiments in the presence of Hfq proximal face mutations, which largely disrupt binding of sRNAs.
  
  (4) The data on construct with a long region of Hfq binding site on recB mRNA deleted is less convincing. There is no control to show that removing this sequence region itself has no effect on translation, and the effect is solely due to the lack of Hfq binding. A better experiment would be using a Hfq distal face mutant that is deficient in binding to the ARN motifs.
  
  We performed the requested experiments. We included this data in the manuscript in the supplementary figure (Figure S11), and our interpretation in the discussion.
  
  Lines 354-378: “While a few recent studies have shown evidence for direct gene regulation by Hfq in a sRNA-independent manner (DOI: 10.1101/gad.302547.117; DOI: 10.1111/mmi.14799; DOI: 10.1371/journal.pgen.1004440; DOI: 10.1111/mmi.12961; DOI: 10.1038/emboj.2013.205), we attempted to investigate whether a small RNA could be involved in the Hfq-mediated regulation of RecB expression. We tested Hfq mutants containing point mutations in the proximal and distal sides of the protein, which were shown to disrupt either binding with sRNAs or with ARN motifs of mRNA targets, respectively [DOI: 10.1016/j.jmb.2013.01.006, DOI: 10.3389/fcimb.2023.1282258]. Hfq mutated in either proximal (K56A) or distal (Y25D) faces were expressed from a plasmid in a ∆hfq background. In both cases, Hfq expression was confirmed with qPCR and did not affect recB mRNA levels (Supplementary Figure S11b). When the proximal Hfq binding side (K56A) was disrupted, RecB protein concentration was nearly similar to that obtained in a ∆hfq mutant (Supplementary Figure S11a, top panel). This observation suggests that the repression of RecB translation requires the proximal side of Hfq, and that a small RNA is likely to be involved as small RNAs (Class I and Class II) were shown to predominantly interact with the proximal face of Hfq [DOI: 10.15252/embj.201591569]. When we expressed Hfq mutated in the distal face (Y25D) which is deficient in binding to mRNAs, less efficient repression of RecB translation was detected (Supplementary Figure S11a, bottom panel). This suggests that RecB mRNA interacts with Hfq at this position. We did not observe full de-repression to the ∆hfq level, which might be explained by residual capacity of Hfq to bind its recB mRNA target in the point mutant (Y25D) (either via the distal face with less affinity or via the lateral rim Hfq interface).”
  
  Taken together, these results suggest that Hfq binds to recB mRNA and that a small RNA might contribute to the regulation although this sRNA has not been identified.
  
  (5) Ln 249-251: The authors claim that the stability of recB mRNA is not changed in ∆hfq simply based on the steady-state mRNA level. To claim so, the lifetime needs to be measured in the absence of Hfq.
  
  We measured recB lifetime in the absence of Hfq in a time-course experiment where transcription initiation was inhibited with rifampicin and mRNA abundance was quantified with RT-qPCR. The results confirmed that recB mRNA lifetime in hfq mutants is similar to the one in the wild type (Figure S7d, referred to the line 263 of the manuscript).
  
  (6) What's the labeling efficiency of Halo-tag? If not 100% labeled, is it considered in the protein number quantification? Is the protein copy number quantification through imaging calibrated by an independent method? Does Halo tag affect the protein translation or degradation?
  
  Our previous study (DOI: 10.1038/s41598-019-44278-0) described a detailed characterization of the HaloTag labelling technique for quantifying low-copy proteins in single E. coli cells using RecB as a test case.
  
  In that study, we showed complete quantitative agreement of RecB quantification between two fully independent methods: HaloTag-based labelling with cell fixation and RecB-sfGFP combined with a microfluidic device that lowers protein diffusion in the bacterial cytoplasm. This second method had previously been validated for protein quantification (DOI: 10.1038/ncomms11641) and provides detection of 80-90% of the labelled protein. Additionally, in our protocol, immediate chemical fixation of cells after the labelling and quick washing steps ensure that new, unlabelled RecB proteins are not produced. We, therefore, conclude that our approach to RecB detection is highly reliable and sufficient for comparing RecB production in different conditions and mutants.
  
  The RecB-HaloTag construct has been designed for minimal impact on RecB production and function. The HaloTag is translationally fused to RecB in a loop positioned after the serine present at position 47 where it is unlikely to interfere with (i) the formation of RecBCD complex (based on RecBCD structure, DOI: 10.1038/nature02988), (ii) the initiation of translation (as it is far away from the 5’UTR and the beginning of the open reading frame) and (iii) conventional C-terminalassociated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). In our manuscript, we showed that the RecB-HaloTag degradation rate is similar to the dilution rate due to bacterial growth. This is in line with a recent study on unlabelled proteins, which shows that RecB’s lifetime is set by the cellular growth rate (DOI: 10.1101/2022.08.01.502339).
  
  Furthermore, we have demonstrated (DOI: 10.1038/s41598-019-44278-0) that (i) bacterial growth is not affected by replacing the native RecB with RecB-HaloTag, (ii) RecB-HaloTag is fully functional upon DNA damage, and (iii) no proteolytic processing of the RecB-HaloTag is detected by Western blot.
  
  These results suggest that RecB expression and functionality are unlikely to be affected by the translational HaloTag insertion at Ser-47 in RecB.
  
  In the revised version of the manuscript, we have added information about the construct and discuss the reliability of the quantification.
  
  Lines 141-152: “To determine whether the mRNA fluctuations we observed are transmitted to the protein level, we quantified RecB protein abundance with singlemolecule accuracy in fixed individual cells using the Halo self-labelling tag (Fig. 2A&B).
  
  The HaloTag is translationally fused to RecB in a loop after Ser47(DOI: 10.1038/s41598-019-44278-0) where it is unlikely to interfere with the formation of RecBCD complex (DOI: 10.1038/nature02988), the initiation of translation and conventional C-terminal-associated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). Consistent with minimal impact on RecB production and function, bacterial growth was not affected by replacing the native RecB with RecBHaloTag, the fusion was fully functional upon DNA damage and no proteolytic processing of the construct was detected (DOI: 10.1038/s41598-019-44278-0). To ensure reliable quantification in bacteria with HaloTag labelling, the technique was previously verified with an independent imaging method and resulted in > 80% labelling efficiency (DOI: 10.1038/s41598-019-44278-0, DOI: 10.1038/ncomms11641). In order to minimize the number of newly produced unlabelled RecB proteins, labelling and quick washing steps were followed by immediate chemical fixation of cells.”
  
  Lines 164-168: “Comparison to the population growth rate [in these conditions (0.017 1/min)] suggests that RecB protein is stable and effectively removed only as a result of dilution and molecule partitioning between daughter cells. This result is consistent with a recent high-throughput study on protein turnover rates in E. coli, where the lifetime of RecB proteins was shown to be set by the doubling time (DOI: 10.1038/s41467-024-49920-8).”
  
  (7) Upper panel of Fig S8a is redundant as in Fig 5B. Seems that Fig S8d is not described in the text.
  
  We have now stated in the legend of Fig S8a that the data in the upper panel were taken from Fig 5B to visually facilitate the comparison with the results given in the lower panel. We also noticed that we did not specify that in the upper panel in Fig S9a (the data in the upper panel of Fig S9a was taken from Fig 5C for the same reason). We added this clarification to the legend of the Fig S9 as well.
  
  We referred to the Fig S8d in the main text.
  
  Lines 283-284: “We confirmed the functionality of the Hfq protein expressed from the pQE-Hfq plasmid in our experimental conditions (Fig. S8d).”
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Experimental regime to measure protein and mRNA levels.
  
  (a) Authors expose cells to ciprofloxacin for 2 hrs. They provide a justification via a mathematical model. However, in the absence of a measurement of protein and mRNA across time, it is unclear whether this single time point is sufficient to make the conclusion on RecB induction under double-strand break.
  
  In our experiments, we only aimed to compare recB mRNA and RecB protein levels in two steady-state conditions: no DNA damage and DNA damage caused by sublethal levels of ciprofloxacin. We did not aim to look at RecB dynamic regulation from nondamaged to damaged conditions – this would indeed require additional measurements at different time points. We revised this part of the results to ensure that our conclusions are stated as steady-state measurements and not as dynamic changes.
  
  Line 203-205: “We used mathematical modelling to verify that two hours of antibiotic exposure was sufficient to detect changes in mRNA and protein levels and for RecB mRNA and protein levels to reach a new steady state in the presence of DNA damage.”
  
  (b) Authors use cell area to account for the elongation under damage conditions. However, it is unclear whether the number of copies of the recB gene are similar across these elongated cells. Hence, authors should report mRNA and protein levels with respect to the number of gene copies of RecB or chromosome number as well.
  
  Based on the experiments in DNA damaging conditions, our main conclusion is that the average translational efficiency of RecB is increased in perturbed conditions. We believe that this conclusion is well supported by our measurements and that it does not require information about the copy number of the recB gene but only the concentration of mRNA and protein. We did observe lower recB mRNA concentration upon DNA damage in comparison to the untreated conditions, which may be due to a lower concentration of genomic DNA in elongated cells upon DNA damage, as we mention in lines (221-223).
  
  Our calculation of translation efficiency could be affected by variations of mRNA concentration across cells in the dataset. For example, longer cells that are potentially more affected by DNA damage could have lower concentrations of mRNA. We verified that this is not the case, as recB mRNA concentration is constant across cell size distribution (see the figure below or Figure S5a from Supplementary Information).
  
  Therefore, we do not think that the measurements of recB gene copy would change our conclusions. We agree that measuring recB gene copies could help to investigate the reason behind the lower recB mRNA concentration under the perturbed conditions as this could be due to lower DNA content or due to shortage of resources (such as RNA polymerases). However, this is a side observation we made rather than a critical result, whose investigation is beyond the scope of this manuscript.
  
  Author response image 1.
  
  (2) RecB as a proxy for RecBCD. Authors suggest that RecB levels are regulated by hfq. However, how does this regulatory circuit affect the levels of RecC and RecD? Ratio of the three proteins has been shown to be important for the function of the complex.
  
  A full discussion of RecBCD complex formation regulation would require a complete quantitative model based on precise information on the dynamic of the complex formation, which is currently lacking.
  
  We can however offer the following (speculative) suggestions assuming that all three subunits are present in similar abundance in native conditions (DOI: 10.1038/s41598019-44278-0 for RecB and RecC). As the complex is formed in 1:1:1 ratio (DOI: 10.1038/nature02988), we propose that the regulation mechanism of RecB expression affects complex formation in the following way. If the RecB abundance becomes lower than the level of RecC and RecD subunits, the complex formation would be limited by the number of available RecB subunits and hence the number of functional RecBCDs will be decreased. On the contrary, if the number of RecB is higher than the baseline, then, especially in the context of low numbers, we would expect that the probability of forming a complex RecBC (and then RecBCD) will be increased. Based on this simple explanation, we might speculate that regulation of RecB expression may be sufficient to regulate RecB levels and RecBCD complex formation. However, we feel that this argument is too speculative to be added to the manuscript.
  
  (3) Role of Hfq in RecB regulation. While authors show the role of hfq in recB translation regulation in non-damage conditions, it is unclear as to how this regulation occurs under damage conditions.
  
  (a) Have the author carried out recB mRNA and protein measurement in hfqdeleted cells under ciprofloxacin treatment?
  
  We attempted to perform experiments in hfq mutants under ciprofloxacin treatment. However, the cells exhibited a very strong and pleiotropic phenotype: they had large size variability and shape changes and were also frequently lysing. Therefore, we did not proceed with mRNA and protein quantification because the data would not have been reliable.
  
  (b) How do the authors propose that Hfq regulation is alleviated under conditions of DNA damage, when RecB translation efficiency increases?
  
  We propose that Hfq could be involved in a more global response to DNA damage as follows.
  
  Based on a proteomic study where Hfq protein abundance has been found to decrease (~ 30%) upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002), we suggest that this could explain the increased translational efficiency of RecB. While Hfq is a highly abundant protein, it has many targets (mRNA and sRNA), some of which are also highly abundant. Therefore the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes (DOI: 10.1046/j.13652958.2003.03734.x), where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding. We reason that upon DNA damage, a moderate decrease in the Hfq protein abundance (30%) can lead to a similar competition among Hfq targets where high-affinity targets outcompete low-affinity ones as well as low-abundant ones (such as recB mRNAs). Thus, the regulation of lowabundant targets of Hfq by moderate perturbations of Hfq protein level is a potential explanation for the change in RecB translation that we have observed. Potential reasons behind the changes of Hfq levels upon DNA damage would be interesting to explore, however this would require a completely different approach and is beyond the scope of this manuscript.
  
  We have modified the text of the discussion to explain our reasoning:
  
  Lines 384-391: “A modest decrease (~30%) in Hfq protein abundance has been seen in a proteomic study in E. coli upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002). While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, it is conceivable that even modest changes in Hfq availability could result in significant changes in gene expression, and this could explain the increased translational efficiency of RecB under DNA damage conditions.”
  
  (c) Is there any growth phenotype associated with recB mutant where hfq binding is disrupted in damage and non-damage conditions? Does this mutation affect cell viability when over-expressed or under conditions of ciprofloxacin exposure?
  
  We checked the phenotype and did not detect any difference in growth or cell viability affecting the recB-5 UTR* mutants either in normal conditions or upon exposure to ciprofloxacin. However, this is expected because the repair capacity is associated with RecB protein abundance and in this mutant, while translational efficiency of recB mRNA increases, the level of RecB proteins remains similar to the wild-type (Figure 5E).
  
  Minor points:
  
  (1) Introduction - authors should also discuss the role of RecFOR at sites of fork stalling, a likely predominant pathway for break generated at such sites.
  
  The manuscript focuses on the repair of DNA double-strand breaks (DSBs). RecFOR plays a very important role in the repair of stalled forks because of single-strand gaps but is not involved in the repair of DSBs (DOI: 10.1038/35003501). We have modified the beginning of the introduction to mention the role of RecFOR.
  
  Lines 35-39: “For instance, replication forks often encounter obstacles leading to fork reversal, accumulation of gaps that are repaired by the RecFOR pathway (DOI: 10.1038/35003501) or breakage which has been shown to result in spontaneous DSBs in 18% of wild-type Escherichia coli cells in each generation (DOI: 10.1371/journal.pgen.1007256), underscoring the crucial need to repair these breaks to ensure faithful DNA replication.”
  
  (2) Methods: The authors refer to previous papers for the method used for single RNA molecule detection. More information needs to be provided in the present manuscript to explain how single molecule detection was achieved.
  
  We added additional information in the method section on the fitting procedure allowing quantifying the number of mRNAs per detected focus.
  
  Lines 515-530: “Based on the peak height and spot intensity, computed from the fitting output, the specific signal was separated from false positive spots (Fig. S1a). To identify the number of co-localized mRNAs, the integrated spot intensity profile was analyzed as previously described (DOI: 10.1038/nprot.2013.066). Assuming that (i) probe hybridization is a probabilistic process, (ii) binding each RNA FISH probe happens independently, and (iii) in the majority of cases, due to low-abundance, there is one mRNA per spot, it is expected that the integrated intensities of FISH probes bound to one mRNA are Gaussian distributed. In the case of two co-localized mRNAs, there are two independent binding processes and, therefore, a wider Gaussian distribution with twice higher mean and twice larger variance is expected. In fact, the integrated spot intensity profile had a main mode corresponding to a single mRNA per focus, and a second one representing a population of spots with two co-localized mRNAs (Fig. S1b). Based on this model, the integrated spot intensity histograms were fitted to the sum of two Gaussian distributions (see equation below where a, b, c, and d are the fitting parameters), corresponding to one and two mRNA molecules per focus. An intensity equivalent corresponding to the integrated intensity of FISH probes in average bound to one mRNA was computed as a result of multiple-Gaussian fitting procedure (Fig. S1b), and all identified spots were normalized by the one-mRNA equivalent.
  
  “
  
  Reviewer #2 (Recommendations For The Authors):
  
  Overall the work is carefully executed and highly compelling, providing strong support for the conclusions put forth by the authors.
  
  One point: the potential biological consequences of the post-transcriptional mechanism uncovered in the work would be enhanced if the authors could 1) tune RecB protein levels and 2) directly monitor the role that RecB plays in generating single-standed DNA at DSBs.
  
  We agree that testing viability of cells in case of tunable changes in RecB levels would be important to further investigate the biological role of the uncovered regulation mechanism. However, this is a very challenging experiment as it is technically difficult to alter the low number of RecB proteins in a controlled and homogeneous across-cell manner, and it would require the development of precisely tunable and very lowabundant synthetic designs.
  
  We did monitor real-time RecB dynamics by tracking single molecules in live E. coli cells in a different study (DOI: 10.1101/2023.12.22.573010) that is currently under revision. There, reduced motility of RecB proteins was observed upon DSB induction indicating that RecB is recruited to DNA to start the repair process.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.10.23.465540v3
www.biorxiv.org www.biorxiv.org

New submission 22/06/2023, 10:44:06

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response:
  
  The following is the authors' response to the original reviews.
  
  Thank you for considering our manuscript “An Unexpected Role of Neutrophils in Clearing Apoptotic Hepatocytes In Vivo". We also thank the referees for their review. We have addressed their comments in detail and added new data to buttress our conclusions.
  
  Reviewer #1 (Public Review):
  
  This study by Cao et al. demonstrates role of Neutrophil in clearing apoptotic hepatocytes by directly burrowing into the apoptotic hepatocytes and ingesting the effete cells from inside without causing inflammation. The authors applied intravital microscopy, Immunostaining and electron microscopy to visualize perforocytosis of neutrophil in hepatocytes. They also found that neutrophil depletion impairs the clearance of apoptotic hepatocytes causing impaired liver function and generation of autoantibodies, implying a role of defective neutrophil- mediated clearance of apoptotic cells in Autoimmune Liver disease. The experiments were well designed and conducted, the results were reasonably interpreted, and the manuscript was clearly written with logical inputs.
  
  Thank you for your comments.
  
  One weak point is that the signals/mechanisms that determine why neutrophil specifically target apoptotic hepatocytes in liver and no other organs or cells is not clearly understood.
  
  We are still studying why neutrophils selectively phagocytose hepatocytes but not HUVEC or 293 cells. We have some intriguing preliminary data so far showing that apoptotic 293 cells have no significant increase of IL-1β production as compared with their nonapoptotic controls; both apoptotic 293 cells and HUVECs do not have increased surface selectin proteins (new Fig. S3C).
  
  Reviewer #2 (Public Review):
  
  […] By examination of HE-stained, noncancerous liver tissue sections from patients with hepatocellular carcinoma and hepatic hemangioma, the authors observed that cells with neutrophil nuclear morphology were inside apoptotic hepatocytes. The authors also further characterized this observation by staining the sections with neutrophil and apoptosis markers. In addition, the authors observed the same phenomena in mouse livers using intravital microscopy, which also recorded the time course of the disappearance of a neutrophil-associated apoptotic cell. The author went on further characterization of neutrophil-mediated efferocytosis of cultured hepatic cells in vitro and demonstrated the process was specific for apoptotic hepatic cells, but not HEK293 or endothelial cells. The in vitro system was then used to characterize the molecular bases for neutrophil-mediated efferocytosis of apoptotic hepatic cells. The evidence was provided to suggest that IL1b and IL-8 released from and selectins upregulated in apoptotic hepatic cells were important. Importantly, the authors used two methods to deplete the neutrophils and showed that the neutrophil depletion increased apoptotic cells in livers. Finally, the authors showed that neutrophil depletion caused defects in liver function parameters. At the end, the authors presented evidence to suggest that AIL disease may be due to defective neutrophils that fail to perform "perforocytosis."
  
  Thank you for your comments.
  
  Point #1. Although the evidence in its totality indicates that neutrophils burrow into apoptotic hepatocytes, the significance of this "perforocytosis" phenomenon and the circumstances under which it may occur remain to be better defined. In both neutrophil depletion models, the TNUEL-positive cells were not definitively identified rather than assuming they were hepatocytes.
  
  Anatomically, the apoptotic hepatocytes are randomly distributed in the hepatic plate from the central vein to the portal region (please refer to the image below: hematoxylin staining of liver tissues, black arrowhead indicates perforocytosis sites).
  
  Author response image 1.
  
  Histologically, the structure of liver/hepatic lobe are well defined, and the cell types in the livers are easy to histologically identify based on their location, morphology and the relationship to hepatic plate and sinusoid. In addition, the hepatocytes are well known for its rich cytoplasmic components, cellular connection and prominent large round nucleus. Thus, hepatocytes are very easy to identify even without using specific molecular markers such as E-cadherin or albumin. Based on these characteristics, the TUNEL positive cells that we displayed in Fig. 5A are apoptotic hepatocytes.
  
  Point #2. In addition, there are discrepancies in the number of neutrophils and apoptotic cells in mouse liver studies; Fig. 2a WT (many neutrophils; locations unclear) vs Fig. 5A Ctr (a few neutrophils that appear in or near a vessel), and Fig. 2a DTR (a few apoptotic cells) vs Fig. 5A Depletion (many apoptotic cells).
  
  In response, Fig. 2A demonstrates a larger area of the mouse liver (bar, 100 µm), while Fig. 5A exhibits a relatively small area of the liver sample (bars, 20 µm for Ctrl and 15 µm for DTR). Similarly, apoptotic cells in Fig. 2A DTR need to zoom in to quantify. We apologize for the confusion, and we did quantify the apoptotic cells in Fig.2A WT vs DTR (see the bar graph next to the images in Fig. 2A).
  
  Point #3. Importantly, Fig 5a Ctrl, which is presumably a section from a mouse without any surgical treatment or without inflammation, the sole TUNNEL signal does not appear to be associated with neutrophils. Does this mean that "perforocytosis" primarily occurs in inflamed livers (Of note, human liver samples in Fig 1 are from patient with tumors. There should be inflammation in the livers of these patients).
  
  In Fig 5A Ctrl, the TUNEL signal indicates apoptotic hepatocytes. The neutrophils (stained with anti-NE antibody, red) are associated with the apoptotic hepatocyte (Fig. 5A). We observed that perforocytosis primarily occurs in normal noninflamed livers.
  
  Human liver samples in Fig 1 are from patient with tumors, hence it is possible that neutrophil burrowing is somehow associated with cancerous/inflammatory livers as the reviewer pointed out. This possibility was ruled out based on our method of sample preparation and experimental results themselves.
  
  1) Both noncancerous and cancerous liver samples were sliced based on the anatomical appearance of normal and cancer tissues (differences were rather easy to identify, and these samples were prepared by highly experienced pathologists from the Liver Cancer Center of Zhongshan Hospital, Shanghai). Furthermore, the results were confirmed by determining whether the surrounding tissue contained microlesions characteristic of metastatic tumors. We only counted apoptotic hepatocytes in noncancerous regions having normal liver lobes and morphologically normal hepatocytes, plates, sinusoid and Kupffer cells. We also excluded hepatoma, chronic inflammatory regions, and necrotic regions.
  
  2) We did not observe recruitment of neutrophils into apoptotic HCC cells, indicating that the clearance of apoptotic cancer cells was not mediated by neutrophils (unpublished observations).
  
  3) It is hard for us to obtain normal human liver samples; however, we did study samples from patients with liver hemangioma characterized by aberrant vasculature in livers but with normal liver functions and the structure of hemangioma livers that we analyzed are nearly identical to a healthy liver in histology (these liver samples contained no cancerous regions and there was no apparent cirrhosis or inflammation). And here we obtained similar results (these are shown in Fig. 1B; a total of 40 apoptotic hepatocytes were examined).
  
  4) Our data from normal mouse livers, isolated primary cells (hepatocytes and neutrophils) and cell lines (NCTC and HL60) all confirmed the central findings in this paper (Fig. 2, 3).
  
  Point #4. The data on human AIL patient neutrophils raises more questions: how many AIL patients have been examined? Do these AIL neutrophils lack IL1, IL8 receptors, and/or selectin ligands? Are there increases in apoptotic hepatocytes in AIL patients?
  
  In response, we have analyzed 16 AIL patient samples (see table below).
  
  Author response table 1.
  
  We performed microarray assay to screen the differential gene expression of neutrophils from normal and liver autoimmune patients. We have identified that IL-1β receptor, IL1R1 and selectin binding protein, P- selectin glycoprotein ligand 1 (PSGL-1) were all decreased in neutrophils from the AIL patients (new Fig 7D). These findings are consistent with our observations using cells and mouse models.
  
  Point #5. Additionally, the overall numbers of apoptotic cells even in the absence of neutrophils are rare; thus, it is questionable that such rarity of apoptotic cells can cause significant AIL phenotypes.
  
  We quantified apoptotic liver cells in percentages instead of overall numbers (Fig. 5, we were not able to precisely calculate the overall numbers, which could be large since billions of cells undergoing apoptosis daily). Depletion of neutrophils increased the percentage of apoptotic cells about 5-6-fold in livers, and we observed the generation of autoantibodies (Fig. 6).
  
  Reviewer #1 (Recommendations For The Authors):
  
  This study by Cao et al. was well designed and conducted, the results were reasonably interpreted, and the manuscript was clearly written with logical inputs.
  
  It would further gain the significance of this study if authors could address the following questions:
  
  1. What are the mechanisms/ signals that prevents AIL Liver neutrophils from burrowing into hepatocytes?
  
  We have identified that IL-1β receptor, IL1R1 and selectin binding protein, P-selectin glycoprotein ligand 1 (PSGL-1) were all decreased in neutrophils from the AIL patients (new Fig 7D).
  
  2. Have authors looked if autoantigens expressed on hepatocytes, which are often found in autoimmune liver disease trigger signaling events that activate neutrophils to burrow?
  
  Thank you for the comment, we have not examined autoantigens expressed in hepatocytes and plan to carry out this research as suggested.
  
  3. Is perforocytosis observed in apoptotic hepatocytes induced by different agents like LPS, TNF-a , rapamycin, alcohol etc?
  
  We did not observe perforocytosis in LPS or TNF-a treated hepatocytes. One possible reason is that LPS or TNF-a we used induced massive necrosis instead of apoptosis. Howere, we did observe neutrophil perforocytosis in FasL-induced apoptotic hepatocytes (unpublished observations).
  
  Reviewer #2 (Recommendations For The Authors):
  
  In addition to the questions raised in the "Public review" section, the authors are also recommended to address the following issues:
  
  1) Why is CD11b+ not associated with the apoptotic sites as neutrophils express CD11b
  
  We have co-immunostained human liver samples with CD11b antibody (from Abcam: ab133357) and MPO antibody (from R&D: AF3667) and observed that tissue infiltrating neutrophils in livers have low to undetectable levels of CD11b expression (please refer the image below; white arrowheads point to neutrophils). Few CD11b+ cells in liver tissues express MPO (the CD11b+ cells are mostly macrophages, unpublished observations).
  
  Based on these data, we conclude that CD11b is hardly expressed in neutrophils inside livers.
  
  Author response image 2.
  
  2) Can TUNEL signals in Fig. S1C be from apoptotic neutrophils?
  
  In response, the fragmentation of nucleus is a hallmark of apoptosis hence TUNEL staining will uniformly label all fragmented parts of apoptotic nucleus. The nucleus of NE+ neutrophils are not labelled by TUNEL staining in Fig. S1C. The TUNEL+ nuclear fragments seen inside neutrophils are nuclear debris of apoptotic hepatocytes phagocytosed by neutrophils (Fig. S1C).
  
  3) The Fig 2B experiment may be done with induced apoptosis so that neutrophil burrowing steps may be recorded from the very beginning and a better time course for the entire process can be assessed.
  
  Thank you for the suggestions, we had tried many times with various conditions, yet still had no success to capture the very beginning of perforocytosis in vivo. We are continuing to work on this.
  
  4) In "we found thatU937 cells exhibited much lower phagocytosis of apoptotic NCTC cells than did HL60 cells (Fig. S2B, C)," the citation should be only S2C
  
  Thank you for pointing this out, we have corrected this in the manuscript.
  
  5) Both neutrophil depletion models cause neutrophil death, which may complicate the interpretation of the liver function and AIL disease phenotypes. A neutropenic model such as G-CSFR−/− or Cebpe-/- mice may be used to avoid the caveat of antibody/DTR-dependent depletion models.
  
  Thank you for this thoughtful suggestion. We have also induced AIL phenotypes in mice by using α- Galcer. α-Galcer did not cause neutrophil death but impaired neutrophil perforocytosis and futher generated AIL phenotypes in mice (unpublished observations). We plan to perform the simiarl experiments in G-CSFR−/− or Cebpe−/− mice as the reviewer suggested.
  
  6) RNAi silencing experiments need additional controls for off-target effects
  
  These RNAi silencing constructs were purchased from Santa Cruz Biotechnology and the off-target effects have been tested by the company. No significant off-target effects have been detected according to the manufacture report.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.08.527616v2
www.biorxiv.org www.biorxiv.org

Interrogating basal ganglia circuit function in Parkinson’s disease and dystonia

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment:
  
  This manuscript is a valuable study of the responses of GPi neurons to DBS stimulation in human PD and dystonia patients and it finds evidence for altered short-term and long-term plasticity in response to DBS between the two patient populations. This data set is of interest to both basic and clinical researchers working in the field of DBS and movement disorders. While there was enthusiasm for the potential significance of these findings, support for their conclusions was incomplete. Thir data may be indicative of more interesting and complex interpretations than currently considered in the article.
  
  The authors would like to express their gratitude to the Editorial Team and Reviewers for their invaluable feedback which helped to improve the manuscript.
  
  Reviewer #1:
  
  Summary:
  
  Sumarac et al investigate differences in globus pallidus internus (GPi) spike activity and short- and long-term plasticity of direct pathway projections in patients with Parkinson's disease (PD) and dystonia. Their main claims are that GPi neurons exhibit distinct characteristics in these two disorders, with PD associated with specific power-frequency oscillations and dystonia showing lower firing rates, increased burstiness, and less regular activity. Additionally, long-term plasticity and synaptic depression appear to differ between the two conditions. The authors suggest that these findings support the concept of hyperfunctional GPi output in PD and hypofunctional output in dystonia, possibly driven by variations in the plasticity of striato-pallidal synapses. Overall enthusiasm is relatively high, but I think the discussion omits discussing findings that don't align well with standard models.
  
  Strengths:
  
  These types of studies are valuable as the data arise from patients who have dystonia or PD. This could provide unique insights into disease pathophysiology that might not be recapitulated in animal systems work.
  
  Thank you for the positive feedback.
  
  Weaknesses:
  
  - The rate model and indirect/direct pathway ideas lack explanatory power; too much of the hypothesis generation and discussion in this manuscript is set in the context of these old ideas. Their data in my view emphasize this somewhat emphatically. Most patients with the 'hypokinetic' movement disorder PD have dystonia as a part of their motor features. Dystonia is a form of excessive muscle activation that on the one hand is 'hyperkinetic' but on the other usually decreases the speed of motor tasks, even in patients with primary dystonia. Similarly, PD patients display a bewildering variety of hyperkinetic manifestations as well (rest tremor, dystonia, dyskinesia). If these are truly independent classifications, i.e. hyper- versus hypo-kinetic, the authors must acknowledge that there is considerable overlap in the spike activity across groups - numerous dystonia patients display higher discharge rates than the majority of the PD sample. Based on the firing rate alone, it would not be possible to distinguish these groups.
  
  Thank you for your insightful comments regarding the discussion of the rate model and the distinction between hyperkinetic and hypokinetic movement disorders. We acknowledge that the rate model, primarily derived from limited number of animal subjects [1], may not fully encapsulate the complexities of Parkinson's disease (PD) and dystonia. Our study aimed to validate animal model findings in humans by correlating single-neuron features with disease symptom severity. However, we concur with the Reviewer’s comment regarding the overlapping motor features in hypokinetic and hyperkinetic disorders. We can speculate that the overlap in neuronal properties may be reflected in the overlap of, for example, hyperkinetic features being also present in PD, as suggested by the Reviewer. Per the Reviewer’s request, we have now acknowledged this notion in the manuscript. Interestingly, hypokinetic symptoms have been reported to occur in dystonia in response to GPi-stimulation and have been associated with beta activity in the LFP [2], which reinforces the notion that neural activity may be more related to specific symptoms rather than diseases as a whole. Supplementing our analyses, in addition to total UPDRSIII scores, we have now provided correlations with only hypokinetic (i.e. bradykinesia) subscores of the UPDRSIII to focus on more direct assessment of hypokinetic features in PD versus hyperkinetic features in dystonia. We have updated our methods and results accordingly.
  
  [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.
  
  [2] R. Lofredi et al., “Pallidal Beta Activity Is Linked to Stimulation-Induced Slowness in Dystonia,” Movement Disorders, vol. 38, no. 5, pp. 894–899, 2023, doi: 10.1002/mds.29347.
  
  Amendments to the manuscript:
  
  “Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients.”
  
  “Given that UPDRSIII includes both hypokinetic and hyperkinetic symptoms of PD, we further sought to disaggregate the score by only considering items 23-26 in UPDRSIII, which assess hypokinetic symptoms of PD.”
  
  “… with a marginally stronger correlation for PD hypokinetic symptoms only (items 23-26 of UPDRSIII, Spearman's rho=0.32, p=.0330; Supplementary Fig. 3)”
  
  Supplementary Fig. 3: We provided correlations with hypokinetic (i.e., bradykinesia) subscore of the UPDRSIII. There is very little difference between correlation results of UPDRSIII total (Fig. 1) and the hypokinetic-only subscore (Supplementary Fig. 3).
  
  “though our results do not change substantially when only hypokinetic PD features are considered (Supplementary Fig. 3).”
  
  - If beta power is pathognomonic of parkinsonism, the authors found no differences in beta-related spike discharges across the groups. One would have predicted greater beta power in PD than in primary dystonia. This should be discussed explicitly and an interpretation should be provided.
  
  We agree with the reviewer that considering the previous LFP literature, one might have expected a difference in single-neuron oscillation power between PD and dystonia. However, while prior studies [3], [4] have reported significant differences in oscillatory power between the two diseases, researchers examined local field potential (LFP) activity only. Other work [5] in non-human primates investigated single-neuron oscillations and reported no differences between PD and dystonia at the single-neuron level, in line with our findings. However, despite the lack of difference in overall power presented here, we provide evidence that the strength of the beta-frequency single-neuron oscillations nevertheless correlates with symptom severity in PD but not dystonia; whereas the strength of the theta-frequency single-neuron oscillations correlates with symptom severity in dystonia but not PD.
  
  [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.
  
  [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.
  
  [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.
  
  Amendments to the manuscript:
  
  “Although previous research has reported differences in the LFP power between PD and dystonia [27,28], a study in non-human primates found no such differences in single-neuron oscillatory strength [8], as reflected in our findings. However, despite a lack of difference in overall power across disorders, we were able to derive disease/frequency-specific relationships with respect to clinical scores (Fig. 1C; oscillatory features).”
  
  - The study lacks a healthy control group, making it challenging to differentiate disease-specific findings from normal variations in GPi activity and plasticity. Although this is acknowledged in the discussion, this complicates the interpretation of the results. The sample sizes for PD and dystonia patients are relatively small, and the study combines various forms of dystonia, potentially masking subtype-specific differences. A larger and more homogenous sample could enhance the study's reliability.
  
  Indeed, intraoperative microelectrode recordings cannot be obtained in healthy individuals. We agree with the Reviewer that this limits the interpretation of the data. However, directly comparing clinical correlations with single neuron readouts between two distinct clinical entities may, to some degree, compensate for the lack of healthy control data. This contrast, while not providing a healthy control, is still able to point to disease-specific differences. This approach has previously been used to comparisons at the LFP level [6]. While the sample size is indeed small, it is comparable or even higher to similar studies that have investigated the relation of symptom severity of single neuron readouts [7]. The Reviewer is right in that we do not differentiate between generalized or cervical dystonia. We chose to do so because our subgroup analysis provided in the Supplementary Material did not suggest specific differences; though there is insufficient data from specific dystonia subtypes to make formal statistical comparisons. Indeed, future studies should investigate specific subtypes further.
  
  [6] R. Lofredi et al., “Pallidal beta bursts in Parkinson’s disease and dystonia,” Movement Disorders, vol. 34, no. 3, pp. 420–424, 2019, doi: 10.1002/mds.27524.
  
  [7] A. Gulberti et al., “Subthalamic and nigral neurons are differentially modulated during parkinsonian gait,” Brain, p. awad006, Feb. 2023, doi: 10.1093/brain/awad006.
  
  Amendments to the manuscript:
  
  “While we did not observe differences across dystonia subtypes (Supplementary Fig. 1), future studies in larger patient cohorts would are warranted. Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”
  
  - While they mention that data are available on request, sharing data openly would increase transparency and allow for independent validation of the results. It is unclear how sharing deidentified data would compromise patient privacy or present ethical issues of any kind, as claimed by the authors.
  
  Much of the data in question were collected under an old Research Ethics Board (REB) protocol which did not address data sharing. However, we have consulted with our REB and gained retroactive permission to post de-identified data which are now available in the Supplementary Material.
  
  Amendments to the manuscript:
  
  “The data that support the findings of this study are available in a public repository (see: https://osf.io/nqzd2/)”
  
  - They appropriately acknowledge several limitations, such as the inability to use pharmacological interventions and the need for further research in the chronic setting.
  
  Thank you for the comment.
  
  - The manuscript highlights differences in GPi activity and plasticity between PD and dystonia but could provide more context on the clinical implications of these findings, particularly regarding what the implications would be novel paradigms for deep brain stimulation.
  
  Thank you for the comment. Our finding that striato-pallidal plasticity decays more slowly in dystonia compared to PD may relate to the slower time course of symptom relief associated with GPi-DBS in dystonia, as presently outlined in the discussion. On the other hand, symptoms are also suppressed for longer after the cessation of stimulation in dystonia compared to PD, which may reflect long-term plastic changes [8], [9]. In the context of clinical DBS, plasticity modulation may be facilitated by intermittent stimulation algorithms that may achieve the necessary plastic network change by applying stimulation for a defined time but could then be switched off for improved energy consumption and perhaps as a means of mitigating side effects. DBS devices with chronic sensing may enable monitoring of evoked potential amplitudes for future adaptive stimulation applications; however, currently available devices are limited by low sampling rates, but future devices may overcome these technical limitations.
  
  [8] D. Ruge et al., “Deep brain stimulation effects in dystonia: time course of electrophysiological changes in early treatment.,” Mov Disord, vol. 26, no. 10, pp. 1913–1921, Aug. 2011, doi: 10.1002/mds.23731.
  
  [9] D. Ruge et al., “Shaping reversibility? Long-term deep brain stimulation in dystonia: the relationship between effects on electrophysiology and clinical symptoms.,” Brain, vol. 134, no. Pt 7, pp. 2106–2115, Jul. 2011, doi: 10.1093/brain/awr122.
  
  Amendments to the manuscript:
  
  “While further work is certainly required to better understand disease-related differences in plasticity, our findings may nevertheless motivate the development of periodic intermittent (ON/OFF) DBS strategies which periodically modulate synaptic plasticity for therapeutic benefits which outlast stimulation delivery, as have recently been employed in preclinical work [52,53].”
  
  - While statistical tests are mentioned, the manuscript could benefit from a more detailed presentation of statistical methods, including correction for multiple comparisons and effect sizes. Did the authors consider different recording sites within each patient as independent observations? I think this is not appropriate if that was the case.
  
  Thank you for your constructive feedback. In response to the concerns regarding the statistical methods, we have expanded our analysis to provide a more comprehensive statistical overview. Specifically, we implemented the Bonferroni correction for multiple comparisons across each of the seven tests conducted for the differences in single-neuron features between PD and dystonia. The adjustment revealed that only the burst index and coefficient of variation retain statistical significance after post hoc correction, while the firing rate does not. Results of the Bonferroni corrections are now presented in Supplementary Table 3. Reflecting on the initial comment about firing rates between the two disorders, our updated findings underscore the limitation of using firing rates alone to differentiate between PD and dystonia, and instead, our analysis now points to burstiness and firing irregularity as more reliable discriminators. Regarding the clinical correlations, we refined our statistical analysis by employing nonparametric Monte Carlo permutation tests with 5000 permutations, as used in recent work [10], [11]. This method is chosen for its independence from assumptions regarding data distribution. Specifically, we computed and tested the Spearman rho for significance using the permutation test. Then, to address multiple comparisons, we controlled the false discovery rate (FDR) using the Benjamini-Hochberg procedure. Results of these comparisons are now presented in Supplementary Table 4. Lastly, to address the concern regarding recording site independence within patients, we updated our plasticity analysis methodology. In our study, 6 out of 18 patients had multiple recording sites. Thus, to account for this, we employed linear mixed models (LMM) with patient ID as a random factor to appropriately account for the non-independence of these observations.
  
  [10] v Lofredi et al., “Dopamine-dependent scaling of subthalamic gamma bursts with movement velocity in patients with Parkinson’s disease,” Elife, vol. 7, p. e31895, Feb. 2018, doi: 10.7554/eLife.31895.
  
  [11] R. Lofredi et al., “Subthalamic beta bursts correlate with dopamine-dependent motor symptoms in 106 Parkinson’s patients,” npj Parkinsons Dis., vol. 9, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41531-022-00443-3.
  
  Amendments to the manuscript:
  
  “For comparing differences in single-neuron features between PD and dystonia, significant results were followed up with post hoc multiple comparisons with a Bonferroni correction. For clinical correlations, non-parametric Monte Carlo permutation tests were used, avoiding assumptions about data distribution. The tested values were randomly shuffled 5,000 times to form a probability distribution, with the p-value reflecting the original sample rank. All tests underwent adjustment for multiple comparisons, controlling the false discovery rate (FDR) at an α-level of 0.05.”
  
  “analyzed using a linear mixed model (LMM) with patient ID as a random factor, normalized fEP amplitudes as the response variable, and epoch as a fixed effect”
  
  “using a LMM with patient ID as a random factor”
  
  “However, none of the clinical correlations survived Benjamini-Hochberg FDR-correction for multiple comparisons (Supplementary Table 4).”
  
  “In PD, fEP amplitudes were significantly greater after compared to before HFS (LMM; p = .0075, effect size = 5.42 ± 1.79; Fig. 2C), while in dystonia, the increase approached but did not reach statistical significance (LMM; p = .0708, effect size = 2.82 ± 1.45; Fig. 2C).”
  
  All statistics were updated in the results section and the figures.
  
  “Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”
  
  - The manuscript could elaborate on the potential mechanisms underlying the observed differences in GPi activity and plasticity and their relevance to the pathophysiology of PD and dystonia.
  
  Thank you for your feedback. We have enhanced the manuscript by integrating additional discussions on previous studies related to plasticity in dystonia and PD (e.g., [12], [13]), which highlight excessive plasticity in dystonia. Although these may appear contradictory to our findings of increased plasticity in PD compared to dystonia, we propose (also justified by previous literature) that chronic dopaminergic medication use may lead to synaptic over-sensitization, which has been hypothesized as a biological mechanism underlying levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].
  
  [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.
  
  [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.
  
  [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.
  
  Amendments to the manuscript:
  
  “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the magnitude of direct pathway plasticity [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”
  
  Reviewer #2:
  
  Summary:
  
  The authors investigated how neuronal activity and metrics of plasticity using local electrical stimulation in the GPi were different between Parkinson's disease and dystonia patients.
  
  Strengths:
  
  The introduction highlights the importance of the work and the fundamental background needed to understand the rest of the paper. It also clearly lays out the novelty (i.e., that the dynamics of plastic effects in GPi between dystonia and PD have not been directly compared).
  
  The methods are clearly described and the results are well organized in the figures.
  
  The results are strong with measurements from a large population of patients for each disease group and with distinct findings for each group.
  
  Thank you for the kind appraisal.
  
  Weaknesses:
  
  The discussion was hard to follow in several places, making it difficult to fully appreciate how well the authors' claims and conclusions are justified by their data, mostly in relation to the plasticity results. It may help to summarize the relevant findings for each section first and then further expand on the interpretation, comparison with prior work, and broader significance. Currently, it is hard to follow each section without knowing which results are being discussed until the very end of the section. With the current wording in the "Neuronal correlates.." section, it is not always clear which results are from the current manuscript, and where the authors are referring to past work.
  
  Thank you for this feedback. The main findings are now summarized in a paragraph at the beginning of the Discussion section, before being discussed in comparison to other studies in the literature in subsequent sub-sections. Moreover, throughout the Discussion, findings from our study are now always reflected by a reference to the relevant figure to more easily differentiate current findings from previous literature. Additionally, Discussion sub-sections have been expanded to consider additional literature in response to various comments throughout the Review process (including the subsequent Review comment).
  
  Amendments to the manuscript:
  
  Paper findings are referenced to figures which depict the results at hand; discussion sub-sections expanded; and the following text has been added at the start of the Discussion:
  
  “In particular, we found that GPi neurons exhibited lower firing rates, but greater burstiness and variability in dystonia compared to PD (Fig. 1A). While no differences were found in the power of spiketrain oscillations across disorders (Fig. 1B), we found that PD symptom severity positively correlated with the power of low-beta frequency spiketrain oscillations, whereas dystonia symptom severity positively correlated with the power of theta frequency spiketrain oscillations (Fig. 1C). Dystonia symptom severity moreover correlated negatively with firing rate, and positively with neuronal variability. These results are discussed in greater detail with respect to previous literature in the subsequent Discussion section entitled “Neuronal correlates of PD and dystonia.” In response to electrical stimulation (protocol depicted in Fig. 2A), we found significant increases in the amplitudes of positive-going stimulation-evoked field potential amplitudes (considered to reflect striato-pallidal synaptic strength; as exemplified in Fig. 2B) before versus after HFS in both PD and dystonia (Fig. 2C); with recording sites in PD exhibiting significantly greater increases (Fig. 2D). While changes to evoked potential amplitude before versus after stimulation can be considered to be reflective of long-term plasticity [15,18], the dynamics of evoked potentials during HFS (as depicted in Fig. 2E) can be considered as reflective of short-term synaptic plasticity [18,21]. To this end, our findings are suggestive of faster latency synaptic depression in PD compared to dystonia (Fig. 2F/G). Plasticity findings are discussed in greater detail in the Discussion section entitled “Direct pathway plasticity.”
  
  Also, I felt that more discussion could be used to highlight the significance of the current results by comparing and/or contrasting them to prior relevant work and mechanisms. The novelty or impact is not very clear as written. Could this be further substantiated in the Discussion?
  
  Thank you for the feedback. The discussion has been expanded to include additional literature that is relevant to the findings reported in the manuscript. For example, with regards to the neuronal correlates sub-section, we now highlight the important findings [15] that show changes to the discharge rates and oscillatory tendencies of GPi neurons in non-human primates in response to staged MPTP applications to progressively titrate motor severity; these results substantiate our lack of correlation with firing rates in PD, and presence of a clinical correlation with beta oscillations. We additionally now emphasize human studies that found LFP power difference between PD and dystonia [3], [4]; but simultaneously highlight studies that did not find such differences in spike-train oscillations (in non-human primates) [5], which is reflective of our own findings. With regards to our plasticity sub-section, we have added new content related to previous literature on plasticity in dystonia and PD (also addressed in response to a query from Reviewer #1). For example, we bring to light a variety of previous studies [12], [13] emphasizing excessive plasticity in dystonia. However, while such studies may seem to contradict our findings of greater plasticity in PD compared to dystonia, we additionally provide hypotheses (justified by previous literature) that prolonged used of dopaminergic medication may result in synaptic over-sensitization, thus giving rise to levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].
  
  [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.
  
  [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.
  
  [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.
  
  [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.
  
  [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.
  
  [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.
  
  [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.
  
  Amendments to the manuscript:
  
  “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression. Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients. While differences in discharge rates were nevertheless observed between PD and dystonia, it may be that the combination of rate and pattern (reflected in the BI and CV) changes best differentiates the two disorders.”
  
  “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation (LTP) at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that LTP effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”
  
  Some specific comments and questions about the Discussion:
  
  Lines 209-211 - This sentence was hard to understand, could it be clarified?
  
  Lines 211-213 - What do phasic and tonic components mean exactly? Could this be specifically defined? Are there specific timescales (as referred to in Intro)?
  
  Lines 215-217 - It's not clear what was delayed in dystonia, and how the authors are trying to contrast this with the faster time course in PD. I think some of this is explained in the introduction, but could also be re-summarized here as relevant to the results discussed.
  
  Lines 223-224 - I'm not sure I follow the implication that network reorganization leads to delayed functional benefits. Could this be further elaborated?
  
  Reply & Amendments to the manuscript: Thank you for your feedback. We've made the following concise revisions to address the comments:
  
  We've clarified lines 209-211 to explain that variations in electrical stimulation effects on pathways in PD and dystonia may reveal the operational mechanisms of DBS, despite a common target:
  
  “The variation in the modulation of these projections / pathways to electrical stimulation may also indicate the mechanism by which DBS operates across PD and dystonia, despite a common stimulation target.”
  
  In response to the second comment on lines 211-213 about phasic and tonic components, we now specify that phasic refers to dynamic muscle contractions, and tonic to continuous muscle contractions, providing clear definitions relevant to our context:
  
  “Clinical studies in dystonia have shown that DBS leads to a more rapid improvement in the transient, dynamic muscle contractions (phasic components) of the disorder when compared to the sustained, continuous muscle contractions (tonic or fixed components) [33]”
  
  For lines 215-217, we've refined our discussion to clearly contrast the delayed response in dystonia with the faster onset in PD:
  
  “This contrast with PD, where the, the maximal clinical response to DBS occurs within a much faster time course [13,36].”
  
  On lines 223-224, we've expanded the explanation of how network reorganization may lead to delayed functional benefits, highlighting adjustments in neural connectivity and synaptic efficacy in response to stimulation:
  
  “which involves adjustments in neural connectivity or synaptic efficacy in response to the stimulation [14,35].”
  
  Could the absence of a relationship between FR and disease in PD be discussed?
  
  Thank you for raising this point. Despite observing higher firing rates in PD compared to dystonia, it is unexpected that these rates do not correlate with symptom severity according to the rate model of PD [1]. However, despite the lack of correlations with firing rates, our findings align with similar animal work of Muralidharan et al. [15], which reported that neuronal firing rates within the GPi of rhesus monkeys did not increase linearly with respect to varying intensities of parkinsonian motor severity. We did however show that low beta oscillatory strength within the GPi may play a significant role in the manifestation of motor symptoms in PD; which is also in line with findings of Muralidharan and colleagues. As per the Reviewer’s request, we have included this content into our discussion.
  
  [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.
  
  [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.
  
  Amendments to the manuscript:
  
  “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression.”
  
  “Indeed, Muralidharan and colleagues [25] also showed linear group-level relationships between low-beta frequency spiketrain oscillations and disease severity in parkinsonian non-human primates, despite the lack of linear relationships with spike discharge rates (as discussed above).”
  
  It wasn't very clear how the direct pathway can be attributed to plasticity changes if the GPi makes up both the direct and indirect pathways. Could this be further clarified?
  
  The reviewer brings up an important nuanced point. Recent work from our lab [16] shows that inhibitory evoked fields in STN (which receives inhibitory fields from GPe; no other inhibitory sources) are persistent with very minimal depression during HFS. On the other hand, inhibitory fields in the SNr (which receives majority of its inhibitory inputs from striatum; though some come by way of GPe as well per anatomical literature) depress quickly. We have previously also shown these rapidly depressing fields in GPi [17], [18], which also receives the majority of its inhibitory inputs via striatum, though some also from GPe. As such, the disaggregation of striatum-mediated versus GPe-mediated inhibitory fields is achieved based on: lack of rapidly depressing inhibitory evoked field potentials in STN (which receives inhibitory inputs via GPe and not striatum), but a common presence of rapidly depressing evoked field potentials in SNr and GPi (which both receive most of their inhibitory inputs from striatum); differences in the morphology of purportedly GPe- (fast latency) versus striatum-mediated (slow latency) evoked field potentials [16]; and the presence of slow latency caudato-nigral evoked field potentials in slices [19] that are reversed by GABA antagonist application [20]. These points are indeed outlined in the first paragraph of the Discussion sub-section “Direct pathway plasticity.” However, we have now additionally added a point to the Limitations that inhibitory inputs to the GPi also come by way of GPe, though in a lesser abundance.
  
  [16] L. A. Steiner et al., “Persistent synaptic inhibition of the subthalamic nucleus by high frequency stimulation,” Brain Stimul, vol. 15, no. 5, pp. 1223–1232, 2022, doi: 10.1016/j.brs.2022.08.020.
  
  [17] L. D. Liu, I. A. Prescott, J. O. Dostrovsky, M. Hodaie, A. M. Lozano, and W. D. Hutchison, “Frequency-dependent effects of electrical stimulation in the globus pallidus of dystonia patients.,” J Neurophysiol, vol. 108, no. 1, pp. 5–17, Jul. 2012, doi: 10.1152/jn.00527.2011.
  
  [18] L. Milosevic et al., “Modulation of inhibitory plasticity in basal ganglia output nuclei of patients with Parkinson’s disease,” Neurobiology of Disease, vol. 124, pp. 46–56, Apr. 2019, doi: 10.1016/j.nbd.2018.10.020.
  
  [19] M. Yoshida and W. Precht, “Monosynaptic inhibition of neurons of the substantia nigra by caudato-nigral fibers,” Brain Res, vol. 32, no. 1, pp. 225–228, Sep. 1971, doi: 10.1016/0006-8993(71)90170-3.
  
  [20] W. Precht and M. Yoshida, “Blockage of caudate-evoked inhibition of neurons in the substantia nigra by picrotoxin,” Brain Res, vol. 32, no. 1, pp. 229–233, Sep. 1971, doi: 10.1016/0006-8993(71)90171-5.
  
  Amendments to the manuscript:
  
  “Indeed, GPi receives the greatest abundance of inhibitory inputs from striatum (direct pathway), but also it also receives inhibitory inputs by way of GPe (indirect pathway). Although we can functionally disaggregate these pathway-specific responses based on differences in morphology and dynamics of GPe-mediated versus striatum-mediated inhibitory fEPs [21]; the possibility of compounded effects cannot be completely ruled out.”
  
  The mechanism of short- and long-term plasticity as applied in the protocols used in this work are outlined in reference to previous citations [15, 16, 18]. Because this is a central aspect of the current work and interpreting the results, it was difficult to appreciate how these protocols provide distinct metrics of short and long-term plasticity in GPi without some explanation of how it applies to the current work and the specific mechanisms. It would also help to be able to better link how the results fit with the broader conclusions.
  
  Short-term plasticity is measured as the dynamic change to the fEP during ongoing HFS. For long-term plasticity analyses, the fEP amplitudes during LFS were compared pre- versus post-HFS. To make this analysis more intuitive we have added a protocol illustration to Fig 2. We have moreover greatly expanded the discussion to include more literature related to disease-specific differences in plasticity, and implications of modulating plasticity using DBS.
  
  Amendments to the manuscript:
  
  Added new panel to Fig 2
  
  Author response image 1.
  
  “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”
  
  In the Conclusion, it was difficult to understand the sentence about microcircuit interaction (line 232) and how it selectively modulates the efficacy of target synapses. Some further explanation here would be helpful. Also, it was not clear how these investigations (line 237) provide cellular-level support for closed-loop targeting. Could the reference to closed-loop targeting also be further explained?
  
  We agree with the reviewer that the current wording may be confusing. We have changed the wording to be clearer. We have additionally added content related to closed-loop DBS based on chronic monitoring of evoked potential responses.
  
  Amendments to the manuscript:
  
  “Furthermore, chronic monitoring of evoked fields may allow for tracking of subcortical neuronal projections as indexed by inhibitory fields reported in this study. microcircuit interaction to selectively modulate the efficacy of target synapses.”
  
  future applications of DBS may also benefit from closed loop tuning of basal-ganglia-thalamo-cortical circuit dynamics and plasticity through chronic monitoring of evoked potential responses [56].
  
  How is the burst index calculated (Methods)?
  
  Thank you for pointing out that the burst index definition was missing from the paper. It has now been added to the manuscript.
  
  Amendments to the manuscript:
  
  “The burst index was computed by taking the ratio of the means from a two-component Gaussian mixture model applied to the log interspike interval distribution, a modification of the previous mode-over-mean ISI method [20]”
  
  Figures and figure captions are missing some details:
  
  Fig. 1 - What does shading represent?
  
  The shading in Fig. 1 illustrates results that were significant before adjustment for multiple comparisons.
  
  Amendments to the manuscript:
  
  “Depicted scatterplots are results that were significant before correction for multiple comparisons”
  
  Fig. 2 - Can the stimulation artifact be labeled so as not to be confused with the physiological signal? Is A representing the average of all patients or just one example? Are there confidence intervals for this data as it's not clear if the curves are significantly different or not (may not be important to show if just one example)? Same for D. What is being plotted in E? Is this the exponential fitted on data? Can this be stated in the figure citation directly so readers don't have to find it in the text, where it may not be directly obvious which figure the analyses are being applied towards?
  
  Thank you for your comments regarding Fig. 2. We have made the following revisions to address the concerns:
  
  To clarify the presence of stimulation artifacts and differentiate them from the physiological signal, we have updated Panel B and E in the updated Fig. 2 which highlight the stimulation artifacts accordingly.
  
  Regarding the comment about Panel A (now B in the updated figure), it represents one single example per disease, rather than an average of all patients.
  
  In response to the comment about what is plotted in Panel E, we have revised the figure caption to explicitly state that it includes the exponential fit on the data.
  
  Amendments to the manuscript:
  
  Figure 2 panel B and E now highlight stimulation artifacts.
  
  Author response image 2.
  
  Author response image 3.
  
  The figure captions could use more details, that can be taken from the text, so that readers can understand figures without searching for relevant details across the paper.
  
  Thank you for your feedback. We have revised the figure captions accordingly to provide more details.
  
  Amendments to the manuscript:
  
  “Fig 1 – GPi spiketrain feature analyses and clinical correlates of PD and dystonia. (A) With respect to (A) rate-based spiketrain features, firing rate was greater in PD while burst index (BI) and coefficient of variation (CV) were greater in dystonia; whereas no differences were found for (B) oscillatory spiketrain features for theta, alpha, low beta, high beta frequencies. MWU statistical results depicted are not corrected for multiple comparisons; after correction using the Bonferroni method, only CV and BI results remain significant (please see Supplementary Table 3). (C) In PD, the power of low beta spiketrain oscillations positively correlated (Spearman correlation) with symptom severity; in dystonia, neuronal firing rate negatively correlated with symptom severity, whereas CV and the power of theta spiketrain oscillations positively correlated with symptom severity. Depicted scatterplots are results that were significant before correction for multiple comparisons; however, none of the results persist after Benjamini-Hochberg correction for false discovery rate (please see Supplementary Table 4).”
  
  “Fig 2 – Long-term and short-term effects of HFS on striato-pallidal plasticity in PD and dystonia. (A) Schematic of the plasticity protocol to assess long-term plasticity via fEP amplitude comparisons pre- versus post-HFS and short-term plasticity via fEP dynamics during HFS. (B) Highlights example fEP traces for measuring long-term plasticity pre- versus post-HFS, with (C) displaying group-level fEP amplitudes pre- versus post-HFS across diseases. (D) Illustrates the amount of plasticity (i.e., percentage change in fEP amplitudes pre- versus post-HFS) in both PD and dystonia, with PD showing higher levels of plasticity. (E) Provides an example of fEP traces during HFS for assessing short-term plasticity, with (F) depicting group-level decay rates of fEP amplitudes using an exponential fit on the fEP amplitudes over the first 5 stimulus pulses across diseases. (G) Shows the half-life of the fitted exponential (i.e., rate of attenuation of fEP amplitudes) between PD and dystonia, with PD demonstrating faster fEP attenuation.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.26.554666v2
www.biorxiv.org www.biorxiv.org

Microglia aging in the hippocampus advances through intermediate states that drive activation and cognitive decline

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1:
  
  To gain further insight into the dynamics of microglial aging in the hippocampus, the authors used a bioinformatics method known as "pseudotime" or "trajectory inference" to understand how cells may progress through different functional states, as defined by cellular transcriptome (15,16). These bioinformatics approaches can reveal key patterns in scRNAseq / snRNAseq datasets and, in the present study, the authors conclude that a "stress response" module characterized by expression of TGFb1 represents a key "checkpoint" in microglial aging in midlife, after which the cells can move along distinct transcriptional trajectories as aging progresses. This is an intriguing possibility. However, pseudotime analyses need to be validated via additional bioinformatics as well as follow-up experiments. Indeed, Heumos et al, in their Nature Genetics "Expert Guidelines" Review, emphasize that "inferred trajectories might not necessarily have biological meaning." They recommend that "when the expected topology is unknown, trajectories and downstream hypotheses should be confirmed by multiple trajectory inference methods using different underlying assumptions."(15) Numerous algorithms are available for trajectory inference (e.g. Monocle, PAGA, Slingshot, RaceID/StemID, among many others) and their performance and suitability depends on the individual dataset and nature of the trajectories that are to be inferred. It is recommended to use dynGuidelines(16) for the selection of optimal pseudotime analysis methods. In the present manuscript, the authors do not provide any justification for their use of Monocle 3 over other trajectory inference approaches, nor do they employ a secondary trajectory inference method to confirm observations made with Monocle 3. Finally, follow-up validation experiments that the authors carry out have their own limitations and caveats (see below). Hence, while the microglial aging trajectories identified by this study are intriguing, they remain hypothetical trajectories that need to be proven with additional follow-up experiments.
  
  We thank the reviewer for their suggestion. We have utilized the dynGuidelines kindly provided by the reviewer to utilize an additional trajectory inference tool to analyze our data. We selected Scorpius based on the structure of our data. The tool has provided additional support that microglia progress from a homeostatic state (Cx3cr1, Mef2c) to the induction of stress genes (Hspa1, Atf3) at an intermediate point during aging progression. Furthermore, we observe a concordant increase in ribosomal protein genes at a time point in the pseudotime analysis immediately prior to activation of inflammation-related genes (Il1b, Cst7). These additional analyses support the main findings of our original pseudotime analysis and have been added to the manuscript as Figure S3C,D. Additionally, in the statistical test that uncovers differentially expressed genes along the pseudotime trajectory in this analyses, we find that Tgfb1 is one of the genes that is differentially expressed with peak expression at an intermediate timepoint along the pseudotime trajectory. Furthermore, we have done some preliminary trajectory analysis with slingshot (Street et al, BMC Genomics, PMID: 29914354) that found a similar trajectory with analogous gene expression patterns and dynamic expression of Tgfb1.
  
  To follow up on the idea that TGFb1 signaling in microglia plays a key role in determining microglial aging trajectories, the authors use RNAscope to show that TGFb1 levels in microglia peak in middle age. They also treat primary LPS-activated microglia with TGFb1 and show that this restores expression of microglial homeostatic gene expression and dampens expression of stress response and, potentially, inflammatory genes. Finally, they utilize transgenic approaches to delete TGFb1 from microglia around 8-10mo of age and scRNAseq to show that homeostatic signatures are lost and inflammatory signatures are gained. Hence, findings in this study support the idea that TGFb1 can strongly regulate microglial phenotype. Loss of TGFb1 signaling to microglia in adulthood has already been shown to cause decreased microglial morphological complexity and upregulation of genes typically associated with microglial responses to CNS insults(17-19). TGFb1 signaling to microglia has also been implicated in microglial responses to disease and manipulations to increase this signaling can improve disease progression in some cases(19). In this light, the findings in the present study are largely confirmatory of previous findings in the literature. They also fall short of unequivocally demonstrating that TGFb1 signaling acts as a "checkpoint" for determining subsequent microglial aging trajectory. To show this clearly, one would need to perturb TGFb1 signaling around 12mo of age and carry out sequencing (bulkRNAseq or scRNAseq) of microglia at 18mo and 24mo. Such experiments could directly demonstrate whether the whole microglial population has been diverted to the TGFb1-low aging trajectory (that progresses through a translational burst state to an inflammation state as proposed). Future development of tools to tag TGFb1 high or low microglia could also enable fate tracing type experiments to directly show whether the TGFb1 state in middle age predicts cell state at later phases of aging.
  
  We apologize for the use of the term “checkpoint” when referring to the role of Tgfb1 in microglial aging. Instead, our model posits that Tgfb1 expression increases in response to the early insults of the aging process in an attempt to return microglia to homeostasis. Therefore, this would predict that increasing TGFB1 levels after an insult would decrease activation and age-related progression of microglia, which we demonstrate in vitro (Figure 3). Alternatively, the loss of TGFB1 should prevent microglia from returning to a homeostatic state after an age-related stressor, and thus increase the number of microglia in activated states. We observe this increase in activated microglia in our middle-aged microglia-specific Tgfb1 knockout mouse model. Furthermore, the haploinsufficiency of Tgfb1 at this age indicates that TGFB1 signaling in microglia is sensitive to relative levels of Tgfb1. The transient increase in Tgfb1 expression further suggests that the threshold for TGFB1 signaling is dynamic. Finally, RNA-Seq analysis of both in vitro TGFB1 supplemented microglia and in vivo Tgfb1 depleted microglia highlight that TGFB1 alters the aging microglia transcriptome. Combined, these results provide evidence that Tgfb1 modulates advancement of microglia through an aging continuum.
  
  The present study would also like to draw links between features of microglial aging in the hippocampus and a decline in hippocampal-dependent cognition during aging. To this end, they carry out behavioral testing in 8-10mo old mice that have undergone microglial-specific TGFb1 deletion and find deficits in novel object recognition and contextual fear conditioning. While this provides compelling evidence that TGFb1 signaling in microglia can impact hippocampus-dependent cognition in midlife, it does not demonstrate that this signaling accelerates or modulates cognitive decline (see below). Age-associated cognitive decline refers to cognitive deficits that emerge as a result of the normative brain aging process (20-21). For a cognitive deficit to be considered age-associated cognitive decline, it must be shown that the cognitive operation under study was intact at some point earlier in the adult lifespan. This requires longitudinal study designs that determine whether a manipulation impacts the relationship between brain status and cognition as animals age (22-24). Alternatively, cross-sectional studies with adequate sample sizes can be used to sample the variability in cognitive outcomes at different points of the adult lifespan (22-24) and show that this is altered by a particular manipulation. For this specific study, one would ideally demonstrate that hippocampal-based learning/memory was intact at some point in the lifespan of mice with microglial TGFb1 KO but that this manipulation accelerated or exacerbated the emergence of deficits in hippocampal-dependent learning/memory during aging. In the absence of these types of data, the authors should tone down their claims that they have identified a cellular and molecular mechanism that contributes to cognitive decline.
  
  We agree with the reviewer that to adequately demonstrate an age-dependent effect of microglia-derived TGFB1 on cognition it is necessary to perturb microglial TGFB1 at young and mature ages and assess the age-dependent effect on cognition. To address this, we have now performed a complementary behavioral study utilizing the Tmem119-CreER mouse model to drive the microglia-specific excision of Tgfb1 in two separate cohorts of mice – one young (2-3 months) and one in mature mice (7-8 months) – followed by cognitive testing. Using the novel object recognition test, we find that young mice of all genotypes (WT, Tgfb1 Het and Tgfb1 cKO ) retain the ability to recognize the novel object (as determined by having a significant preference in exploring the novel object). Alternatively, only the WT mature mice demonstrate a preference for the novel object, while the Tgfb1 Het and Tgfb1 cKO show no preference for the novel object. These behavioral data demonstrate an age-dependent necessity for microglia-specific TGFB1 in in maintain proper hippocampal-dependent memory and is now included in the manuscript as revised Figure 4I-J. We have also included additional behavioral tests (Y-Maze and open field) that did not show any difference between the genotypes as Figure S6D-G. Unfortunately, we were unable to perform the fear conditioning testing, as our apparatus broke during this time. Together, these results reveal that there is an age-dependent necessity for microglia-derived TGFB1 for hippocampal-dependent cognitive function.
  
  A final point of clarification for the reader pertains to the mining of previously generated data sets within this study. The language in the results section, methods, and figure legends causes confusion about which experiments were actually carried out in this study versus previous studies. Some of the language makes it sound as though parabiosis experiments and experiments using mouse models of Alzheimer's Disease were carried out in this study. However, parabiosis and AD mouse model experiments were executed in previous studies (25,26), and in the present study, RNAseq datasets were accessed for targeted data mining. It is fantastic to see further mining of datasets that already exist in the field. However, descriptions in the results and methods sections need to make it crystal clear that this is what was done.
  
  The reviewer makes an excellent point. While we referenced the public dataset in the original manuscript, the citation style of superscripted numbers diminishes our ability to adequately reference the datasets. Therefore, we have added the names of the first authors (Palovics for the parabiosis dataset and Sala Frigerio for the Alzheimer’s Disease dataset) to all the instances in the results and figure legends when we refer to these datasets.
  
  Additional recommendations:
  
  Major comments.
  
  (1) There is some ambiguity surrounding how to interpret the microglial TGFb1 knockout that seems incompatible with viewing this molecule as a "checkpoint" in microglial aging. TGFb1 is believed to be primarily produced by microglia. Secreted TGFb1 is then detected by microglial TGFbR2. Are the microglia that have high levels of TGFb1 in middle age signaling to themselves (autocrine signaling)? Or contributing to a local milieu that impacts multiple neighbor microglia (paracrine signaling)? The authors could presumably look in their own dataset to evaluate microglial capacity to detect TGFb1 via its receptors.
  
  We thank the reviewer for this insightful suggestion. We have undertaken analysis of our dataset to assess whether Tgfb1 acts through autocrine or paracrine signaling. To do so, we reanalyzed our microglia aging scRNA-Seq dataset leveraging the variation in microglia Tgfb1 expression to probe the relative activity of TGFB1. Specifically, we partitioned microglia into quartiles based on their Tgfb1 expression, and subsequently investigated the expression of TGFB signaling effectors and targets. High expression of downstream TGFB signaling pathway components in microglia with high Tgfb1 expression would point to autocrine mechanisms while, alternatively, high expression of downstream TGFB signaling pathway components in microglia with low Tgfb1 expression would point to paracrine mechanisms. We observed highest expression of TGFB signaling pathway components and targets in microglia with the highest expression of Tgfb1. These data suggest that Tgfb1 acts through an autocrine mechanism. These results have been added to our manuscript as Figure S4E-G. Additionally, while our manuscript was under review, a paper by Bedolla et al (Nature Communications 2024; PMID: 38906887) was published that investigated the role of Tgfb1 in adult microglia. This paper utilized orthogonal techniques – sparse microglia-specific Tgfb1 knockout and IHC - to also suggest that microglia utilize autocrine Tgfb1 signaling. Together, these complementary data provide strong evidence that Tgfb1 acts through an autocrine mechanism in adult microglia.
  
  (2) Conclusions of the study rest on the assumption that microglial inflammatory responses are a central driver of cognitive decline. They assume that manipulations that increase microglial progression into an inflammatory state will negatively impact cognitive function. Although there are certainly a lot of data in the field that inflammatory factors can impact synaptic function, additional experiments would be required to unequivocally demonstrate that a "TGFb1 dependent" progression of microglia to an inflammatory state underlies any observed changes in cognition. For example, in the context of microglial TGFb1 deletion, can NSAIDs or blockers of soluble TNFa (e.g. XENP345), or blockers of SPP1, etc. rescue behavior? Can microglial depletion in this context rescue behavior? Assuming behavior was carried out in the same microglial TGFb1 KO mice that were used for microglial scRNAseq, they could also carry out linear regression-type analyses to link microglial inflammatory status to the behavioral performance of individual mice. In the absence of additional evidence of this sort, the authors should tone down claims about mechanistic relationships between microglial state and cognitive performance.
  
  We thank the reviewer for realizing that the link between cognition and inflammation in our paper is speculative. Therefore, we have taken the reviewer’s advice and toned down the claims linking inflammation to cognition in our manuscript. Instead, we connect the disruption in cognition to what is observed in our data, a loss of microglia homeostasis and a shift in the microglia aging trajectories.
  
  Additional Recommendations:
  
  Minor comments:
  
  (1) Ideally at some point in the results or discussion, the authors should acknowledge that the hippocampus has highly distinct sub-regions and that microglia show different functions and properties across these sub-regions (e.g. microglia in hilus and subgranular zone vs microglia in stratum radiatum, vs microglia immediately adjacent to or embedded within stratum pyrimidale). Do expression levels of TGFb1 and microglial aging trajectories vary across sub-regions? To what extent can this account for heterogeneity of aging trajectories observed in microglial aging within the hippocampus?
  
  We are interested in how microglia heterogeneity during aging is influenced by the specific functions, and thus microenvironments within the hippocampus. Therefore, we have expanded our IHC analysis of microglia to determine how the microenvironment influences microglia phenotypes by looking at several different regions of the hippocampus. We have included this regional analysis as Figure S2 in the manuscript. This analysis has revealed region-specific effects on microglia activation during aging.
  
  (2) For immunohistochemistry data, it is not particularly convincing to see one example of one cell from each condition. Generally, an accepted approach in the field is to present lower magnification images accompanied by zoom panels for several cells from each field of view. This reassures the reader that specific cells haven't simply been "cherry-picked" to support a particular conclusion.
  
  To allay the concerns of the reviewer that cells haven’t been “cherry-picked”, we have provided low magnification images for the aging CD68 and NF<sub>κ</sub>B stains in Supplemental Figure S2.
  
  (3) In immunohistochemistry data, have measures been taken to ensure that observed signals are not simply autofluorescence that becomes prominent in tissues with aging? (i.e. use of trueblack or photoquenching of tissue prior to staining) See PMID 37923732
  
  We agree that autofluorescence, at least partially due to the accumulation of lipofuscin, becomes prominent in certain regions and cells of the hippocampus during aging. This most prominently occurs in the microglia of the hilus. This autofluorescence has a particular subcellular distribution, as it is localized to lyso-endosomal bodies. The microglia activation marker CD68 is also localized to lysosomes. A previous publication by Burns et al (eLife; PMID: 32579115) identified autofluorescent microglia (AF+) with unique molecular profiles that accumulate with age. They posited that these AF+ microglia resembled other microglia subsets that have pronounced storage compartments, such as the pro-inflammatory lipid droplet-containing microglia that accumulate with age reported by Marschallinger et al (Nature; PMID: 31959936). As such, autofluorescence present in microglia potentially represents distinctive and functional states of microglia. Our CD68 immunostaining accumulates with age, which could overlap with autofluorescent storage bodies. Thus, we performed a complementary CD68 immunostaining in an independent cohort of young (3 months) and aged (24 months) mice with autofluorescence quencher TrueBlack, and found that the staining pattern and accumulation of CD68 microglia with age persisted as previously observed after use of this quencher (see Authpr response image 1). Images are IBA1 (cyan) and CD68 (yellow) with the molecular layer (ML), granule cell (GC), and hilus illustrated and corresponding quantification provided (Two-way ANOVA with Sidak’s multiple comparisons test; ***P<0.001; ****P<0.0001).
  
  We would like to note that the subcellular localization of the other immunostainings included in the manuscript was distinct from CD68, and not likely to be associated with the autofluorescent storage bodies. Additionally, our RNAScope staining for Tgfb1 did not show an accumulation with age, but rather a transient increase at 12 months of age, which indicates that the interpretation of the RNAScope stain for Tgfb1 was not unduly influenced by autofluorescence.
  
  Author response image 1.
  
  (4) Ideally, more care is needed with the language used to describe microglial state during aging. The terms "dystrophic," "dysfunctional," and "inflammatory" all carry their own implications and assumptions. Many changes exhibited by microglia during aging can initially be adaptive or protective, particularly during middle age. Without additional experiments to show that specific microglial attributes during aging are actively detrimental to the tissue and additional experiments to show that microglia have ceased to be capable of engaging in many of their normal actions to support tissue homeostasis, the authors should exercise caution in using terms like dysfunctional.
  
  We appreciate the reviewers’ suggestion. To allay the concerns of the reviewer about the multiple implications of terms such as “dysfunctional” and “inflammatory”, we have tried to replace them throughout the text with more specific terms.
  
  Reviewer #2:
  
  That said, given what we recently learned about microglia isolation for RNA-seq analysis, there is a danger that some of the observations are a result of not age, but cell stress from sample preparation (enzymatic digestion 10min at 37C; e.g. PMID: 35260865). Changes in cell state distribution along aging were made based on scRNA-seq and were not corroborated by any other method, such as imaging of cluster-specific marker expression in microglia at different ages. This analysis would allow confirming the scRNA-seq data and would also give us an idea of where the subsets are present within the hippocampus, and whether there is any interesting distribution of cell states (e.g. some are present closer to stem cells?). Since TGFb is thought to be crucial to microglia biology, it would be valuable to include more analysis of the mice with microglia-specific Tgfb deletion e.g. what was the efficiency of recombination in microglia? Did their numbers change after induction of Tgfb deletion in Cx3cr1-creERT2::Tgfb-flox mice.
  
  We thank the reviewer for their comment regarding potential ex vivo transcriptional alterations with the approaches used in our study. We performed our aging microglia scRNA-Seq characterization prior to the release of Marsh et al (Nature Neuroscience; PMID: 35260865), which revealed the potential transcriptional artefacts induced by isolation. That being said, we took great care to minimize the amount of time samples were subjected to enzymatic digestion (15 minutes) and kept cells at 4C during the remainder of the isolation. Furthermore, we performed all isolations simultaneously, so that transcriptional changes induced by the isolation would be present across all ages and should not be observed during our analysis unless indicative of a true age-related change. Additionally, we have corroborated changes in cell state distribution across ages using several markers (Tgfb1 and KLF2 for the intermediate stress state, S6 for the translation state, and NFKB and CD68 for activation states). In the revised manuscript, we have added additional hippocampal subregion analysis of several IHC immunostains to provide spatial insights into the microglia aging process (Figure S2). This analysis reveals unique spatial dynamics of microglia aging. For example, as the reviewer foresaw, we found that the granule cell layer (the location of adult hippocampal neurogenesis) had a more pronounced age-associated progression of microglial activation than several other regions. A subset of regions had minimal levels of activation during aging, such as the molecular layer and the stratum radiatum of the CA1 (inner CA1in the manuscript) – regions enriched in synaptic terminals. Furthermore, this analysis highlights the susceptibility of microglia aging to microenvironmental influences.
  
  Regarding the temporally controlled microglia-specific genetic KO mouse model used in our original submission, the Cx3cr1-CreER allele selected (B6.129P2(Cg)-Cx3cr1tm2.1(cre/ERT2)Litt/WganJ) has been reported to have very high recombination efficiency (~94% in Parkhurst et al (Cell; PMID: 24360280)), and we used a tamoxifen induction protocol very similar to Faust et al. (Cell Reports; PMID: 37635351) that achieved ~98% recombination (they injected 100mg/kg for 5 days, while we injected 90mg/kg for 5 days). We analyzed our scRNA-Seq data for the expression of Tgfb1 and found that the knockout mice had a 67% reduction in cells expressing higher levels of Tgfb1 (see panel A in Author response image 2). This is likely a large underestimate of the recombination efficiency, as exon 3 is floxed and residual nonfunctional transcripts could be present, given nonsense-mediated decay is not realized in a number of knockout lines (Lindner et al, Methods, PMID: 33838271). We likely achieved a much higher excision efficiency. We would like to highlight that our data indicating increased microglia activation after tamoxifen treatment (Figure S5A) and the involvement of autonomous signaling (Figure S4E-G) are consistent with recently published work by Bedolla et al, (Nature Communications; PMID: 38906887). Additionally, as part of the revision process, we have now corroborated our behavioral data using and independent temporally controlled microglia-specific KO mouse model - Tmem119-CreER::Tgfb1 knockout mice (Figure 4I-K). We performed qPCR on sorted microglia to determine RNA levels in wildtype and knockout mice. Relative levels of Tgfb1 and exon 3 of Tgfb1 (the floxed exon) on technical replicates of 3 pooled samples indicated overall loss of Tgfb1 expression, as well as undetectable levels of exon 3 as normalized to Actb (see panel B in Author response image 2).
  
  Author response image 2.
  
  With respect to the effects of aging and Tgfb1 on microglia density, we find a slight region-specific increase in microglia density with age (see Author response image 3). The density of Iba1 cells across hippocampal regions was analyzed at 3 and 24 months of age (see panel A in Author response image 3) and along an aging continuum at 3, 6, 12, 18, and 24 months (see panel B in Author response image 3). These data are also included in the revised manuscript (Figure S2D-F).
  
  Author response image 3.
  
  Deletion of Tgfb1 also had region-specific effects on microglia. While there was no difference in microglia density between wildtype and heterozygous microglia, there was a significant increase in microglia density in the hilus and molecular layers in knockout mice (see Author response image 4) and included in the revised manuscript (Figure S5A). These data indicate that there are subtle region-specific increases in microglia density with age, as well as following the deletion of Tgfb1 from microglia of mature mice.
  
  Author response image 4.
  
  Additional Recommendations:
  
  (1) The problem of possible digestion artifacts in scRNA-seq should be at least addressed in the discussion as a caveat in data interpretation. Staining for unique cluster markers in undigested tissue would solve the problem. It can be done with microscopy or using flow cytometry, but for this microglia, isolation should be done with no enzymes or with Actinomycin (PMID: 35260865).
  
  The ex vivo activation signature uncovered by Marsh et al. (Nature Neuroscience; PMID: 35260865) arises from the digestion methods used to isolate microglia. We took the utmost care in processing our microglia identically within experiments, which should minimize the amount of uneven ex vivo activation of microglia. This is borne out by the structures of our single-cell sequencing data. Unlike Marsh et al_. where they observe unique cluster after addition of their inhibitors, we do not see any clusters unique to a single condition, suggesting that any influence of _ex vivo activation was evenly distributed.
  
  Importantly, as suggested by the review, we have we have complemented our scRNA-Seq analysis by corroborating several markers for various stages of microglia aging progression using RNAScope and IHC in intact tissue. Specifically, the transient age-dependent increase in Tgfb1 high microglia was confirmed using RNAScope (Figure 3B), the age-related increase in ribosomal high microglia was confirmed using S6 immunostaining (Figure 3I), and the increase of various markers of age-associated activation (C1q, CD68 and NFkB) was confirmed using immunostaining (Figure 1F and Figure S2D-I). Additionally, we have also performed immunostainings for KLF2 and confirmed peak microglia expression at 18 months of age with lower levels at 24 months of age (Figure 2H).
  
  (2) The figures of GO and violin plots are not easy to follow sometimes... what are the data points in the violin plots, maybe worth showing them as points? For the GO, e.g. in 3D, 3J, including a short description of the figure could help, e.g. in Figure 1. it was clear.
  
  We chose not to include the datapoints in the violin plots for aesthetic purposes. Each violin plot would have had hundreds of points that would have made the plots very busy and hidden the structure of the distribution. In Author response image 5 we show the violin plot in Figure 2M with (panel A) and without (panel B) individual points. In a small format, the points overlap and become jumbled together. Therefore, we chose to present the violin plots without points for clarity on the data structure. As for the gene ontology plots in Figure 3, we have updated the descriptions in both the text and figure legends to provide clarification on what they represent.
  
  Author response image 5.
  
  (3) I'm very curious to see the mechanism of action of "aged" microglia in the TGFb-depletion model. Is it creating hostile conditions for stem cells, or we have increased synapse loss? Something else?
  
  We thank the reviewer for their insightful questions. We would like to note that during the revision process of our manuscript, a complementary study was published reporting that the loss of microglia-derived Tgfb1 leads to an aberrant increase in the density of dendritic spines in the CA1 region of the hippocampus (Bedolla et al, Nature Communications, PMID: 38906887). The data from Bedolla et al, shows sparsely labeled neurons in the CA1 with a mGreenLantern expressing virus in mice the had Tgfb1 deleted from microglia using the Cx3cr1-CreERT driver (Figure 7U,V). Additionally, McNamara et al (Nature; PMID: 36517604) demonstrated that microglia-derived Tgfb1 signaling regulates myelin integrity during development and several studies have revealed links between Tgfb1 signaling and altered neurogenesis (e.g., He et al, Nature, PMID: 24859199 and Dias et al, Neuron, PMID: 25467979). Together, this growing body of work indicates that microglia-derived TGFB1 regulates myelination, neurogenesis and synaptic plasticity, which have all been shown to play a role in cognition.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.09.588665v2
www.biorxiv.org www.biorxiv.org

New submission 13/10/2023, 09:13:56

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This valuable study introduces an innovative method for measuring interocular suppression depth, which implicates mechanisms underlying subconscious visual processing. The evidence supporting the effectiveness of this method would be solid after successfully addressing concerns raised by the reviewers. The novel method will be of interest not only to cognitive psychologists and neuroscientists who study sensation and perception but also to philosophers who work on theories of consciousness.
  
  Thank you for the recognition and appreciation of our work.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Strengths:
  
  The authors introduced a new adapted paradigm from continuous flash suppression (CFS). The new CFS tracking paradigm (tCFS) allowed them to measure suppression depth in addition to breakthrough thresholds. This innovative approach provides a more comprehensive understanding of the mechanisms underlying continuous flash suppression. The observed uniform suppression depth across target types (e.g., faces and gratings) is novel and has new implications for how the visual system works. The experimental manipulation of the target contrast change rate, as well as the modeling, provided strong support for an early interocular suppression mechanism. The authors argue that the breakthrough threshold alone is not sufficient to infer about unconscious processing.
  
  Weaknesses:
  
  A major finding in the current study is the null effect of the image categories on the suppression depth measured in the tCFS paradigm, from which the authors infer an early interocular mechanism underlying CFS suppression. This is not strictly logical as an inference based on the null effect. The authors may consider statistical evaluation of the null results, such as equivalence tests or Bayesian estimation.
  
  We have now included a Bayesian model comparison (implemented in JASP), to assess the strength of evidence in favour of the alternative hypothesis (or null effect). For example in Experiment 1 (comparing discrete to tCFS), we found inconsistent evidence in favour of the null effect of image-category on suppression depth:
  
  Lines 382 – 388: “We quantified the evidence for this null-effect on suppression depth with a subsequent Bayesian model comparison. A Bayesian repeated-measures ANOVA (2 x 2; procedure x image type on suppression depth) found that the best model to explain suppression depth included the main effect of procedure (BF10 = 3231.74), and weak evidence/data insensitivity for image type (BF10 = 0.37). This indicates that the data was insensitive as to whether image-type was better at predicting suppression depth than the null model.”
  
  In Experiment 2, which was specifically designed to investigate the effect of image category on suppression depth, we found strong evidence in favour of the null:
  
  Lines 429 – 431: “A Bayesian repeated-measures ANOVA (1 x 5, effect of image categories on suppression depth), confirmed strong evidence in favour of the null hypothesis (BF01 =20.30).
  
  In Experiment 3, we also had image categories, but the effect of rate of contrast change was our main focus. For completeness, we have also included the Bayes factors for image-category in Experiment 3 in our text.
  
  Lines 487- 490> “This null-effect of image-type was again confirmed with a Bayesian model comparison (3 speed x 4 image categories on suppression depth), demonstrating moderate support for the null effect of image category (BF01= 4.06).”
  
  We have updated our Methods accordingly with a description of this procedure
  
  Lines 297-305: “We performed Bayesian model comparison to quantify evidence for and against the null in JASP, using Bayesian repeated measures ANOVAs (uninformed prior with equal weight to all models). We report Bayes factors (B) for main effects of interest (e.g. effect of image type on suppression depth), as evidence in favour compared to the null model (BF10= B). Following the guidelines recommended in (Dienes 2021), B values greater than 3 indicate moderate evidence for H1 over H0, and B values less than 1/3 indicate moderate evidence in favour of the null. B values residing between 1/3 and 3 are interpreted as weak evidence, or an insensitivity of the data to distinguish between the null and alternative models.”
  
  More importantly, since limited types of image categories have been tested, there may be some exceptional cases. According to "Twofold advantages of face processing with or without visual awareness" by Zhou et al. (2021), pareidolia faces (face-like non-face objects) are likely to be an exceptional case. They measured bidirectional binocular rivalry in a blocked design, similar to the discrete condition used in the current study. They reported that the face-like non-face object could enter visual awareness in a similar fashion to genuine faces but remain in awareness in a similar fashion to common non-face objects. We could infer from their results that: when compared to genuine faces, the pareidolia faces would have a similar breakthrough threshold but a higher suppression threshold; when compared to common objects, the pareidolia faces would have a similar suppression threshold but a low breakthrough threshold. In this case, the difference between these two thresholds for pareidolia faces would be larger than either for genuine faces or common objects. Thus, it would be important for the authors to discuss the boundary between the findings and the inferences.
  
  This is correct. We acknowledge that our sampling of image-categories is limited, and have added a treatment of this limitation in our discussion. We have expanded on the particular case of Zhou et al (2021), and the possibility of the asymmetries suggested:
  
  Lines 669 – 691: “As a reminder, we explicitly tested image types that in other studies have shown differential susceptibility to CFS attributed to some form of expedited unconscious processing. Nevertheless, one could argue that our failure to obtain evidence for category specific suppression depth is based on the limited range of image categories sampled in this study. We agree it would be informative to broaden the range of image types tested using tCFS to include images varying in familiarity, congruence and affect. We can also foresee value in deploying tCFS to compare bCFS and reCFS thresholds for visual targets comprising physically meaningless ‘tokens’ whose global configurations can synthesise recognizable perceptual impressions. To give a few examples, dynamic configurations of small dots varying in location over time can create the compelling impression of rotational motion of a rigid, 3D object (structure from motion) or of a human engaged in given activity (biological motion) (Grossmann & Dobbins, 2006; Watson et al., 2004). These kinds of visual stimuli are associated with neural processing in higher-tier visual areas of the human brain, including the superior occipital lateral region (e.g., Vanduffel et al., 2002) and the posterior portion of the superior temporal sulcus (e.g., Grossman et al., 2000). These kinds of perceptually meaningful impressions of objects from rudimentary stimulus tokens are capable of engaging binocular rivalry. Such stimuli would be particularly useful in assessing high-level processing in CFS because they can be easily manipulated using phase-scrambling to remove the global percept without altering low-level stimulus properties. In a similar vein, small geometric shapes can be configured so as to resemble human or human-like faces, such as those used by (Zhou et al., 2021)[1]. These kinds of faux faces could be used in concert with tCFS to compare suppression depth with that associated with actual faces.
  
  [1] Zhou et al. (2021) derived dominance and suppression durations with fixed-contrast images. In their study, genuine face images and faux faces remained suppressed for equivalent durations whereas genuine faces remained dominant significantly longer than did faux faces. The technique used by those investigators - interocular flash suppression (Wolfe, 1994) - is quite different from CFS in that it involves abrupt, asynchronous presentation of dissimilar stimuli to the two eyes. It would be informative to repeat their experiment using the tCFS procedure.
  
  Reviewer #2 (Public Review):
  
  Summary
  
  The paper introduces a valuable method, tCFS, for measuring suppression depth in continuous flash suppression (CFS) experiments. tCFS uses a continuous-trial design instead of the discrete trials standard in the literature, resulting in faster, better controlled, and lower-variance estimates. The authors measured suppression depth during CFS for the first time and found similar suppression depths for different image categories. This finding provides an interesting contrast to previous results that breakthrough thresholds differ for different image categories and refine inferences of subconscious processing based solely on breakthrough thresholds. However, the paper overreaches by claiming breakthrough thresholds are insufficient for drawing certain conclusions about subconscious processing.
  
  We agree that breakthrough thresholds can provide useful information to draw conclusions about unconscious processing – as our procedure is predicated on breakthrough thresholds. Our key point is that breakthrough provides only half of the needed information.
  
  We have amended our manuscript thoroughly (detailed below) to accommodate this nuance and avoid this overreaching claim.
  
  Strengths
  
  (1) The tCFS method, by using a continuous-trial design, quickly estimates breakthrough and re-suppression thresholds. Continuous trials better control for slowly varying factors such as adaptation and attention. Indeed, tCFS produces estimates with lower across-subject variance than the standard discrete-trial method (Fig. 2). The tCFS method is straightforward to adopt in future research on CFS and binocular rivalry.
  
  (2) The CFS literature has lacked re-suppression threshold measurements. By measuring both breakthrough and re-suppression thresholds, this work calculated suppression depth (i.e., the difference between the two thresholds), which warrants different interpretations from the breakthrough threshold alone.
  
  (3) The work found that different image categories show similar suppression depths, suggesting some aspects of CFS are not category-specific. This result enriches previous findings that breakthrough thresholds vary with image categories. Re-suppression thresholds vary symmetrically, such that their differences are constant.
  
  Thank you for this positive and succinct summary of our contribution. We have adopted your 3rd point “... suggesting that some aspects...” in our revised manuscript to more appropriately treat the ways that bCFS and reCFS thresholds may interact with suppression depths. For example:
  
  Lines 850 – 852: “These [low level] factors could be parametrically varied to examine specifically whether they modulate bCFS thresholds alone, or whether they also cause a change in suppression depth by asymmetrically affecting reCFS thresholds”.
  
  Weaknesses
  
  (1) The results and arguments in the paper do not support the claim that 'variations in breakthrough thresholds alone are insufficient for inferring unconscious or preferential processing of given image categories,' to take one example phrasing from the abstract. The same leap in reasoning recurs on lines 28, 39, 125, 566, 666, 686, 759, etc.
  
  We have thoroughly updated our manuscript with respect to mentions of preferential processing, to avoid this leap in reasoning throughout. For example, this phrase in the abstract now reads:
  
  Lines 27-30: “More fundamentally, it shows that variations in bCFS thresholds alone are insufficient for inferring whether the barrier to achieving awareness exerted by interocular suppression is weaker for some categories of visual stimuli compared to others”.
  
  Take, for example, the arguments on lines 81-83. Grant that images are inequivalent, and this explains different breakthrough times. This is still no argument against differential subconscious processing. Why are images non-equivalent? Whatever the answer, does it qualify as 'residual processing outside of awareness'? Even detecting salience requires some processing. The authors appear to argue otherwise on lines 694-696, for example, by invoking the concept of effective contrasts, but why is effective contrast incompatible with partial processing? Again, does detecting (effective) contrast not involve some processing? The phrases 'residual processing outside of awareness' and 'unconscious processing' are broad enough to encompass bottom-up salience and effective contrast. Salience and (effective) contrast are arguably uninteresting, but that is a different discussion. The authors contrast 'image categories' or semantics with 'low-level factors.' In my opinion, this is a clearer contrast worth emphasizing more. However, semantic processing is not equal to subconscious processing writ large.
  
  We are in agreement with your analysis that differential subconscious processing may contribute to differences between images, and have updated our manuscript to clarify this possibility. In particular, we have now included a section in our Discussion which offers a suggestion for future research, linking sensitivity to different low-level image features with differences in gain of the respective contrast-response functions.
  
  From Lines 692 – 722: “Next we turn to another question raised about our conclusion concerning invariant depth of suppression: If certain image types have overall lower bCFS and reCFS contrast thresholds relative to other image types, does that imply that images in the former category enjoy “preferential processing” relative to those in the latter? Given the fixed suppression depth, what might determine the differences in bCFS and reCFS thresholds? Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings and that polar patterns, once dominant, tend to maintain dominance to lower contrasts than do gratings and this happens even though the rate of contrast change is identical for both types of stimuli. But while rate of contrast change is identical, the neural responses to those contrast changes may not be the same: neural responses to changing contrast will depend on the neural contrast response functions (CRFs) of the cells responding to each of those two types of stimuli, where the CRF defines the relationship between neural response and stimulus contrast. CRFs rise monotonically with contrast and typically exhibit a steeply rising initial response as stimulus contrast rises from low to moderate values, followed by a reduced growth rate for higher contrasts. CRFs can vary in how steeply they rise and at what contrast they achieve half-max response. CRFs for neurons in mid-level vision areas such as V4 and FFA (which respond well to polar stimuli and faces, respectively) are generally steeper and shifted towards lower contrasts than CRFs for neurons in primary visual cortex (which responds well to gratings). Therefore, the effective strength of the contrast changes in our tCFS procedure will depend on the shape and position of the underlying CRF, an idea we develop in more detail in Supplementary Appendix 1, comparing the case of V1 and V4 CRFs. Interestingly, the comparison of V1 and V4 CRFs shows two interesting points: (i) that V4 CRFs should produce much lower bCFS and reCFS thresholds than V1 CRFs, and (ii) that V4 CRFs should produce more suppression than V1 CRFs. Our data do not support either prediction: Figure 3 shows that bCFS and reCFS thresholds are very similar for all image categories and suppression depth is uniform. There is no room in these results to support the claim that certain images receive “preferential processing” or processing outside of awareness, although there are many other kinds of images still to be tested and exceptions may potentially be found. As a first step in exploring this idea, one could use standard psychophysical techniques (e.g., (Ling & Carrasco, 2006)) to derive CRFs for different categories of patterns and then measure suppression depth associated with those patterns using tCFS.”
  
  We have also expanded on this nuanced line of reasoning in a new Supplementary Appendix for the interested reader.
  
  The preceding does not detract from the interest in finding uniform suppression depth. Suppression depth and absolute bCFS can conceivably be due to orthogonal mechanisms warranting their own interpretations. In fact, the authors briefly take this position in the Discussion (lines 696-704, 'A hybrid model ...'). The involvement of different mechanisms would defeat the argument on lines 668-670.
  
  We agree with this analysis, and note our response to Reviewer 1 and the possibility of exceptional cases that may affect absolute bCFS or reCFS thresholds independently.
  
  Similarly, we agree with the notion that some aspects of CFS may not be category specific. The symmetric relationship of thresholds for a given category of stimuli should be assessed in the context of other categories, such as with pontillist images and by incorporating semantic features of images into the mask as in Che et al. (2019) and Han et al. (2021). This line of reasoning and suggestions for future research is provided in the revised discussion, beginning:
  
  Lines 67: “Nevertheless, one could argue that our failure to obtain evidence for category specific suppression depth is based on a limited range of image categories….”
  
  (2) These two hypotheses are confusing and should be more clearly distinguished: a) varying breakthrough times may be due to low-level factors (lines 76-79); b) uniform suppression depth may also arise from early visual mechanisms (e.g., lines 25-27).
  
  Thank you for highlighting this opportunity for clarification. We have updated our text:
  
  Lines 25 – 27: “This uniform suppression depth points to a single mechanism of CFS suppression, one that likely occurs early in visual processing, because suppression depth was not modulated by target salience or complexity”
  
  Lines 78 – 79: “Sceptics argue, however, that differences in breakthrough times can be attributed to low-level factors such as spatial frequency, orientation and contrast that vary between images”
  
  Neutral remarks
  
  The depth between bCFS and reCFS depended on measurement details such as contrast change speed and continuous vs. discrete trials. With discrete trials, the two thresholds showed inverse relations (i.e., reCFS > bCFS) in some participants. The authors discuss possible reasons at some length (adaptation, attention, etc. ). Still, a variable measure does not clearly indicate a uniform mechanism.
  
  We have ensured our revised manuscript makes no mention of a uniform mechanism, although we frequently mention our result of uniform suppression depth.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In the 'bCFS' paradigm, a monocular target gradually increases in contrast until it breaks interocular suppression by a rich monocular suppressor in the other eye. The present authors extend the bCFS paradigm by allowing the target to reduce back down in contrast until it becomes suppressed again. The main variable of interest is the contrast difference between breaking suppression and (re) entering suppression. The authors find this difference to be constant across a range of target types, even ones that differ substantially in the contrast at which they break interocular suppression (the variable conventionally measured in bCFS). They also measure how the difference changes as a function of other manipulations. Interpretation in terms of the processing of unconscious visual content, as well as in terms of the mechanism of interocular suppression.
  
  Thank you for your positive assessment of our methodology.
  
  Strengths:
  
  Interpretation of bCFS findings is mired in controversy, and this is an ingenuous effort to move beyond the paradigm's exclusive focus on breaking suppression. The notion of using the contrast difference between breaking and entering suppression as an index of suppression depth is interesting, but I also feel like it can be misleading at times, as detailed below.
  
  Weaknesses:
  
  Here's one doubt about the 'contrast difference' measure used by the authors. The authors seem confident that a simple subtraction is meaningful after the logarithmic transformation of contrast values, but doesn't this depend on exactly what shape the contrast-response function of the relevant neural process has? Does a logarithmic transformation linearize this function irrespective of, say, the level of processing or the aspect of processing that we're talking about?
  
  Given that stimuli differ in terms of the absolute levels at which they break (and re-enter) suppression, the linearity assumption needs to be well supported for the contrast difference measure to be comparable across stimuli.
  
  Our motivation to quantify suppression depth after log-transform to decibel scale was two-fold. First, we recognised that the traditional use of a linear contrast ramp in bCFS is at odds with the well-characterised profile of contrast discrimination thresholds which obey a power law (Legge, 1981) and the observations that neural contrast response functions show the same compressive non-linearity in many different cortical processing areas (e.g.: V1, V2, V3, V4, MT, MST, FST, TEO. See (Ekstrom et al., 2009)). Increasing contrast in linear steps could thus lead to a rapid saturation of the response function, which may account for the overshoot that has been reported in many canonical bCFS studies. For example, in (Jiang et al., 2007), target contrast reached 100% after 1 second, yet average suppression times for faces and inverted faces were 1.36 and 1.76 seconds respectively. As contrast response functions in visual neurons saturate at high contrast, the upper levels of a linear contrast ramp have less and less effect on the target's strength. This approach to response asymptote may have exaggerated small differences between stimulus conditions and may have inflated some previously reported differences. In sum, the use of a log-transformed contrast ramp allows finer increments in contrast to be explored before saturation, a simple manipulation which we hope will be adopted by our field.
  
  Second, by quantifying suppression depth as a decibel change we enable the comparison of suppression depth between experiments and laboratories, which inevitably differ in presentation environments. As a comparison, a reaction-time for bCFS of 1.36 s can not easily be compared without access to near-identical stimulation and testing environments. In addition once ramp contrast is log transformed it effectively linearises the neural contrast response function. This means that comparing different studies that use different contrast levels for masker or target can be directly compared because a given suppression depth (for example, 15 dB) is the same proportionate difference between bCFS and reCFS regardless of the contrasts used in the particular study.
  
  We also acknowledge that different stimulus categories may engage neural and visual processing associated with different contrast gain values (e.g., magno- vs parvo-mediated processing). But the breaks and returns to suppression of a given stimulus category would be dependent on the same contrast gain function appropriate for that stimulus which thus permits their direct comparison. Indeed, this is why our novel approach offers a promising technique for comparing suppression depth associated with various stimulus categories (a point mentioned above). Viewed in this way, differences in actual durations of break times (such as we report in our paper) may tell us more about differences in gain control within neural mechanisms responsible for processing of those categories.
  
  We have now included a summary of these arguments in a new paragraph of our discussion (from lines 696- cf Reviewer 2 above), as well as a new Supplementary Appendix.
  
  Here's a more conceptual doubt. The authors introduce their work by discussing ambiguities in the interpretation of bCFS findings with regard to preferential processing, unconscious processing, etc. A large part of the manuscript doesn't really interpret the present 'suppression depth' findings in those terms, but at the start of the discussion section (lines 560-567) the authors do draw fairly strong conclusions along those lines: they seem to argue that the constant 'suppression depth' value observed across different stimuli argues against preferential processing of any of the stimuli, let alone under suppression. I'm not sure I understand this reasoning. Consider the scenario that the visual system does preferentially process, say, emotional face images, and that it does so under suppression as well as outside of suppression. In that scenario, one might expect the contrast at which such a face breaks suppression to be low (because the face is preferentially processed under suppression) and one might also expect the contrast at which the face enters suppression to be low (because the face is preferentially processed outside of suppression). So the difference between the two contrasts might not stand out: it might be the same as for a stimulus that is not preferentially processed at all. In sum, even though the author's label of 'suppression depth' on the contrast difference measure is reasonable from some perspectives, it also seems to be misleading when it comes to what the difference measure can actually tell us that bCFS cannot.
  
  We have addressed this point with respect to the differences between suppression depth and overall value of contrast thresholds in our revised discussion (reproduced above), and supplementary appendix.
  
  The authors acknowledge that non-zero reaction time inflates their 'suppression depth' measure, and acknowledge that this inflation is worse when contrast ramps more quickly. But they argue that these effects are too small to explain either the difference between breaking contrast and re-entering contrast to begin with, or the increase in this difference with the contrast ramping rate. I agree with the former: I have no doubt that stimuli break suppression (ramping up) at a higher contrast than the one at which they enter suppression (ramping down). But about the latter, I worry that the RT estimate of 200 ms may be on the low side. 200 ms may be reasonable for a prepared observer to give a speeded response to a clearly supra-threshold target, but that is not the type of task observers are performing here. One estimate of RT in a somewhat traditional perceptual bistability task is closer to 500 ms (Van Dam & Van Ee, Vis Res 45 2005), but I am uncertain what a good guess is here. Bottom line: can the effect of contrast ramping rate on 'suppression depth' be explained by RT if we use a longer but still reasonable estimated RT than 200 ms?
  
  A 500 ms reaction time estimate would not account for the magnitude of the changes observed in Experiment 3. Suppression depths in our slow, medium, and fast contrast ramps were 9.64 dB, 14.64 dB and 18.97 dB, respectively (produced by step sizes of .035, .07 and .105 dB per video frame at 60 fps). At each rate, assuming a 500 ms reaction time for both thresholds would capture a change of 2.1 dB, 4.2 dB, 6.3 dB. This difference cannot account for the size of the effects observed between our different ramp speeds. Note that any critique using the RT argument also applies to all other bCFS studies which inevitably will have inflated breakthrough points for the same reason.
  
  We’ve updated our discussion with this more conservative estimate:
  
  Lines 744 – 747: “For example, if we assume an average reaction time of 500 ms for appearance and disappearance events, then suppression depth will be inflated by ~4.2 dB at the rate of contrast change used in Experiments 1 and 2 (.07 dB per frame at 60 fps). This cannot account for suppression depth in its entirety, which was many times larger at approximately 14 dB across image categories.”
  
  Lines 755 – 760: [In Experiment 3] “Using the same assumptions of a 500 ms response time delay, this would predict a suppression depth of 2.1 dB, 4.2 dB and 6.3 dB for the slow, medium and fast ramp speeds respectively. However, this difference cannot account for the size of the effects (Slow 9.64 dB, Medium 14.6 dB, Fast 18.97 dB). The difference in suppression depth based on reaction-time delays (± 2.1 dB) also does not match with our empirical data (Medium - Slow = 4.96 dB; Fast - Medium = 4.37 dB)”
  
  A second remark about the 'ramping rate' experiment: if we assume that perceptual switches occur with a certain non-zero probability per unit time (stochastically) at various contrasts along the ramp, then giving the percept more time to switch during the ramping process will lead to more switches happening at an earlier stage along the ramp. So: ramping contrast upward more slowly would lead to more switches at relatively low contrast, and ramping contrast downward more slowly would lead to more switches at relatively high contrasts. This assumption (that the probability of switching is non-zero at various contrasts along the ramp) seems entirely warranted. To what extent can that type of consideration explain the result of the 'ramping rate' experiment?
  
  We agree that for a given ramp speed there is a variable probability of a switch in perceptual state for both bCFS and reCFS portions of the trial. To put it in other words, for a given ramp speed and a given observer the distribution of durations at which transitions occur will exhibit variance. We see that variance in our data (just as it’s present in conventional binocular rivalry duration histograms), as a non-zero probability of switches at very short durations (for example). One might surmise that slower ramp speeds would afford more opportunity for stochastic transitions to occur and that the measured suppression depths for slow ramps are underestimates of the suppression depth produced by contrast adaptation. Yet by the same token, the same underestimation would occur during fast ramp speeds, indicating that that difference may be even larger than we reported. In our revision we will spell this out in more detail, and indicate that a non-zero probability of switches at any time may lead to an underestimation of all recorded suppression depths.
  
  In our data, we believe the contribution of these stochastic switches are minimal. Our current Supplementary Figure 1(d) indicates that there is a non-zero probability of responses early in each ramp (e.g. durations < 2 seconds), yet these are a small proportion of all percept durations. This small proportion is clear in the empirical cumulative density function of percept durations, which we include below. Notably, during slow-ramp conditions, average percept durations actually increased, implying a resistance to any effect of early stochastic switching.
  
  Author response image 1.
  
  The data from Supplementary FIgure 1D. (right) Same data reproduced as a cumulative density function. The non-zero probability of a switch occurring (for example at very short percept durations) is clear, but a small proportion of all switches. Notably, In slow ramp trials, there is more time for this stochastic switching to occur, which should underestimate the overall suppression depth. Yet during slow-ramp conditions, average percept durations increased (vertical arrows), implying a resistance to any effect of early stochastic switching.
  
  When tying the 'dampened harmonic oscillator' finding to dynamic systems, one potential concern is that the authors are seeing the dampened oscillating pattern when plotting a very specific thing: the amount of contrast change that happened between two consecutive perceptual switches, in a procedure where contrast change direction reversed after each switch. The pattern is not observed, for instance, in a plot of neural activity over time, threshold settings over time, etcetera. I find it hard to assess what the observation of this pattern when representing a rather unique aspect of the data in such a specific way, has to do with prior observations of such patterns in plots with completely different axes.
  
  We acknowledge that fitting the DHO model to response order (rather than time) is a departure from previous investigations modelling oscillations over time. Our alignment to response order was a necessary step to avoid the smearing which occurs due to variation in individual participant threshold durations.
  
  Our Supplementary Figure 1 shows the variation in participant durations for the three rates of contrast change. From this pattern we can expect that fitting the DHO to perceptual changes over time would result in the poorest fit for slow rates of change (with the largest variation in durations), and best fit for fast rates of change (with least variation in durations).
  
  That is indeed what we see, reproduced in the review figure below. We include this to show the DHO is still applicable to perceptual changes over time when perceptual durations have relatively low variance (in the fast example), but not the alternate cases. Thus the DHO is not only produced by our alignment to response number - but this step is crucial to avoid the confound of temporal smearing when comparing between conditions.
  
  Author response image 2.
  
  DHO fit to perceptual thresholds over time. As a comparison to manuscript Figure 5 (aligning to response order), here we display the raw detrended changes in threshold over time per participant, and their average. Individual traces are shown in thin lines, the average is thick. Notably, in the slow and medium conditions, when perceptual durations had relatively high variance, the DHO is a poor fit to the average (shown in pink). The DHO is still an excellent fit in fast conditions, when modelling changes in threshold over time, owing to the reduced variance in perceptual durations (cf. Supplementary Figure 1). As a consequence, to remove the confound of individual participant durations, we have fitted the DHO when aligned to response order in our manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  The terminology used: "suppression depth". The depth of interocular suppression indexed by detection threshold has long been used in the literature, such as in Tsuchiya et al., 2006. I notice that this manuscript has created a totally different manipulative definition of the depth of suppression, the authors should make this point clear to the readers to avoid confusion.
  
  We believe that our procedure does not create a new definition for suppression depth, but rather utilises the standard definition used for many years in the binocular rivalry literature: the ratio between a threshold measured for a target while it is in the state of suppression and for that same target when in the dominance state.
  
  We have now revised our introduction to make the explicit continuation from past methods to our present methodology clear:
  
  Lines 94 – 105: “One method for measuring interocular suppression is to compare the threshold for change-detection in a target when it is monocularly suppressed and when it is dominant, an established strategy in binocular rivalry research (Alais, 2012; Alais et al., 2010; Alais & Melcher, 2007; Nguyen et al., 2003). Probe studies using contrast as the dependent variable for thresholds measured during dominance and during suppression can advantageously standardise suppression depth in units of contrast within the same stimulus (e.g., Alais & Melcher, 2007; Ling et al., 2010). Ideally, the change should be a temporally smoothed contrast increment to the rival image being measured (Alais, 2012), a tactic that preludes abrupt onset transients and, moreover, provides a natural complement to the linear contrast ramps that are standard in bCFS research. In this study, we measure bCFS thresholds as the analogue of change-detection during suppression, and as their complement, record thresholds for returns to suppression (reCFS).”
  
  The paper provides a new method to measure CFS bidirectionally. Given the possible exceptional case of pareidolia faces, it would be important to discuss how the bidirectional measurement offers more information, e.g., how the bottom-up and top-down factors would be involved in the breakthrough phase and the re-suppression phase.
  
  In our discussion, we have now included the possibility of exceptional cases (such as pareidolia faces), and how an asymmetry may arise with respect to separate image categories affecting either bCFS or reCFS thresholds orthogonally.
  
  Lines 688 - 691: “...In a similar vein, small geometric shapes can be configured so as to resemble human faces, such as those used by Zhou et al. (2021)[footnote]. These kinds of faux faces could be used in concert with tCFS to compare suppression depth with that associated with actual faces.
  
  [footnote] Zhou et al. (2021) derived dominance and suppression durations with fixed-contrast images. In their study, genuine face images and faux faces remained suppressed for equivalent durations whereas genuine faces remained dominant significantly longer than did faux faces. The technique used by those investigators - interocular flash suppression (Wolfe, 1994) - is quite different from CFS in that it involves abrupt, asynchronous presentation of dissimilar stimuli to the two eyes. It would be informative to repeat their experiment using the tCFS procedure.”
  
  What makes the individual results in the discrete condition much less consistent than the tCFS (in Figure 2c)? The authors discussed that motivation or attention to the task would change between bCFS and reCFS blocks (Line 589). But this point is not clear. Does not the attention to task also fluctuate in the tCFS paradigm, as the target continuously comes and goes?
  
  We believe the discrete conditions have greater variance owing to the blocked design of the discrete conditions. A sequence of bCFS thresholds was collected in order (over ~15 mins), before switching to a sequence of back-to-back discrete reCFS thresholds (another ~15 mins), or a sequence of the tCFS condition. As the order of these blocks was randomized, thresholds collected in the discrete bCFS vs reCFS blocks could be separated by many minutes. In contrast, during tCFS, every bCFS threshold used to calculate the average is accompanied by a corresponding reCFS threshold collected within the same trial, separated by seconds. Thus the tCFS procedure naturally controls for waxing and waning attention, as within every change in attention, both thresholds are recorded for comparison.
  
  A second advantage is that because the tCFS design changes contrast based on visibility, targets spend more time close to the threshold governing awareness. This reduced distance to thresholds remove the opportunity for other influences (such as oculomotor influences, blinks, etc), from introducing variance into the collected thresholds.
  
  Experiment 3 reported greater suppression depth with faster contrast change. Because the participant's response was always delayed (e.g., they report after they become aware that the target has disappeared), is it possible that the measured breakthrough threshold gets lower, the re-suppression threshold gets higher, just because the measuring contrast is changing faster?
  
  We have included an extended discussion of the contribution of reaction-times to the differences in suppression depth we report. Importantly, even a conservative reaction time of 500 ms, for both bCFS and reCFS events, cannot account for the difference in suppression depth between conditions.
  
  Lines 755 – 760> “Using the same assumptions of a 500 ms response time delay, this would predict a suppression depth of 2.1 dB, 4.2 dB and 6.3 dB for the slow, medium and fast ramp speeds respectively. However, this difference cannot account for the size of the effects (Slow 9.64 dB, Medium 14.6 dB, Fast 18.97 dB). The difference in suppression depth based on reaction-time delays (± 2.1 dB) also does not match with our empirical data (Medium - Slow = 4.96 dB; Fast - Medium = 4.37 dB).”
  
  In the current manuscript, some symbols are not shown properly (lines 145, 148, 150, 303).
  
  Thank you for pointing this out, we will arrange with the editors to fix the typos.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Line 13: 'time needed'-> contrast needed?
  
  This sentence was referring to previous experiments which predominantly focus on the time of breakthrough.
  
  Line 57: Only this sentence uses saliency; everywhere else in the paper uses salience.
  
  We have updated to salience throughout.
  
  Fig. 1c: The higher variance in discrete measurement results may be due to more variation in discrete trials, e.g., trial duration and inter-trial intervals (ITIs). Tighter control is indeed one advantage of the continuous tCFS design. For the discrete condition, it would help to report more information about variation across trials. How long and variable are the trials? The ITIs? This information is also relevant to the hypothesis about adaptation in Experiment 3.
  
  In the discrete condition, each trial ended after the collection of a single response. Thus the variability of the trials is the same as the variability of the contrast thresholds reported in Figure 2. The distribution of these ‘trials’ (aka percept durations), is also shown in Supplementary Figure 1.
  
  The ITI between discrete trials was self-paced, and not recorded during the experiment.
  
  Line 598: 'equivalently' is a strong word. The benefit is perhaps best stated relatively: bCFS and reCFS are measured under closer conditions (e.g., adaptation, attention) with continuous experiments compared to discrete ones.
  
  We agree - and have amended our manuscript:
  
  Lines 629 – 632: “Alternating between bCFS/reCFS tasks also means that any adaptation occurring over the trial will occur in close proximity to each threshold, as will any waning of attention. The benefit being that bCFS and reCFS thresholds are measured under closer conditions in continuous trials, compared to discrete ones.”
  
  Reviewer #3 (Recommendations For The Authors):
  
  Figure 1 includes fairly elaborate hypothetical results and how they would be interpreted by the authors, but I didn't really see any mention of this content in the main text. It wasn't until I started reading the caption that I figured it out. A more elaborate reference to the figure would prevent readers from overlooking (part of) the figure's message.
  
  We have now made it clearer in the text that those details are contained in the caption to Figure 1.
  
  Lines 113 – 115: “Figure 1 outlines hypothetical results that can be obtained when recording reCFS thresholds as a complement to bCFS thresholds in order to measure suppression depth.”
  
  A piece of text seems to have been accidentally removed on line 267.
  
  Thank you, this has now been amended
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.17.537110v4
www.biorxiv.org www.biorxiv.org

New submission 08/01/2024, 08:27:23

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  The authors have developed a compelling coarse-grained simulation approach for nucleosome-nucleosome interactions within a chromatin array. The data presented are solid and provide new insights that allow for predictions of how chromatin interactions might occur in vivo, but some of the claims should be tempered. The tools will be valuable for the chromosome biology field.
  
  Response: We want to thank the editors and all the reviewers for their insightful comments. We have made substantial changes to the manuscript to improve its clarity and temper necessary claims, as detailed in the responses, and we performed additional analyses to address the reviewers’ concerns. We believe that we have successfully addressed all the comments, and the quality of our paper has improved significantly.
  
  In the following, we provide point-to-point responses to all the reviewer comments.
  
  RESPONSE TO REFEREE 1:
  
  Comment 0: This study develops and applies a coarse-grained model for nucleosomes with explicit ions. The authors perform several measurements to explore the utility of a coarse-grained simulation method to model nucleosomes and nucleosome arrays with explicit ions and implicit water. ’Explicit ions’ means that the charged ions are modeled as particles in simulation, allowing the distributions and dynamics of ions to be measured. Since nucleosomes are highly charged and modulated by charge modifications, this innovation is particularly relevant for chromatin simulation.
  
  Response: We thank the reviewer’s excellent summary of the work.
  
  Comment 1: Strengths: This simulation method produces accurate predictions when compared to experiments for the binding affinity of histones to DNA, counterion interactions, nucleosome DNA unwinding, nucleosome binding free energies, and sedimentation coefficients of arrays. The variety of measured quantities makes both this work and the impact of this coarse-grained methodology compelling. The comparison between the contributions of sodium and magnesium ions to nucleosome array compaction, presented in Figure 3, was exciting and a novel result that this simulation methodology can assess.
  
  Response: We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank him/her for the detailed suggestions and comments.
  
  Comment 2: Weaknesses: The presentation of experimental data as representing in vivo systems is a simplification that may misrepresent the results of the simulation work. In vivo, in this context, typically means experimental data from whole cells. What one could expect for in vivo experimental data is measurements on nucleosomes from cell lysates where various and numerous chemical modifications are present. On the contrary, some of the experimental data used as a comparison are from in vitro studies. In vitro in this context means nucleosomes were formed ’in a test tube’ or under controlled conditions that do not represent the complexity of an in vivo system. The simulations performed here are more directly compared to in vitro conditions. This distinction likely impacts to what extent these simulation results are biologically relevant. In vivo and in vitro differences could be clarified throughout and discussed.
  
  Response: As detailed in Response to Comment 3, we have made numerous modifications in the Introduction, Results, and Discussion Section to emphasize the differences between reconstituted and native nucleosomes. The newly added texts also delve into the utilization of the interaction strength measured for reconstituted nucleosomes as a reference point for conceptualizing the interactions among native nucleosomes.
  
  Comment 3: In the introduction (pg. 3), the authors discuss the uncertainty of nucleosome-tonucleosome interaction strengths in vivo. For example, the authors discuss works such as Funke et al. However, Funke et al. used reconstituted nucleosomes from recombinant histones with one controlled modification (H4 acetylation). Therefore, this study that the authors discuss is measuring nucleosome’s in vitro affinity, and there could be significant differences in vivo due to various posttranslational modifications. Please revise the introduction, results section ”Close contacts drive nucleosome binding free energy,” and discussion to reflect and clarify the difference between in vitro and in vivo measurements. Please also discuss how biological variability could impact your findings in vivo. The works of Alexey Onufriev’s lab on the sensitivity of nucleosomes to charge changes (10.1016/j.bpj.2010.06.046, 10.1186/s13072-018-0181-5), such as some PTMs, are one potential starting place to consider how modifications alter nucleosome stability in vivo.
  
  Response: We thank the reviewer for the insightful comments and agree that native nucleosomes can differ from reconstituted nucleosomes due to the presence of histone modifications.
  
  We have revised the introduction to emphasize the differences between in vitro and in vivo nucleosomes. The new text now reads
  
  "The relevance of physicochemical interactions between nucleosomes to chromatin organization in vivo has been constantly debated, partly due to the uncertainty in their strength [cite]. Examining the interactions between native nucleosomes poses challenges due to the intricate chemical modifications that histone proteins undergo within the nucleus and the variations in their underlying DNA sequences [cite]. Many in vitro experiments have opted for reconstituted nucleosomes that lack histone modifications and feature wellpositioned 601-sequence DNA to simplify the chemical complexity. These experiments aim to establish a fundamental reference point for understanding the strength of interactions within native nucleosomes. Nevertheless, even with reconstituted nucleosomes, a consensus regarding the significance of their interactions remains elusive. For example, using force-measuring magnetic tweezers, Kruithof et al. estimated the inter-nucleosome binding energy to be ∼ 14 kBT [cite]. On the other hand, Funke et al. introduced a DNA origamibased force spectrometer to directly probe the interaction between a pair of nucleosomes [cite], circumventing any potential complications from interpretations of single molecule traces of nucleosome arrays. Their measurement reported a much weaker binding free energy of approximately 2 kBT. This large discrepancy in the reported reference values complicates a further assessment of the interactions between native nucleosomes and their contribution to chromatin organization in vivo."
  
  We modified the first paragraph of the results section to read
  
  "Encouraged by the explicit ion model’s accuracy in reproducing experimental measurements of single nucleosomes and nucleosome arrays, we moved to directly quantify the strength of inter-nucleosomes interactions. We once again focus on reconstituted nucleosomes for a direct comparison with in vitro experiments. These experiments have yielded a wide range of values, ranging from 2 to 14 kBT [cite]. Accurate quantification will offer a reference value for conceptualizing the significance of physicochemical interactions among native nucleosomes in chromatin organization in vivo."
  
  New text was added to the Discussion Section to emphasize the implications of simulation results for interactions among native nucleosomes.
  
  "One significant finding from our study is the predicted strong inter-nucleosome interactions under the physiological salt environment, reaching approximately 9 kBT. We showed that the much lower value reported in a previous DNA origami experiment is due to the restricted nucleosomal orientation inherent to the device design. Unrestricted nucleosomes allow more close contacts to stabilize binding. A significant nucleosome binding free energy also agrees with the high forces found in single-molecule pulling experiments that are needed for chromatin unfolding [cite]. We also demonstrate that this strong inter-nucleosomal interaction is largely preserved at longer nucleosome repeat lengths (NRL) in the presence of linker histone proteins. While posttranslational modifications of histone proteins may influence inter-nucleosomal interactions, their effects are limited, as indicated by Ding et al. [cite], and are unlikely to completely abolish the significant interactions reported here. Therefore, we anticipate that, in addition to molecular motors, chromatin regulators, and other molecules inside the nucleus, intrinsic inter-nucleosome interactions are important players in chromatin organization in vivo."
  
  The suggested references (10.1016/j.bpj.2010.06.046, 10.1186/s13072-018-0181-5) are now included as citations # 44 and 45.
  
  Comment 4: Due to the implicit water model, do you know if ions can penetrate the nucleosome more? For example, does the lack of explicit water potentially cause sodium to cluster in the DNA grooves more than is biologically relevant, as shown in Figure 1?
  
  Response: We thank the reviewer for the insightful comments. The parameters of the explicit-ion model were deduced from all-atom simulations and fine-tuned to replicate crucial aspects of the local ion arrangements around DNA (1). The model’s efficacy was demonstrated in reproducing the radial distribution function of Na+ and Mg2+ ion distributions in the proximity of DNA (see Author response image 1). Consequently, the number of ions near DNA in the coarse-grained models aligns with that observed in all-atom simulations, and we do not anticipate any significant, unphysical clustering. It is worth noting that previous atomistic simulations have also reported the presence of a substantial quantity of Na+ ions in close proximity to nucleosomal DNA (refer to Author response image 2).
  
  Author response image 1.
  
  Comparison between the radial distribution functions of Na+ (left) and Mg2+ (right) ions around the DNA phosphate groups computed from all-atom (black) and coarse-grained (red) simulations. Figure adapted from Figure 4 of Ref. 1. The coarse-grained explicit ion model used in producing the red curves is identical to the one presented in the current manuscript. (© 2011, AIP Publishing. This figure is reproduced with permission from Figure 4 in Freeman GS, Hinckley DM, de Pablo JJ (2011) A coarse-grain three-site-pernucleotide model for DNA with explicit ions. The Journal of Chemical Physics 135:165104. It is not covered by the CC-BY 4.0 license and further reproduction of this figure would need permission from the copyright holder.)
  
  Author response image 2.
  
  Three-dimensional distribution of sodium ions around the nucleosome determined from all-atom explicit solvent simulations. Darker blue colors indicate higher sodium density and high density of sodium ions around the DNA is clearly visible. The crystallographically identified acidic patch has been highlighted as spheres on the surface of the histone core and a high level of sodium condensation is observed around these residues. Figure adapted from Ref. 2. (© 2009, American Chemical Society. This figure is reproduced with permission from Figure 7 in Materese CK, Savelyev A, Papoian GA (2009) Counterion Atmosphere and Hydration Patterns near a Nucleosome Core Particle. J. Am. Chem. Soc. 131:15005–15013.. It is not covered by the CC-BY 4.0 license and further reproduction of this figure would need permission from the copyright holder.)
  
  Comment 5: Histone side chain to DNA interactions, such as histone arginines to DNA, are essential for nucleosome stability. Therefore, can the authors provide validation or references supporting your model of the nucleosome with one bead per amino acid? I would like to see if the nucleosomes are stable in an extended simulation or if similar dynamic motions to all-atom simulations are observed.
  
  Response: The nucleosome model, which employs one bead per amino acid and lacks explicit ions, has undergone extensive calibration and has found application in numerous prior studies. For instance, the de Pablo group utilized a similar model to showcase its ability to accurately replicate the experimentally measured nucleosome unwinding free energy penalty (3), sequence-dependent nucleosome sliding (4), and the interaction between two nucleosomes (5). Similarly, the Takada group employed a comparable model to investigate acetylation-modulated tri-nucleosome structures (6), chromatin structures influenced by chromatin factors (7), and nucleosome sliding (8). Our group also employed this model to study the structural rearrangement of a tetranucleosome (9) and the folding of larger chromatin systems (10). In cases where data were available, simulations frequently achieved quantitative reproduction of experimental results.
  
  We added the following text to the manuscript to emphasize previous studies that validate the model accuracy.
  
  "We observe that residue-level coarse-grained models have been extensively utilized in prior studies to examine the free energy penalty associated with nucleosomal DNA unwinding [cite], sequence-dependent nucleosome sliding [cite], binding free energy between two nucleosomes [cite], chromatin folding [cite], the impact of histone modifications on tri-nucleosome structures [cite], and protein-chromatin interactions [cite]. The frequent quantitative agreement between simulation and experimental results supports the utility of such models in chromatin studies. Our introduction of explicit ions, as detailed below, further extends the applicability of these models to explore the dependence of chromatin conformations on salt concentrations."
  
  We agree that arginines are important for nucleosome stability. Since we assign positive charges to these residues, their contribution to DNA binding can be effectively captured. The model’s ability in reproducing nucleosome stability is supported by the good agreement between the simulated free energy penalty associated with nucleosomal DNA unwinding and experimental value estimated from single molecule experiments (Figure 1).
  
  To further evaluate nucleosome stability in our simulations, we conducted a 200-ns-long simulation of a nucleosome featuring the 601-sequence under physiological salt conditions– 100 mM NaCl and 0.5 mM MgCl2, consistent with the conditions in Figure 1 of the main text. We found that the nucleosome maintains its overall structure during this simulation. The nucleosome’s radius of gyration (Rg) remained proximate to the value corresponding to the PDB structure (3.95 nm) throughout the entire simulation period (see Author response image 3).
  
  Author response image 3.
  
  Time trace of the radius of gyration (Rg) of a nucleosome with the 601-sequence along an unbiased, equilibrium trajectory. It is evident the Rg fluctuates around the value found in the PDB structure (3.95 nm), supporting the stability of the nucleosome in our simulation.
  
  Occasional fluctuations in Rg corresponded to momentary, partial unwrapping of the nucleosomal DNA, a phenomenon observed in single-molecule experiments. However, we advise caution due to the coarse-grained nature of our simulations, which prevents a direct mapping of simulation timescale to real time. Importantly, the rate of DNA unwrapping in our simulations is notably overestimated.
  
  It’s plausible that coarse-grained models, lacking side chains, might underestimate the barrier for DNA sliding along the nucleosome. Specifically, our model, without differentiation between interactions among various amino acids and nucleotides, accurately reproduces the average nucleosomal DNA binding affinity but may not capture the energetic variations among binding interfaces. Since sliding’s contribution to chromatin organization is minimal due to the use of strongly positioning 601 sequences, we imposed rigidity on the two nucleotides situated at the dyad axis to prevent nucleosomal DNA sliding. In future studies, enhancing the calibration of protein-DNA interactions to achieve improved sequence specificity would be an intriguing avenue. To underscore this limitation of the model, we have included the following text in the discussion section of the main text.
  
  "Several aspects of the coarse-grained model presented here can be further improved. For instance, the introduction of specific protein-DNA interactions could help address the differences in non-bonded interactions between amino acids and nucleotides beyond electrostatics [cite]. Such a modification would enhance the model’s accuracy in predicting interactions between chromatin and chromatin-proteins. Additionally, the single-bead-per-amino-acid representation used in this study encounters challenges when attempting to capture the influence of histone modifications, which are known to be prevalent in native nucleosomes. Multiscale simulation approaches may be necessary [cite]. One could first assess the impact of these modifications on the conformation of disordered histone tails using atomistic simulations. By incorporating these conformational changes into the coarse-grained model, systematic investigations of histone modifications on nucleosome interactions and chromatin organization can be conducted. Such a strategy may eventually enable the direct quantification of interactions among native nucleosomes and even the prediction of chromatin organization in vivo."
  
  Comment 6: The solvent salt conditions vary in the experimental reference data for internucleosomal interaction energies. The authors note, for example, that the in vitro data from Funke et al. differs the most from other measurements, but the solvent conditions are 35 mM NaCl and 11 mM MgCl2. Since this simulation method allows for this investigation, could the authors speak to or investigate if solvent conditions are responsible for the variability in experimental reference data? The authors conclude on pg. 8-9 and Figure 4 that orientational restraints in the DNA origami methodology are responsible for differences in interaction energy. Can the authors rule out ion concentration contributions?
  
  Response: We thank the reviewer for the insightful comment. We would like to clarify that the black curve presented in Figure 4B of the main text was computed using the salt concentration specified by Funke et al. (35 mM NaCl and 11 mM MgCl2). Furthermore, there were no restraints placed on nucleosome orientations during these calculations. Consequently, the results in Figure 4B can be directly compared with the black curve in Figure 5C. The data in Figure 5C were calculated under physiological salt conditions (150 mM NaCl and 2 mM MgCl2), which are the standard solvent salt conditions used in most studies. It is worth noting that the free energy of nucleosome binding is significantly higher at the salt concentration employed by Funke et al. (14 kBT) than the value at the physiological salt condition (9 kBT). Therefore, comparing the results in Figure 4B and 5C eliminates ion concentration conditions as a potential cause for the the almost negligible result reported by Funke et al.
  
  Comment 7: In the discussion on pg. 12 residual-level should be residue-level.
  
  Response: We apologize for the oversight and have corrected the grammatical error in our manuscript.
  
  RESPONSE TO REFEREE 2:
  
  Comment 0: In this manuscript, the authors introduced an explicit ion model using the coarse-grained modelling approach to model the interactions between nucleosomes and evaluate their effects on chromatin organization. The strength of this method lies in the explicit representation of counterions, especially divalent ions, which are notoriously difficult to model. To achieve their aims and validate the accuracy of the model, the authors conducted coarse-grained molecular dynamics simulations and compared predicted values to the experimental values of the binding energies of protein-DNA complexes and the free energy profile of nucleosomal DNA unwinding and inter-nucleosome binding. Additionally, the authors employed umbrella sampling simulations to further validate their model, reproducing experimentally measured sedimentation coefficients of chromatin under varying salt concentrations of monovalent and divalent ions.
  
  Response: We thank the reviewer’s excellent summary of the work.
  
  Comment 1: The significance of this study lies in the authors’ coarse-grained model which can efficiently capture the conformational sampling of molecules while maintaining a low computational cost. The model reproduces the scale and, in some cases, the shape of the experimental free energy profile for specific molecule interactions, particularly inter-nucleosome interactions. Additionally, the authors’ method resolves certain experimental discrepancies related to determining the strength of inter-nucleosomal interactions. Furthermore, the results from this study support the crucial role of intrinsic physicochemical interactions in governing chromatin organization within the nucleus.
  
  Response: We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank him/her for the detailed suggestions and comments.
  
  Comment 2: The method is simple but can be useful, given the authors can provide more details on their ion parameterization. The paper says that parameters in their ”potentials were tuned to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations.” However, no details on their all-atom simulations were provided; at some point, the authors refer to Reference 67 which uses all-atom simulations but does not employ the divalent ions. Also, no explanation is given for their modelling of protein-DNA complexes.
  
  Response: We appreciate the reviewer’s suggestion on clarifying the parameterization of the explicition model. The parameterization was not carried out in reference 67 nor by us, but by the de Pablo group in citation 53. Specifically, ion potentials were parameterized to fit the potential of mean force between both monovalent and divalent ion pairs, calculated either from all-atom simulations or from the literature. The authors carried out extensive validations of the model parameters by comparing the radial distribution functions of ions computed using the coarse-grained model with those from all-atom simulations. Good agreements between coarse-grained and all-atom results ensure that the parameters’ accuracy in reproducing the local structures of ion interactions.
  
  To avoid confusion, we have revised the text from:
  
  "Parameters in these potentials were tuned to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations."
  
  to
  
  "Parameters in these potentials were tuned by Freeman et al. [cite] to reproduce the radial distribution functions and the potential of mean force between ion pairs determined from all-atom simulations."
  
  We modified the Supporting Information at several places to clarify the setup and interpretation of protein-DNA complex simulations.
  
  For example, we clarified the force fields used in these simulation with the following text
  
  "All simulations were carried out using the software Lammps [cite] with the force fields defined in the previous two sections."
  
  We added details on the preparation of these simulations as follows
  
  "We carried out a series of umbrella-sampling simulations to compute the binding free energies of a set of nine protein-DNA complexes with experimentally documented binding dissociation constants [cite]. Initial configurations of these simulations were prepared using the crystal structures with the corresponding PDB IDs listed in Fig. S1."
  
  We further revised the caption of Figure S1 (included as Author response image 4) to facilitate the interpretation of simulation results.
  
  Author response image 4.
  
  The explicit-ion model predicts the binding affinities of protein-DNA complexes well, related to Fig. 1 of the main text. Experimental and simulated binding free energies are compared for nine protein-DNA complexes [cite], with a Pearson Correlation coefficient of 0.6. The PDB ID for each complex is indicated in red, and the diagonal line is drawn in blue. The significant correlation between simulated and experimental values supports the accuracy of the model. To further enhance the agreement between the two, it will be necessary to implement specific non-bonded interactions that can resolve differences among amino acids and nucleotides beyond simple electrostatics. Such modifications will be interesting avenues for future research. See text Section: Binding free energy of protein-DNA complexes for simulation details.
  
  Comment 3: Overall, the paper is well-written, concise and easy to follow but some statements are rather blunt. For example, the linker histone contribution (Figure 5D) is not clear and could be potentially removed. The result on inter-nucleosomal interactions and comparison to experimental values from Ref#44 is the most compelling. It would be nice to see if the detailed shape of the profile for restrained inter-nucleosomal interactions in Figure 4B corresponds to the experimental profile. Including the dependence of free energy on a vertex angle would also be beneficial.
  
  Response: We thank the reviewer for the comments and agree that the discussion on linker histone results was brief. However, we believe the results are important and demonstrate our model’s advantage over mesoscopic approaches in capturing the impact of chromatin regulators on chromatin organization.
  
  Therefore, instead of removing the result, we expanded the text to better highlight its significance, to help its comprehension, and to emphasize its biological implications. The image in Figure 5D was also redesigned to better visualize the cross contacts between nucleosomes mediated by histone H1. The added texts are quoted as below, and the new Figure 5 is included.
  
  Author response image 5.
  
  Revised main text Figure 5, with Figure 5D modified for improved visual clarity.
  
  "Importantly, we found that the weakened interactions upon extending linker DNA can be more than compensated for by the presence of histone H1 proteins. This is demonstrated in Fig. 5C and Fig. S8, where the free energy cost for tearing part two nucleosomes with 167 bp DNA in the presence of linker histones (blue) is significantly higher than the curve for bare nucleosomes (red). Notably, at larger inter-nucleosome distances, the values even exceed those for 147 bp nucleosomes (black). A closer examination of the simulation configurations suggests that the disordered C-terminal tail of linker histones can extend and bind the DNA from the second nucleosome, thereby stabilizing the internucleosomal contacts (as shown in Fig. 5D). Our results are consistent with prior studies that underscore the importance of linker histones in chromatin compaction [cite], particularly in eukaryotic cells with longer linker DNA [cite]."
  
  We further compared the simulated free energy profile, depicting the center of mass distance between nucleosomes, with the experimental profile, as depicted in Author response image 6. The agreement between the simulated and experimental results is evident. The nuanced features observed between 60 to 80 Ain the simulated profile stem from DNA unwinding˚ to accommodate the incoming nucleosome, creating a small energy barrier. It’s worth noting that such unwinding is unlikely to occur in the experimental setup due to the hybridization method used to anchor nucleosomes onto the DNA origami. Moreover, our simulation did not encompass configurations below 60 A, resulting in a lack of data in˚ that region within the simulated profile.
  
  We projected the free energy profile onto the vertex angle of the DNA origami device, utilizing the angle between two nucleosome faces as a proxy. Once more, the simulated profile demonstrates reasonable agreement with the experimental data (Author response image 6). Author response image 6 has been incorporated as Figure S4 in the Supporting Information.
  
  Author response image 6.
  
  Explicit ion modeling reproduces the experimental free energy profiles of nucleosome binding. (A) Comparison between the simulated (black) and experimental (red) free energy profile as a function of the inter-nucleosome distance. Error bars were computed as the standard deviation of three independent estimates. The barrier observed between 60A and 80˚ A arises from the unwinding of nucleosomal DNA when the two nu-˚ cleosomes are in close proximity, as highlighted in the orange circle. (B) Comparison between the simulated (black) and experimental (red) free energy profile as a function of the vertex angle. Error bars were computed as the standard deviation of three independent estimates. (C) Illustration of the vertex angle Φ used in panel (B).
  
  Comment 4: Another limitation of this study is that the authors’ model sacrifices certain atomic details and thermodynamic properties of the modelled systems. The potential parameters of the counter ions were derived solely by reproducing the radial distribution functions (RDFs) and potential of mean force (PMF) based on all-atom simulations (see Methods), without considering other biophysical and thermodynamic properties from experiments. Lastly, the authors did not provide any examples or tutorials for other researchers to utilize their model, thus limiting its application.
  
  Response: We agree that residue-level coarse-grained modeling indeed sacrifices certain atomistic details. This sacrifice can be potentially limiting when studying the impact of chemical modifications, especially on histone and DNA methylations. We added a new paragraph in the Discussion Section to point out such limitations and the relevant text is quoted below.
  
  "Several aspects of the coarse-grained model presented here can be further improved. For instance, the introduction of specific protein-DNA interactions could help address the differences in non-bonded interactions between amino acids and nucleotides beyond electrostatics [cite]. Such a modification would enhance the model’s accuracy in predicting interactions between chromatin and chromatin-proteins. Additionally, the single-bead-per-amino-acid representation used in this study encounters challenges when attempting to capture the influence of histone modifications, which are known to be prevalent in native nucleosomes. Multiscale simulation approaches may be necessary [cite]. One could first assess the impact of these modifications on the conformation of disordered histone tails using atomistic simulations. By incorporating these conformational changes into the coarse-grained model, systematic investigations of histone modifications on nucleosome interactions and chromatin organization can be conducted. Such a strategy may eventually enable the direct quantification of interactions among native nucleosomes and even the prediction of chromatin organization in vivo."
  
  Nevertheless, it’s important to note that while the model sacrifices accuracy, it compensates with superior efficiency. Atomistic simulations face significant challenges in conducting extensive free energy calculations required for a quantitative evaluation of ion impacts on chromatin structures.
  
  The explicit ion model, introduced by the de Pablo group, follows a standard approach adopted by other research groups, such as the parameterization of ion models using the potential of mean force from atomistic simulations (11; 12). According to multiscale coarse-graining theory, reproducing potential mean force (PMF) enables the coarsegrained model to achieve thermodynamic consistency with the atomistic model, ensuring identical statistical properties derived from them. However, it’s crucial to recognize that an inherent limitation of such approaches is their dependence on the accuracy of atomistic force fields in reproducing thermodynamic properties from experiments, as any inaccuracies in the atomistic force fields will similarly affect the resulting coarse-grained (CG) model.
  
  We have provided the implementation of CG model and detailed instructions on setting up and performing simulations GitHub repository. Examples include simulation setup for a protein-DNA complex and for a nucleosome with the 601-sequence.
  
  References [1] Freeman GS, Hinckley DM, de Pablo JJ (2011) A coarse-grain three-site-pernucleotide model for DNA with explicit ions. The Journal of Chemical Physics 135:165104.
  
  [2] Materese CK, Savelyev A, Papoian GA (2009) Counterion Atmosphere and Hydration Patterns near a Nucleosome Core Particle. J. Am. Chem. Soc. 131:15005–15013.
  
  [3] Lequieu J, Cordoba A, Schwartz DC, de Pablo JJ´ (2016) Tension-Dependent Free Energies of Nucleosome Unwrapping. ACS Cent. Sci. 2:660–666.
  
  [4] Lequieu J, Schwartz DC, De Pablo JJ (2017) In silico evidence for sequence-dependent nucleosome sliding. Proc. Natl. Acad. Sci. U.S.A. 114.
  
  [5] Moller J, Lequieu J, de Pablo JJ (2019) The Free Energy Landscape of Internucleosome Interactions and Its Relation to Chromatin Fiber Structure. ACS Cent. Sci. 5:341–348.
  
  [6] Chang L, Takada S (2016) Histone acetylation dependent energy landscapes in trinucleosome revealed by residue-resolved molecular simulations. Sci Rep 6:34441.
  
  [7] Watanabe S, Mishima Y, Shimizu M, Suetake I, Takada S (2018) Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays. Biophysical Journal 114:2336–2351.
  
  [8] Brandani GB, Niina T, Tan C, Takada S (2018) DNA sliding in nucleosomes via twist defect propagation revealed by molecular simulations. Nucleic Acids Research 46:2788–2801.
  
  [9] Ding X, Lin X, Zhang B (2021) Stability and folding pathways of tetra-nucleosome from six-dimensional free energy surface. Nat Commun 12:1091.
  
  [10] Liu S, Lin X, Zhang B (2022) Chromatin fiber breaks into clutches under tension and crowding. Nucleic Acids Research 50:9738–9747.
  
  [11] Savelyev A, Papoian GA (2010) Chemically accurate coarse graining of doublestranded DNA. Proc. Natl. Acad. Sci. U.S.A. 107:20340–20345.
  
  [12] Noid WG (2013) Perspective: Coarse-grained models for biomolecular systems. The Journal of Chemical Physics 139:090901.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.16.541030v2
www.biorxiv.org www.biorxiv.org

Semantical and Geometrical Protein Encoding Toward Enhanced Bioactivity and Thermostability

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Response to Reviewer 1
  
  Summary:
  
  The authors introduce a denoising-style model that incorporates both structure and primary-sequence embeddings to generate richer embeddings of peptides. My understanding is that the authors use ESM for the primary sequence embeddings, take resolved structures (or use structural predictions from AlphaFold when they're not available), and then develop an architecture to combine these two with a loss that seems reminiscent of diffusion models or masked language model approaches. The embeddings can be viewed as ensemble-style embedding of the two levels of sequence information, or with AlphaFold, an ensemble of two methods (ESM+AlphaFold). The authors also gather external datasets to evaluate their approach and compare it to previous approaches. The approach seems promising and appears to out-compete previous methods at several tasks. Nonetheless, I have strong concerns about a lack of verbosity as well as the exclusion of relevant methods and references.
  
  Thank you for the comprehensive summary. Regarding the concerns listed in the review below, we have made point-to-point response. We also modified our manuscript in accordance.
  
  Advances:
  
  I appreciate the breadth of the analysis and comparisons to other methods. The authors separate tasks, models, and sizes of models in an intuitive, easy-to-read fashion that I find valuable for selecting a method for embedding peptides. Moreover, the authors gather two datasets for evaluating embeddings' utility for predicting thermostability. Overall, the work should be helpful for the field as more groups choose methods/pretraining strategies amenable to their goals, and can do so in an evidence-guided manner.
  
  Thank you for recognizing the strength of our work in terms of the notable contributions, the solid analysis, and the clear presentation.
  
  Considerations:
  
  (1) Primarily, a majority of the results and conclusions (e.g., Table 3) are reached using data and methods from ProteinGym, yet the best-performing methods on ProteinGym are excluded from the paper (e.g., EVEbased models and GEMME). In the ProteinGym database, these methods outperform ProtSSN models. Moreover, these models were published over a year---or even 4 years in the case of GEMME---before ProtSSN, and I do not see justification for their exclusion in the text.
  
  We decided to exclude the listed methods from the primary table as they are all MSA-based methods, which are considered few-shot methods in deep learning (Rao et al., ICML, 2021). In contrast, the proposed ProtSSN is a zero-shot method that makes inferences based on less information than few-shot methods. Moreover, it is possible for MSA-based methods to query aligned sequences based on predictions. For instance, Tranception (Notin et al., ICML, 2022) selects the model with the optimal proportions of logits and retrieval results according to the average correlation score on ProteinGym (Table 10, Notin et al., 2022).
  
  With this in mind, we only included zero-shot deep learning methods in Table 3, which require no more than the sequence and structure of the underlying wild-type protein when scoring the mutants. In the revision, we have added the performance of SaProt to Table 3, and the performance of GEMME, TranceptEVE, and SaProt to Table 5. Furthermore, we have released the model's performance on the public leaderboard of ProteinGym v1 at proteingym.org.
  
  (2) Secondly, related to the comparison of other models, there is no section in the methods about how other models were used, or how their scores were computed. When comparing these models, I think it's crucial that there are explicit derivations or explanations for the exact task used for scoring each method. In other words, if the pre-training is indeed an important advance of the paper, the paper needs to show this more explicitly by explaining exactly which components of the model (and previous models) are used for evaluation. Are the authors extracting the final hidden layer representations of the model, treating these as features, and then using these features in a regression task to predict fitness/thermostability/DDG etc.? How are the model embeddings of other methods being used, since, for example, many of these methods output a k-dimensional embedding of a given sequence, rather than one single score that can be correlated with some fitness/functional metric? Summarily, I think the text lacks an explicit mention of how these embeddings are being summarized or used, as well as how this compares to the model presented.
  
  Thank you for the suggestion. Below we address the questions in three points.
  
  (1) The task and the scoring for each method. We followed your suggestion and added a new paragraph titled “Scoring Function” on page 9 to provide a detailed explanation of the scoring functions used by other deep learning zero-shot methods.
  
  (2) The importance of individual pre-training modules. The complete architecture of the proposed ProtSSN model has been introduced on page 7-8. Empirically, the influence of each pre-training module on the overall performance has been examined through ablation studies on page 12. In summary, the optimal performance is achieved by combining all the individual modules and designs.
  
  (3) The input of fitness scoring. For a zero-shot prediction task, the final score for a mutant will be calculated by wildly-used functions named log-odds ratio (for encoder models, including ours) or loglikelihood (for autoregressive models or inverse folding models. In the revision, we explicitly define these functions in sections “Inferencing” (page 7) and “Scoring Function” (page 9).
  
  (3) I think the above issues can mainly be addressed by considering and incorporating points from Li et al. 2024[1] and potentially Tang & Koo 2024[2]. Li et al.[1] make extremely explicit the use of pretraining for downstream prediction tasks. Moreover, they benchmark pretraining strategies explicitly on thermostability (one of the main considerations in the submitted manuscript), yet there is no mention of this work nor the dataset used (FLIP (Dallago et al., 2021)) in this current work. I think a reference and discussion of [1] is critical, and I would also like to see comparisons in line with [1], as [1] is very clear about what features from pretraining are used, and how. If the comparisons with previous methods were done in this fashion, this level of detail needs to be included in the text.
  
  The initial version did not include an explicit comparison with the mentioned reference due to the difference in the learning task. In particular, [1] formulates a supervised learning task on predicting the continuous scores of mutants of specific proteins. In comparison, we make zero-shot predictions, where the model is trained in a self-supervised learning manner that requires no labels from experiments. In the revision, we added discussions in “Discussion and Conclusion” (lines 476-484):
  
  Recommendations For The Authors:
  
  Comment 1
  
  I found the methods lacking in the sense that there is never a simple, explicit statement about what is the exact input and output of the model. What are the components of the input that are required by the user (to generate) or supply to the model? Are these inputs different at training vs inference time? The loss function seems like it's trying to de-noise a modified sequence, can you make this more explicit, i.e. exactly what values/objects are being compared in the loss?
  
  We have added a more detailed description in the "Model Pipeline" section (page 7), which explains the distinct input requirements for training and inference, as well as the formulation of the employed loss function. To summarize:
  
  (1) Both sequence and structure information are used in training and inference. Specifically, structure information is represented as a 3D graph with coordinates, while sequence information consists of AA-wise hidden representations encoded by ESM2-650M. During inference, instead of encoding each mutant individually, the model encodes the WT protein and uses the output probability scores relevant to the mutant to calculate the fitness score. This is a standard operation in many zero-shot fitness prediction models, commonly referred to as the log-odds-ratio.
  
  (2) The loss function compares the differences between the noisy input sequence and the output (recovered) AA sequence. Noise is added to the input sequences, and the model is trained to denoise them (see “Ablation Study” for the different types of noise we tested). This approach is similar to a one-step diffusion process or BERT-style token permutation. The model learns to recover the probability of each node (AA) being one of 33 tokens. A cross-entropy loss is then applied to compare this distribution with the ground-truth (unpermuted) AA sequence, aiming to minimize the difference.
  
  To better present the workflow, we revised the manuscript accordingly.
  
  Comment 2
  
  Related to the above, I'm not exactly sure where the structural/tertiary structure information comes from. In the methods, they don't state exactly whether the 3D coordinates are given in the CATH repository or where exactly they come from. In the results section they mention using AlphaFold to obtain coordinates for a specific task---is the use of AlphaFold limited only to these tasks/this is to show robustness whether using AlphaFold or realized coordinates?
  
  The 3D coordinates of all proteins in the training set are derived from the crystal structures in CATH v4.3.0 to ensure a high-quality input dataset (see "Training Setup," Page 8). However, during the inference phase, we used predicted structures from AlphaFold2 and ESMFold as substitutes. This approach enhances the generalizability of our method, as in real-world scenarios, the crystal structure of the template protein to be engineered is not always available. The associated descriptions can be found in “Training Setup” (lines 271-272) and “Folding Methods” (lines 429-435).
  
  Comment 3
  
  Lines 142+144 missing reference "Section establishes", "provided in Section ."
  
  199 "see Section " missing reference
  
  214 missing "Section"
  
  Thank you for pointing this out. We have fixed all missing references in the revision.
  
  Comment 4
  
  Table 2 - seems inconsistent to mention the number of parameters in the first 2 methods, then not in the others (though I see in Table 3 this is included, so maybe should just be omitted in Table 2).
  
  In Table 2, we present the zero-shot methods used as baselines. Since many methods have different versions due to varying hyperparameter settings, we decided to list the number of parameters in the following tables.
  
  We have double-checked both Table 3 and Table 5 and confirm that there is no inconsistency in the reported number of parameters. One potential explanation for the observed difference in the comment could be due to the differences in the number of parameters between single and ensemble methods. The ensemble method averages the predictions of multiple models, and we sum the total number of parameters across all models involved. For example, RITA-ensemble has 2210M parameters, derived from the sum of four individual models with 30M, 300M, 680M, and 1200M parameters.
  
  Comment 5
  
  In general, I found using the word "type" instead of "residue" a bit unnatural. As far as I can tell, the norm in the field is to say "amino acid" or "residue" rather than "type". This somewhat confused me when trying to understand the methods section, especially when talking about injecting noise (I figured "type" may refer to evolutionarily-close, or physicochemically-close residues). Maybe it's not necessary to change this in every instance, but something to consider in terms of ease of reading.
  
  Thank you for your suggestion. The term "type" we used is a common expression similar to "class" in the NLP field. To avoid further confusion to the biologists, we have revised the manuscript accordingly.
  
  Comment 6
  
  197 should this read "based on the kNN "algorithm"" (word missing) or maybe "based on "its" kNN"?
  
  We have corrected the typo accordingly. It now reads “the 𝑘-nearest neighbor algorithm (𝑘NN)” (line 198).
  
  Comment 7
  
  200 weights of dimension 93, where does this number come from?
  
  The edge features are derived by Zhou et al., 2024. We have updated the reference in the manuscript for clarity (lines 201-202).
  
  Comment 8
  
  210-212 "representations of the noisy AA sequence are encoded from the noisy input" what is the "noisy AA sequence?" might be helpful to exactly defined what is "noisy input" or "noisy AA sequence". This sentence could potentially be worded to make it clearer, e.g. "we take the modified input sequence and embed it using [xyz]."
  
  We have revised the text accordingly. In the revised see lines 211-212:
  
  Comment 9
  
  In Table 3
  
  Formatting, DTm (million), (million) should be under "# Params" likely?
  
  Also for DDG this is reported on only a few hundred mutations, it might be worth plotting the confidence intervals over the Spearman correlation (e.g. by bootstrapping the correlation coefficient).
  
  We followed the suggestion and added “million” under the "# Params". We have added the bootstrapped results for DDG and DTm to Table 6. For each dataset, we randomly sampled 50% of the data for ten independent runs. ProtSSN achieves the top performance with a considerably small variance.
  
  Comment 10
  
  The paragraph in lines 319 to lines 328 I feel may lack sufficient evidence.
  
  "While sequence-based analysis cannot entirely replace the role of structure-based analysis, compared to a fully structure-based deep learning method, a protein language model is more likely to capture sufficient information from sequences by increasing the model scale, i.e., the number of trainable parameters."
  
  This claim is made without a citation, such as [1]. Increasing the scale of the model doesn't always align with improving out-of-sample/generalization performance. I don't feel fully convinced by the claim that worse prediction is ameliorated by increasing the number of parameters. In Table 3 the performance is not monotonic with (nor scales with) the number of parameters, even within a model. See ProGen2 Expression scores, or ESM-2 Stability scores, as a function of their model sizes. In [1], the authors discuss whether pretraining strategies are aligned with specific tasks. I think rewording this paragraph and mentioning this paper is important. Figure 3 shows that maybe there's some evidence for this but I don't feel entirely convinced by the plot.
  
  We agree that increasing the number of learnable parameters does not always result in better performance in downstream tasks. However, what we intended to convey is that language models typically need to scale up in size to capture the interactions among residues, while structure-based models can achieve this more efficiently with lower computational costs. We have rephrased this paragraph in the paper to clarify our point in lines 340-342.
  
  Comment 11
  
  Line 327 related to my major comment, " a comprehensive framework, such as ProtSSN, exhibits the best performance." Refers to performance on ProteinGym, yet the best-performing methods on ProteinGym are excluded from the comparison.
  
  The primary comparisons were conducted using zero-shot models for fairness, meaning that the baseline models were not trained on MSA and did not use test performance to tune their hyperparameters. It's also worth noting that SaProt (the current SOTA model) had not been updated on the leaderboard at the time of submitting this paper. In the revised manuscript, we have included GEMME and TranceptEVE in Table 5 and SaProt in Tables 3, 5, and 6. While ProtSSN does not achieve SOTA performance in every individual task, our key argument in the analysis is to highlight the overall advantage of hybrid encoders compared to single sequence-based or structure-based models. We made clearer statement in the revised manuscript (line 349):
  
  Comment 12
  
  Line 347, line abruptly ends "equivariance when embedding protein geometry significantly." (?).
  
  We have fixed the typo, (lines 372-373):
  
  Comment 13
  
  Figure 3 I think can be made clearer. Instead of using True/false maybe be more explicit. For example in 3b, say something like "One-hot encoded" or "ESM-2 embedded".
  
  The labels were set to True/False with the title of the subfigures so that they can be colored consistently.
  
  Following the suggestion, we have updated the captions in the revised manuscript for clarity.
  
  Comment 14
  
  Lines 381-382 "average sequential embedding of all other Glycines" is to say that the score is taken as the average score in which Glycine is substituted at every other position in the peptide? Somewhat confused by the language "average sequential embedding" and think rephrasing could be done to make things clearer.
  
  We have revised the related text accordingly a for clearer presentation (lines 406-413).
  
  Comment 15
  
  Table 5, and in mentions to VEP, if ProtSSN is leveraging AlphaFold for its structural information, I disagree that ProtSSN is not an MSA method, and I find it unfair to place ProtSSN in the "non-MSA" categories. If this isn't the case, then maybe making clearer the inputs etc. in the Methods will help.
  
  Your response is well-articulated and clear, but here is a slight revision for improved clarity and flow:
  
  We respectfully disagree with classifying a protein encoding method based solely on its input structure. While AF2 leverages MSA sequences to predict protein structures, this information is not used in our model, and our model is not exclusive to AF2-predicted structures. When applicable, the model can encode structures derived from experimental data or other folding methods. For example, in the manuscript, we compared the performance of ProtSSN using proteins folded by both AF2 and ESMFold.
  
  However, we would like to emphasize that comparing the sensitivity of an encoding method across different structures or conformations is not the primary focus of our work. In contrast, some methods explicitly use MSA during model training. For instance, MSA-Transformer encodes MSA information directly into the protein embedding, and Tranception-retrieval utilizes different sets of MSA hyperparameters depending on the validation set's performance.
  
  To avoid further confusion, we have revised the terms "MSA methods" and "non-MSA methods" in the manuscript to "zero-shot methods" and "few-shot methods."
  
  Comment 16
  
  Table 3 they're highlighted as the best, yet on ProteinGym there's several EVE models that do better as well as GEMMA, which are not referenced.
  
  The comparison in Table 3 focuses on zero-shot methods, whereas GEMME and EVE are few-shot models. Since these methods have different input requirements, directly comparing them could lead to
  
  unfair conclusions. For this reason, we reserved the comparisons with these few-shot models for Table 5, where we aim to provide a more comprehensive evaluation of all available methods.
  
  Response to Reviewer 2
  
  Summary:
  
  To design proteins and predict disease, we want to predict the effects of mutations on the function of a protein. To make these predictions, biologists have long turned to statistical models that learn patterns that are conserved across evolution. There is potential to improve our predictions however by incorporating structure. In this paper, the authors build a denoising auto-encoder model that incorporates sequence and structure to predict mutation effects. The model is trained to predict the sequence of a protein given its perturbed sequence and structure. The authors demonstrate that this model is able to predict the effects of mutations better than sequence-only models.
  
  Thank you for your thorough review and clear summary of our work. Below, we provide a detailed, pointby-point response to each of your questions and concerns.
  
  Strengths:
  
  The authors describe a method that makes accurate mutation effect predictions by informing its predictions with structure.
  
  Thank you for your clear summary of our highlights.
  
  Weaknesses:
  
  Comment 1
  
  It is unclear how this model compares to other methods of incorporating structure into models of biological sequences, most notably SaProt.
  
  (https://www.biorxiv.org/content/10.1101/2023.10.01.560349v1.full.pdf).
  
  In the revision, we have updated the performance of SaProt single models (with both masked and unmasked versions with the pLDDT score) and ensemble models in the Tables 3, 5, and 6.
  
  In the revised manuscript, we have updated the performance results for SaProt's single models (both masked and unmasked versions with the pLDDT score) as well as the ensemble models. These updates are reflected in Tables 3, 5, and 6.
  
  Comment 2
  
  ProteinGym is largely made of deep mutational scans, which measure the effect of every mutation on a protein. These new benchmarks contain on average measurements of less than a percent of all possible point mutations of their respective proteins. It is unclear what sorts of protein regions these mutations are more likely to lie in; therefore it is challenging to make conclusions about what a model has necessarily learned based on its score on this benchmark. For example, several assays in this new benchmark seem to be similar to each other, such as four assays on ubiquitin performed at pH 2.25 to pH 3.0.
  
  We agree that both DTm and DDG are smaller datasets, making them less comprehensive than ProteinGym. However, we believe DTm and DDG provide valuable supplementary insights for the following reasons:
  
  (1) These two datasets are low-throughput and manually curated. Compared to datasets from highthroughput experiments like ProteinGym, they contain fewer errors from experimental sources and data processing, offering cleaner and more reliable data.
  
  (2) Environmental factors are crucial for the function and properties of enzymes, which is a significant concern for many biologists when discussing enzymatic functions. Existing benchmarks like ProteinGym tend to simplify these factors and focus more on global protein characteristics (e.g., AA sequence), overlooking the influence of environmental conditions.
  
  (3) While low-throughput datasets like DTm and DDG do not cover all AA positions or perform extensive saturation mutagenesis, these experiments often target mutations at sites with higher potential for positive outcomes, guided by prior knowledge. As a result, the positive-to-negative ratio is more meaningful than random mutagenesis datasets, making these benchmarks more relevant for evaluating model performance.
  
  We would like to emphasize that DTm and DDG are designed to complement existing benchmarks rather than replace ProteinGym. They address different scales and levels of detail in fitness prediction, and their inclusion allows for a more comprehensive evaluation of deep learning models.
  
  Recommendations For The Authors:
  
  Comment 1
  
  I recommend including SaProt in your benchmarks.
  
  In the revision, we added comparisons with SaProt in all the Tables (3, 5 and 6).
  
  Comment 2
  
  I also recommend investigating and giving a description of the bias in these new datasets.
  
  The bias of the new benchmarks could be found in Table 1, where the mutants are distributed evenly at different level of pH values.
  
  In the revision, we added a discussion regarding the new datasets in “Discussion and Conclusion” (lines 496-504 of the revised version).
  
  Comment 3
  
  I also recommend reporting the model's ability to predict disease using ClinVar -- this experiment is conspicuously absent.
  
  Following the suggestion, we retrieved 2,525 samples from the ClinVar dataset available on ProteinGym’s website. Since the official source did not provide corresponding structure files, we performed the following three steps:
  
  (1) We retrieved the UniProt IDs for the sequences from the UniProt website and downloaded the corresponding AlphaFold2 structures for 2,302 samples.
  
  (2) For the remaining proteins, we used ColabFold 1.5.5 to perform structure prediction.
  
  (3) Among these, 12 proteins were too long to be folded by ColabFold, for which we used the AlphaFold3 server for prediction.
  
  All processed structural data can be found at https://huggingface.co/datasets/tyang816/ClinVar_PDB. Our test results are provided in the following table. ProtSSN achieves the top performance over baseline methods.
  
  Author response table 1.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.01.569522v3
www.medrxiv.org www.medrxiv.org

Evaluation of Clonal Hematopoiesis and Mosaic Loss of Y Chromosome in Cardiovascular Risk: an analysis in prospective studies

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  This manuscript examines the individual and dual effects of CHIP and LOY in MI employing a cohort of ~460 individuals. CHIP is assessed by NGS and LOY is assessed by PCR. The threshold for CHIP is set at 2% (an arbitrary cutoff that is often used) and LOY at 9% (according to the Discussion text - this reviewer may have missed the section that describes why this threshold was employed). The investigation assessed whether LOY could modulate inflammation, atherosclerotic burden, or MI risk associated with CHIP. Neither CHIP nor LOY independently affected hsCRP, atherosclerotic burden, or MI incidence, nor did LOY presence diminish these outcomes in CHIP+ male subjects.
  
  This study represents the first dual analysis of CHIP and LOY on CVD outcomes. The results are largely negative, contradictory to other studies (many with much larger sample sizes). I would attribute the limitation of sample size as a major contributor to the negative data. While the negative data are suspect, the "positive" finding that LOY abolishes the prognostic significance of CHIP on MI is of interest (and consistent with what is understood from mechanistic studies).
  
  Overall, I enjoyed reading the paper, and it is of interest to the research community.
  
  However, I disagree with some of the authors' interpretations of the data.
  
  Generally, many conclusions on CHIP interpretation are based on the comparison of findings from very large datasets that have been evaluated by shallow NGS DNA sequencing. These studies lack sensitivity and accuracy, but this is counterbalanced by their very large sample sizes. Thus, they draw conclusions from the sickest individuals (ICD codes) with the largest clones (explaining the 10% VAF threshold). Here, the study has a well-phenotyped cohort, but as far as this reviewer can tell, the DNA sequencing is "shallow" NGS. Typically, to assess smaller datasets, investigators employ an error-correction method (DNA barcodes, duplex sequencing, etc.) for the sensitivity and accuracy of calling variants. Thus, the current study appears to suffer from this limitation (small sample sizes combined with NGS).
  
  We thank the reviewer for his/her positive and open comment. We acknowledge that we did not use error-corrected sequencing method for our study. However, we do not fully agree with the statement that our NGS sequencing technique is “shallow”.
  
  Considering our entire sequencing panel, we achieve a sequencing depth ≥100X and ≥300X for 100% [99%;100%] and 99% [99%;100%] of the targeted regions respectively. This corresponds to a median depth of 2111X [1578;2574] for all regions sequenced. When considering “CHIP genes”, the median depth is 2694X [1875;3785] for patients from the CHAth study and 3455X [2266;4885] for patients from the 3C study. More specifically, for DNMT3A and TET2 genes, the median depths of sequencing are 2531X [1818;3313] and 3710X [2444;4901] for patients from the CHAth and 3C study respectively. These values are far much higher than the 300X recommended for NGS sequencing by capture technology by the French National Institute of Cancer. Coupling this high depth of sequencing with our bioinformatic pipeline that uses 3 different variant callers, a manual curing for all variants by trained hematobiologists and a bioinformatic tool to estimate the background noise allow us to detect somatic mutation with a VAF of 1% with a high accuracy. Noteworthy, our accuracy in detecting mutations in leukemia-associated genes is tested twice a year as part of our quality control program organized by the French Group of Molecular Biologists in Hematology (GBMHM). We added the information about the depth of sequencing in the Supplementary Methods section.
  
  While the "negative" data from this study are inconclusive, the positive data (i.e. CHIP being prognostic for MI in the absence but not presence of MI) is of interest. Thus, the investigators may want to consider a shorter report that largely focuses on this finding.
  
  We thank the reviewer for his/her interest in this result. We also agree that it would be interesting to focus specifically on demonstrating the impact of mLOY in countering the cardiovascular risk associated with CHIP. We performed additional analysis to demonstrate that this effect was independent of age and cardiovascular risk factors and included this information in the results section.
  
  However, we believe that it is also of interest to show negative results that, although probably due to limitation in sample size, suggest that the cardiovascular risk associated with CHIP is not as strong and clinically pertinent as initially suggested. Of note, if CHIP really increase the risk of Myocardial Infarction in a significant manner, they would be more frequently detected in subjects who suffered from a MI compared to those who did not, which was not observed in our cohort. Moreover, we were able to determine that if CHIP increases the risk of MI, they do it to a much lesser extent (HR = 1.03 for CHIP) -than other established cardiovascular risk factors such as hypercholesterolemia or tobacco use HR = 1.47 and HR = 1.86 respectively in our cohort), which questions the pertinence of considering for CHIP in the management of patients with atherothrombosis. These data have been added in the Results and Discussion sections.
  
  We also believe that our study has the merit to assess directly the impact of CHIP on atheroma burden, which has been performed in only a limited number of studies in the context of coronary artery disease. This could not be possible by analyzing only male subjects in our cohort because it would further decrease the statistical power of our analyses.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The preprint by Fawaz et al. presents the findings of a study that aimed to assess the relationship between somatic mutations associated with clonal hematopoiesis (CHIP) and the prevalence of myocardial infarction (MI). The authors conducted targeted DNA sequencing analyses on samples from 149 MI patients and 297 non-MI controls from a separate cohort. Additionally, they investigated the impact of the loss of the Y chromosome (LOY), another somatic mutation frequently observed in clonally expanded blood cells. The results of the study primarily demonstrate no significant associations, as neither CHIP nor LOY were found to be correlated with an increased prevalence of MI. Of note, the null findings regarding CHIP are in conflict with several larger studies in the literature.
  
  Strengths:
  
  Overall, this is a useful research work on an emerging risk factor for cardiovascular disease (CVD). The use of a targeted sequencing approach is a strength, as it offers higher sensitivity than the whole exome sequencing approaches used in many previous studies.
  
  Weaknesses:
  
  Reporting null findings is definitely relevant in an emerging field such as the role of somatic mutations in cardiovascular disease. Nevertheless, the study suffers from severe limitations, which casts doubts on the authors' conclusions, as detailed below:
  
  (1) The small sample size of the study population is a critical limitation, particularly when reporting null findings that conflict (partly) with positive findings in much larger studies, totaling hundreds of thousands of individuals (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023; Zhao et al, JAMA Cardio 2024). The authors claim that they have 90% power to detect an effect size of CHIP on MI comparable to that in a previous report (Jaiswal et al, NEJM 2017). However, the methodology used to estimate statistical power is not described.
  
  We thank the reviewer for his/her pertinent and constructive comments. We totally agree that our study presents a substantially smaller sample size as compared to the studies of Zekavat et al, Vlasschaert et al or Zhao et al.
  
  The CHAth study was designed as a prospective study (which is not frequent in CHIP reports) to demonstrate that, if CHIP increase the risk of MI, they would be detected more frequently in patients who suffered from a MI compared to those who did not. To achieve this, we defined eligibility criteria to have a rather high prevalence of CHIP and optimize the statistical power of a study based on a limited number of patients. We thus enrolled patients who suffered from a first MI after the age of 75 years. These patients had to be compared with subjects from the Three-City study who had 65 years or more at inclusion and did not present any cardiovascular event before inclusion.
  
  To determine the number of patients necessary to achieve our objective, we considered a CHIP prevalence of 20% in the general population after the age of 75 years, as estimated when we set up our study (Genovese et al, NEJM 2014, Jaiswal et al, NEJM 2014, Jaiswal et al, NEJM 2017). At this time the relative risk of MI associated with CHIP was shown to be 1.7, leading to an expected prevalence of CHIP of 37% in subjects who presented a MI. Based on these hypotheses, the recruitment of 112 patients in the CHAth would have been sufficient to detect a significant higher prevalence of CHIP in MI(+) patients compared to MI(-) subjects with a power of 0.90 at a type I error rate of 5%. These calculations were performed by the Research Methodology Support Unit of the University Hospital of Bordeaux. These data were added in the Supplementary Methods section to expose more clearly the design and objectives of the CHAth study.
  
  Finally, we recruited 149 patients in the CHAth study and compared them to 297 control subjects. Although recruiting more patients than initially needed, we observed a similar prevalence of CHIP between our 2 cohorts, suggesting that the cardiovascular risk associated with CHIP is lower than the 1.7 increased risk claimed in most publications related to CHIP in the cardiovascular field. We have to notice that our study was not designed to demonstrate the impact of CHIP on the occurrence of MI during follow-up, which could explain our negative results due to a limited number of patients as stated by the reviewers. This statement has been added in the Supplementary Methods section. However, performing such analysis allowed us to confirm that the risk of MI associated with CHIP was lower than 1.7 and lower than the one associated with hypercholesterolemia or smoking.
  
  We would like also to notice that the eligibility criteria for both CHAth and the Three-City study can have led to a selection bias, possibly contributing to the contradiction of our results with other studies. As stated before, in the CHAth study, only patients who experience a first MI after the age of 75 were enrolled. In the Three-City study, all subjects had 65 years or more at inclusion. On the contrary, most of the cohorts showing an association between CHIP and cardiovascular events were composed of younger subjects:
  
  -          Bioimage : median age 70 years (55-80 years)
  
  -          MDC : median age 60 years
  
  -          ATVB : subjects with a MI before 45 years
  
  -          PROMIS : subjects between 30 and 80 years
  
  -          UK Biobank : between 40 and 70 years at inclusion, median age of 58 years in the study of Vlasschaert et al.
  
  -          Zhao et al : median age of 53.83 years (45.35-62.39 years).
  
  This last information was added in the Discussion section (lines 452-454).
  
  Furthermore, the work by Jaiswal et al (NEJM 2017) showed a hazard ratio of approx. 2.0, but more recent work in much larger populations suggests that the overall effect of CHIP on atherosclerotic CVD is smaller, most likely due to the heterogeneity of effects of different mutated genes (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023; Zhao et al, JAMA Cardio 2024).
  
  We thank the reviewer for insisting on the fact that the initial HR of 2.0 observed by Jaiswal et al was shown to be smaller in more recent studies. This corresponds to what we wrote in the introduction (lines 103-109) and discussion (lines 365-370, 465-471).
  
  In addition, several analyses in the current manuscript are conducted separately in MI(+) (n= 149) and MI(-) (N=297) individuals, further limiting statistical power. Power is still lower in the investigation of the effects of LOY and its interaction with CHIP, as only men are included in these analyses. Overall, I believe the study is severely underpowered, which calls into question the validity of the reported null findings.
  
  We agree with the reviewer that the statistical power of our study is lower than the one of other studies, in particular those based on several hundred thousand patients. Whenever possible, we analyzed our data by combining MI(+) and MI(-) subjects. However, for some aspects such as atherosclerosis, we did not have the same parameters available for these 2 groups and had to analyze them separately, leading to a more limited statistical power. We also have to acknowledge that our study was not designed to demonstrate an effect of CHIP on incident MI (as stated before), limiting our statistical power to demonstrate an effect of CHIP +/- mLOY on the incident risk of coronary artery disease.
  
  However, when designing our prospective study (CHAth study), we aimed to address the limitations of a small cohort and obtain rapid, significant results regarding the impact of CHIP. We hypothesized that if CHIP really increases the risk of myocardial infarction (MI), it would be detected more frequently in patients who have experienced a MI compared to those who have not. This study design would demonstrate the importance of CHIP in MI pathophysiology without requiring thousands of patients. However, we did not observe such an association questioning the relevance of detecting CHIP for the management of patients in the field of Cardiology. This was confirmed by the fact that in our cohort, the cardiovascular risk associated with CHIP appears to be low (HR = 1.03 [0.657;1.625] after adjustment on sex, age and cardiovascular risk factors) compared to hypercholesterolemia (HR = 1.474 [0.758;2.866]) or smoking (HR = 1.865 [0.943;3.690]). These data have been added in the Results and Discussion sections.
  
  In addition, we would like to mention that despite the limited number of subjects studied, we do not have only negative results. When studying only men subjects, we were able to show that CHIP accelerate the occurrence of MI, particularly in the absence of mLOY (Figure 2D). This effect was independent of age and cardiovascular risk factors (diabetes, cholesterol and high blood pressure). We added this last information in the results section of the manuscript, although we acknowledge that this has to be confirmed in future work.
  
  (2) Related to the above, it is widely accepted that the effects of CHIP on CVD are highly heterogeneous, as some mutated genes appear to have a strong impact on atherosclerosis, whereas the effect of others is negligible (e.g. Zekavat et al, Nature CVR 2023, Vlasschaert et al, Circulation 2023, among others). TET2 mutations are frequently considered a "positive control", given the multiple lines of evidence suggesting that these mutations confer a higher risk of atherosclerotic disease.
  
  However, no association with MI or related variables was found for TET2 mutations in the current work. Reporting the statistical power specifically for assessing the effect of TET2 mutations would enhance the interpretation of these results.
  
  We thank the reviewer for this pertinent remark. It has indeed been shown that depending on the somatic mutation, the impact of CHIP on inflammation, atherosclerosis and cardiovascular risk is different. The studies cited by the reviewer suggest that DNMT3A mutations have a low impact on atherosclerosis/atherothrombosis while other “non-DNMT3A” mutations, including TET2 mutations, have a greater impact. In particular, Zekavat et al suggested that TP53, PPM1D, ASXL1 and spliceosome mutations have a similar impact on atherosclerosis/atherothrombosis to TET2.
  
  To answer to the reviewer in our cohort, we did not find a clear association between the detection of TET2 mutation with a VAF≥2% and:
  
  -          A history of MI at inclusion (p=0.5339)
  
  -          Inflammation (p=0.440)
  
  -          Atherosclerosis burden :
  
  -   In the CHAth study:
  
  - p=0.031 for stenosis≥50%
  
  - p=0.442 fir multitruncular lesions
  
  - p=0.241 for atheroma volume
  
  -   in the 3C study :
  
  - p=0.792 for the presence of atheroma
  
  - p=0.3966 for the number of plaques
  
  - p=0.876 for intima-media thickness
  
  -          Incidence of MI (p=0.5993)
  
  Similarly we did not find any association between the detection of TET2 mutations with a VAF≥1% and:
  
  -          A history of MI at inclusion (p=0.5339)
  
  -          Inflammation (p=0.802)
  
  -          Atherosclerosis burden :
  
  -   In the CHAth study :
  
  - p=0.104 for stenosis≥50%
  
  - p=0.617 fir multitruncular lesions
  
  - p=0.391 for atheroma volume
  
  -   in the 3c study:
  
  - p=0.3291 for the presence of atheroma
  
  - p=0.2060 for the number of plaques
  
  - p=0.2300 for intima-media thickness
  
  -          Incidence of MI (p=0.195)
  
  However, analyzing the specific effect of TET2 mutations reduces the cohort of CHIP(+) subjects to 61 individuals. In these conditions, considering a prevalence of “TET2-CHIP” of 13.5% (in our cohort) and a hazard ratio of 1.3 (Vlasschaert et al), the statistical power to show an increased risk of MI is only 16%.
  
  (3) One of the most essential features of CHIP is the tight correlation with age. In this study, the effect of age on CHIP (Supplementary Tables S5, S6) seems substantially milder than in previous studies. Given the relatively weak association with age here, it is not surprising that no association with MI or atherosclerotic disease was found, considering that this association would have a much smaller effect size.
  
  We thank the reviewer for highlighting this point. Although the difference of median age between subjects with or without a CHIP is not very important in our cohort, we did observe a significant association of CHIP with age:
  
  -          The differences in age were statistically significant both in the CHAth and 3C study (Supplementary Tables S5 and S6)
  
  -          We observed a significant association between age and CHIP prevalence (p<0.001 for the total cohort, p=0.0197 for the CHAth study, and p=0.0394 for the 3C cohort after adjustment on sex). This association was already shown in the figure 1. We added the significant association between age and CHIP prevalence in the Results section (line 279).
  
  As stated before, we have to remind the reviewer that we enrolled only subjects of ≥75 years and ≥65 years in the CHAth and 3C studies respectively. This led to a median age in our cohort that was substantially higher than in other cohorts (in particular the UK Biobank and the different cohorts studied by Jaiswal et al). This could have contributed to an apparent milder effect of age on CHIP, even if this association was still observed.
  
  In addition, there are previous reports of sex-related differences in the prevalence of CHIP, is there an association between CHIP and age after adjusting for sex?
  
  The reviewer correctly pointed out that sex has been associated with various aspects of CHIP. While Zekavat et al reported that CHIP carriers were more frequently males, Kar et al (Nature Genetics 2022), and Kamphuis et al (Hemasphere 2023) did not observe a difference in the prevalence of CHIP between males and females, but rather a difference in the mutational spectrum. Male presented more frequently SRSF2, ASXL1, SF3B1, U2AF1, JAK2, TP53 and PPM1D mutations while females had more frequently DNMT3A, CBL and GNB1 mutations.
  
  In our study, the association between CHIP prevalence and age was indeed significant even after adjustment on sex (p<0.001 for the total cohort, p=0.0197 for the CHAth study and p=0.0394 for the 3C).
  
  (4) The mutated genes included in the definition of "CHIP" here are markedly different than those in most previous studies, particularly when considering specifically the studies that demonstrated an association between CHIP and atherosclerotic CVD. For instance, the definition of CHIP in this manuscript includes genes such as ANKRD26, CALR, CCND2, and DDX41... that are not prototypical CHIP genes. This is unlikely to have a major impact on the main results, as the vast majority of mutations detected are indeed in bona fide CHIP genes, but it should be at least acknowledged.
  
  We agree with the reviewer that our gene panel includes genes that are not considered prototypical CHIP genes. This acknowledgment has been added in the Supplementary Methods section. To perform this study, we did not design a specific targeted sequencing panel. We used the one that is used for the diagnosis of myeloid malignancies at the University Hospital of Bordeaux. ANKRD26 and DDX41 are genes that, when mutated, predispose to the development of hematological malignancies. CALR mutations are frequently detected in Myeloproliferative Neoplasms while CCND2 mutation can be detected in acute myeloid leukemia among other diseases. As usually performed in our routine practice, we analyzed all the genes in the panel. However, as stated by the reviewer, most of the mutations we detected involved bona fide CHIP genes.
  
  Furthermore, the strategy used here for the CHIP variant calling and curation seems substantially different than that used in previous studies, which precludes a direct comparison. This is important because such differences in the definition of CHIP and the curation of variants are the basis of most conflicting findings in the literature regarding the effects of this condition. Ideally, the authors should conduct sensitivity analyses restricted to prototypical CHIP genes, using the criteria that have been previously established in the field (e.g. Vlasschaert et al, Blood 2023).
  
  We agree with the reviewer, our strategy for CHIP variant calling and curation was substantially different from what has been used in other studies. We decided to apply the criteria we used in previous studies for the analysis of somatic mutation in myeloid malignancies. Because CHIP are defined by the detection of “somatic mutations in leukemia driver genes”, this appeared to follow the definition of CHIP.
  
  We also acknowledge that this discrepancy with the criteria defined by Vlasschaert et al could contribute to our findings that differ from those of other studies. We thus checked whether the variants detected were in accordance or not with the criteria defined by Vlasschaert et al. Pooling the 2 cohorts, we detected 439 variants, 381 of which were in accordance with the criteria established by Vlasschaert et al, representing a concordance rate of 86.8%. Moreover, the variants “wrongly” retained according to these criteria had an impact on the conclusion on the detection of CHIP in only 15 patients (because these variants were associated with a mutation in a bona fide CHIP gene and/or because its VAF was below 2%). Thus, the impact of CHIP variant calling and curation had only a limited impact on our results. This has been added in the discussion (lines 455-459).
  
  However, we would like to discuss the criteria that have been defined by Vlasschaert et al which are probably too restrictive. For some genes, such as ZRSR2, in addition to frameshift and non-sens mutations that are expected to be associated with a loss of function, only some single nucleotide variations were retained (probably those detected by this group). In our patient 20785, we detected a c.524A>G, p.(Tyr175Cys) mutation that was not reported in the list published by Vlasscheart et al. However, this variant presents a VAF presumptive of a somatic origin (3%), affects the Zn finger domain of the protein and is observed in a male subject. Thus, it presents several criteria to consider it as associated with a loss of function. Similarly, the CBL variant c.1139T>C, p.(Leu380Pro) observed in our patient 21536, although not affecting the residues 381-421 of the protein (the criteria defined by Vlasschaert et al), has been reported in 29 cases of hematological malignancies. It is thus likely to have a significant impact on the behavior of hematopoietic cells. Moreover, in the same patient, a TET2 c.4534G>A, p.(Ala1512Thr) variant was detected. Although not affecting directly the CD1 domain, it has been reported in a case of AML with a VAF suggestive of a somatic origin (Papaemmanuil et al, NEJM 2016). The SH2B3 gene is not considered by Vlasschaert et al as a bona fide CHIP gene, contrary to other genes involved in cell signaling such as JAK2, GNAS, GNB1, CBL. However, inactivating mutations in SH2B3 can be detected in myeloid malignancies and were recently shown to drive the phenotype in some patients with a MPN (Zhang et al, American Journal of Hematology 2024). We could thus expect that this also happens in our patients 22591 and 21998 who harbor mutations of SH2B3 (a SNV in the PH domain and a frameshift mutation respectively).
  
  Regarding BCOR, STAG2, SMC3 and RAD21 genes, although frameshift mutations are the most prevalent, there are several reports on the existence of SNV in the context of hematological malignancies (COSMIC, Blood (2021) 138 (24): 2455–2468, Blood Cancer Journal (2023)13:18 ; https://doi.org/10.1038/s41408-023-00790-1).
  
  We can also add that although Vlasschaert et al did not consider CSF3R and CALR as CHIP-genes, Kessler et al did. Because CHIP are an emerging field, it should be considered that the concepts that define it are expected to evolve, as demonstrated by the recent study of the Jyoti Nangalia’s group (Bernstein et al, Nature Genetics 2024) who showed that 17 additional genes (including SH2B3) should be considered as driver of clonal hematopoiesis.
  
  (5) An important limitation of the current study is the cross-sectional design of most of the analyses. For instance, it is not surprising that no association is found between CHIP and prevalent atherosclerosis burden by ultrasound imaging, considering that many individuals may have developed atherosclerosis years or decades before the expansion of the mutant clones, limiting the possible effect of CHIP on atherosclerosis burden. Similarly, the analysis of the relationship between CHIP and a history of MI may be confounded by the potential effects of MI on the expansion of mutant clones. In this context, it is noteworthy that the only positive results here are found in the analysis of the relationship between CHIP at baseline and incident MI development over follow-up. Increasing the sample size for these longitudinal analyses would provide deeper insights into the relationship between CHIP and MI.
  
  We agree with the reviewer that increasing the sample size for longitudinal analyses would provide deeper insights into the relationship between CHIP and MI. Unfortunately, for the moment, we do not have access to additional samples of the 3C study and are not able to perform these additional analyses.
  
  (6) The description of some analyses lacks detail, but it seems that statistical analyses were exclusively adjusted for age or age and sex. The lack of adjustment for conventional cardiovascular risk factors in statistical analyses may confound results, particularly given the marked differences in several variables observed between groups.
  
  The reviewer is right when saying that we adjusted our analyses on age and/or sex. This was done because as stated before, our results did not show a lot of significant differences. However, we reanalyzed our data, adjusting further the tests for conventional cardiovascular risk factors, and observed similar results. These data have been added in the results section (lines 286-287, 303, 319, 331-332, 341).
  
  (7) The variant allele fraction (VAF) threshold for identifying clinically relevant clonal hematopoiesis is still a subject of debate. The authors state that subjects without any detectable mutation or with mutations with a VAF below 2% were considered non-CHIP carriers. While this approach is frequent in the field, it likely misses many impactful mutations with lower VAFs. Such false negatives could contribute to the null findings reported here. Ideally, the authors should determine the lower detection limit of their sequencing approach (either computationally or through serial dilution experiments) and identify the threshold of VAF that can be detected reliably with their sequencing assay. The association between CHIP and MI should then be evaluated considering all mutations above this VAF threshold, in addition to sensitivity analyses with other thresholds frequent in the literature, such as 1% VAF, 2% VAF, and 10% VAF.
  
  We agree with the reviewer that the VAF threshold for identifying clinically relevant CH is still debated. As stated in the manuscript and by the reviewer, we used the conventional threshold of 2%. Considering that different studies have shown that the cardiovascular risk is increased in a more important manner for CHIP with a high VAF (Jaiswal et al, NEJM 2017, Kessler et al Nature 2022, Vlasschaert et al, Circulation 2023), it is not sure that considering variant with a very low VAF (below 2%) would help us in finding an impact of CHIP on inflammation, atherosclerosis or atherothrombotic risk.
  
  However, as mentioned by the reviewer, variants with a low VAF could have a clinical impact as recently reported by Zhao et al. In France, the use of biological analysis for medical purposes imposes to demonstrate that all its aspects are mastered, including their performances. In that context, we determined that our NGS strategy allowed us to reliably detect mutation with a VAF down to 1% (data not shown). As stated in the discussion, we also analyzed our results considering variants with a VAF of 1% and found similar results (lines 394-395). The sensitivity analyses were already mentioned in the manuscript, as we also searched for an effect of CHIP with a high VAF (≥5%) and found no effect neither. We did not have a sufficient number of subjects carrying variants with a VAF≥10% to perform analysis with this threshold.
  
  (8) The authors should justify the use of 3D vascular ultrasound imaging exclusively in the supra-aortic trunk. I am not familiar with this technique, but it seems to be most typically used to evaluate atherosclerosis burden in superficial vascular beds such as carotids or femorals. I am concerned about the potential impact of tissue depth on the accurate quantification of atherosclerosis burden in the current study (e.g. https://doi.org/10.1016/j.atherosclerosis.2016.03.002). It is unclear whether the carotids or femorals were imaged in the study population.
  
  We apologize for the lack of precision in the Methods section. As stated by the reviewer, we evaluated the atherosclerosis burden in superficial vascular beds. We measured atheroma volume at the site of the common carotid (as described by B Lopez-Melgar, in Atheroslerosis, 2016). We did not analyze femoral arteries in this study. The sentence is now corrected in the Methods (lines 176-179).
  
  (9) The specific criteria used to define LOY need to be justified. LOY is stated to be defined based on a "A cut off of 9% of cells with mLOY defined the detection of a mLOY based on the study of 30 men of less than 40 years who had a normal karyotype as assessed by conventional cytogenetic study." As acknowledged by the authors, this definition of LOY is substantially different than that used in recent studies employing the same technique to detect LOY (Mas-Peiro et al, EHJ 2023). In addition, it seems essential to provide more detailed information on the ddPCR assay used to determine LOY, including the operating range and, more importantly, the lower limit of detection (%LOY) of the assay. A dilution series of a control DNA with no LOY would be helpful in this context.
  
  We apologize if the definition of the threshold for detecting mLOY was unclear. To test the performance of our ddPCR technique, we first determined the background noise by testing DNA obtained from total leukocytes in 30 men of ≤40 years who presented a normal karyotype as assessed by conventional cytogenetic technics. In this control population supposed not to carry mLOY, we detected of proportion of cells with mLOY of 2,34+/-1,98 (see Author response image 1, panel A). We thus considered a threshold above 9% as being different from background noise (mean + 3 times the standard deviation).
  
  We then compared the proportion of cells with mLOY measured by ddPCR and conventional karyotype and observed a rather good correlation between the 2 technics (R2\=0.6430, p=0.0053, see Author response image 1, panel B). Finally, we tested the reliability of our ddPCR assay in detecting different levels of mLOY using a dilution series of control DNA (from an equivalent of 2% of cell with mLOY to 98% of cells with mLOY). We observed a very nice correlation between the theoretical and measured proportions of cells with mLOY (R2\=0.9989, p<0.001, see Author response image 1, panel C). Of note, the proportion of mLOY measured for values ≤10% were concordant with theoretical values. However, considering the background noise determined with control DNA, we were unable to confirm that this “signal” was different from the background noise. Therefore, we set a threshold of 9% to define the detection of mLOY by ddPCR. It is also noteworthy that the 10% cell population with mLOY was consistently detected by the ddPCR technique. This has been added in the Methods section (lines 228-235).
  
  Author response image 1.
  
  (10) Our understanding of the relationship between CHIP and CVD is evolving fast, and the manuscript should be considered in the context of recent literature in the field. For instance, the recent work by Zhao et al (JAMA Cardio 2024, doi:10.1001/jamacardio.2023.5095) should be considered, as it used a similar targeted DNA sequencing approach as the one used here, but found a clear association between CHIP and coronary heart disease (in a population of 6181 individuals).
  
  We thank the reviewer for this pertinent reference. We did not include it in the first version of our manuscript because it was not published yet when we submitted our work. We included this reference in the discussion (lines 451, 455, 464). We also included the recent study of Heimlich et al (Circ Gen Pre Med 2024, lines 464-468) who studied the association of CHIP with atherosclerosis burden.
  
  (11) The use of subjective terms like "comprehensive" or "thorough" in the title of the manuscript does not align with the objective nature of scientific reporting.
  
  We removed the terms “comprehensive” and “thorough” from the title and the text.
  
  Recommendations for the authors:
  
  Reviewing Editor:
  
  The Editors believe that in light of the small study the word Comprehensive has to be removed (including from the title and abstract).
  
  We agree and removed the term comprehensive from the title and the text.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Other comments:
  
  It has long been recognized that hsCRP does not adequately address the inflammation associated with CHIP. For example, see Bick et al Nature 2020; 586:763. Through an assessment of a large dataset, the regulation of multiple inflammatory mediators was associated with CHIP but not with CRP.
  
  We agree that hsCRP is probably not the most sensitive marker for inflammatory state associated with CHIP. However, it is the most commonly used one in medical practise. However, as indicated in the discussion (lines 418-420), we did not observe any association between CHIP and the plasmatic level of different cytokines (IL1ß, IL6, IL18 and TNFα) in patients enrolled in the CHAth study.
  
  Many of the citations lack journal names, volumes, page numbers, etc.
  
  We apologize for this and corrected the citations.
  
  Please provide more details on the methodology (i.e. is CHIP assessed only through NGS with no error correction?). Specify the rationale for why the 9% LOY threshold was employed. Provide this information in the Methods section.
  
  We added more details on the methodology as demanded in the results section (lines 212-214 and 228-235).
  
  Supplementary Table S3 lacks headings. What are the designations for columns 6-8?
  
  We apologize for this and corrected the Table. Columns 6-8 correspond to the VAF, coverage of the variants and depth of sequencing, as for Table S4.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2024.01.15.24301313v2
www.biorxiv.org www.biorxiv.org

Basolateral amygdala oscillations enable fear learning in a biophysical model

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  eLife assessment:
  
  This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because of insufficient grounding in prior experimental results and insufficient consideration of alternative explanations. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.
  
  We disagree with the overall assessment of our paper. The current reviews published below focus on two kinds of perceived inadequacies. Reviewer 1 (R1) was concerned that the fear conditioning paradigm used in the model is not compatible with some of the experiments we are modeling. The reviewer helpfully suggested in the Recommendations for the Authors some papers, which R1 believed exposed this incompatibility. In our reading, those data are indeed compatible with our hypotheses, as we will explain in our reply. Furthermore, the point raised by R1 is an issue for the entire field. We will suggest a solution to that issue based on published data.
  
  Reviewer 2 (R2) said that there is no evidence that the BLA is capable of producing, by itself, the rhythms that have been observed during fear conditioning in BLA and, furthermore, that the paper we cited to support such evidence, in fact, refutes our argument. We believe that the reasoning used by reviewer 2 is wrong and that the framework of R2 for what counts as evidence is inadequate. We spell out our arguments below in the reply to the reviewers.
  
  Finally, we believe this work is of interest far beyond investigators studying fear conditioning. The work shows how rhythms can create the timing necessary for spike-timing-dependent plasticity using multiple time scales that come from multiple different kinds of interneurons found both in BLA and, more broadly, in cortex. Thus, the work is relevant for all kinds of associative learning, not just fear conditioning. Furthermore, it is one of the first papers to show how rhythms can be central in mechanisms of higher-order cognition.
  
  Reviewer #1
  
  We thank Reviewer 1 for his kind remarks about our first set of responses and their understanding of the importance of the work. There was only one remaining point to be addressed:
  
  Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.
  
  It is true that some fear conditioning protocols involve non-overlapping US and CS, raising the question of how plasticity happens or whether behavioral effects may happen without plasticity. This is an issue for the entire field (Sun et al., F1000Research, 2020). Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS.
  
  Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021) and M1 receptors target spines receiving glutamatergic input (McDonald et al., 2019). Thus, ACh from BF should elicit a long-lasting depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This implies that the release of ACh can affect the consequences of the CS in successive trials. This should include higher spiking rates and more sustained activity in the ECS neurons after the first presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the role of ACh release by BF. To the best of our knowledge, there is nothing in the literature that contradicts this potential solution. The model we have may be considered a “minimal” model that puts in by hand the higher frequency due to the cholinergic drive without explicitly modeling it. As R1 says, it is important for us to give the motivation of that higher frequency; in the next revision, we will be explicit about how the needed adequate firing rate can come about without an overlap of CS and US in any given trial.
  
  Reviewer #2
  
  The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA.
  
  After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extra-hippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled.
  
  Reviewer 2 (R2) says “the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered.” In our revision, we cited (Antonoudiou et al., 2022), who showed that BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings. R2 pointed out that this paper produces such theta under conditions in which the inhibition is totally removed. R2 then states that the resulting rhythmic populations burst at theta “are driven solely by excitatory cells. Thus, the results by (Antonoudiou et al., 2022) contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices.”
  
  This reasoning of R2 is faulty. With all GABAergic currents omitted, the LFP is composed of excitatory currents and intrinsic currents. Our model of the LFP includes all synaptic and membrane currents. In our model, the high theta comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. We are including a new simulation, which models the activity of the slice in the presence of kainate (as done in Antonoudiou et al., 2022), providing additional excitation to the network. If the BLA starts at high excitation, our model produces an ongoing gamma in the VIP cells that suppress SOM cells and allows a PING gamma to form between PV and F cells; with Gabazine (modeled as the removal of all the GABAergic synapses), this PING is no longer possible and so the gamma rhythm disappears. As expected, the simulation shows that the model produces theta with Gabazine; the model also shows that a PING rhythm is produced without Gabazine, and that this rhythm goes away with Gabazine because PING requires feedback inhibition (see Author response image 1). Thus, the theta increase with Gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model, so that paper does support the model.
  
  Author response image 1.
  
  Spectral properties of the BLA network without (black) versus with Gabazine (magenta). Power spectra of the LFP proxy, which is the linear sum of AMPA, GABA (only present in the absence of Gabazine, D-, NaP-, and H-currents. Both power spectra are represented as mean and standard deviation across 10 network realizations. Bottom: inset between 35 and 50 Hz.
  
  Nevertheless, we agree that this paper alone is not sufficient evidence that the BLA can produce a low theta. We have recently learned of a new paper (Bratsch-Prince et al., 2024) that is directly related to the issue of whether the BLA by itself can produce low theta, and in what circumstances. In this study, intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be produced by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low-theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003).
  
  We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. We do not explicitly include ACh modulation of BLA in our paper, but in current work with experimentalists, we aim to show that ACh is essential to the theta by activating the BLA VIP cells. In our re-revised version, we will discuss Bratsch-Prince et al., 2024 and its connection to our hypothesis that the theta oscillations can be produced within the BLA.
  
  Note that we have already included a paragraph stating explicitly that our hypothesis in no way contradicts the idea that inputs to the BLA may include theta oscillations. Indeed, the following paragraphs in the revised paper describe the complexity of trying to understand the origin of brain rhythms in vivo. R2 did not appear to take this complexity, and the possible involvement of neuromodulation, into account in their current position that the theta rhythms cannot be produced intrinsically in the BLA.
  
  From revised paper: “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms.
  
  Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”
  
  We believe our current paper is important to show how detailed biophysical modeling can unearth the functional implications of physiological details (such as the biophysical bases of rhythms), which are often (indeed, usually) ignored in models, and why rhythms may be essential to some cognitive processes (including STDP). Indeed, for evaluating our paper it is necessary to go back to the purpose of a model, especially one such as ours, which is “hypothesis/data driven”. The hypotheses of the model serve to illuminate the functional roles of the physiological details, giving meaning to the data. Of course, the hypotheses must be plausible, and we think that the discussion above easily clears that bar. Hypotheses should also be checked experimentally, and a model that explains the implications of a hypothesis, such as ours, provides motivation for doing the hard work of experimental testing. We think that R1 understands this and has been very helpful.
  
  —————
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.
  
  Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”.
  
  We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA. We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. Details are below in the answer to reviewers.
  
  Reviewer #1 (Public Comments):
  
  (1) … the weakness is that their attempt to align with the experimental literature (specifically Krabbe et al. 2019) is performed inconsistently. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+).
  
  In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Krabbe et al., 2019, Supp. Fig. 4, panel t). We also omitted PV to SOM, PV to VIP, SOM to VIP, VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning.
  
  We reply with more details below to the Recommendations for the Authors, including new text.
  
  (2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.
  
  Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations.
  
  We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.
  
  Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spiketiming-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature.
  
  We reply with more details below to the Recommendations for the Authors, including new text.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Major points:
  
  (1) This paper draws extensively from Krabbe et al. 2019, but it does not do so consistently. The paper would be strengthened if it tried to better match the circuit properties and activations.
  
  Specifically:
  
  a. Krabbe found that PV interneurons were comparably activated by the US (see Supp Fig 1). Your model does not include that. The basis for the Krabbe 2019 claim that PV US responses are weaker is that they have a slightly larger proportion of cells inhibited by the US, but this is not especially compelling. In addition, their Fig 2 showed that VIP and SOM cells receive afferents from the same set of upstream regions.
  
  b. The model excluded PV-SOM connections, but this does not agree with Krabbe et al. 2019, Table 2. PV cells % connectivity and IPSC amplitudes were comparable to those from VIP interneurons.
  
  c. ECS to PV synapses are not included. This seems unlikely given the dense connectivity between PV interneurons and principal neurons in cortical circuits and the BLA (Woodruff and Sah 2007 give 38% connection probability in BLA).
  
  We thank the Reviewer for raising these points, which allow us to clarify how we constrained our model and to do more simulations. Specifically:
  
  a. (Wolff et al., Nature, 2014), cited by (Krabbe et al. 2018), reported that PV and SOM interneurons are on average inhibited by the US during the fear conditioning. However, we agree that (Krabbe et al., 2019) added to this by specifying that PV interneurons respond to both CS+ and US, although the fraction of US-inhibited PV interneurons is larger. As noted by the Reviewer, in the model we initially considered the PV interneurons responding only to CS+ (identified as “CS” in our manuscript). For the current revision, we ran new simulations in which the PV interneuron receives the US input, instead of CS+. It turned out that this did not affect the results, as shown in the figure below: all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful learning.
  
  As for afferents of VIP and SOM from upstream regions, in (Krabbe et al., 2019) is reported that “[…] BLA SOM interneurons receive a different array of afferent innervation compared to that of VIP and PV interneurons, which might contribute to the differential activity patterns observed during fear learning.” Thus, in the model, we are agnostic about inputs to SOM interneurons; we modeled them to fire spontaneously at high theta.
  
  To address these points in the manuscript, we added some new text in what follows:
  
  (1) New Section “An alternative network configuration characterized by US input to PV, instead of CS, also learns the association between CS and fear” in the Supplementary information:
  
  “We constrained the BLA network in Fig. 2 with CS input to the PV interneuron, as reported in (Krabbe et al., 2018). However, (Krabbe et al., 2019) notes that a class of PV interneurons may be responding to US rather than CS. Fig. S3 presents the results obtained with this variation in the model (see Fig. 3 A,B for comparison) and shows that all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful fear learning.
  
  We model the VIP interneuron as affected by US; in addition, (Krabbe et al. 2019) reports that a substantial proportion of them is mildly activated by CS. Replacing the US by CS does not change the input to VIP cells, which is modeled by the same constant applied current. Thus, the VIP CS-induced activity is a bursting activity at low theta, similar to the one elicited by US in Fig. 2.”
  
  (2) Section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning” in Results: “Finally, since (Krabbe et al., 2019) reported that a fraction of PV interneurons are affected by US, we have also run the simulations for single neuron network with the PV interneuron affected by US instead of CS. In this case as well, all the network realizations are learners (see Fig. S3). ”
  
  (3) Section “Conditioned and unconditioned stimuli” in Materials and Methods: “To make Fig. S3, we also considered a variation of the model with PV interneurons affected by US, instead of CS, as reported in (Krabbe et al. 2019).”
  
  b. Re the SOM to PV connection: As reported in the reply to the public reviews, we considered the prominent functional connections reported in (Krabbe et al., 2019), instead of structural connections. That is, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Supp. Fig. 4, panel t, in (Krabbe et al., 2019)). We also omitted PV to SOM, PV to VIP, SOM to VIP, and VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning.
  
  In order to clarify this point, in Section “Network connectivity and synaptic currents” in Materials and Methods, we now say:
  
  “We modeled the network connectivity as presented in Fig. 2B, derived from the prominent functional, instead of structural, connections reported in (Krabbe et al., 2019).”
  
  c. Re the ECS to PV synapses: We thank the Reviewer for the reference provided; as the Reviewer says, the ECS to PV synapses are not included. Upon adding this connection in our network, we found that, unlike the connection suggested in part a above, introducing these synapses would, in fact, change the outcome. Thus, the omission of this connection must be considered an implied hypothesis. Including those synapses with a significant strength would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. Thanks very much for showing us that this needs to be said. Our hypothesis does not contradict the dense connections mentioned by the Reviewer; such dense connectivity does not mean that all pyramidal cells connect to all interneurons. This hypothesis may be taken as a prediction of the model.
  
  The absence of this connection is now discussed at the end of a new Section of the Discussion entitled “Assumptions and predictions of the model”, which reads as follows:
  
  “Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. We note that in (Woodruff and Sah, 2007) only 38% of the pyramidal cells are connected to PV cells. The functional identity of the connected pyramidal cells is unknown. Our model suggests that successful fear conditioning requires F to PV connections and that ECS to PV must be weak or absent.”
  
  (2) Krabbe et al. 2019 and Davis et al. 2017 were referenced for the construction of the conditioned and unconditioned stimulus pairing protocol. The Davis citation is not applicable here because that study was a contextual, not cued, fear conditioning paradigm. Regarding Krabbe, the pairing protocol was radically different from what the authors used. Their conditioned stimulus was a train of tone pips presented at 0.9 Hz, which lasted 30 s, after which the unconditioned stimulus was presented after tone offset. The authors should determine how their network behaves when this protocol is used. Also, note that basolateral amygdala responses to tone stimuli are primarily brief onset responses (e.g. Quirk, Armony, and LeDoux 1997), and not the tonic activation used in the model.
  
  We replied to this point in our responses to the Reviewer’s Public Comments as follows:
  
  “We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like
  
  Poisson.”
  
  Current answer to the Reviewer:
  
  There are several distinct issues raised by the Reviewer in the more detailed critique. We respectfully disagree that the model is not applicable to context-dependent fear learning where the context acts as a CS, though we should have been more explicit. Specifically, our CS input can describe both the cue and the context. We included the following text in the Results section “Interneuron rhythms provide the fine timing needed for depression-dominated STDP to make the association between CS and fear”:
  
  “In our simulations, the CS input describes either the context or the cue in contextual and cued fear conditioning, respectively. For the context, the input may come from the hippocampus or other non-sensory regions, but this does not affect its role as input in the model.”
  
  The second major issue is whether the specific training protocols used in the cited papers need to be exactly reproduced in the signals received by the elements of our model; we note that there are many transformations that can occur between the sensory input and the signals received by the BLA. In the case of auditory fear conditioning, a series of pips, rather than individual pips, are considered the CS (e.g., (Stujenske et al., 2014; Krabbe et al. 2019)). Our understanding is that a single pip does not elicit a fear response; a series of pips is required for fear learning. This indicates that it is not the neural code of a single pip that matters, but rather the signal entering the amygdala that incorporates any history-dependent signaling that could lead to spiking throughout the sequence of pips. Also, as mentioned above, intense inputs at frequencies about 6kHz and 12kHz can lead to metabotropic effects that last much longer than each brief pip (~200 ms), thus possibly producing continuous activity in neurons encoding the input. Thus, we believe that our use of the Poisson spike train is reasonable.
  
  However, we are aware that the activity of neurons encoding CS can be modulated by the pips: neurons encoding auditory CS display a higher firing rate when each pip is presented and a Poisson-like spike train between pips (Herry et al., Journal of Neuroscience, 2007). Here we confirm that potentiation is present even in the presence of the fast transient response elicited by the pips. We said in the original manuscript that there is learning for a Poisson spike train CS input at ~50 Hz; this describes the neuronal activity in between pips. For the revision, we asked whether learning is preserved when CS is characterized by higher frequencies, which would describe the CS during and right after each pip. We show in the new Fig. S4 that potentiation is ensured for a range of CS frequencies. The figure shows the learning speed as a function of CS and US frequencies. For all the CS frequencies considered, i) there is learning, ii) learning speed increases with CS frequency. Thus, potentiation is present even when pips elicit a faster transient response.
  
  To better specify this in the manuscript,
  
  We added the following sentences in the Results section “With the depressiondominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”:
  
  “We note that the CS and US inputs modeled as independent Poisson spike trains represent stimuli with no structure. Although we have not explicitly modeled pulsating pips, as common in auditory fear conditioning (e.g., (Stujenske 2014; Krabbe 2019)), we show in Fig. S4 that potentiation can be achieved over a relatively wide range of gamma frequencies. This indicates that overall potentiation is ensured if the gamma frequency transiently increases after the pip.”
  
  We added the section “The full network potentiates for a range of CS frequencies“ and figure S4 in the Supplementary Information:
  
  We included in Materials and Methods “Conditioned and unconditioned stimuli” the following sentences:
  
  “Finally, for Fig.S4, we considered a range of frequencies for the CS stimulus. To generate the three Poisson spike trains with average frequencies from 48 to 64 Hz in Fig. S4, we set 𝜆 = 800, 1000, 1200.”
  
  Finally, to address the comment about the need for CS and US overlapping in time to instantiate fear association, we added the following text in the Results section “Assumptions and predictions of the model”:
  
  “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US co-terminates with CS (e.g., (Lindquist et al., 2004)), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs exist, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect due to metabotropic effects (Whittington et al., Nature, 1995) as suggested above, or by the contribution from other brain regions (see section “Involvement of other brain structures” in the Discussion). The fact that plasticity occurs with US memory trace is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature.”
  
  (3) As best as I could tell, only a single training trial was used in this study. Fair enough, especially given that fear learning can occur with a single trial. However, most studies of amygdala fear conditioning have multiple trials (~5 or more). How does the model perform when multiple trials are given?
  
  The association between CS and fear acquired after one trial, i.e., through a potentiated ECS to F connection, is preserved in the presence of multiple trials. Indeed, the association would be weakened or erased (through depression of the ECS to F connection) only if ECS and F did not display good fine timing, i.e., F does not fire right after ECS most of the time. However, the implemented circuit supports the role of interneurons in providing the correct fine timing, thus preventing the association acquired from being erased.
  
  In the second paragraph of the Results section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”, we made the above point by adding the following text:
  
  “We note that once the association between CS and fear is acquired, subsequent presentations of CS and US do not weaken or erase it: the interneurons ensure the correct timing and pauses in ECS and F activity, which are conducive for potentiation.”
  
  (4) The LFP calculations are problematic. First, it is unclear how they were done. Did the authors just take the transmembrane currents they included and sum them, or were they scaled by distance from the 'electrode' and extracellular conductivity (as one would derive from the Laplace equation)? Presumably, the spatial arrangement of model neurons was neglected so distance was not a factor.
  
  Second, if this is the case, then the argument for excluding GABAergic conductances seems flawed. If the spatial arrangement of neurons is relevant to whether to include or exclude GABAergic conductances, then wouldn't a simulation without any spatial structure not be subject to the concern of laminar vs. nuclear arrangement?
  
  Moreover, to the best I can tell, the literature the authors use to justify the exclusion of
  
  GABAergic currents does not make the case for a lack of GABAergic contribution in non-laminar structures. Instead, those studies only argue that in a non-laminar structure, AMPA currents are detectable, not that GABA cannot be detected. Thus, the authors should either include the GABAergic currents when calculating their simulated LFP, or provide a substantially better argument or citation for their exclusion.
  
  We thank the Reviewer for pointing this out; this comment helped us rethink how to model the LFP. The origin of the LFP signal in BLA has not been fully determined, but factors thought to be important include differences in the spatial extension of the arborization in excitatory and inhibitory neurons, in the number of synaptic boutons, and spatial distributions of somata and synapses (Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). In the first version of the manuscript, we excluded the GABAergic currents because it is typically assumed that they add very little to the extracellular field as the inhibitory reversal potential is close to the resting membrane potential. For the revision, we re-ran the simulations during pre and post fear conditioning and we modeled the LFP as the sum of the AMPA, GABA and NaP-/H-/D- currents. With this new version of the LFP, we added a new Fig. 6 showing that there is a significant increase in the low theta power, but not in the high theta power, with fear learning (Fig. 6 C, D, E). This increase in the low theta power was mainly due to the AMPA currents created by the newly established connection from ECS to F, which allowed F to be active after fear conditioning in response to CS.
  
  However, as the Reviewer mentioned, our network has no spatial extent: neurons are modeled as point cells. Thus, our current model does not include the features necessary to model some central aspects of the LFP. Despite that, our model does clearly demonstrate how rhythmic activity in the spike timing of neurons within the network changes due to fear learning (Fig. 6B). The spiking outputs of the network are key components of the inputs to the LFP, and thus we expect the rhythms in the spiking to be reflected in more complex descriptions of the LFP. But we also discovered that different LFP proxies provide different changes in rhythmic activity comparing pre- and post-fear learning; although we have no principled way to choose a LFP proxy, we believe that the rhythmic firing is the essential finding of the model.
  
  We have added the following to the manuscript:
  
  (1) In the new version of Fig. 6, we present the power spectra of the network spiking activity (panel B), along with the power spectra of the LFP proxy that includes the GABA, AMPA, and NaP-/H-/D- currents (panels C, D, E).
  
  (2) We modified the conclusion of the Results section entitled “Increased low-theta frequency is a biomarker of fear learning” by saying:
  
  “In this section, we explore how plasticity in the fear circuit affects the network dynamics, comparing after fear conditioning to before. We first show that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also show that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase and no significant variation in the high theta power (Fig. 6 C,D,E). These results reproduce the experimental findings in (Davis et al., 2017), and (Davis et al., 2017), and Fig 6 F,G show that the low theta increase is due to added excitation provided by the new learned pathway. The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation. Nevertheless, although both the AMPA and GABA currents contribute to the power increase in the low theta frequency range (Fig. 6F), the AMPA currents show a dramatic power increase relative to the baseline (the average power ratio of AMPA and GABA post- vs pre-conditioning across 20 network realizations is 3*103 and 4.6, respectively). This points to the AMPA currents as the major contributor to the low theta power increase. Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6G). Finally, the increase in power is in the low theta range because ECS and F are allowed to spike only during the active phase of the low theta spiking VIP neurons. We have also explored another proxy for the LFP (see Supplementary Information and Fig. S6).”
  
  In the Supplementary Information, we included a figure and some text in the new section entitled “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”:
  
  “Given that our BLA network comprises a few neurons described as single-compartment cells with no spatial extension and location, the LFP cannot be computed directly from our model’s read-outs. In the main text, we choose as an LFP proxy the linear sum of the AMPA, GABA, and P-/H-/D-currents. We note that if the LFP is modeled as the sum of the absolute value of the currents, as suggested by (Mazzoni et al. 2008; Mazzoni et al. 2015), an even higher low theta power increase arises after fear conditioning compared to the linear sum. Differences in the power spectra also arise if other LFP proxies (e.g., only AMPA currents, only GABA currents) are considered. A principled description of an LFP proxy would require modeling the three-dimensional BLA anatomy, including that of the interneurons VIP and SOM; this is outside the scope of the current paper. (See (Feng et al. 2019) for a related project in the BLA.)”
  
  (3) We updated the Materials and Methods section “Local field potentials and spectral analysis” to explain how we compute the LFP in the revised manuscript:
  
  “We considered as an LFP proxy as the linear sum of all the AMPA, GABA, NaP, D, and H currents in the network. The D-current is in the VIP interneurons, and NaP-current and H-current are in SOM interneurons.”
  
  Although it is beyond the scope of the current work, an exploration of the most accurate proxy of the LFP in the amygdala is warranted. Such a study could be accomplished by adopting a similar approach as in (Mazzoni et al., 2015), where several LFP proxies based on point-neuron leaky-integrate and fire neuronal network were compared with a “groundtruth” LFP obtained in an analogous realistic three-dimensional network model.
  
  To explicitly mention this issue in the paper, we add a paragraph in the “Limitations and caveats” section in the Discussion, which reads as follows:
  
  “LFPs recorded in the experiments are thought to be mainly created by transmembrane currents in neurons located around the electrode and depend on several factors, including the morphology of the arborization of contributing neurons and the location of AMPA and GABA boutons (Katzner et al. 2009; Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). Since our model has no spatial extension, we used an LFP proxy; this proxy was shown to reflect the rhythmic output of the network, which we believe to be the essential result (for more details see Results “Increased low-theta frequency is a biomarker of fear learning”, and Supplementary Information “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”).”
  
  (4)     We have removed the section “Plasticity between fear neuron and VIP slows down overall potentiation” in Results and sections “Plasticity between the fear neuron (F) and VIP slows down overall potentiation” and “Plastic F to VIP connections further increase lowtheta frequency power after fear conditioning” in the Supplementary Information. This material is extraneous since we are using a new proxy for LFP.
  
  Minor points:
  
  (1) In Figure 3C, the y-axis tick label for 0.037 is written as "0.37."
  
  We thank the reviewer for finding this typo; we fixed it.
  
  (2) Figure 5B is unclear. It seems to suggest that the added ECS and F neurons did not respond to either the CS or UCS. Is this true? If so, why include them in the model? How would their inclusion change the model behavior?
  
  It is correct that the added ECS and F neurons did not respond to the CS or US (UCS); they are constructed to be firing at 11 Hz in the absence of any connections from other cells. These cells were included to be part of our computation of the LFP. Specifically, adding in those cells would make the LFP take inhibition into account more, and we wanted to make sure that were not biasing our computation away from the effects of inhibition. As shown in the paper (Fig. 6B), even with inhibition onto these non-responsive cells, the LFP has the properties claimed in the paper concerning the changes in the low theta and high-theta power, because the LFP is dominated by new excitation rather than the inhibition.
  
  First, in the Results section “Network with multiple heterogeneous neurons can establish the association between CS and fear”, we commented on the added ECS and F neurons that do not respond to either CS or US by saying the following:
  
  “The ECS cells not receiving CS are inhibited by ongoing PV activity during the disinhibition window (Fig. 5B); they are constructed to be firing at 11 Hz in the absence of any connections from other cells. The lack of activity in those cells during fear conditioning implies that there is no plasticity from those ECS cells to the active F. Those cells are included for the calculation of the LFP (see below in “Increased low-theta frequency is a biomarker of fear learning”.)”
  
  Furthermore, we add the following sentence in the Results section “Increased low-theta frequency is a biomarker of fear learning”:
  
  “The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation.”
  
  (3) Applied currents are given as current densities, but these are difficult to compare with current levels observed from whole-cell patch clamp recordings. Can the currents be given as absolute levels, in pA/nA.
  
  In principle, it is possible to connect current densities with absolute levels, as requested. However, we note that the number of cells in models is orders of magnitude smaller than the number being modeled. It is common in modeling to adjust physiological parameters to achieve the qualitative properties that are important to the model, rather than trying to exactly match particular recordings.
  
  We added to the Methods description why we choose units per unit area, rather than absolute units.
  
  “All the currents are expressed in units per area, rather than absolute units, to avoid making assumptions about the size of the neuron surface.”
  
  (4) Regarding: "We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. However, the high theta rhythm they produce is not crucial to the plasticity: in our model, high theta or higher frequency rhythms in SOM cells are all conducive to associative fear learning. This opens the possibility that the high theta rhythm in the BLA mostly originates in the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022)." The chain of reasoning in the above statement is unclear. The second sentence seems to be saying contradictory things.
  
  We agree that the sentence was confusing; thank you for pointing it out. We have revised the paragraph to make our point clearer. The central points are: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.
  
  We deleted from the discussion the text reported by the Reviewer, and we added the following one to make this point clearer:
  
  “We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. The BLA SOM cells do not necessarily have to be the only source of the high theta observed in the BLA during fear learning; the high theta detected in the LFP of the BLA also originates from the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022).”
  
  (5) Regarding: "This suggests low theta power change is not just an epiphenomenon but rather a biomarker of successful fear conditioning." Not sure this is the right framing for the above statement. The power of the theta signal in the LFP reflects the strengthening of connections, but it itself does not have an impact on network activity. Moreover, whether something is epiphenomenal is not relevant to the question of whether it can serve as a successful biomarker. A biomarker just needs to be indicative, not causal.
  
  We intended to say why the low theta power change is a biomarker in the sense of the Reviewer. That is: experiments have shown that, with learning, the low theta power increases. The modeling shows in addition that, when learning does not take place, the low power does not increase. That means that the low theta power increases if and only if there is learning, i.e., the change in low theta power is a biomarker. To make our meaning clearer, we have changed the quoted sentences to read:
  
  “This suggests that the low theta power change is a biomarker of successful fear conditioning: it occurs when there is learning and does not occur when there is no learning.”
  
  Reviewer #2 (Public Comments):
  
  We thank the Reviewer for raising these interesting points. Below are our public replies and the changes we made to the manuscript to address the Reviewer’s objections.
  
  (1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations.
  
  Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.
  
  In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020). We are not aware of any model description of the mechanisms of theta that do require layers.
  
  We added the following text in the introduction of the manuscript to make this point clearer: “A recent rodent experimental study (Antonoudiou et al. 2022) suggests that BLA can intrinsically generate theta oscillations (3-12 Hz).”
  
  (2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.
  
  Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the response to question 1 above, we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP neurons. Using intrinsic currents known to exist in VIP neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band.
  
  To elaborate more on this in the manuscript, we added the following new section in the discussion:
  
  “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms.
  
  Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”
  
  We also note that the presence of D-currents in the BLA VIP interneurons should be confirmed experimentally, and that the ability of VIP interneurons to generate the BLA low theta rhythm constitutes a prediction of our computational model. These points are specified in the first paragraph in the Discussion entitled “Assumptions and predictions of the model”:
  
  “The interneuron descriptions in the model were constrained by the electrophysiological properties reported in response to hyperpolarizing currents (Sosulina et al., 2010). Specifically, we modeled the three subtypes of VIP, SOM, and PV interneurons displaying bursting behavior, regular spiking with early spike-frequency adaptation, and regular spiking without spike-frequency adaptation, respectively. Focusing on VIP interneurons, we were able to model the bursting behavior by including the D-type potassium current. This current is thought to exist in the VIP interneurons in the cortex (Porter et al., 1998), but whether this current is also found in the VIP interneurons the BLA is still unknown. Similarly, we endowed the SOM interneurons with NaP- and H-currents, as the OLM cells in the hippocampus. Due to these currents, the VIP and SOM cells are able to show low- and high-theta oscillations, respectively. The presence of these currents and the neurons’ ability to exhibit oscillations in the theta range during fear conditioning and at baseline in BLA, which are assumptions of our model, should be tested experimentally.”
  
  (3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.
  
  The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPYcontaining neurogliaform cells.
  
  The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not yet been done.
  
  We elaborate more on this in a new section in the Discussion entitled “Assumptions and predictions of the model”. The paragraph related to this point reads as follows:
  
  “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute the majority of biologically detailed models. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”
  
  (4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).
  
  A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption.
  
  In section “Network connectivity and synaptic currents” in “Materials and Methods” we provided references to motivate our choice of considering a GABA-A reversal potential around -80 mV:
  
  “The GABAa current reversal potential (𝐸!) is set to −80        𝑚𝑉, as common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020).”
  
  (5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.
  
  Other peptides seem to be important in overall modulation of fear, but VIP is especially important in the first part of fear learning, the subject of our paper. Re SST: we hypothesize that SST interneurons are critical in fear extinction and preventing fear generalization, but not to initial fear learning. The peptide of the CCK neurons, which overlap with VIP cells, has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018). Thus, these other peptides are likely more important for other aspects of fear learning.
  
  In the Discussion, we have added:
  
  “We hypothesize that SST peptide is critical in fear extinction and preventing fear generalization, but not to initial fear learning. Also, the CCK peptide has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018).”
  
  Reviewer #2 (Recommendations For The Authors):
  
  We note that Reviewer #2’s Recommendations For The Authors have the same content as the Public Comments. Thus, the changes to the manuscript we implemented above address also the private critiques listed below.
  
  (1) As the breathing-driven rhythm is a global phenomenon accompanying fear state, one might restrict the analysis to this oscillation. The rationale beyond this restriction is that the 'high' theta in the BLA has an unknown origin (since it can originate from the ventral hippocampus, piriform cortex etc.).
  
  In response to point 4 made by Reviewer 1 (Recommendations for the Authors) (p. 13), referring to high theta in the BLA, we previously wrote: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.
  
  In the Public Critiques, Reviewer 2 relates the respiratory rhythm to the low theta. We answered this point in point 2 of the Reviewer’s Public Comments (at p. 15).
  
  (2) I would include more interneurons in the network model incorporating recent findings.
  
  This point was answered in our response to point 3 of the Reviewer’s Public Comments.
  
  (3) The reversal potential for GABA-A receptor-mediated currents would be good to set to measured values. In addition, I would use AMPA conductance values that have been measured in the BLA.
  
  We addressed this objection in our response to point 4 of the Reviewer’s Public Comments.
  
  Reviewer #3 (Public comments):
  
  Weaknesses:
  
  (1) The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry.
  
  (2) Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing.
  
  3) Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.
  
  We are repeating here the answers we gave in response to the public comments, adding further relevant points.
  
  (1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We can reproduce these electrophysiological properties by using specific membrane currents known to be present in similar neurons in other brain regions (D-current in VIP interneurons in the cortex, and NaP- and H-currents in OLM/SOM cells in the hippocampus). Also, though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.
  
  (2) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. Furthermore, it is correct that the model relies on a strong inhibition from SOM and PV to silence the excitatory projection neurons. We agree that the placement of the SOM inhibition on the pyramidal neurons can make a difference on some aspects of the circuit behavior. We are assuming that the inhibition from the SOM cells can inhibit the pyramidal cells firing, which can be seen as a hypothesis of our model. It is well known that VIP cells disinhibit pyramidal cells through inhibition of SOM and PV cells (Krabbe et al. 2019); hence, this hypothesis is generally believed. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.
  
  Re points 1) and 2), in a new paragraph (“Assumptions and predictions of the model”) in the Discussion reported in response to Reviewer #2 (public comments)’s point 3, we stated that modeling requires the omission of many details to bring out the significance of other details.
  
  (3) 40 seconds is the temporal interval we decided to use to present the results. In the Results, we also showed that there is learning over a shorter interval of time (15 seconds) where CS and US/memory of US should both be present. Thus, our model requires 15 seconds over a single or multiple trials for associative learning to be established. We included references to additional experimental papers to support our reasoning in the last paragraph of section “Assumptions and predictions of the model” in the Discussion, also reported in response to Reviewer #1 point 2 (Recommendations for the Authors). We said there that some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity.
  
  The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.
  
  Our model accounts for the multiple rhythms observed in the context of fear learning, as well as the known involvement of multiple kinds of interneurons. We did not say explicitly enough why our complicated model may be functionally important in ways that cannot be fulfilled with a simpler model with the non depression-dominated Hebbian rule. To explain this, we have added the following in the manuscript discussion:
  
  “Although fear learning can occur without the depression-dominated rule, we hypothesize that it is necessary for other aspects of fear learning and regulation. That is, in pathological cases, there can be overgeneralization of learning. We hypothesize that the modulation created by the involvement of these interneurons is normally used to prevent such overgeneralization. However, this is beyond the scope of the present paper.”
  
  We have also written an extra paragraph about generalization in the Discussion “Synaptic plasticity in our model”:
  
  “With the classical Hebbian plasticity rule, we show that learning can occur without the involvement of the VIP and SOM cells. Although fear learning can occur without the depressiondominated rule, we hypothesize that the latter is necessary for other aspects of fear learning and regulation. Generalization of learning can be pathological, and we hypothesize that the modulation created by the involvement of VIP and SOM interneurons is normally used to prevent such overgeneralization. However, in some circumstances, it may be desirable to account for many possible threats, and then a classical Hebbian plasticity rule could be useful. We note that the involvement or not of the VIP-SOM circuit has been implicated when there are multiple strategies for solving a task (Piet et al., 2024). In our situation, the nature of the task (including reward structure) may determine whether the learning rule is depression-dominated and therefore whether the VIP-SOM circuit plays an important role.”
  
  Reviewer #3 (Recommendations For The Authors):
  
  We thank the Reviewer for all the recommendations. We replied to each of them below.
  
  In general, there are some inconsistencies in the naming (e.g. sometimes you write PV sometimes PV+,...), please use consistent abbreviations throughout the manuscript. You also introduce some of the abbreviations multiple times.
  
  We modified the manuscript to remove all the inconsistencies in the naming.
  
  Introduction:
  
  - In the last section you speak about one recent study but actually cite two articles.
  
  We removed the reference to (Perrenoud and Cardin, 2023), which is a commentary on the Veit et al. article.
  
  Results:
  
  - 'Brain rhythms are thought to be encoded and propagated largely by interneurons' What do you mean by encoded here?
  
  We agree with the Reviewer that the verb “to encode” is not accurate. We modified the sentence as follows:
  
  “Brain rhythms are thought to be generated and propagated largely by interneurons”.
  
  - The section 'Interneurons interact to modulate fear neuron output' could be clearer. Start with describing the elements of the circuit, then the rhythms in the baseline.
  
  We reorganized the section as follows:
  
  “Interneurons interact to modulate fear neuron output. Our BLA network consists of interneurons, detailed in the previous section, and excitatory projection neurons (Fig. 2A). Both the fear-encoding neuron (F), an excitatory projection neuron, and the VIP interneuron are activated by the noxious stimulus US (Krabbe et al., 2019). As shown in Fig. 2A (top, right), VIP disinhibits F by inhibiting both SOM and PV, as suggested in (Krabbe et al., 2019). We do not include connections from PV to SOM and VIP, nor connections from SOM to PV and VIP, since those connections have been shown to be significantly weaker than the ones included (Krabbe et al., 2019). The simplest network we consider is made of one neuron for each cell type. We introduce a larger network with some heterogeneity in the last two sections of the Results.
  
  Fig. 2A (bottom) shows a typical dynamic of the network before and after the US input onset, with US modeled as a Poisson spike train at ~50 Hz; the network produces all the rhythms originating from the interneurons alone or through their interactions with the excitatory projection neurons (shown in Fig. 1). Specifically, since VIP is active at low theta during both rest and upon the injection of US, it then modulates F at low theta cycles via SOM and PV. In the baseline condition, the VIP interneuron has short gamma bursts nested in low theta rhythm. With US onset, VIP increases its burst duration and the frequency of low theta rhythm. These longer bursts make the SOM cell silent for long periods of each low theta cycle, providing F with windows of disinhibition and contributing to the abrupt increase in activity right after the US onset. Finally, in Fig. 2A, PV lacks any external input and fires only when excited by F. Thanks to their reciprocal interactions, PV forms a PING rhythm with F, as depicted in Fig.1C.”
  
  - Figure 3C: The lower dashed line has the tick label '0.37' which should read '0.037'.
  
  We fixed it.
  
  - The section describing the network with multiple neurons could be clearer, especially, it is not really clear how these different ECS and F neurons receive their input.
  
  We answered the same objection in the reply to Reviewer #1 in point 2 under “minor issues.”
  
  Discussion:
  
  - The paragraph 'It has also been suggested that ventral tegmental area has a role in fear expression (Lesas et al.,2023). Furthermore, it has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022).' is merely stating facts but I don't see how they relate to the presented work.
  
  We thank the Reviewer for pointing out that this was confusing. What we meant to emphasize was that later stages of fear conditioning and extinction appear to require more than the BLA. We specifically mention the discrimination of non-threatening cues at the end of the paragraph, which now reads as follows:
  
  “Other brain structures may be involved in later stages of fear responsiveness, such as fear extinction and prevention of generalization. It has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022). Brain structures such as the prefrontal cortex and hippocampus have been documented to play a crucial role also in fear extinction, the paradigm following fear conditioning aimed at decrementing the conditioned fearful response through repeated presentations of the CS alone. As reported by several studies, fear extinction suppresses the fear memory through the acquisition of a distinct memory, instead of through the erasure of the fear memory itself (Harris et al., 2000; Bouton, 2002; Trouche et al., 2013; Thompson et al., 2018). Davis et al., 2017 found a high theta rhythm following fear extinction that was associated with the suppression of threat in rodents. Our model can be extended to include structures in the prefrontal cortex and the hippocampus to further investigate the role of rhythms in the context of discrimination of non-threatening cues and extinction. We hypothesize that a different population of PV interneurons plays a crucial role in mediating competition between fearful memories, associated with a low theta rhythm, and safety memories, associated with a high theta rhythm; supporting experimental evidence is in (Lucas et al., 2016; Davis et al., 2017; Chen et al., 2022).”
  
  - The comparison to other models BLA is quite short and seems a bit superficial. A more indepth comparison seems warranted.
  
  We thank the reviewer for suggesting that a more in-depth comparison between our and other models in the literature would improve the manuscript. We rewrote entirely the first paragraph of that section. The new content reads as follows:
  
  “Comparison with other models. Many computational models that study fear conditioning have been proposed in the last years; the list includes biophysically detailed models (e.g., (Li 2009; Kim et al., 2013a)), firing rate models (e.g., Krasne 2011; Ball 2012; Vlachos 2011), and connectionist models (e.g., Moustafa 2013; Armony 1997; Edeline 1992) (for a review see (Nair et al., 2016)). Both firing rate models and connectionist models use an abstract description of the interacting neurons or regions. The omission of biophysical details prevents such models from addressing questions concerning the roles of dynamics and biophysical details in fear conditioning, which is the aim of our model. There are also biophysically detailed models (Li 2009; Kim 2013; Kim 2016; Feng 2019), which differ from ours in both the physiology included in the model and the description of how plastic changes take place. One main difference in the physiology is that we differentiated among types of interneurons, since the fine timing produced for the latter was key to our use of rhythms to produce spike-time dependent plasticity. The origin of the gamma rhythm (but not the other rhythms) was investigated in Feng et al 2019, but none of these papers connected the rhythms to plasticity.
  
  The most interesting difference between our work and that in (Li 2009; Kim 2013; Kim 2016) is the modeling of plasticity. We use spike-time dependent plasticity rules. The models in (Li 2009; Kim 2013; Kim 2016) were more mechanistic about how the plasticity takes place, starting with the known involvement of calcium with plasticity. Using a hypothesis about back propagation of spikes, the set of papers together come up with a theory that is consistent with STDP and other instantiations of plasticity (Shouval 2002a; Shouval 2002b). For the purposes of our paper, this level of detail, though very interesting, was not necessary for our conclusions. By contrast, in order for the rhythms and the interneurons to have the dynamic roles they play in the model, we needed to restrict our STDP rule to ones that are depression-dominated. Our reading of (Shouval 2002) suggests to us that such subrules are possible outcomes of the general theory. Thus, there is no contradiction between the models, just a difference in focus; our focus was on the importance of the much-documented rhythms (Seidenbecher et al., 2003; Courtin et al., 2014b; Stujenske et al., 2014; Davis et al., 2017) in providing the correct spike timing. We showed in the Supplementary Information (“Classical Hebbian plasticity rule, unlike the depression-dominated one, shows potentiation even with no strict pre and postsynaptic spike timing”) that if the STDP rule was not depression dominated, the rhythms need not be necessary. We hypothesize that the necessity of strict timing enforced by the depression-dominated rule may foster the most appropriate association with fear at the expense of less relevant associations.”
  
  - The paragraph 'This could happen among some cells responding to weaker sensory inputs that do not lead to pre-post timing with fear neurons. This timing could be modified by the "triconditional rule", as suggested in (Grewe et al., 2017).' is not very clear. What exactly is 'this' in the first sentence referring to? If you mention the 'tri-conditional rule' here, please briefly explain it and how it would solve the issue at hand here.
  
  We apologize that the sentence reported was not sufficiently clear. “This” refers to “depression”. We meant that, in our model, depression during fear conditioning happens every time there is no pre-post timing between neurons encoding the neutral stimuli and fear cells; poor pre-post timing can characterize the activity of neurons responding to weaker sensory inputs and does not lead to associative learning. We modified that paragraph as follows:
  
  “The study in (Grewe et al., 2017) suggests that associative learning resulting from fear conditioning induces both potentiation and depression among coactive excitatory neurons; coactivity was determined by calcium signaling and thus did not allow measurements of fine timing between spikes. In our model, we show how potentiation between coactive cells occurs when strict pre-post spike timing and appropriate pauses in the spiking activity arise. Depression happens when one or both of these components are not present. Thus, in our model, depression represents the absence of successful fear association and does not take part in the reshaping of the ensemble encoding the association, as instead suggested in (Grewe et al., 2017). A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al. 2017), in which the potential role of neuromodulators is taken into account in addition to the pre- and postsynaptic neuron activity; this may lead to both potentiation and depression in establishing an associative memory.”
  
  - In the limitations and caveats section you mention that the small size of the network implies that they represent a synchronous population. What are the potential implications for the proposed rhythm-dependent mechanism? What are your expectations for larger networks?
  
  We apologize if we were not adequately clear. We are guessing that the Reviewer thought we meant the entire population was synchronous, which it is not. We meant that, when we use a single cell to represent a subpopulation of cells of that type, that subpopulation is effectively synchronous. For larger networks in which each subtype is represented by many cells, there can be heterogeneity within each subtype. We have shown in the paper that the basic results still hold under some heterogeneity; however, they may fail if the heterogeneity is too large.
  
  We mentioned in a new section named “Assumptions and predictions of the model” in response to point 3 made by Reviewer #2.
  
  - The discussion is also missing a section on predictions/new experiments that can be derived from the model. How can the model be confirmed, what experiments/results would break the model?
  
  To answer this question, we put in a new section in the Discussion entitled “Assumptions and predictions of the model”. The first paragraph of this section is in the reply to Reviewer #2 point 2; the second paragraph is in the reply to Reviewer #2 point 3; the last paragraph is in the Reply to Reviewer #1 point c; the rest of the section reads as follows:
  
  “Our study suggests that all the interneurons are necessary for associative learning provided that the STDP rule is depression-dominated. This prediction could be tested experimentally by selectively silencing each interneuron subtype in the BLA: if the associative learning is hampered by silencing any of the interneuron subtypes, this validates our study. Finally, the model prediction could be tested indirectly by acquiring more information about the plasticity rule involved in the BLA during associative learning. We found that all the interneurons are necessary to establish fear learning only in the case of a depression-dominated rule. This rule ensures that fine timing and pauses are always required for potentiation: interneurons provide both fine timing and pauses to pyramidal cells, making them crucial components of the fear circuit.
  
  The modeling of the interneurons assumes the involvement of various intrinsic currents; the inclusion of those currents can be considered hypotheses of the model. Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.28.538604v3
www.biorxiv.org www.biorxiv.org

New submission 10/10/2023, 08:57:19

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  A summary of what the authors were trying to achieve.
  
  The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.
  
  An account of the major strengths and weaknesses of the methods and results.
  
  Strengths
  
  Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.
  
  Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.
  
  Weaknesses
  
  Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.
  
  In accordance with the reviewer’s suggestion, we provided the composition of the T cell subset per subject (all 8 subjects) in the revised manuscript (shown below).
  
  Author response image 1.
  
  S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.
  
  We thank the reviewer for raising the question related to our experimental strategy. We chose this method because a background unrelated to S protein was lower than widely used AIM methods, which is verified by reconstituting many TCRs and testing the responses in vitro. One more reason is this method can identify S-reactive functional (proliferative) T cell clonotypes than anergic or less-responsive T cells as the reviewer mentioned, which is our objective in this study. In accordance with the reviewer’s suggestion, we have carefully described our limitation and rationale of our experimental strategy in the revised manuscript.
  
  Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.
  
  As the reviewer mentioned, the pre-existing highly expanded clonotypes that we analyzed did not react to HCoV-derived peptides. After we determined the epitopes of the clonotypes, the S peptide sequences were analyzed for homology in HCoVs. The only two clonotypes whose epitope sequences were relatively conserved in HCoV strains (clonotypes #8-pre_9 and #8-pre_10) were tested for their reactivity to the similar HCoV epitope counterparts, but no activation was observed (shown below). We added these data in the revised manuscript.
  
  Author response image 2.
  
  As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.
  
  As the reviewer mentioned, some pre-existing S-reactive T cells might appear to react with S peptides judging from the NFAT-GFP expression of their reporter cell lines. However, the percentage of GFP-expressing cells is affected by many factors such as TCR expression level and HLA molecule expression level. Thus, the affinity of pre-existing S-reactive T cells was not fully deduced from the activation of reporter cell lines as shown in Fig. S3 in the present manuscript. We thank the reviewer for this constructive suggestion, but we therefore decided not to use these data quantitatively to evaluate affinity in this manuscript.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  A short-term comparison of durability of S antibody levels after 2-dose vaccination, showing that better or more poorly sustained responses correlate with the presence of Tfh cells.
  
  Strengths:
  
  Novelty of approach in expanding, sequencing and expressing TCRs for functional studies from the implicated populations.
  
  Weaknesses:
  
  Somewhat outdated question, short timeline, small numbers, over-interpretation of sequence homology data
  
  Reviewer #2 (Recommendations For The Authors):
  
  In line with my above comments, it might be useful for the authors to look at moderating some of the assertions in what is a rather small-scale descriptive account of correlates of some quite nuanced, short-term, S antibody response differences
  
  We clearly described that some homologous microbe-derived peptides were indeed recognized by S-reactive T cells. Also, we have removed our overstatement from the revised manuscript.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.
  
  The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:
  
  (1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.
  
  In accordance with the reviewer’s request, we have also analyzed the T cells separately (shown below). We observed the average frequency was much lower in decliners than sustainers, while the difference did not reach statistical significance partly because of the large deviation due to one sustainer (#27) who possessed quite a high Tfh%. We modified our description in the revised manuscript.
  
  Author response image 3.
  
  (2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.
  
  We agree with the reviewer’s concern. Since antigen stimulation only induced the proliferation of antigen-specific T cells, the multiple clusters were mostly due to the fluctuation of cell cyclerelated genes. We therefore carefully and manually annotated these clusters by selecting the cell type-related genes (Kaech et al, Nat. Rev. Immunol., 2002; Sallusto et al, Annu Rev Immunol., 2004) and determined their subsets regardless of the automatic clustering based on the whole transcriptome. Indeed, antigen-responded Tfh and Treg are close, as ICOS and PDCD1 are expressed. We mainly used IL21 and FOXP3 to distinguish the Tfh and Treg populations, respectively. We thank the reviewer for pointing out this important process that we carefully addressed. We added the description of annotation methods to the revised manuscript.
  
  (3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.
  
  We thank the reviewer for raising this important issue. As the reviewer pointed out, culturing T cells for 10 days indeed changed the repertoire and features, so the Tfh clonotypes we detected after the expansion may not correspond to the cTfh clonotypes in vivo. Because our observation and analysis were mostly based on the dominant T cell clonotypes expanded in vitro, we modified our description and conclusion accordingly in the revised manuscript.
  
  (4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.
  
  We thank the reviewer for pointing out our inaccuracy. As the reviewer suggested, we used percentages to demonstrate the relative abundance of each clonotype in Fig. 4B of the revised manuscript.
  
  (5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.
  
  We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have the exactly same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in the rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Related to my public review 1. To make a solid conclusion, I think the author can include more sustainers and decliners if possible, can just stimulate their PBMCs for 10 days and check the Tfh features in proliferated CD4 T cells (e.g. IL21 secretion, PD-1 expression etc). And then compare these values in sustainers vs decliners
  
  We thank the reviewer for the suggestion. Unfortunately, additional PBMCs from more sustainers and decliners are not available to us. Instead, we carefully described the current observation in the revised manuscript.
  
  (2) Related to my public review 3. The author can attempt to sort CXCR5+ cTfh and CXCR5- non cTfh, stimulate in vitro for 10 days and compare whether the stimulated cTfh still have more Tfh-related features such as increased IL- 21 secretion.
  
  As the reviewer recommended, sorting and culturing the cTfh and non cTfh separately will clarify this issue. Due to the limitation of the samples, we could not perform these experiments.
  
  (3) I couldn't find information about the availability of data and code to analyze the single cell RNA-seq dataset in the manuscript
  
  We clarified the availability of data and added the codes for the single cell RNA-seq dataset in the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.06.543529v3
www.biorxiv.org www.biorxiv.org

ProteasomeID: quantitative mapping of proteasome interactomes and substrates for in vitro and in vivo studies

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Specific comments to improve the quality of the work:
  
  (1) The choice of subunits to tag are really not ideal. In the available structures of the human proteasome, The C-terminus of Rpn3/PSMD3 points directly toward the ATPase pore and is likely to disrupt the structure and/or dynamics of the proteasome during proteolysis (see comments regarding controls for functionality below). Similarly, the C-terminal tail of Rpt1/PSMC2 has a key role in the opening of the 20S core particle gate for substrate translocation and processing (see 2018 Nature Communications, 9:1360 and 2018 Cell Reports 24:1301-1315), and Alpha3/PSMA4 can be substituted by a second copy of Alpha4/PSMA7 in some conditions (although tagging Alpha3/PSMA4 would admittedly provide a picture of the canonical proteasome interactome while actively excluding the interactome of the non-canonical proteasomes that form via replacement of Alpha3/PSMA4). Comparison of these cell lines with lines harboring tags on subunits that are commonly used for tagging in the field because of a lack of impacts, such as the N-terminus of Rpn1/PSMD2, the C-terminus of Rpn11/PSMD14, and the C-terminus of Beta4/PSMB2 would help instill confidence that the interactome reported largely arises from mature, functional proteasomes rather than subcomplexes, defective proteasomes, or other species that may occur due to tagging at these positions.
  
  We thank the reviewer for pointing this out. The original purpose of our strategy was to establish proximity labeling of proteasomes to enable applications both in cell culture and in vivo. The choice of PSMA4 and PSMC2 was dictated by previous successful tagging with GFP in mammalian cells (Salomons et al., Exp Cell Res 2010)(Bingol and Schuman, Nature 2006). However, the choice of C-terminal PSMC2 might have been not optimal. HEK293 cells overexpressing PSMC2-BirA show slower growth and the BioID data retrieve higher enrichment of assembly factors suggesting slower assembly of this fusion protein in proteasome. Although we did not observe a negative impact on overall proteasome activity and PSMC2-BirA was (at least in part) incorporated into fully assembled proteasomes as indicated by enrichment of 20S proteins.We apologize for not making it clear that we labeled the N-terminus of PSMD3/Rpn3 and not the C-terminus (Figure 1a and S1a). Therefore, we included in Figure S1a of the revised manuscript structures of the proteasome where the tagged subunit termini are highlighted: C-terminus for PSMA4 and PSMC2 and N-terminus for PSMD3. Additionally, we would like to point out that, differently from PSMC2-BirA, cells expressing BirA-PSMD3 did not show slower growth, and BioID data showed a more homogenous enrichment of both 19S and 20S proteins, as compared to PSMC2-BirA (Figure 1D and 1E). However, the overall level of enrichment of proteasome subunits was not comparable to PSMA4-BirA and, therefore, we opted for focusing the rest of the manuscript on this construct.
  
  In support of this point, the data provided in Figure 1E in which the change in the abundances of each proteasome subunit in the tagged line vs. the BirA control line demonstrates substantial enrichment of the subcomplexes of the proteasome that are tagged in each case; this effect may represent the known feedback-mediated upregulation of new proteasome subunit synthesis that occurs when proteasomal proteolysis is impaired, or alternatively, the accumulation of subcomplexes containing the tagged subunit that cannot readily incorporate into mature proteasomes. Acknowledging this limitation in the text would be valuable to readers who are less familiar with the proteasome.
  
  We would like to clarify that the data shown in Figure 1E do not represent whole proteome data, but rather log2 fold changes vs. BirA* control calculated on streptavidin enrichment samples. The differences in the enrichment of the various subcomplexes between cell lines derives from the fact that the effect size of the enrichment depends on both protein abundance in the isolated complexes, but also on the efficiency of biotinylation. The latter will be higher for proteins located in closer proximity to the bait. A similar observation was pointed out in a recent publication (PMID:36410438) that compared BioID and Co-IP for the same bait. When a component of the nuclear pore complex (Nup158) was analyzed by BioID only the more proximal proteins were enriched as compared to the whole complex in Co-IP data (Author response image 1):
  
  Author response image 1.
  
  Proteins identified in the NUP158 BioID or pulldown experiments are filled in red or light red for significance intervals A or B, respectively. The bait protein NUP158 is filled in yellow. Proteins enriched in the pulldown falling outside the SigA/B cutoff are filled in gray. NPC, nuclear pore complex. SigA, significant class A; SigB, significant class B. Reproduced from Figure 6 of PMID: 36410438.
  
  However, we would like to point out that despite quantitative differences between different proteasome subunits, both 19S and 20S proteins were found to be strongly enriched (typically >2 fold) in all the constructs compared to BirA* control line (Figure 1E). This indicates that at least a fraction of all the tagged subunits are incorporated into fully assembled proteasomes.
  
  Regarding the upregulation of proteasome subunits as a consequence of proteasome dysfunction, we did not find evidence of this, at least in the case of PSMA4. The immunoblot shown in Figure 2A and its quantification in S3A indicate no increased abundance of endogenous PSMA4 upon tetracycline induction of PSMA4-BirA*.
  
  (2) The use of myc as a substrate of the proteasome for demonstration that proteolysis is unaffected is perhaps not ideal. Myc is known to be degraded via both ubiquitin-dependent and ubiquitin-independent mechanisms, such that disruption of one means of degradation (e.g., ubiquitin-dependent degradation) via a given tag could potentially be compensated by another. A good example of this is that the C-terminal tagging of PSMC2/Rpt1 is likely to disrupt interaction between the core particle and the regulatory particle (as suggested in Fig. 1D); this may free up the core particle for ubiquitin-independent degradation of myc.
  
  Aside from using specific reporters for ubiquitin-dependent vs. independent degradation or a larger panel of known substrates, analysis of the abundance of K48-ubiquitinated proteins in the control vs. tag lines would provide additional evidence as to whether or not proteolysis is generally perturbed in the tag lines.
  
  We thank the reviewer for this suggestion. We have included an immunoblot analysis showing that the levels of K48 ubiquitylation (Figure S3d) are not affected by the expression of tagged PSMA4.
  
  (3) On pg. 8 near the bottom, the authors accidentally refer to ARMC6 as ARMC1 in one instance.
  
  We have corrected the mistake.
  
  (4) On pg. 10, the authors explain that they analyzed the interactome for all major mouse organs except the brain; although they explain in the discussion section why the brain was excluded, including this explanation on pg. 10 here instead of in the discussion might be a better place to discuss this.
  
  We moved the explanation from the discussion to the results part.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Perhaps the authors can quantify the fraction of unassembled PSMA4-BirA* from the SEC experiment (Fig. 2b) to give the readers a feeling for how large a problem this could be.
  
  The percentages based on Area Under the Curve calculations have been added to Figure S3b.
  
  (2) Do the authors observe any difference in the enrichment scores between proteins that are known to interact with the proteasome vs proteins that the authors can justify as "interactors of interactors" vs the completely new potential interactors? This could be an interesting way to show that the potential new interactors are not simply because of poor false positive rate calibration, but that they behave in the same way as the other populations.
  
  We thank the reviewer for this suggestion. We analyzed the enrichment scores for 20S proteasome subunits, known PIPs, first neighbors and the remaining enriched proteins. The remaining proteins (potential new interactors) have very similar scores as the first neighbors of known interactors. This plot has been added to Figure S3g.
  
  (3) Did the authors try to train a logistic model for the miniTurbo experiments, like it was done for the BirA* experiments? Perhaps combining the results of both experiments would yield higher confidence on the proteasome interactors.
  
  Following the reviewers suggestion, we applied the classifier on the dataset of the comparison between miniTurbo and PSMA-miniTurbo. We found a clear separation between the FPR and the TPR with 136 protein groups enriched in PSMA-miniTurbo. We have added the classifier and corresponding ROC curve to Figure S4f and S4g.
  
  75 protein groups were found to be enriched for both PSMA4-BirA* and PSMA4-miniTurbo (Author response image 2), including the proteasome core particles, regulatory particles, known interactors and potential new interactors. As we focused more on the identification of substrates with PSMA4-miniTurbo, we did not pursue these overlapping protein groups further, but rather used the comparison to the mouse model to identify potential new interactors.
  
  Author response image 2.
  
  Overlap between ProteasomeID enriched proteins (fpr<0.05) between PSMA4-BirA* and PSMA4-miniTurbo.
  
  (4) Perhaps this is already known, but did the authors check if MG132 affect proteasome assembly? The authors could for example repeat their SEC experiments in the presence of MG132.
  
  We thank the reviewer for the suggestion, however to our knowledge there are no reports that MG132 has an effect on the assembly of the proteasome. MG132 is one of the most used proteasome inhibitors in basic research and as such has been extensively characterized in the last 3 decades. The small peptide aldehyde acts as a substrate analogue and binds directly to the active site of the protease PSMB5/β5. We therefore think it is unlikely that MG132 is interfering with the assembly of the proteasome.
  
  (5) Minor comment: at the bottom of page 8, the authors probably mean ARMC6 and not ARMC1.
  
  We have corrected the mistake.
  
  (6) It would be interesting to expand the analysis of the already acquired in vivo data to try to identify tissue-specific proteasome interactors. Can the authors draw a four-way Venn diagram with the interactors of each tissue?
  
  We thank the reviewer for this suggestion. We have generated an UpSet plot showing the overlap of ProteasomeID enriched proteins in the four tissues that gave us meaningful results (Author response image 3). In order to investigate whether the observed differences in ProteasomeID enriched proteins could be meaningful in terms of proteasome biology, we have highlighted proteins belonging to the UPS that show tissue specific enrichments. We found proteasome activators such as PSME1/PA28alpha and PSME2/PA28beta to enrich preferentially in kidney and liver, respectively, as well as multiple deubiquitinases to enrich preferentially in the heart. These differences might be related to the specific cellular composition of the different tissues, e.g., number of immune cells present, or the tissue-specific interaction of proteasomes with enzymes involved in the ubiquitin cycle. Given the rather preliminary nature of these findings, we have opted for not including this figure in the main manuscript, but rather include it only in this rebuttal letter.
  
  Author response image 3.
  
  Upset plot showing overlap between ProteasomeID enriched proteins in different mouse organs.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) In the first paragraph of the Introduction, the authors link cellular senescence caused by partial proteasome inhibition with the efficacy of proteasome inhibitors in cancer therapy. Although this is an interesting hypothesis, I am not aware of any direct evidence for this; rather, I believe the efficacy of bortezomib/carfilzomib in haematological malignancies is most commonly attributed to these cells having adapted to high levels of proteotoxic stress (e.g., chronic unfolded protein response activation). I would suggest rephrasing this sentence.
  
  We thank the reviewer for the comment and have amended the introduction.
  
  (2) For the initial validation experiments (e.g., Fig. 1B), have the authors checked what level of Streptavidin signal is obtained with "+ bio, - tet" ? Although I accept that the induction of PSMA4-BirA* upon doxycycline addition is clear from the anti-Flag blots, it would still be informative to ascertain what level of background labelling is obtained without induction (but in the presence of exogenous biotin).
  
  We tested four different conditions +/- tet and +/- biotin (24h) in PSMA4-BirA* cell lines (Author response image 4). As expected, biotinylation was most pronounced when tet and biotin were added. When biotin was omitted, streptavidin signal was the lowest regardless of the addition of tet. Compared to the -biotin conditions, a slight increase of streptavidin signal could be observed when biotin was added but tet was not added. This could be either due to the promoter leaking (PMID: 12869186) or traces of tetracycline in the FBS we used, as we did not specifically use tet-free FBS for our experiments.
  
  Author response image 4.
  
  Streptavidin-HRP immunoblot following induction of BirA fusion proteins with tetracycline (+tet) and supplementation of biotin (+bio). For the sample used as expression control tetracycline was omitted (-tet). To test background biotinylation, biotin supplementation was omitted (-bio). Immunoblot against BirA and PSMA was used to verify induction of fusion proteins, while GAPDH was used as loading control.
  
  (3) For the proteasome structure models in Fig. 1D, a scale bar would be useful to inform the reader of the expected 10 nm labelling radius (as the authors have done later, in Fig. 2D).
  
  We have added 10 nm scale bars to Figure 1d.
  
  (4) In the "Identification of proteasome substrates by ProteasomeID" Results subsection, I believe there is a typo where the authors refer to ARMC1 instead of ARMC6.
  
  We have corrected the mistake.
  
  (5) I think Fig. S5 was one of the most compelling in the manuscript. Given the interest in confirming on-target efficacy of targeted degradation modalities, as well as identifying potential off-target effects early-on in development, I would consider promoting this out of the supplement.
  
  We thank the reviewer for the comment and share the excitement about using ProteasomeID for targeted degradation screening. We have moved the data on PROTACs (Figure S5) into a new main Figure 5.
  
  In addition, in relation to the comment of this reviewer regarding the detection of endogenous substrates, we have now included validation for one more hit emerging from our analysis (TIGD5) and included the results in Figure 4f, 4g and S4j.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.08.09.503299v3
www.biorxiv.org www.biorxiv.org

An emerging view of neural geometry in motor cortex supports high-performance decoding

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Summary of reviewers’ comments and our revisions:
  
  We thank the reviewers for their thoughtful feedback. This feedback has motivated multiple revisions and additions that, in our view, have greatly improved the manuscript. This is especially true with regard to a major goal of this study: clearly defining existing scientific perspectives and delineating their decoding implications. In addition to building on this conceptual goal, we have expanded existing analyses and have added a new analysis of generalization using a newly collected dataset. We expect the manuscript will be of very broad interest, both to those interested in BCI development and to those interested in fundamental properties of neural population activity and its relationship with behavior.
  
  Importantly, all reviewers were convinced that MINT provided excellent performance, when benchmarked against existing methods, across a broad range of standard tasks:
  
  “their method shows impressive performance compared to more traditional decoding approaches” (R1)
  
  “The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method.” (R2)
  
  “The fact that performance on stereotyped tasks is high is interesting and informative…” (R3)
  
  This is important. It is challenging to design a decoder that performs consistently across multiple domains and across multiple situations (including both decoding and neural state estimation). MINT does so. MINT consistently outperformed existing lightweight ‘interpretable’ decoders, despite being a lightweight interpretable decoder itself. MINT was very competitive with expressive machine-learning methods, yet has advantages in flexibility and simplicity that more ‘brute force’ methods do not. We made a great many comparisons, and MINT was consistently a strong performer. Of the many comparisons we made, there was only one where MINT was at a modest disadvantage, and it was for a dataset where all methods performed poorly. No other method we tested was as consistent. For example, although the GRU and the feedforward network were often competitive with MINT (and better than MINT in the one case mentioned above), there were multiple other situations where they performed less well and a few situations where they performed poorly. Moreover, no other existing decoder naturally estimates the neural state while also readily decoding, without retraining, a broad range of behavioral variables.
  
  R1 and R2 were very positive about the broader impacts of the study. They stressed its impact both on decoder design, and on how our field thinks, scientifically, about the population response in motor areas:
  
  “This paper presents an innovative decoding approach for brain-computer interfaces” (R1)
  
  “presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour” (R1)
  
  “the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field” (R1)
  
  “The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality” (R2)
  
  “This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design.” (R2)
  
  “this work is also broadly impactful for neuroscientific analysis... Thus, MINT will likely impact neuroscience research generally.” (R2)
  
  We agree with these assessments, and have made multiple revisions to further play into these strengths. As one example, the addition of Figure 1b (and 6b) makes this the first study, to our knowledge, to fully and concretely illustrate this emerging scientific perspective and its decoding implications. This is important, because multiple observations convince us that the field is likely to move away from the traditional perspective in Figure 1a, and towards that in Figure 1b. We also agree with the handful of weaknesses R1 and R2 noted. The manuscript has been revised accordingly. The major weakness noted by R1 was the need to be explicit regarding when we suspect MINT would (and wouldn’t) work well in other brain areas. In non-motor areas, the structure of the data may be poorly matched with MINT’s assumptions. We agree that this is likely to be true, and thus agree with the importance of clarifying this topic for the reader. The revision now does so. R1 also wished to know whether existing methods might benefit from including trial-averaged data during training, something we now explore and document (see detailed responses below). R2 noted two weaknesses: 1) The need to better support (with expanded analysis) the statement that neural and behavioral trajectories are non-isometric, and 2) The need to more rigorously define the ‘mesh’. We agree entirely with both suggestions, and the revision has been strengthened by following them (see detailed responses below).
  
  R3 also saw strengths to the work, stating that:
  
  “This paper is well-structured and its main idea is clear.”
  
  “The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories.”
  
  “The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength.”
  
  However, R3 also expressed two sizable concerns. The first is that MINT might have onerous memory requirements. The manuscript now clarifies that MINT has modest memory requirements. These do not scale unfavorably as the reviewer was concerned they might. The second concern is that MINT is:
  
  “essentially a table-lookup rather than a model.”
  
  Although we don’t agree, the concern makes sense and may be shared by many readers, especially those who take a particular scientific perspective. Pondering this concern thus gave us the opportunity to modify the manuscript in ways that support its broader impact. Our revisions had two goals: 1) clarify the ways in which MINT is far more flexible than a lookup-table, and 2) better describe the dominant scientific perspectives and their decoding implications.
  
  The heart of R3’s concern is the opinion that MINT is an effective but unprincipled hack suitable for situations where movements are reasonably stereotyped. Of course, many tasks involve stereotyped movements (e.g. handwriting characters), so MINT would still be useful. Nevertheless, if MINT is not principled, other decode methods would often be preferable because they could (unlike MINT in R3’s opinion) gain flexibility by leveraging an accurate model. Most of R3’s comments flow from this fundamental concern:
  
  “This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model.”
  
  “MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks.”
  
  “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”
  
  “given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”
  
  “For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not).”
  
  The manuscript has been revised to clarify that MINT is considerably more flexible than a lookup table, even though a lookup table is used as a first step. Yet, on its own, this does not fully address R3’s concern. The quotes above highlight that R3 is making a standard assumption in our field: that there exists a “movement space and associated neural space”. Under this perspective, one should, as R3 argues fully explore the movement space. This would perforce fully explore the associated neural subspace. One can then “model the neural subspace and its association to movement”. MINT does not use a model of this type, and thus (from R3’s perspective) does not appear to use a model at all. A major goal of our study is to question this traditional perspective. We have thus added a new figure to highlight the contrast between the traditional (Figure 1a) and new (Figure 1b) scientific perspectives, and to clarify their decoding implications.
  
  While we favor the new perspective (Figure 1b), we concede that R3 may not share our view. This is fine. Part of the reason we believe this study is timely, and will be broadly read, is that it raises a topic of emerging interest where there is definitely room for debate. If we are misguided – i.e. if Figure 1a is the correct perspective – then many of R3’s concerns would be on target: MINT could still be useful, but traditional methods that make the traditional assumptions in Figure 1a would often be preferable. However, if the emerging perspective in Figure 1b is more accurate, then MINT’s assumptions would be better aligned with the data than those of traditional methods, making it a more (not less) principled choice.
  
  Our study provides new evidence in support of Figure 1b, while also synthesizing existing evidence from other recent studies. In addition to Figure 2, the new analysis of generalization further supports Figure 1b. Also supporting Figure 1b is the analysis in which MINT’s decoding advantage, over a traditional decoder, disappears when simulated data approximate the traditional perspective in Figure 1a.
  
  That said, we agree that the present study cannot fully resolve whether Figure 1a or 1b is more accurate. Doing so will take multiple studies with different approaches (indeed we are currently preparing other manuscripts on this topic). Yet we still have an informed scientific opinion, derived from past, present and yet-to-be-published observations. Our opinion is that Figure 1b is the more accurate perspective. This possibility makes it reasonable to explore the potential virtues of a decoding method whose assumptions are well-aligned with that perspective. MINT is such a method. As expected under Figure 1b, MINT outperforms traditional interpretable decoders in every single case we studied.
  
  As noted above, we have added a new generalization-focused analysis (Figure 6) based on a newly collected dataset. We did so because R3’s comments highlight a deep point: which scientific perspective one takes has strong implications regarding decoder generalization. These implications are now illustrated in the new Figure 6a and 6b. Under Figure 6a, it is possible, as R3 suggests, to explore “the whole movement space and associated neural space” during training. However, under Figure 6b, expectations are very different. Generalization will be ‘easy’ when new trajectories are near the training-set trajectories. In this case, MINT should generalize well as should other methods. In contrast, generalization will be ‘hard’ when new neural trajectories have novel shapes and occupy previously unseen regions / dimensions. In this case, all current methods, including MINT, are likely to fail. R3 points out that traditional decoders have sometimes generalized well to new tasks (e.g. from center-out to ‘pinball’) when cursor movements occur in the same physical workspace. These findings could be taken to support Figure 6a, but are equally consistent with ‘easy’ generalization in Figure 6b. To explore this topic, the new analysis in Figure 6c-g considers conditions that are intended to span the range from easy to hard. Results are consistent with the predictions of Figure 6b.
  
  We believe the manuscript has been significantly improved by these additions. The revisions help the manuscript achieve its twin goals: 1) introduce a novel class of decoder that performs very well despite being very simple, and 2) describe properties of motor-cortex activity that will matter for decoders of all varieties.
  
  Reviewer #1:
  
  Summary:
  
  This paper presents an innovative decoding approach for brain-computer interfaces (BCIs), introducing a new method named MINT. The authors develop a trajectory-centric approach to decode behaviors across several different datasets, including eight empirical datasets from the Neural Latents Benchmark. Overall, the paper is well written and their method shows impressive performance compared to more traditional decoding approaches that use a simpler approach. While there are some concerns (see below), the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field.
  
  We thank the reviewer for these comments. We share their enthusiasm for the trajectory-centric approach, and we are in complete agreement that this perspective has both scientific and decoding implications. The revision expands upon these strengths.
  
  Strengths:
  
  The adoption of a trajectory-centric approach that utilizes statistical constraints presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour. This is one of the strongest aspects of the paper.
  
  Again, thank you. We also expect the trajectory-centric perspective to have a broad impact, given its relevance to both decoding and to thinking about manifolds.
  
  The thorough evaluation of the method across various datasets serves as an assurance that the superior performance of MINT is not a result of overfitting. The comparative simplicity of the method in contrast to many neural network approaches is refreshing and should facilitate broader applicability.
  
  Thank you. We were similarly pleased to see such a simple method perform so well. We also agree that, while neural-network approaches will always be important, it is desirable to also possess simple ‘interpretable’ alternatives.
  
  Weaknesses:
  
  Comment 1) Scope: Despite the impressive performance of MINT across multiple datasets, it seems predominantly applicable to M1/S1 data. Only one of the eight empirical datasets comes from an area outside the motor/somatosensory cortex. It would be beneficial if the authors could expand further on how the method might perform with other brain regions that do not exhibit low tangling or do not have a clear trial structure (e.g. decoding of position or head direction from hippocampus)
  
  We agree entirely. Population activity in many brain areas (especially outside the motor system) presumably will often not have the properties upon which MINT’s assumptions are built. This doesn’t necessarily mean that MINT would perform badly. Using simulated data, we have found that MINT can perform surprisingly well even when some of its assumptions are violated. Yet at the same time, when MINT’s assumptions don’t apply, one would likely prefer to use other methods. This is, after all, one of the broader themes of the present study: it is beneficial to match decoding assumptions to empirical properties. We have thus added a section on this topic early in the Discussion:
  
  “In contrast, MINT and the Kalman filter performed comparably on simulated data that better approximated the assumptions in Figure 1a. Thus, MINT is not a ‘better’ algorithm – simply better aligned with the empirical properties of motor cortex data. This highlights an important caveat. Although MINT performs well when decoding from motor areas, its assumptions may be a poor match in other areas (e.g. the hippocampus). MINT performed well on two non-motor-cortex datasets – Area2_Bump (S1) and DMFC_RSG (dorsomedial frontal cortex) – yet there will presumably be other brain areas and/or contexts where one would prefer a different method that makes assumptions appropriate for that area.”
  
  Comment 2) When comparing methods, the neural trajectories of MINT are based on averaged trials, while the comparison methods are trained on single trials. An additional analysis might help in disentangling the effect of the trial averaging. For this, the authors could average the input across trials for all decoders, establishing a baseline for averaged trials. Note that inference should still be done on single trials. Performance can then be visualized across different values of N, which denotes the number of averaged trials used for training.
  
  We explored this question and found that the non-MINT decoders are harmed, not helped, by the inclusion of trial-averaged responses in the training set. This is presumably because the statistics of trialaveraged responses don’t resemble what will be observed during decoding. This statistical mismatch, between training and decoding, hurts most methods. It doesn’t hurt MINT, because MINT doesn’t ‘train’ in the normal way. It simply needs to know rates, and trial-averaging is a natural way to obtain them. To describe the new analysis, we have added the following to the text.
  
  “We also investigated the possibility that MINT gained its performance advantage simply by having access to trial-averaged neural trajectories during training, while all other methods were trained on single-trial data. This difference arises from the fundamental requirements of the decoder architectures: MINT needs to estimate typical trajectories while other methods don’t. Yet it might still be the case that other methods would benefit from including trial-averaged data in the training set, in addition to single-trial data. Alternatively, this might harm performance by creating a mismatch, between training and decoding, in the statistics of decoder inputs. We found that the latter was indeed the case: all non-MINT methods performed better when trained purely on single-trial data.”
  
  Reviewer #2:
  
  Summary:
  
  The goal of this paper is to present a new method, termed MINT, for decoding behavioral states from neural spiking data. MINT is a statistical method which, in addition to outputting a decoded behavioral state, also provides soft information regarding the likelihood of that behavioral state based on the neural data. The innovation in this approach is neural states are assumed to come from sparsely distributed neural trajectories with low tangling, meaning that neural trajectories (time sequences of neural states) are sparse in the high-dimensional space of neural spiking activity and that two dissimilar neural trajectories tend to correspond to dissimilar behavioral trajectories. The authors support these assumptions through analysis of previously collected data, and then validate the performance of their method by comparing it to a suite of alternative approaches. The authors attribute the typically improved decoding performance by MINT to its assumptions being more faithfully aligned to the properties of neural spiking data relative to assumptions made by the alternatives.
  
  We thank the reviewer for this accurate summary, and for highlighting the subtle but important fact that MINT provides information regarding likelihoods. The revision includes a new analysis (Figure 6e) illustrating one potential way to leverage knowledge of likelihoods.
  
  Strengths:
  
  The paper did an excellent job critically evaluating common assumptions made by neural analytical methods, such as neural state being low-dimensional relative to the number of recorded neurons. The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality.
  
  Thank you. We also hope that the shift in perspective is the most important contribution of the study. This shift matters both scientifically and for decoder design. The revision expands on this strength. The scientific alternatives are now more clearly and concretely illustrated (especially see Figure 1a,b and Figure 6a,b). We also further explore their decoding implications with new data (Figure 6c-g).
  
  The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method. The authors also provided nice intuition regarding why MINT may offer performance improvement in some cases and in which instances MINT may not perform as well.
  
  Thank you. We were pleased to be able to provide comparisons across so many datasets (we are grateful to the Neural Latents Benchmark for making this possible).
  
  In addition to providing a philosophical discussion as to the advantages of MINT and benchmarking against alternatives, the authors also provided a detailed description of practical considerations. This included training time, amount of training data, robustness to data loss or changes in the data, and interpretability. These considerations not only provided objective evaluation of practical aspects but also provided insights to the flexibility and robustness of the method as they relate back to the underlying assumptions and construction of the approach.
  
  Thank you. We are glad that these sections were appreciated. MINT’s simplicity and interpretability are indeed helpful in multiple ways, and afford opportunities for interesting future extensions. One potential benefit of interpretability is now explored in the newly added Figure 6e.
  
  Impact:
  
  This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design. However, this work is also broadly impactful for neuroscientific analysis to relate neural spiking activity to observable behavioral features. Thus, MINT will likely impact neuroscience research generally. The methods are made publicly available, and the datasets used are all in public repositories, which facilitates adoption and validation of this method within the greater scientific community.
  
  Again, thank you. We have similar hopes for this study.
  
  Weaknesses (1 & 2 are related, and we have switched their order in addressing them):
  
  Comment 2) With regards to the idea of neural and behavioral trajectories having different geometries, this is dependent on what behavioral variables are selected. In the example for Fig 2a, the behavior is reach position. The geometry of the behavioral trajectory of interest would look different if instead the behavior of interest was reach velocity. The paper would be strengthened by acknowledgement that geometries of trajectories are shaped by extrinsic choices rather than (or as much as they are) intrinsic properties of the data.
  
  We agree. Indeed, we almost added a section to the original manuscript on this exact topic. We have now done so:
  
  “A potential concern regarding the analyses in Figure 2c,d is that they require explicit choices of behavioral variables: muscle population activity in Figure 2c and angular phase and velocity in Figure 2d. Perhaps these choices were misguided. Might neural and behavioral geometries become similar if one chooses ‘the right’ set of behavioral variables? This concern relates to the venerable search for movement parameters that are reliably encoded by motor cortex activity [69, 92–95]. If one chooses the wrong set of parameters (e.g. chooses muscle activity when one should have chosen joint angles) then of course neural and behavioral geometries will appear non-isometric. There are two reasons why this ‘wrong parameter choice’ explanation is unlikely to account for the results in Figure 2c,d. First, consider the implications of the left-hand side of Figure 2d. A small kinematic distance implies that angular position and velocity are nearly identical for the two moments being compared. Yet the corresponding pair of neural states can be quite distant. Under the concern above, this distance would be due to other encoded behavioral variables – perhaps joint angle and joint velocity – differing between those two moments. However, there are not enough degrees of freedom in this task to make this plausible. The shoulder remains at a fixed position (because the head is fixed) and the wrist has limited mobility due to the pedal design [60]. Thus, shoulder and elbow angles are almost completely determined by cycle phase. More generally, ‘external variables’ (positions, angles, and their derivatives) are unlikely to differ more than slightly when phase and angular velocity are matched. Muscle activity could be different because many muscles act on each joint, creating redundancy. However, as illustrated in Figure 2c, the key effect is just as clear when analyzing muscle activity. Thus, the above concern seems unlikely even if it can’t be ruled out entirely. A broader reason to doubt the ‘wrong parameter choice’ proposition is that it provides a vague explanation for a phenomenon that already has a straightforward explanation. A lack of isometry between the neural population response and behavior is expected when neural-trajectory tangling is low and output-null factors are plentiful [55, 60]. For example, in networks that generate muscle activity, neural and muscle-activity trajectories are far from isometric [52, 58, 60]. Given this straightforward explanation, and given repeated failures over decades to find the ‘correct’ parameters (muscle activity, movement direction, etc.) that create neural-behavior isometry, it seems reasonable to conclude that no such isometry exists.”
  
  Comment 1) The authors posit that neural and behavioral trajectories are non-isometric. To support this point, they look at distances between neural states and distances between the corresponding behavioral states, in order to demonstrate that there are differences in these distances in each respective space. This supports the idea that neural states and behavioral states are non-isometric but does not directly address their point. In order to say the trajectories are non-isometric, it would be better to look at pairs of distances between corresponding trajectories in each space.
  
  We like this idea and have added such an analysis. To be clear, we like the original analysis too: isometry predicts that neural and behavioral distances (for corresponding pairs of points) should be strongly correlated, and that small behavioral distances should not be associated with large neural distances. These predictions are not true, providing a strong argument against isometry. However, we also like the reviewer’s suggestion, and have added such an analysis. It makes the same larger point, and also reveals some additional facts (e.g. it reveals that muscle-geometry is more related to neural-geometry than is kinematic-geometry). The new analysis is described in the following section:
  
  “We further explored the topic of isometry by considering pairs of distances. To do so, we chose two random neural states and computed their distance, yielding dneural1. We repeated this process, yielding dneural2. We then computed the corresponding pair of distances in muscle space (dmuscle1 and dmuscle2) and kinematic space (dkin1 and dkin2). We considered cases where dneural1 was meaningfully larger than (or smaller than) dneural2, and asked whether the behavioral variables had the same relationship; e.g. was dmuscle1 also larger than dmuscle2? For kinematics, this relationship was weak: across 100,000 comparisons, the sign of dkin1 − dkin2 agreed with dneural1 − dneural2 only 67.3% of the time (with 50% being chance). The relationship was much stronger for muscles: the sign of dmuscle1 − dmuscle2 agreed with dneural1 − dneural2 79.2% of the time, which is far more than expected by chance yet also far from what is expected given isometry (e.g. the sign agrees 99.7% of the time for the truly isometric control data in Figure 2e). Indeed there were multiple moments during this task when dneural1 was much larger than dneural2, yet dmuscle1 was smaller than dmuscle2. These observations are consistent with the proposal that neural trajectories resemble muscle trajectories in some dimensions, but with additional output-null dimensions that break the isometry [60].”
  
  Comment 3) The approach is built up on the idea of creating a "mesh" structure of possible states. In the body of the paper the definition of the mesh was not entirely clear and I could not find in the methods a more rigorous explicit definition. Since the mesh is integral to the approach, the paper would be improved with more description of this component.
  
  This is a fair criticism. Although MINTs actual operations were well-documented, how those operations mapped onto the term ‘mesh’ was, we agree, a bit vague. The definition of the mesh is a bit subtle because it only emerges during decoding rather than being precomputed. This is part of what gives MINT much more flexibility than a lookup table. We have added the following to the manuscript.
  
  “We use the term ‘mesh’ to describe the scaffolding created by the training-set trajectories and the interpolated states that arise at runtime. The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations.
  
  Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local”
  
  We have also added Figure 4d. This new analysis documents the fact that decoded states are near trainingset trajectories, which is why the term ‘mesh’ is appropriate.
  
  Reviewer #3:
  
  Summary:
  
  This manuscript develops a new method termed MINT for decoding of behavior. The method is essentially a table-lookup rather than a model. Within a given stereotyped task, MINT tabulates averaged firing rate trajectories of neurons (neural states) and corresponding averaged behavioral trajectories as stereotypes to construct a library. For a test trial with a realized neural trajectory, it then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior. The method can also interpolate between these tabulated trajectories. The authors mention that the method is based on three key assumptions: (1) Neural states may not be embedded in a lowdimensional subspace, but rather in a high-dimensional space. (2) Neural trajectories are sparsely distributed under different behavioral conditions. (3) These neural states traverse trajectories in a stereotyped order.
  
  The authors conducted multiple analyses to validate MINT, demonstrating its decoding of behavioral trajectories in simulations and datasets (Figures 3, 4). The main behavior decoding comparison is shown in Figure 4. In stereotyped tasks, decoding performance is comparable (M_Cycle, MC_Maze) or better (Area 2_Bump) than other linear/nonlinear algorithms
  
  (Figure 4). However, MINT underperforms for the MC_RTT task, which is less stereotyped (Figure 4).
  
  This paper is well-structured and its main idea is clear. The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories. The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength. However, I have several major concerns. I believe several of the conclusions in the paper, which are also emphasized in the abstract, are not accurate or supported, especially about generalization, computational scalability, and utility for BCIs. MINT is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling. These aspects will limit MINT's utility for real-world BCIs and tasks. These properties will also limit MINT's generalizability from task to task, which is important for BCIs and thus is commonly demonstrated in BCI experiments with other decoders without any retraining. Furthermore, MINT's computational and memory requirements can be prohibitive it seems. Finally, as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations. I expand on these concerns below.
  
  We thank the reviewer for pointing out weaknesses in our framing and presentation. The comments above made us realize that we needed to 1) better document the ways in which MINT is far more flexible than a lookup-table, and 2) better explain the competing scientific perspectives at play. R3’s comments also motivated us to add an additional analysis of generalization. In our view the manuscript is greatly improved by these additions. Specifically, these additions directly support the broader impact that we hope the study will have.
  
  For simplicity and readability, we first group and summarize R3’s main concerns in order to better address them. (These main concerns are all raised above, in addition to recurring in the specific comments below. Responses to each individual specific comment are provided after these summaries.)
  
  (1) R3 raises concerns about ‘computational scalability.’ The concern is that “MINT's computational and memory requirements can be prohibitive.” This point was expanded upon in a specific comment, reproduced below:
  
  I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract.
  
  The revised manuscript clarifies that our statement (that computations are simple and scalable) is absolutely accurate. There is no need to compute, or store, a massive lookup table. There are three tables: two of modest size and one that is tiny. This is now better explained:
  
  “Thus, the log-likelihood of , for a particular current neural state, is simply the sum of many individual log-likelihoods (one per neuron and time-bin). Each individual log-likelihood depends on only two numbers: the firing rate at that moment and the spike count in that bin. To simplify online computation, one can precompute the log-likelihood, under a Poisson model, for every plausible combination of rate and spike-count. For example, a lookup table of size 2001 × 21 is sufficient when considering rates that span 0-200 spikes/s in increments of 0.1 spikes/s, and considering 20 ms bins that contain at most 20 spikes (only one lookup table is ever needed, so long as its firing-rate range exceeds that of the most-active neuron at the most active moment in Ω). Now suppose we are observing a population of 200 neurons, with a 200 ms history divided into ten 20 ms bins. For each library state, the log-likelihood of the observed spike-counts is simply the sum of 200 × 10 = 2000 individual loglikelihoods, each retrieved from the lookup table. In practice, computation is even simpler because many terms can be reused from the last time bin using a recursive solution (Methods). This procedure is lightweight and amenable to real-time applications.”
  
  In summary, the first table simply needs to contain the firing rate of each neuron, for each condition, and each time in that condition. This table consumes relatively little memory. Assuming 100 one-second-long conditions (rates sampled every 20 ms) and 200 neurons, the table would contain 100 x 50 x 200 = 1,000,000 numbers. These numbers are typically stored as 16-bit integers (because rates are quantized), which amounts to about 2 MB. This is modest, given that most computers have (at least) tens of GB of RAM. A second table would contain the values for each behavioral variable, for each condition, and each time in that condition. This table might contain behavioral variables at a finer resolution (e.g. every millisecond) to enable decoding to update in between 20 ms bins (1 ms granularity is not needed for most BCI applications, but is the resolution used in this study). The number of behavioral variables of interest for a particular BCI application is likely to be small, often 1-2, but let’s assume for this example it is 10 (e.g. x-, y-, and z-position, velocity, and acceleration of a limb, plus one other variable). This table would thus contain 100 x 1000 x 10 = 1,000,000 floating point numbers, i.e. an 8 MB table. The third table is used to store the probability of s spikes being observed given a particular quantized firing rate (e.g. it may contain probabilities associated with firing rates ranging from 0 – 200 spikes/s in 0.1 spikes/s increments). This table is not necessary, but saves some computation time by precomputing numbers that will be used repeatedly. This is a very small table (typically ~2000 x 20, i.e. 320 KB). It does not need to be repeated for different neurons or conditions, because Poisson probabilities depend on only rate and count.
  
  (2) R3 raises a concern that MINT “is essentially a table-lookup rather than a model.’ R3 states that MINT
  
  “is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling.”
  
  and that,
  
  “as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations.”
  
  This concern is central to most subsequent concerns. The manuscript has been heavily revised to address it. The revisions clarify that MINT is much more flexible than a lookup table, even though MINT uses a lookup table as its first step. Because R3’s concern is intertwined with one’s scientific assumptions, we have also added the new Figure 1 to explicitly illustrate the two key scientific perspectives and their decoding implications.
  
  Under the perspective in Figure 1a, R3 would be correct in saying that there exist traditional interpretable decoders (e.g. a Kalman filter) whose assumptions better model the data. Under this perspective, MINT might still be an excellent choice in many cases, but other methods would be expected to gain the advantage when situations demand more flexibility. This is R3’s central concern, and essentially all other concerns flow from it. It makes sense that R3 has this concern, because their comments repeatedly stress a foundational assumption of the perspective in Figure 1a: the assumption of a fixed lowdimensional neural subspace where activity has a reliable relationship to behavior that can be modeled and leveraged during decoding. The phrases below accord with that view:
  
  “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”
  
  “it will not generalize… even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”
  
  “For proper training, the training data should explore the whole movement space and the associated neural space”
  
  “I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics”
  
  Thus, R3 prefers a model that 1) assumes a low-dimensional subspace that is fixed across tasks and 2) assumes a consistent ‘association’ between neural activity and kinematics. Because R3 believes this is the correct model of the data, they believe that decoders should leverage it. Traditional interpretable method do, and MINT doesn’t, which is why they find MINT to be unprincipled. This is a reasonable view, but it is not our view. We have heavily revised the manuscript to clarify that a major goal of our study is to explore the implications of a different, less-traditional scientific perspective.
  
  The new Figure 1a illustrates the traditional perspective. Under this perspective, one would agree with R3’s claim that other methods have the opportunity to model the data better. For example, suppose there exists a consistent neural subspace – conserved across tasks – where three neural dimensions encode 3D hand position and three additional neural dimensions encode 3D hand velocity. A traditional method such as a Kalman filter would be a very appropriate choice to model these aspects of the data.
  
  Figure 1b illustrates the alternative scientific perspective. This perspective arises from recent, present, and to-be-published observations. MINT’s assumptions are well-aligned with this perspective. In contrast, the assumptions of traditional methods (e.g. the Kalman filter) are not well-aligned with the properties of the data under this perspective. This does not mean traditional methods are not useful. Yet under Figure 1b, it is traditional methods, such as the Kalman filter, that lack an accurate model of the data. Of course, the reviewer may disagree with our scientific perspective. We would certainly concede that there is room for debate. However, we find the evidence for Figure 1b to be sufficiently strong that it is worth exploring the utility of methods that align with this scientific perspective. MINT is such a method. As we document, it performs very well.
  
  Thus, in our view, MINT is quite principled because its assumptions are well aligned with the data. It is true that the features of the data that MINT models are a bit different from those that are traditionally modeled. For example, R3 is quite correct that MINT does not attempt to use a biomimetic model of the true transformation from neural activity, to muscle activity, and thence to kinematics. We see this as a strength, and the manuscript has been revised accordingly (see paragraph beginning with “We leveraged this simulated data to compare MINT with a biomimetic decoder”).
  
  (3) R3 raises concerns that MINT cannot generalize. This was a major concern of R3 and is intimately related to concern #2 above. The concern is that, if MINT is “essentially a lookup table” that simply selects pre-defined trajectories, then MINT will not be able to generalize. R3 is quite correct that MINT generalizes rather differently than existing methods. Whether this is good or bad depends on one’s scientific perspective. Under Figure 1a, MINT’s generalization would indeed be limiting because other methods could achieve greater flexibility. Under Figure 1b, all methods will have serious limits regarding generalization. Thus, MINT’s method for generalizing may approximate the best one can presently do. To address this concern, we have made three major changes, numbered i-iii below:
  
  i) Large sections of the manuscript have been restructured to underscore the ways in which MINT can generalize. A major goal was to counter the impression, stated by R3 above, that:
  
  “for a test trial with a realized neural trajectory, [MINT] then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior”.
  
  This description is a reasonable way to initially understand how MINT works, and we concede that we may have over-used this intuition. Unfortunately, it can leave the misimpression that MINT decodes by selecting whole trajectories, each corresponding to ‘a behavior’. This can happen, but it needn’t and typically doesn’t. As an example, consider the cycling task. Suppose that the library consists of stereotyped trajectories, each four cycles long, at five fixed speeds from 0.5-2.5 Hz. If the spiking observations argued for it, MINT could decode something close to one of these five stereotyped trajectories. Yet it needn’t. Decoded trajectories will typically resemble library trajectories locally, but may be very different globally. For example, a decoded trajectory could be thirty cycles long (or two, or five hundred) perhaps speeding up and slowing down multiple times across those cycles.
  
  Thus, the library of trajectories shouldn’t be thought of as specifying a limited set of whole movements that can be ‘selected from’. Rather, trajectories define a scaffolding that outlines where the neural state is likely to live and how it is likely to be changing over time. When we introduce the idea of library trajectories, we are now careful to stress that they don’t function as a set from which one trajectory is ‘declared’ to be the right one:
  
  “We thus designed MINT to approximate that manifold using the trajectories themselves, rather than their covariance matrix or corresponding subspace. Unlike a covariance matrix, neural trajectories indicate not only which states are likely, but also which state-derivatives are likely. If a neural state is near previously observed states, it should be moving in a similar direction. MINT leverages this directionality.
  
  Training-set trajectories can take various forms, depending on what is convenient to collect. Most simply, training data might include one trajectory per condition, with each condition corresponding to a discrete movement. Alternatively, one might instead employ one long trajectory spanning many movements. Another option is to employ many sub-trajectories, each briefer than a whole movement. The goal is simply for training-set trajectories to act as a scaffolding, outlining the manifold that might be occupied during decoding and the directions in which decoded trajectories are likely to be traveling.”
  
  Later in that same section we stress that decoded trajectories can move along the ‘mesh’ in nonstereotyped ways:
  
  “Although the mesh is formed of stereotyped trajectories, decoded trajectories can move along the mesh in non-stereotyped ways as long as they generally obey the flow-field implied by the training data. This flexibility supports many types of generalization, including generalization that is compositional in nature. Other types of generalization – e.g. from the green trajectories to the orange trajectories in Figure 1b – are unavailable when using MINT and are expected to be challenging for any method (as will be documented in a later section).”
  
  The section “Training and decoding using MINT” has been revised to clarify the ways in which interpolation is flexible, allowing decoded movements to be globally very different from any library trajectory.
  
  “To decode stereotyped trajectories, one could simply obtain the maximum-likelihood neural state from the library, then render a behavioral decode based on the behavioral state with the same values of c and k. This would be appropriate for applications in which conditions are categorical, such as typing or handwriting. Yet in most cases we wish for the trajectory library to serve not as an exhaustive set of possible states, but as a scaffolding for the mesh of possible states. MINT’s operations are thus designed to estimate any neural trajectory – and any corresponding behavioral trajectory – that moves along the mesh in a manner generally consistent with the trajectories in Ω.”
  
  “…interpolation allows considerable flexibility. Not only is one not ‘stuck’ on a trajectory from Φ, one is also not stuck on trajectories created by weighted averaging of trajectories in Φ. For example, if cycling speed increases, the decoded neural state could move steadily up a scaffolding like that illustrated in Figure 1b (green). In such cases, the decoded trajectory might be very different in duration from any of the library trajectories. Thus, one should not think of the library as a set of possible trajectories that are selected from, but rather as providing a mesh-like scaffolding that defines where future neural states are likely to live and the likely direction of their local motion. The decoded trajectory may differ considerably from any trajectory within Ω.”
  
  This flexibility is indeed used during movement. One empirical example is described in detail:
  
  “During movement… angular phase was decoded with effectively no net drift over time. This is noteworthy because angular velocity on test trials never perfectly matched any of the trajectories in Φ. Thus, if decoding were restricted to a library trajectory, one would expect growing phase discrepancies. Yet decoded trajectories only need to locally (and approximately) follow the flow-field defined by the library trajectories. Based on incoming spiking observations, decoded trajectories speed up or slow down (within limits).
  
  This decoding flexibility presumably relates to the fact that the decoded neural state is allowed to differ from the nearest state in Ω. To explore… [the text goes on to describe the new analysis in Figure 4d, which shows that the decoded state is typically not on any trajectory, though it is typically close to a trajectory].”
  
  Thus, MINT’s operations allow considerable flexibility, including generalization that is compositional in nature. Yet R3 is still correct that there are other forms of generalization that are unavailable to MINT. This is now stressed at multiple points in the revision. However, under the perspective in Figure 1b, these forms of generalization are unavailable to any current method. Hence we made a second major change in response to this concern… ii) We explicitly illustrate how the structure of the data determines when generalization is or isn’t possible. The new Figure 1a,b introduces the two perspectives, and the new Figure 6a,b lays out their implications for generalization. Under the perspective in Figure 6a, the reviewer is quite right: other methods can generalize in ways that MINT cannot. Under the perspective in Figure 6b, expectations are very different. Those expectations make testable predictions. Hence the third major change… iii) We have added an analysis of generalization, using a newly collected dataset. This dataset was collected using Neuropixels Probes during our Pac-Man force-tracking task. This dataset was chosen because it is unusually well-suited to distinguishing the predictions in Figure 6a versus Figure 6b. Finding a dataset that can do so is not simple. Consider R3’s point that training data should “explore the whole movement space and the associated neural space”. The physical simplicity of the Pac-Man task makes it unusually easy to confirm that the behavioral workspace has been fully explored. Importantly, under Figure 6b, this does not mean that the neural workspace has been fully explored, which is exactly what we wish to test when testing generalization. We do so, and compare MINT with a Wiener filter. A Wiener filter is an ideal comparison because it is simple, performs very well on this task, and should be able to generalize well under Figure 1a. Additionally, the Wiener filter (unlike the Kalman Filter) doesn’t leverage the assumption that neural activity reflects the derivative of force. This matters because we find that neural activity does not reflect dforce/dt in this task. The Wiener filter is thus the most natural choice of the interpretable methods whose assumptions match Figure 1a.
  
  The new analysis is described in Figure 6c-g and accompanying text. Results are consistent with the predictions of Figure 6b. We are pleased to have been motivated to add this analysis for two reasons. First, it provides an additional way of evaluating the predictions of the two competing scientific perspectives that are at the heart of our study. Second, this analysis illustrates an underappreciated way in which generalization is likely to be challenging for any decode method. It can be tempting to think that the main challenge regarding generalization is to fully explore the relevant behavioral space. This makes sense if a behavioral space has “an associated neural space”. However, we are increasingly of the opinion that it doesn’t. Different tasks often involve different neural subspaces, even when behavioral subspaces overlap. We have even seen situations where motor output is identical but neural subspaces are quite different. These facts are relevant to any decoder, something highlighted in the revised Introduction:
  
  “MINT’s performance confirms that there are gains to be made by building decoders whose assumptions match a different, possibly more accurate view of population activity. At the same time, our results suggest fundamental limits on decoder generalization. Under the assumptions in Figure 1b, it will sometimes be difficult or impossible for decoders to generalize to not-yet-seen tasks. We found that this was true regardless of whether one uses MINT or a more traditional method. This finding has implications regarding when and how generalization should be attempted.”
  
  We have also added an analysis (Figure 6e) illustrating how MINT’s ability to compute likelihoods can be useful in detecting situations that may strain generalization (for any method). MINT is unusual in being able to compute and use likelihoods in this way.
  
  Detailed responses to R3: we reproduce each of R3’s specific concerns below, but concentrate our responses on issues not already covered above.
  
  Main comments:
  
  Comment 1. MINT does not generalize to different tasks, which is a main limitation for BCI utility compared with prior BCI decoders that have shown this generalizability as I review below. Specifically, given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).
  
  First, the authors provide a section on generalization, which is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task. The former is critical for any algorithm, but it does not imply the latter. For example, removing one direction of cycling from the training set as the authors do here is an example of generating poor training data because the two behavioral (and neural) directions are non-overlapping and/or orthogonal while being in the same space. As such, it is fully expected that all methods will fail. For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not). Many BCI studies have indeed shown this generalization ability using a model. For example, in Weiss et al. 2019, center-out reaching tasks are used for training and then the same trained decoder is used for typing on a keyboard or drawing on the 2D screen. In Gilja et al. 2012, training is on a center-out task but the same trained decoder generalizes to a completely different pinball task (hit four consecutive targets) and tasks requiring the avoidance of obstacles and curved movements. There are many more BCI studies, such as Jarosiewicz et al. 2015 that also show generalization to complex realworld tasks not included in the training set. Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement. On the contrary, MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks. So, unlike these prior BCIs methods, MINT will likely actually need to include every task in its library, which is not practical.
  
  I suggest the authors remove claims of generalization and modify their arguments throughout the text and abstract. The generalization section needs to be substantially edited to clarify the above points. Please also provide the BCI citations and discuss the above limitation of MINT for BCIs.
  
  As discussed above, R3’s concerns are accurate under the view in Figure 1a (and the corresponding Figure 6a). Under this view, a method such as that in Gilja et al. or Jarosiewicz et al. can find the correct subspace, model the correct neuron-behavior correlations, and generalize to any task that uses “the same 2D computer screen and associated neural space”, just as the reviewer argues. Under Figure 1b things are quite different.
  
  This topic – and the changes we have made to address it – is covered at length above. Here we simply want to highlight an empirical finding: sometimes two tasks use the same neural subspace and sometimes they don’t. We have seen both in recent data, and it is can be very non-obvious which will occur based just on behavior. It does not simply relate to whether one is using the same physical workspace. We have even seen situations where the patterns of muscle activity in two tasks are nearly identical, but the neural subspaces are fairly different. When a new task uses a new subspace, neither of the methods noted above (Gilja nor Jarosiewicz) will generalize (nor will MINT). Generalizing to a new subspace is basically impossible without some yet-to-be-invented approach. On the other hand, there are many other pairs of tasks (center-out-reaching versus some other 2D cursor control) where subspaces are likely to be similar, especially if the frequency content of the behavior is similar (in our recent experience this is often critical). When subspaces are shared, most methods will generalize, and that is presumably why generalization worked well in the studies noted above.
  
  Although MINT can also generalize in such circumstances, R3 is correct that, under the perspective in Figure 1a, MINT will be more limited than other methods. This is now carefully illustrated in Figure 6a. In this traditional perspective, MINT will fail to generalize in cases where new trajectories are near previously observed states, yet move in very different ways from library trajectories. The reason we don’t view this is a shortcoming is that we expect it to occur rarely (else tangling would be high). We thus anticipate the scenario in Figure 6b.
  
  This is worth stressing because R3 states that our discussion of generalization “is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task.” We have heavily revised this section and improved it. However, it was never inaccurate. Under Figure 6b, these two concepts absolutely are mixed up. If different tasks use different neural subspaces, then this requires collecting different “informative training data” for each. One cannot simply count on having explored the physical workspace.
  
  Comment 2. MINT is shown to achieve competitive/high performance in highly stereotyped datasets with structured trials, but worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped. This shows that MINT is valuable for decoding in repetitive stereotyped use-cases. However, it also highlights a limitation of MINT for BCIs, which is that MINT may not work well for real-world and/or less-constrained setups such as typing, moving a robotic arm in 3D space, etc. This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model. Indeed, the authors acknowledge that the lower performance on MC_RTT (Figure 4) may be caused by the lack of repeated trials of the same type. However, real-world BCI decoding scenarios will also not have such stereotyped trial structure and will be less/un-constrained, in which MINT underperforms. Thus, the claim in the abstract or lines 480-481 that MINT is an "excellent" candidate for clinical BCI applications is not accurate and needs to be qualified. The authors should revise their statements according and discuss this issue. They should also make the use-case of MINT on BCI decoding clearer and more convincing.
  
  We discussed, above, multiple changes and additions to the revision that were made to address these concerns. Here we briefly expand on the comment that MINT achieves “worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped”. All decoders performed poorly on this task. MINT still outperformed the two traditional methods, but this was the only dataset where MINT did not also perform better (overall) than the expressive GRU and feedforward network. There are probably multiple reasons why. We agree with R3 that one likely reason is that this dataset is straining generalization, and MINT may have felt this strain more than the two machine-learning-based methods. Another potential reason is the structure of the training data, which made it more challenging to obtain library trajectories in the first place. Importantly, these observations do not support the view in Figure 1a. MINT still outperformed the Kalman and Wiener filters (whose assumptions align with Fig. 1a). To make these points we have added the following:
  
  “Decoding was acceptable, but noticeably worse, for the MC_RTT dataset… As will be discussed below, every decode method achieved its worst estimates of velocity for the MC_RTT dataset. In addition to the impact of slower reaches, MINT was likely impacted by training data that made it challenging to accurate estimate library trajectories. Due to the lack of repeated trials, MINT used AutoLFADS to estimate the neural state during training. In principle this should work well. In practice AutoLFADS may have been limited by having only 10 minutes of training data. Because the random-target task involved more variable reaches, it may also have stressed the ability of all methods to generalize, perhaps for the reasons illustrated in Figure 1b.
  
  The only dataset where MINT did not perform the best overall was the MC_RTT dataset, where it was outperformed by the feedforward network and GRU. As noted above, this may relate to the need for MINT to learn neural trajectories from training data that lacked repeated trials of the same movement (a design choice one might wish to avoid). Alternatively, the less-structured MC_RTT dataset may strain the capacity to generalize; all methods experienced a drop in velocity-decoding R2 for this dataset compared to the others. MINT generalizes somewhat differently than other methods, and may have been at a modest disadvantage for this dataset. A strong version of this possibility is that perhaps the perspective in Figure 1a is correct, in which case MINT might struggle because it cannot use forms of generalization that are available to other methods (e.g. generalization based on neuron-velocity correlations). This strong version seems unlikely; MINT continued to significantly outperform the Wiener and Kalman filters, which make assumptions aligned with Figure 1a.”
  
  Comment 3. Related to 2, it may also be that MINT achieves competitive performance in offline and trial-based stereotyped decoding by overfitting to the trial structure in a given task, and thus may not generalize well to online performance due to overfitting. For example, a recent work showed that offline decoding performance may be overfitted to the task structure and may not represent online performance (Deo et al. 2023). Please discuss.
  
  We agree that a limitation of our study is that we do not test online performance. There are sensible reasons for this decision:
  
  “By necessity and desire, all comparisons were made offline, enabling benchmarked performance across a variety of tasks and decoded variables, where each decoder had access to the exact same data and recording conditions.”
  
  We recently reported excellent online performance in the cycling task with a different algorithm
  
  (Schroeder et al. 2022). In the course of that study, we consistently found that improvements in our offline decoding translated to improvements in our online decoding. We thus believe that MINT (which improves on the offline performance of our older algorithm) is a good candidate to work very well online. Yet we agree this still remains to be seen. We have added the following to the Discussion:
  
  “With that goal in mind, there exist three important practical considerations. First, some decode algorithms experience a performance drop when used online. One presumed reason is that, when decoding is imperfect, the participant alters their strategy which in turn alters the neural responses upon which decoding is based. Because MINT produces particularly accurate decoding, this effect may be minimized, but this cannot be known in advance. If a performance drop does indeed occur, one could adapt the known solution of retraining using data collected during online decoding [13]. Another presumed reason (for a gap between offline and online decoding) is that offline decoders can overfit the temporal structure in training data [107]. This concern is somewhat mitigated by MINT’s use of a short spike-count history, but MINT may nevertheless benefit from data augmentation strategies such as including timedilated versions of learned trajectories in the libraries”
  
  Comment 4. Related to 2, since MINT requires firing rates to generate the library and simple averaging does not work for this purpose in the MC_RTT dataset (that does not have repeated trials), the authors needed to use AutoLFADS to infer the underlying firing rates. The fact that MINT requires the usage of another model to be constructed first and that this model can be computationally complex, will also be a limiting factor and should be clarified.
  
  This concern relates to the computational complexity of computing firing-rate trajectories during training. Usually, rates are estimated via trial-averaging, which makes MINT very fast to train. This was quite noticeable during the Neural Latents Benchmark competition. As one example, for the “MC_Scaling 5 ms Phase”, MINT took 28 seconds to train while GPFA took 30 minutes, the transformer baseline (NDT) took 3.5 hours, and the switching nonlinear dynamical system took 4.5 hours.
  
  However, the reviewer is quite correct that MINT’s efficiency depends on the method used to construct the library of trajectories. As we note, “MINT is a method for leveraging a trajectory library, not a method for constructing it”. One can use trial-averaging, which is very fast. One can also use fancier, slower methods to compute the trajectories. We don’t view this as a negative – it simply provides options. Usually one would choose trial-averaging, but one does not have to. In the case of MC_RTT, one has a choice between LFADS and grouping into pseudo-conditions and averaging (which is fast). LFADS produces higher performance at the cost of being slower. The operator can choose which they prefer. This is discussed in the following section:
  
  “For MINT, ‘training’ simply means computation of standard quantities (e.g. firing rates) rather than parameter optimization. MINT is thus typically very fast to train (Table 1), on the order of seconds using generic hardware (no GPUs). This speed reflects the simple operations involved in constructing the library of neural-state trajectories: filtering of spikes and averaging across trials. At the same time we stress that MINT is a method for leveraging a trajectory library, not a method for constructing it. One may sometimes wish to use alternatives to trial-averaging, either of necessity or because they improve trajectory estimates. For example, for the MC_RTT task we used AutoLFADS to infer the library. Training was consequently much slower (hours rather than seconds) because of the time taken to estimate rates. Training time could be reduced back to seconds using a different approach – grouping into pseudo-conditions and averaging – but performance was reduced. Thus, training will typically be very fast, but one may choose time-consuming methods when appropriate.”
  
  Comment 5. I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract.
  
  As discussed above, the manuscript has been revised to clarify that our statement was accurate.
  
  Comment 6. In addition to the above technical concerns, I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics (e.g., fixed points/limit cycles). While it is of course valid and even insightful to propose different assumptions from existing models as the authors do here, they do not actually translate these assumptions into a new model. Without a model and by just tabulating the data, I don't believe we can provide interpretation or advance the understanding of the fundamentals behind neural computations. As such, I am not clear as to how this library building approach can advance neuroscience or how these assumptions are useful. I think the authors should clarify and discuss this point.
  
  As requested, a major goal of the revision has been to clarify the scientific motivations underlying MINT’s design. In addition to many textual changes, we have added figures (Figures 1a,b and 6a,b) to outline the two competing scientific perspectives that presently exist. This topic is also addressed by extensions of existing analyses and by new analyses (e.g. Figure 6c-g).
  
  In our view these additions have dramatically improved the manuscript. This is especially true because we think R3’s concerns, expressed above, are reasonable. If the perspective in Figure 1a is correct, then R3 is right and MINT is essentially a hack that fails to model the data. MINT would still be effective in many circumstances (as we show), but it would be unprincipled. This would create limitations, just as the reviewer argues. On the other hand, if the perspective in Figure 1b is correct, then MINT is quite principled relative to traditional approaches. Traditional approaches make assumptions (a fixed subspace, consistent neuron-kinematic correlations) that are not correct under Figure 1b.
  
  We don’t expect R3 to agree with our scientific perspective at this time (though we hope to eventually convince them). To us, the key is that we agree with R3 that the manuscript needs to lay out the different perspectives and their implications, so that readers have a good sense of the possibilities they should be considering. The revised manuscript is greatly improved in this regard.
  
  Comment 7. Related to 6, there seems to be a logical inconsistency between the operations of MINT and one of its three assumptions, namely, sparsity. The authors state that neural states are sparsely distributed in some neural dimensions (Figure 1a, bottom). If this is the case, then why does MINT extend its decoding scope by interpolating known neural states (and behavior) in the training library? This interpolation suggests that the neural states are dense on the manifold rather than sparse, thus being contradictory to the assumption made. If interpolation-based dense meshes/manifolds underlie the data, then why not model the neural states through the subspace or manifold representations? I think the authors should address this logical inconsistency in MINT, especially since this sparsity assumption also questions the low-dimensional subspace/manifold assumption that is commonly made.
  
  We agree this is an important issue, and have added an analysis on this topic (Figure 4d). The key question is simple and empirical: during decoding, does interpolation cause MINT to violate the assumption of sparsity? R3 is quite right that in principle it could. If spiking observations argue for it, MINT’s interpolation could create a dense manifold during decoding rather than a sparse one. The short answer is that empirically this does not happen, in agreement with expectations under Figure 1b. Rather than interpolating between distant states and filling in large ‘voids’, interpolation is consistently local. This is a feature of the data, not of the decoder (MINT doesn’t insist upon sparsity, even though it is designed to work best in situations where the manifold is sparse).
  
  In addition to adding Figure 4d, we added the following (in an earlier section):
  
  “The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations. Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I appreciate the detailed methods section, however, more specifics should be integrated into the main text. For example on Line 238, it should additionally be stated how many minutes were used for training and metrics like the MAE which is used later should be reported here.
  
  Thank you for this suggestion. We now report the duration of training data in the main text:
  
  “Decoding R^2 was .968 over ~7.1 minutes of test trials based on ~4.4 minutes of training data.”
  
  We have also added similar specifics throughout the manuscript, e.g. in the Fig. 5 legend:
  
  “Results are based on the following numbers of training / test trials: MC\_Cycle (174 train, 99 test), MC\_Maze (1721 train, 574 test), Area2\_Bump (272 train, 92 test), MC\_RTT (810 train, 268 test).”
  
  Similar additions were made to the legends for Fig. 6 and 8. Regarding the request to add MAE for the multitask network, we did not do so for the simple reason that the decoded variable (muscle activity) has arbitrary units. The raw MAE is thus not meaningful. We could of course have normalized, but at this point the MAE is largely redundant with the correlation. In contrast, the MAE is useful when comparing across the MC_Maze, Area2_Bump, and MC_RTT datasets, because they all involve the same scale (cm/s).
  
  Regarding the MC_RTT task, AutoLFADS was used to obtain robust spike rates, as reported in the methods. However, the rationale for splitting the neural trajectories after AutoLFADS is unclear. If the trajectories were split based on random recording gaps, this might lead to suboptimal performance? It might be advantageous to split them based on a common behavioural state?
  
  When learning neural trajectories via AutoLFADS, spiking data is broken into short (but overlapping) segments, rates are estimated for each segment via AutoLFADs, and these rates are then stitched together across segments into long neural trajectories. If there had been no recording gaps, these rates could have been stitched into a single neural trajectory for this dataset. However, the presence of recording gaps left us no choice but to stitch together these rates into more than one trajectory. Fortunately, recording gaps were rare: for the decoding analysis of MC_RTT there were only two recording gaps and therefore three neural trajectories, each ~2.7 minutes in duration.
  
  We agree that in general it is desirable to learn neural trajectories that begin and end at behaviorallyrelevant moments (e.g. in between movements). However, having these trajectories potentially end midmovement is not an issue in and of itself. During decoding, MINT is never stuck on a trajectory. Thus, if MINT were decoding states near the end of a trajectory that was cut short due to a training gap, it would simply begin decoding states from other trajectories or elsewhere along the same trajectory in subsequent moments. We could have further trimmed the three neural trajectories to begin and end at behaviorallyrelevant moments, but chose not to as this would have only removed a handful of potentially useful states from the library.
  
  We now describe this in the Methods:
  
  “Although one might prefer trajectory boundaries to begin and end at behaviorally relevant moments (e.g. a stationary state), rather than at recording gaps, the exact boundary points are unlikely to be consequential for trajectories of this length that span multiple movements. If MINT estimates a state near the end of a long trajectory, its estimate will simply jump to another likely state on a different trajectory (or earlier along the same trajectory) in subsequent moments. Clipping the end of each trajectory to an earlier behaviorally-relevant moment would only remove potentially useful states from the libraries.”
  
  Are the training and execution times in Table 1 based on pure Matlab functions or Mex files? If it's Mex files as suggested by the code, it would be good to mention this in the Table caption.
  
  They are based on a combination of MATLAB and MEX files. This is now clarified in the table caption:
  
  “Timing measurements taken on a Macbook Pro (on CPU) with 32GB RAM and a 2.3 GHz 8-Core Intel Core i9 processor. Training and execution code used for measurements was written in MATLAB (with the core recursion implemented as a MEX file).”
  
  As the method most closely resembles a Bayesian decoder it would be good to compare performance against a Naive Bayes decoder.
  
  We agree and have now done so. The following has been added to the text:
  
  “A natural question is thus whether a simpler Bayesian decoder would have yielded similar results. We explored this possibility by testing a Naïve Bayes regression decoder [85] using the MC_Maze dataset. This decoder performed poorly, especially when decoding velocity (R2 = .688 and .093 for hand position and velocity, respectively), indicating that the specific modeling assumptions that differentiate MINT from a naive Bayesian decoder are important drivers of MINT’s performance.”
  
  Line 199 Typo: The assumption of stereotypy trajectory also enables neural states (and decoded behaviors) to be updated in between time bins.
  
  Fixed
  
  Table 3: It's unclear why the Gaussian binning varies significantly across different datasets. Could the authors explain why this is the case and what its implications might be?
  
  We have added the following description in the “Filtering, extracting, and warping data on each trial” subsection of the Methods to discuss how 𝜎 may vary due to the number of trials available for training and how noisy the neural data for those trials is:
  
  “First, spiking activity for each neuron on each trial was temporally filtered with a Gaussian to yield single-trial rates. Table 3 reports the Gaussian standard deviations σ (in milliseconds) used for each dataset. Larger values of σ utilize broader windows of spiking activity when estimating rates and therefore reduce variability in those rate estimates. However, large σ values also yield neural trajectories with less fine-grained temporal structure. Thus, the optimal σ for a dataset depends on how variable the rate estimates otherwise are.”
  
  An implementation of the method in an open-source programming language could further enhance the widespread use of the tool.
  
  We agree this would be useful, but have yet not implemented the method in any other programming languages. Implementation in Python is still a future goal.
  
  Reviewer #2 (Recommendations For The Authors):
  
  - Figures 4 and 5 should show the error bars on the horizontal axis rather than portraying them vertically.
  
  [Note that these are now Figures 5 and 6]
  
  The figure legend of Figure 5 now clarifies that the vertical ticks are simply to aid visibility when symbols have very similar means and thus overlap visually. We don’t include error bars (for this analysis) because they are very small and would mostly be smaller than the symbol sizes. Instead, to indicate certainty regarding MINT’s performance measurements, the revised text now gives error ranges for the correlations and MAE values in the context of Figure 4c. These error ranges were computed as the standard deviation of the sampling distribution (computed via resampling of trials) and are thus equivalent to SEMs. The error ranges are all very small; e.g. for the MC_Maze dataset the MAE for x-velocity is 4.5 +/- 0.1 cm/s. (error bars on the correlations are smaller still).
  
  Thus, for a given dataset, we can be quite certain of how well MINT performs (within ~2% in the above case). This is reassuring, but we also don’t want to overemphasize this accuracy. The main sources of variability one should be concerned about are: 1) different methods can perform differentially well for different brain areas and tasks, 2) methods can decode some behavioral variables better than others, and 3) performance depends on factors like neuron-count and the number of training trials, in ways that can differ across decode methods. For this reason, the study examines multiple datasets, across tasks and brain areas, and measures performance for a range of decoded variables. We also examine the impact of training-set-size (Figure 8a) and population size (solid traces in Fig. 8b, see R2’s next comment below).
  
  There is one other source of variance one might be concerned about, but it is specific to the neuralnetwork approaches: different weight initializations might result in different performance. For this reason, each neural-network approach was trained ten times, with the average performance computed. The variability around this average was very small, and this is now stated in the Methods.
  
  “For the neural networks, the training/testing procedure was repeated 10 times with different random seeds. For most behavioral variables, there was very little variability in performance across repetitions. However, there were a few outliers for which variability was larger. Reported performance for each behavioral group is the average performance across the 10 repetitions to ensure results were not sensitive to any specific random initialization of each network.”
  
  - For Figure 6, it is unclear whether the neuron-dropping process was repeated multiple times. If not, it should be since the results will be sensitive to which particular subsets of neurons were "dropped". In this case, the results presented in Figure 6 should include error bars to describe the variability in the model performance for each decoder considered.
  
  A good point. The results in Figure 8 (previously Figure 6) were computed by averaging over the removal of different random subsets of neurons (50 subsets per neuron count), just as the reviewer requests. The figure has been modified to include the standard deviation of performance across these 50 subsets. The legend clarifies how this was done.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Other comments:
  
  (1) [Line 185-188] The authors argue that in a 100-dimensional space with 10 possible discretized values, 10^100 potential neural states need to be computed. But I am not clear on this. This argument seems to hold only in the absence of a model (as in MINT). For a model, e.g., Kalman filter or AutoLFADS, information is encoded in the latent state. For example, a simple Kalman filter for a linear model can be used for efficient inference. This 10^100 computation isn't a general problem but seems MINT-specific, please clarify.
  
  We agree this section was potentially confusing. It has been rewritten. We were simply attempting to illustrate why maximum likelihood computations are challenging without constraints. MINT simplifies this problem by adding constraints, which is why it can readily provide data likelihoods (and can do so using a Poisson model). The rewritten section is below:
  
  “Even with 1000 samples for each of the neural trajectories in Figure 3, there are only 4000 possible neural states for which log-likelihoods must be computed (in practice it is fewer still, see Methods). This is far fewer than if one were to naively consider all possible neural states in a typical rate- or factor-based subspace. It thus becomes tractable to compute log-likelihoods using a Poisson observation model. A Poisson observation model is usually considered desirable, yet can pose tractability challenges for methods that utilize a continuous model of neural states. For example, when using a Kalman filter, one is often restricted to assuming a Gaussian observation model to maintain computational tractability “
  
  (2) [Figure 6b] Why do the authors set the dropped neurons to zero in the "zeroed" results of the robustness analysis? Why not disregard the dropped neurons during the decoding process?
  
  We agree the terminology we had used in this section was confusing. We have altered the figure and rewritten the text. The following, now at the beginning of that section, addresses the reviewer’s query:
  
  “It is desirable for a decoder to be robust to the unexpected loss of the ability to detect spikes from some neurons. Such loss might occur while decoding, without being immediately detected. Additionally, one desires robustness to a known loss of neurons / recording channels. For example, there may have been channels that were active one morning but are no longer active that afternoon. At least in principle, MINT makes it very easy to handle this second situation: there is no need to retrain the decoder, one simply ignores the lost neurons when computing likelihoods. This is in contrast to nearly all other methods, which require retraining because the loss of one neuron alters the optimal parameters associated with every other neuron.”
  
  The figure has been relabeled accordingly; instead of the label ‘zeroed’, we use the label ‘undetected neuron loss’.
  
  (3) Authors should provide statistical significance on their results, which they already did for Fig. S3a,b,c but missing on some other figures/places.
  
  We have added error bars in some key places, including in the text when quantifying MINT’s performance in the context of Figure 4. Importantly, error bars are only as meaningful as the source of error they assess, and there are reasons to be careful given this. The standard method for putting error bars on performance is to resample trials, which is indeed what we now report. These error bars are very small. For example, when decoding horizontal velocity for the MC_Maze dataset, the correlation between MINT’s decode and the true velocity had a mean and SD of the sampling distribution of 0.963 +/- 0.001. This means that, for a given dataset and target variable, we have enough trials/data that we can be quite certain of how well MINT performs. However, we want to be careful not to overstate this certainty. What one really wants to know is how well MINT performs across a variety of datasets, brain areas, target variables, neuron counts, etc. It is for this reason that we make multiple such comparisons, which provides a more valuable view of performance variability.
  
  For Figure 7, error bars are unavailable. Because this was a benchmark, there was exactly one test-set that was never seen before. This is thus not something that could be resampled many times (that would have revealed the test data and thus invalidated the benchmark, not to mention that some of these methods take days to train). We could, in principle, have added resampling to Figure 5. In our view it would not be helpful and could be misleading for the reasons noted above. If we computed standard errors using different train/test partitions, they would be very tight (mostly smaller than the symbol sizes), which would give the impression that one can be quite certain of a given R^2 value. Yet variability in the train/test partition is not the variability one is concerned about in practice. In practice, one is concerned about whether one would get a similar R^2 for a different dataset, or brain area, or task, or choice of decoded variable. Our analysis thus concentrated on showing results across a broad range of situations. In our view this is a far more relevant way of illustrating the degree of meaningful variability (which is quite large) than resampling, which produces reassuringly small but (mostly) irrelevant standard errors.
  
  Error bars are supplied in Figure 8b. These error bars give a sense of variability across re-samplings of the neural population. While this is not typically the source of variability one is most concerned about, for this analysis it becomes appropriate to show resampling-based standard errors because a natural concern is that results may depend on which neurons were dropped. So here it is both straightforward, and desirable, to compute standard errors. (The fact that MINT and the Wiener filter can be retrained many times swiftly was also key – this isn’t true of the more expressive methods). Figure S1 also uses resampling-based confidence intervals for similar reasons.
  
  (4) [Line 431-437] Authors state that MINT outperforms other methods with the PSTH R^2 metric (trial-averaged smoothed spikes for each condition). However, I think this measure may not provide a fair comparison and is confounded because MINT's library is built using PSTH (i.e., averaged firing rate) but other methods do not use the PSTH. The author should clarify this.
  
  The PSTH R^2 metric was not created by us; it was part of the Neural Latents Benchmark. They chose it because it ensures that a method cannot ‘cheat’ (on the Bits/Spike measure) by reproducing fine features of spiking while estimating rates badly. We agree with the reviewer’s point: MINT’s design does give it a potential advantage in this particular performance metric. This isn’t a confound though, just a feature. Importantly, MINT will score well on this metric only if MINT’s neural state estimate is accurate (including accuracy in time). Without accurate estimation of the neural state at each time, it wouldn’t matter that the library trajectory is based on PSTHs. This is now explicitly stated:
  
  “This is in some ways unsurprising: MINT estimates neural states that tend to resemble (at least locally) trajectories ‘built’ from training-set-derived rates, which presumably resemble test-set rates. Yet strong performance is not a trivial consequence of MINT’s design. MINT does not ‘select’ whole library trajectories; PSTH R2 will be high only if condition (c), index (k), and the interpolation parameter (α) are accurately estimated for most moments.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.05.535396v3
www.biorxiv.org www.biorxiv.org

mitoBKCa is functionally expressed in murine and human breast cancer cells and potentially contributes to metabolic reprogramming

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the current reviews.
  
  Our answer to the final point(s) raised is as follows:
  
  "We thank the reviewer for the comment. We checked our datasets accordingly. Typically, the n of cells showed deviations of maximally 20% from experiment to experiment (e.g. 16-24 cells per experiment). Additionally, experiments were performed using different passages of the cells. Moreover, data were validated at different time-points during the study using newly thawed cell lines."
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Bischoff et al present a carefully prepared study on a very interesting and relevant topic: the role of ion channels (here a Ca2+-activated K+ channel BK) in regulating mitochondrial metabolism in breast cancer cells. The potential impact of these and similar observations made in other tumor entities has only begun to be appreciated. That being said, the authors pursue in my view an innovative approach to understanding breast cancer cell metabolism. Considering the following points would further strengthen the manuscript:
  
  We thank reviewer #1 for the overall positive feedback on our study.
  
  Methods:
  
  (1) The authors use an extracellular Ca2+ concentration (2 mM) in their Ringer's solutions that is almost twice as high as the physiologically free Ca2+ concentration (ln 473). Moreover, the free Ca2+ concentration of their pipette solution is not indicated (ln 487).
  
  Indeed, we utilized 2 mM of Ca2+ in the physiologic live-cell imaging buffer. This concentration could actually be a little lower than the total Ca2+ concentration (ranging usually from 2.2 to 2.6 mM) in the body, while the free Ca2+ concentration is typically half as high. Nevertheless, we find multiple studies different from ours, which utilized 2 mM for their live-cell-based experiments. Please check the following studies, which represent only a small selection:
  
  https://doi.org/10.1038/s41598-019-49070-8
  
  https://doi.org/10.1016/j.bpj.2020.08.045
  
  https://doi.org/10.1016/j.redox.2022.102319
  
  However, to ensure that the applied conditions are physiologically relevant, we reperformed experiments using MMTV-PyMT WT and MMTV-PyMT BK-KO cells and compared cytosolic Ca2+ concentrations over time in response to cell stimulation with ATP, either in the presence of 1.0 mM (Author response image 1A) or 2.0 mM extracellular Ca2+ (Author response image1B). The respective graphs are attached in the following for reviewer’s inspection. As expected, we find that the intracellular Ca2+ concentration in MMTV-PyMT WT and BK-KO cells was dependent on the extracellular Ca2+ concentration. Importantly, however, irrespective of the exact Ca2+ concentration applied, we observed a similar difference in basal cytosolic Ca2+ between MMTV-PyMT WT and BK-KO cells (Author response image1C).
  
  Author response image 1.
  
  Cytosolic Ca2+ concentrations over-time in the presence of 1.mM or 2.0 mM extracellular Ca2+.
  
  Concerning the Ca2+ concentration in the patch-pipette – we are very glad that you uncovered an error in our description and apologize for the mistake. Actually, the information the reviewer is referring to was already given in the previous version of the manuscript, but unclear because a comma was shifted (see line 487 in the originally submitted manuscript). The Ca2+ concentration of the patch-pipette was 0.1 mM in the presence of 0.6 mM EGTA, which should (according to Ca-EGTA calculator, https://somapp.ucdmc.ucdavis.edu/pharmacology/bers/maxchelator/CaEGTA-NIST.htm) be equivalent to ~30 nM of free Ca2+ in the patch pipette. We corrected the mistake in the manuscript and thank the reviewer again for spotting this inaccuracy.
  
  (2) Ca2+I measurements: The authors use ATP to elicit intracellular Ca2+ signals. Is this then a physiological stimulus for Ca2+ signaling in breast cancer? What is the rationale for using ATP? Moreover, it would be nice to see calibrated baseline values of Ca2+i.
  
  We thank the reviewer for the comment and suggestion. Importantly, it was demonstrated recently, that all of the utilized cell lines respond to treatment with extracellular ATP with a prominent increase in Ca2+I, most probably indicating the expression of purinergic receptors, which was a prerequisite to observe ATP induced changes in [Ca2+]i.
  
  https://doi.org/10.1038/s41419-022-05329-z,
  
  https://doi.org/10.1093/carcin/bgt493
  
  https://doi.org/10.1038/s41598-018-26459-5
  
  Furthermore, ATP plays a crucial role in the tumor microenvironment, where high rates of cell death occur. Hence, ATP is of pathophysiologic relevance for the utilized cancer cell lines.
  
  https://doi.org/10.1038/s41568-018-0037-0
  
  https://doi.org/10.3390/cells9112496
  
  https://doi.org/10.1002/jcp.30580
  
  Following the suggestions by Reviewer #1 (and #2), we included calibrations of Ca2+cyto and Ca2+mito in the manuscript, by depleting the intracellular Ca2+ stores using Ionomycin in the absence of extracellular Ca2+ (EGTA) to validate the basal difference in Ca2+cyto and Ca2+mito. Additionally, Ca2+cyto was calibrated under basal and inhibitor treated conditions, and values in nM are given in the text (p. 5, lines 185-190, 193-195 and 199-200, in the tracked changes version of the MS). The new data can be found in new Figure S2F – Figure S2J and new Figure S2R – Figure S2V. Moreover, we calculated basal [Ca2+]cyto in the different BKCa pro- and deficient cell lines and under inhibitor treated conditions. We additionally added information about the pathophysiologic relevance of ATP in the tumor microenvironment in lines 175-178 in the tracked changes version of the manuscript.
  
  (3) Membrane potential measurements: It would be nice to see a calibration of the potential measurements; this would allow us to correlate the IV relationship with membrane potential. Without calibration, it is hard to compare unless the identical uptake of the dye is shown. Does paxilline or IbTx also induce depolarization?
  
  We thank the reviewer for the suggestion. Indeed, membrane potential calibrations/ measurements using the membrane potential sensitive dye Dibac4(3) would be interesting, however, technically hardly feasible. The reason is that the principle of the dye is based on different uptake in response to differences in membrane potential, and not ratiometric as for most other dyes/ sensors used. Considering this limitation, we decided to perform membrane potential measurements by patch-clamp analysis. Additionally, we performed these experiments upon inhibition of PM-located BKCa by IBTX. Current-clamp experiments confirmed the difference in basal membrane potential between MMTV-PyMT WT and BK-KO cells (consult new Figure S1C and lines 127-130 in the tracked changes version of the manuscript). Interestingly, IBTX treatment depolarized the PM potential to the BK-KO cell level, which validates that BK activity and PM potential are connected. In addition to this approach, we utilized our recently developed genetically encoded K+ sensors revealing basal differences in [K+]cyto between MMTV-PyMT WT and BK-KO cells. Also this difference between both genotypes was equalized by IBTX as the respective treatment increased [K+]cyto only in WT cells, which most likely explains the cause of PM depolarization (consult lines 130-135 in the tracked changes version of the manuscript and new Figure S1D and Figure S1E).
  
  (4) Mito-potential measurements: Why did the authors use such a long time course and preincubate cells with channel blockers overnight? Why did they not perform paired experiments and record the immediate effect of the BK channel blockers in the mito potential?
  
  We thank the reviewer for the suggestion. We performed TMRM-based experiments with MMTV-PyMT WT cells in response to short-term exposure to paxilline, which did not significantly affect the mitochondrial membrane potential, at least within 15 minutes of treatment (Author response image 2). This indicates, that further downstream processes subsequent to (mito) BKCa inhibition affect the mitochondrial membrane potential(MMP), most probably including remodeling processes of the respiratory chain, mitochondrial ion homeostasis or glycolytic activity, ultimately also delivering reduction equivalents to mitochondria. Our final goal was to validate potential differences between a BKCa pro-and deficient cell model, whereby the latter cells lacked the BKCa channel since its origination. Hence, “long-term” (~12h) BKCa inhibition as performed in our experiments rather reflects the BK-KO cell situation. Taken together with the new experiment (Author response image 2), we can now state that the effect of BK inhibition on the MMP is at least not the consequence of an acute (within minutes) channel blockade.
  
  Author response image 2.
  
  Mitochondrial membrane potential, as measured using TMRM, in response to acute short-term administration of 5µM paxilline, followed by mitochondrial depolarization using FCCP.
  
  (5) MTT assays are also based on mitochondrial function - since modulation of mito function is at the core of this manuscript, an alternative method should be used.
  
  We thank the reviewer for the important comment. We performed additional, immunofluorescence-based experiments using Ki-67 staining to assess cell proliferation rates. The newly added data can be found in the text, lines 409-412 in the tracked changes version of the manuscript and new Figure S6D-F. The results obtained confirm the MTTbased results (Fig.6H-I).
  
  Results:
  
  (1) Fig. 5G: The number of BK "positive" mitoplasts is surprisingly low - how does this affect the interpretation? Did the authors attempt to record mitoBK current in the "whole-mitoplast" mode? How does the mitoBK current density compare with that of the plasma membrane? Is it possible to theoretically predict the number of mitoBK channels per mitochondrion to elicit the observed effects? Can these results be correlated with the immuno-localization of mitoBK channels?
  
  Indeed, the number of BKCa-positive mitoplasts appears low on a first view. However, as these experiments were performed in a mitoplast-attached mode, it is important to keep in mind that only a very small area of the actual mitoplast is investigated with each patch. If no channel was detected in such region, the patch was depicted as “empty”, as presented in Fig.5G, which does, however, not mean that the entire mitochondria was actually BKCa negative. Hence, the density of BKCa in the IMM might be higher than expected from our experiments. Nevertheless, already earlier results using glioblastoma cell lines – considered to be one of the cell lines mostly enriched in mitoBKCa – demonstrated a quite low density of BKCa β4 regulatory subunit in mitochondria – please see figure 2B in the following paper: 10.1371/journal.pone.0068125 – which (based on 1:1 stoichiometry of α and β subunits) also suggests that the density of the alpha subunit of BKCa might be low in this compartment.
  
  Author response image 3.
  
  Author response image 3: Schematic representation of mitoplast attached patch-clamp experiments
  
  Theoretically, density predictions of mitoBK compared to PM localized BKCa would be possible if whole-mitoplast experiments were performed, however, we are unsure what added value this information would actually burst, allowing the pharmacologic modulation of structures originally located within the mitochondrial matrix. Please also consult Author response image 3. According to the most recent models, even if there are other views on this, mitoBKCa is oriented in a way, that the C-terminus with its Ca2+ binding bowl is located within the mitochondrial matrix. Hence, to allow Ca2+ sensitivity experiments of the channel, broken up (by swelling) mitoplasts are required to make the Ca2+ binding bowl accessible for Ca2+ manipulations in the bath solution. This approach does not allow us to compare the channel density to that of the PM.
  
  Finally, to the best of our knowledge, a combination of immunofluorescence with mitoplast patch-clamp experiments is not feasible yet, and would probably be impossible due to the low density of the mitoBKCa as well as the lack of highly sensitive and specific antibodies.
  
  (2) There are also reports about other mitoK channels (e.g. Kv1.3, KCa3.1, KATP) playing an important role in mitochondrial function. Did the authors observe them, too? Can the authors speculate on the relative importance of the different channels? Is it known whether they are expressed organ-/tumor-specifically?
  
  Author response image 4.
  
  Representative single channels different to mitoBKCa detected in MDAMB-453 mitoplasts.
  
  The reviewer is right, other K+ channels have been found in mitochondria and these also play a role in tumor cells. This is also consistent with our data (Fig.5G), where we observed other channels in the mitoplasts of BCCs as well. These all four cell lines tested. According to their conductance and our expectations from literature, these channels may e.g. include mitoIKCa, mitoSKCa, mitoKATP orothers (10.1146/annurev-biophys-092622-094853). As we focused, however, on patches containing a mitoBKCa, we did not further pharmacologically characterize these channels. Two examples of channels we found in these mitoplasts besides BKCa are presented for reviewers’ inspection (Author response image 4). As our manuscript focusses on mitoBKCa, we did not further classify these channels in smaller subgroups according to their conductance, as we feel that a differentiation between BKCa (~210 pS), and channels showing a conductance ≤150pS, or a conductance ≤100 pS is sufficient. Furthermore, this additional information would dilute our story too much making it difficult for the (non-specialist) reader to follow the red thread of the study. We added respective information in the manuscript, however. Please consult lines 365-366 in the tracked changes version of the manuscript.
  
  Reviewer #1 is right, the observed the different K+ channels might of course be organ- or tumor-specific. For example, it has been reported that the expression of K+ channels is different in various cancer cell (lines) (https://doi.org/10.2174/13816128113199990032, 10.1016/j.pharmthera.2021.107874, 10.1038/nrc3635), a fact, which also according to our study might be exploited for pharmacological manipulation, aiming to affect proliferation/apoptosis of cancer cells. Further, a recently published single-cell and spatially resolved atlas of human breast cancer implies that the expression of different K+ channels (such as mitoIKCa, mitoSKCa, mitoKATP) might even differ between cancer- and non-cancer cells within a single tumour (https://doi.org/10.1038/s41588-021-00911-1).
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The large-conductance Ca2+ activated K+ channel (BK) has been reported to promote breast cancer progression, but it is not clear how. The present study carried out in breast cancer cell lines, concludes that BK located in mitochondria reprograms cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer.
  
  Strengths:
  
  The use of a wide array of modern complementary techniques, including metabolic imaging, respirometry, metabolomics, and electrophysiology. On the whole, experiments are astute and well-designed and appear carefully done. The use of BK knock-out cells to control for the specificity of the pharmacological tools is a major strength. The manuscript is clearly written.
  
  There are many interesting original observations that may give birth to new studies.
  
  Weaknesses:
  
  The main conclusion regarding the role of a BK channel located in mitochondria appears is not sufficiently supported. Other perfectible aspects are the interpretation of co-localization experiments and the calibration of Ca2+ dyes. These points are discussed in more detail in the following paragraphs:
  
  We thank reviewer #2 for the thorough assessment of our study.
  
  (1) May the metabolic effects be ascribed to a BK located in mitochondria? Unfortunately not, at least with the available evidence. While it is clear these cells have a BK in mitochondria (characteristic K+ currents detected in mitoplasts) and it is also well substantiated that the metabolic effects in intact cells are explained by an intracellular BK (paxilline effects absent in the BK KO), it does not follow that both observations are linked. Given that ectopic BKDEC appeared at the surface, a confounding factor is the likely expression of BK in other intracellular locations such as ER, Golgi, endosomes, etc. To their credit, authors acknowledge this limitation several times throughout the text ("...presumably mitoBK...") but not in other important places, particularly in the title and abstract.
  
  We thank the reviewer for this important comment and amended the title and abstract, respectively. The title of the manuscript was changed to “mitoBKCa is functionally expressed in murine and human breast cancer cells and potentially contributes to metabolic reprogramming.” Additionally, we changed appropriate passages in the text, to emphasize that mitoBKCa potentially mediates the metabolic reprogramming, but other intracellular channels could also contribute to these processes.
  
  (2) MitoBK subcellular location. Pearson correlations of 0.6 and about zero were obtained between the locations of mitoGREEN on one side, and mRFP or RFP-GPI on the other (Figs. 1G and S1E). These are nice positive and negative controls. For BK-DECRFP however, the Pearson correlation was about 0.2. What is the Z resolution of apotome imaging? Assuming an optimum optical section of 600 nm, as obtained by a 1.4 NA objective with a confocal, that mitochondria are typically 100 nm in diameter and that BK-DECRFP appears to stain more structures than mitoGREEN, the positive correlation of 0.2 may not reflect colocalization. For instance, it could be that BK-DECRFP is not just in mitochondria but in a close underlying organelle e.g. the ER. Along the same line, why did BK-RFP also give a positive Pearson? Isn´t that unexpected? Considering that BK-DEC was found by patch clamping at the plasma membrane, the subcellular targeting of the channel is suspect. Could it be that the endogenous BK-DEC does actually reside exclusively in mitochondria (a true mitoBK), but overflows to other membranes upon overexpression? Regarding immunodetection of BK in the mitochondrial Percoll preparation (Fig. S5), the absence of NKA demonstrates the absence of plasma membrane contamination but does not inform about contamination by other intracellular membranes.
  
  Indeed, it seems that BKCa-DEC is not an exclusive mitoBKCa, at least not upon (over-/)expression in MCF-7 cells. It is known from literature, that mitochondrial K+ channels are encoded by the nuclear genome, as no obvious gene for a K+ channel is found in the mitochondrial genome. Channel proteins are synthetized by cytosolic ribosomes and likely translocated into mitochondria via the TOM/TIM system. Although some K+ channels possess a mitochondrial targeting sequence at the N-terminus, their import is mostly far from a general mechanism, and this seems also to be true for BK channels. In the case of the K+ channel Kv1.3, an even more complex scenario is hypothesized, as the channel located in the PM could be transferred to mitochondria via mitochondria-associated membranes (MAM) structures of the ER (https://doi.org/10.3390/ijms20030734). Yet, the detailed mechanism for BK shuttling to mitochondria is not fully understood. Possibly, overflow is exactly what is happening, due to very high levels of BK-DEC expression upon transfection. However, that the channel translocates to the IMM upon transfection is not surprising and was also demonstrated for other cell models including HEK293 – see e.g. 10.1038/s41598-021-904653. Unfortunately, transfection efficiency of MCF-7 is quite low compared to HEK293 – hence, quantitative statements from mito-patches upon transfection are difficult.
  
  In order to ensure that the mitochondrial colocalization is not a matter of poor microscope resolution, we reperformed these experiments using confocal imaging on a Zeiss LSM980 with an Airyscan 2 detector, yielding z resolutions of ~ 450 nm. These experiments confirmed the increased colocalization of BKCa-DEC with mitochondria compared to BKCa lacking the DEC exon. Furthermore, this imaging at higher resolution demonstrated, that, unfortunately, colocalization might not be the best analysis, as especially fragmented mitochondria showed a clear MitoGREEN stained matrix, surrounded by red fluorescence derived from BKCaDECRFP present in the IMM (revised Fig. 1G).
  
  To validate the results derived from immunoblotting, we additionally stained the membranes for TMX1, a marker for the ER membrane. This analysis confirmed the high purity of the mitochondrial isolation without ER-membrane contamination after percoll purification, and hence validated the presence of BKCa in the mitochondrial membrane (revised Fig. S5D). The additional information can be found in lines 156-159 in the tracked changes version of the manuscript.
  
  (3) Calibration of fluorescent probes. The conclusion that BK blockers or BK expression affects resting Ca2+ levels should be better supported. Fluorescent sensors and dyes provide signals or ratios that need to be calibrated if comparisons between different cell types or experimental conditions are to be made. This is implicitly acknowledged here when monitoring ER Ca2+, with an elaborate protocol to deplete the organelle in order to achieve a reading at zero Ca2+.
  
  We thank the reviewer for the important comment. Please note that at no point in the manuscript we aim to compare different cell lines concerning their intracellular Ca2+ concentration, but we only compare the same cell lines after the different treatments, as we are aware of this limitation of fluorescent probes. However, to validate the differences in intracellular Ca2+ concentrations, we calibrated the signals derived from Fura-2 and 4mtD3cpV using ionomycin in combination with cellular Ca2+ depletion/ saturation. The newly added data can be found in the text, lines 185-190, 192-195, 199-200, and 228-230 in the tracked changes version of the manuscript, as well as new Figure S2F – Figure S2J and new Figure S2R – Figure S2V
  
  Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).
  
  Author response image 5.
  
  Dot blot for data shown in Figure 2I.
  
  The reviewer is right, it seems that BKCaRFP may also affect [Ca2+]mito. However, the effect is not significant and shows a p-value of p>0.999 using Kruskal-Wallistest followed by Dunn’s multiple comparison test, due to the non-normally distributed nature of the data. p=0.0002 for ctrl vs. BKCa-DECRFP and 0.0022 for BKCaRFP vs. BKCa-DECRFP, however. We added a scatter dot-blot of the respective data as Author response image 5 for reviewer’s inspection. Additionally, first, even using a more stringent statistical test by only comparing ctrl vs BKCaRFP using Mann-Whitney test, the results are not significant, as the p-value was determined at 0.4467, and second, we performed the requested Ca2+calibration using ionomycin under these conditions, which confirmed the difference between ctrl cells and BKCa-DECRFP expressing cells, but not BKCaRFP expressing ones. Please see Figure S2V.
  
  Reviewer #3 (Public Review):
  
  The original research article, titled "mitoBKCa is functionally expressed in murine and human breast cancer cells and promotes metabolic reprogramming" by Bischof et al, has demonstrated the underlying molecular mechanisms of alterations in the function of Ca2+ activated K+ channel of large conductance (BKCa) in the development and progression of breast cancer. The authors also proposed that targeting mitoBKCa in combination with established anti-cancer approaches, could be considered as a novel treatment strategy in breast cancer treatment.
  
  The paper is clearly written, and the reported results are interesting.
  
  Strengths:
  
  Rigorous biophysical experimental proof in support of the hypothesis.
  
  Weaknesses:
  
  A combinatorial synergistic study is missing.
  
  We thank reviewer #3 for the positive summary of our study. Indeed, we propose that targeting of mitoBKCa in combination with established anti-cancer drugs may represent a novel anti-cancer treatment strategy. Unfortunately, we feel that the manuscript is very condensed already, and that adding respective required experiments and data to support this hypothesis will make the flow of the manuscript more complex or even incomprehensible. As no attempts linking mitoBKCa activity with anti-cancer therapies have been made so far, we removed the respective information from the abstract and only discuss this aspect.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Statistics: Legends have to contain information about the number of biological replicates (N) and cells analysed (n). Statistics must be calculated with the averages of the replicates.
  
  Author response image 6.
  
  Representative single cell responses of Fura-2 loaded MMTV-PyMT WT cells.
  
  We thank the reviewer for the comment and added the missing details to all figure legends.
  
  We feel that using each cell represents exactly the power of high-resolution live-cell imaging, as there is no better biological replicate than a single separated cell, which is observed by fluorescence microscopy. This analysis is also able to visualize cell-to-cell differences in the microscopy area, similarly to patch-clamp experiments, where each single cell or mitoplast patched is used as a single replicate. Please find a representative dataset derived from fluorescence microscopy of different responses of neighboring single cells in Author response image 6.
  
  (2) Fig. 1G: This is a poor resolution figure, mostly because of its far too small size; in its current form it bears very little information.
  
  We agree with reviewer #1 and reperformed the imaging experiments using high resolution confocal imaging and exchanged the respective images. We feel that this increased the quality of the images significantly. Unfortunately, we were not able to increase the size of the images in the main figure, hence, we added magnifications of the respective images as new Figure S1I.
  
  (3) Fig. 1H: What do the dotted grey lines and the labels stand for?
  
  We believe Reviewer #1 is probably referring to Figure 1G. As indicated in the figure panel and in the text, the grey dotted lines and labels indicate the colocalization scores of mtRFP and RFP-GPI with MitoGREEN, respectively. These data are also shown in Figure S1H, including error bars and statistics. We added additional information in the text to make the meaning of the lines clearer to the reader. Please consult lines 149 – 150 in the tracked changes version of the manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) May the metabolic effects be ascribed to a BK located in mitochondria? Short of a way to tackle BK function and metabolism specifically in mitochondria, the conclusion may best be toned down to "intracellular BK". For the time being the term "mitoBK" appears too ambitious.
  
  We fell that you are right and that our previous overstatement requires adaptation as a clear (100%) attribution of the observed metabolic effects solely to mitoBKCa is not definitely possible. We have therefore amended all relevant passages in the entire MS accordingly.
  
  (2) MitoBK subcellular location. Please address the points raised in the Public Review.
  
  As stated above we addressed the point raised in the public review accordingly (please consult new Figure S1I and revised Figure 1G).
  
  (3) Calibration of fluorescent probes. Please provide calibrations for cytosolic and mitochondrial Ca2+, for example, the standard high Ca2+/ionophore/metabolic inhibition treatment to reach saturation followed by Ca2+ chelation to obtain zero Ca2+.
  
  We thank the reviewer for the comment. As you can see from our response to the public review, we performed the respective experiments, and datasets were added in the manuscript.
  
  (4) Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).
  
  Please consult our response to the (same) comment in the public review.
  
  (5) Line 228. The statement "Similar results were obtained in MDA-MB-453 cells" is confusing. As shown in Fig.3, pax reduced ECAR and OCR in MMTV-PyMT WT cells. As ibtx was without effect, it is suggested that intracellular BK support metabolism. However, the effect of pax on MDA cells was the opposite. Doesn´t this divergence speak against a universal role of intracellular BKs in promoting metabolism in BCCs? A similar point may be made regarding metabolomics, which showed no effects of pax on lactate and pyruvate in MMTV-PyMT WT cells but stimulation in MDA cells. Perhaps the word "promotes" in the title of the figure should be replaced by something more neutral like "affects" or "alters", as used elsewhere,
  
  We thank the reviewer for pointing out the overstatement regarding intracellular BK functions and changed the title of the figure as suggested.
  
  With regard to the experiments mentioned, we would like to point out the following aspects:
  
  First, the cell lines used strongly differ in their metabolic settings under basal conditions. While both, MMTV-PyMT and MDA-MB-453 cells seem to show similar basal ECAR levels (if BKCa was present), their OCR seems to differ strongly. MMTV-PyMT cells seem to show a basal OCR which is almost at the maximum already, while MDA-MB-453 cells possess a tremendous capacity in their OCR, as observed upon mitochondrial uncoupling using FCCP. Of note, both, ECAR and OCR are indirect metabolic measures. On the one hand, ECAR measures extracellular acidification, which is accomplished by H+ along with lactate secretion. However, lactate secretion is not the only process leading to extracellular acidification, and ECAR may hence measure a variety of H+ releasing processes, including processes of vesicle secretion. On the other hand, OCR is not directly linked to ATP production, as mitochondrial complex IV is consuming O2, ATP, however, is produced by mitochondrial complex V. This becomes even more evident when having a look on OCRs after FCCP treatment – under these conditions, the H+ gradient is destroyed and ATP synthase activity is reduced, OCR, however, increases to the maximum due to increased supply of mitochondrial complex IV with H+.
  
  Second, please note that the LC-MS-based metabolomics derive from a static single time point and not from an over-time “live” read-outs. Moreover, underlying dynamics of the parameters measured can not be assessed. Hence, as an example, increasing levels of pyruvate can e.g. indicate faster generation, or slower subsequent degradation/ metabolization. A clear in-depth statement about what is happening under basal and BKCa inhibitor treated conditions is hence not possible. The only conclusion possible to draw from these experiments is that paxilline treatment differentially affects metabolic pathways in these cells.
  
  Based on these limitations of both methods, we decided to perform our in-depth fluorescence microscopy-based analysis, which provided strong evidence for intracellular BKCa channels on mitochondrial ATP production. Despite opposing effects of BKCa inhibition on OCR in MMTV-PyMT WT and MDA-MB-453 cells, mitochondrial ATP production was reduced, if BKCa-DECRFP was expressed/ intracellular BKCa was functional.
  
  In line with these findings, mitoBKCa was recently described as an uncoupling protein, which could furthermore explain the differential effects of intracellular BKCa inhibition on OCR. https://doi.org/10.1038/s41598-021-90465-3
  
  Minor
  
  (6) Fig. 1C. Average fluorescence intensity in 6 experiments was about 20% higher in BK-KO cells relative to WT. Such a small difference is significant but should not be evident to the eye. The pictures selected for illustration appear to show a much larger difference and therefore may not be representative. If this is the case, please omit them. The same goes for the other representative pictures.
  
  Author response image 7.
  
  : Representative images at different brightnesses.
  
  Please note, that the analysis of the images was done in an unbiased way using a Fiji macro. After analysis, we chose representative images, which were closest to the average.
  
  Furthermore, we must kindly disagree with the reviewer as changes of 20% in fluorescence intensity are indeed evident to the eye (consult Author response image 7). This panels show the same image at different brightness levels with intensity differences of 20%. Hence, we feel, that all the images the reviewer was referring are representative for the values given.
  
  (7) Line 130. The definition of "recent" is of course relative, but 10 years?
  
  We are very glad that you have discovered this “inconsistency", and reworded the respective phrase accordingly.
  
  (8) Line 327. "conductivity" is the property of a medium, "conductance" is the property of a component, such as a channel.
  
  We thank the reviewer for the important comment. We revised the text accordingly.
  
  (9) Various figures. FRET sensor data are expressed as Ratio(FRET/CFP). This is unusual, typically it should be FRET ratio (YFP/CFP), FRET ratio(mTFP/Venus), etc. Please note that the FRET partners differ between sensors.
  
  We acknowledge the comment of the reviewer. It is correct that fluorescent proteins vary widely between the sensors (used). Please note, however, the following: The emission measured from these sensors actually represents FRET, as CFP but not YFP is directly excited. Hence, emission is FRET, not the “intrinsic” fluorescence of the YFP. This is getting more and more important to differentiate, as there are probes existing, which can also be “alternately” excited, i.e. CFP and YFP separately, which will then yield the YFP/CFP ratio (https://doi.org/10.1021/acssensors.8b01599). In case of only CFP excitation, we feel, that the term FRET/CFP is preferable over other labelings such as YFP/CFP.
  
  (10) BK-DEC makes BCCs cells less oxidative. However, BK-DEC was first described in cardiomyocytes, which are among the most oxidative cell types. It would be useful if authors could address this apparent contradiction in the Discussion Section.
  
  That is an exciting point that we addressed as follows in the revised MS:
  
  First, it is important to mention that cardiac myocytes do not show a metabolic Warburg setting and are – under physiologic conditions – maintained in a high O2 environment.
  
  Second, a recent study from our group addressed the question about the role of mitoBKCa in primary cardiac myocytes. Indeed, mitoBKCa was functionally expressed in these cells. Interestingly, under physiologic conditions, the channel did not alter (multiple) cell behaviours nor overall cardiac physiology in a mouse model. However, upon induction of ischemia/ reperfusion injury, a lack of BK increased cardiac susceptibility to cell death resulting in increased infarction size (https://doi.org/10.1161/CIRCULATIONAHA.117.028723). Hence, also in this cell model, BKCa only played a role under oxygen limited conditions/ conditions where mitochondria were not properly functioning. Thus, the results derived from cardiac myocytes support our recent findings in BCCs, as BKCa mediates BCC resistance to hypoxic stress/ makes BCCs more independent from oxidative metabolism.
  
  Parts of this discussion were included in the revised MS. Please consult lines 490-500 in the tracked changes version of the manuscript.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) The study is very well designed and most of the computational analyses were done rigorously.
  
  We highly appreciate the positive feedback by reviewer #3.
  
  (2) The authors should discuss the expression of BKCa in different subsets of breast cancer. Authors may also debate on the level of steroid receptors and BKCa expressions.
  
  We thank reviewer #3 for the important suggestion and added the requested information in the discussion, lines 445-447 and 450-454 in the tracked changes version of the manuscript.
  
  (3) In the discussion section, the authors mentioned that the MCF7 cell is the best model to study this hypothesis. Does it imply that triple-negative breast cancer cell lines express lower levels of BKCa? The authors should discuss this.
  
  We thank the reviewer for the interesting comment; we would like to point out that the ERα-positive MCF-7 cell line was used to study experimental overexpression of BKCa at an otherwise low baseline level. This does not imply that BKCa is expressed at lower levels in TNBC cell lines; in fact a recent study showed the opposite, i.e. overexpression of BKCa in TNBC patients (10.1186/s12885-020-07071-1). Consistent with our work, the authors conclude that the channel could even be a new strategy for development of a targeted therapy in TNBC. We also added this information in the discussion, lines 450-454 in the tracked changes version of the manuscript.
  
  (4) The authors propose that combinatorial targeting of mitoBKCa along with known breast cancer chemotherapeutics can open a new horizon in breast cancer treatment. However, the authors did not perform any experiment to show the synergistic effect as mentioned.
  
  As already stated in the public reviews, we feel that the manuscript is very condensed already, and that adding the respective experiments and data will make the flow of the study even more complex. For the moment, we removed all information and statements linking mitoBKCa with anti-cancer treatment strategies from the abstract and only discuss this aspect. We hope that the reviewer agrees with us that an extensive analysis of the functional mitoBKCa status in the context of established breast cancer therapies must be addressed by (our) future studies.
  
  Minor Comments:
  
  There are several typos and grammatical errors that need further attention and rephrasing.
  
  We thank the reviewer for the comment and revised the text accordingly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.02.560571v2
www.biorxiv.org www.biorxiv.org

KDM5 demethylases suppress R-loop-mediated “viral mimicry” and DNA damage in breast cancer cells

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  We thank the reviewers for their careful and positive assessment of our manuscript. Maybe our findings are best summarized in the model below, showing that KDM5 inhibition/loss mediates a viral mimicry and DNA damage response through the generation of R-loops in genomic repeats. This is a different mechanism from the more well studied double-stranded RNA-induced “viral mimicry” response. Our studies also suggest that KDM5 inhibition may have a larger therapeutic window than STING agonists, since KDM5 inhibition seemingly does not induce “viral mimicry” in normal breast epithelial cells.
  
  Author response image 1.
  
  Model of viral mimicry activation. De-repression of repetitive elements may trigger dsRNA formation, which activates the RIG-1/MDA5 pathway, as well as PKR. Alternatively, derepression of these elements may induce transcription replication conflicts (TRCs), resulting in R-loop formation. R-loops can lead to DNA damage, and/or activate the cGAS/STING pathway. Both the MAVS pathway and the cGAS/STING pathway converge to activate type I interferon (IFN) responses, resulting in decreased cell fitness and/or increased immunogenicity.
  
  We do agree with the assessment that the study would be strengthened by in vivo studies. However, there are 4 different isoforms of KDM5 (3 in females), and existing KDM5specific inhibitors do not have adequate PK/PD properties for in vivo studies. We would also like to note that most mouse studies have not been proven to accurately predict immunotherapy responses in patients. Future studies in ex vivo tumor models would strengthen the clinical relevance of these studies. In the interim, we have added some normal macrophage studies in Figure S5 and an example of studies in normal T-cells below. Such studies will also be important to ensure that future KDM5 inhibitors do not have adverse effects on the immune system. Here, we observe that KDM5 inhibition appears to have neutral or slightly reduced T cell viability with KDM5 inhibition (Author response image 2a). However, KDM5 inhibition also results in increased CD107a expression in T-cells, indicative of a more cytotoxic phenotype (Author response image 2b). These studies suggest that KDM5 inhibitors do not have significant adverse effects on T cells or macrophages (figure S5) in the normal immune environment.
  
  Author response image 2.
  
  KDM5 inhibition does not have significant adverse effects on T-cells. a) Fold change proliferation of T-cells from 2 different human donors (left and right panels on graph) activated with 0.25ug/ml CD3 and treated with the indicated concentrations of C48 or a positive control (CBLB) compared to vehicle controls. b. FACS plots and histograms of CD107a surface expression (x-axis) versus forward scatter (FSC, y-axis) of T-cells from 2 different humans donors activated with 0.25ug/ml or 0.5mug/ml CD3 and treated with the indicated concentrations of C48.
  
  Specific comments and answers to Reviewer #1:
  
  We have added some additional analysis of data from other breast cancer cell lines to strengthen our points (Figure S2f, Figure S3e, Figure S4g-h, k.) We have also uploaded all the data to Geo with the following accession numbers :
  
  GSE296387: H3K4me3 CUT-and-Tag data
  
  GSE296584: S9.6 CUT-and-Tag data
  
  GSE296974: RNA-sequencing data
  
  Responses to Reviewer #1 (Recommendations for the authors):
  
  (1) We have not conducted genomic studies comparing KDM5 expression to retroelement activation status in the tumor data sets but recognize that this is important for future studies. Again, there are several KDM5 isoforms and looking at repeat expression in these larger data sets is complex. We have added some data correlating KDM5 expression with ISG signatures in Figure S3j-l as well as in the graph below (Author response image 3). The correlation with ISG and AP signatures is modest, but strongest for KDM5B and C in breast cancer data sets, consistent with our disruption data for these 2 isoforms. As mentioned above, we do agree that future studies of KDM5s along with a broader analysis of other epigenetic modifying enzymes over repeats in various cancer types will shed light on the role of histone modifying enzymes in suppressing “viral mimicry” in tumors.
  
  Author response image 3.
  
  Correlation between gene expression and IFN gene set GSVA scores in breast cancer cell lines. a) Pearson correlation score between gene expression and IFN signature (ISG) gene set variation analysis (GSVA) scores in breast cancer cell lines as reported in DepMap. Higher ranks indicate an inverse correlation between expression of the individual gene and the expression of the ISG gene set. Correlation ranks for KDM5A, B and C are highlighted. b) as in a), but comparing gene expression to antigen presentation (AP) GSVA scores.
  
  (2) We apologize for the mislabeling in figure 2B – has been corrected in the revised version.
  
  (3) We agree that blocking the cGAS/STING pathway, only partially rescues the ISREGFP and HLA-A, B, C phenotype in HCC1428 cells. We have added data (Figure S2f) showing that this rescue is stronger in MCF7 cells. It is possible that the MDA5/MAVS pathway may also contribute to activation of the Type I interferon response. However, we have data that MAVS plays a minor (if any) role in this context, as MAVS KO minimally decreases C48-induced ISRE-GFP activity and HLA-A, B, C surface expression in HCC1428 cells (added Figure S2g).
  
  Furthermore, there is no significant increase in dsRNA observed (using J2 antibody as a readout in immunofluorescence experiments) with C48 treatment as compared to 5’-azacytidine treatment or ADAR K/O (data not included). However, we have not performed MAVS/PKR K/O experiments to completely rule out the involvement of the dsRNA sensing pathways.
  
  (4) These experiments were performed in the operetta imaging system, rather than confocal imaging, and therefore we do not have such images. Quantification of RNaseH1-GFP in the whole cell is reported in the figure, as RNaseH1-GFP signal is increased in both the nucleus and the cytoplasm with C48 treatment. This is not unexpected, as our data suggest that R-loop formation occurs in repetitive regions of the genome that are de-repressed by KDM5 inhibition in the nucleus, and the RNA/DNA hybrids, generated from R-loops, may activate cGAS/STING pathway in the cytoplasm.
  
  (5) Disruption of siXPF and siXPG is relatively toxic in itself. Complete knockouts in breast cancer cells were not viable and we partially knocked down XPF using siRNA instead. We do agree that these kinds of rescue studies need to be expanded upon in future studies, but they served as further proof of the conclusions presented here.
  
  (6) We have provided all the data in Geo and alternative representations can be made.
  
  (7) Unfortunately, CUT-and-Tag experiments were not performed in cells expressing siXPF and therefore we cannot provide this data. However, XPF has been previously shown to be responsible for excising R-loops from the genome, rendering them detectable by cGAS/STING in the cytoplasm (Crossley et al, 2022, referenced in the current MS). Therefore, while we demonstrate that XPF knockdown attenuates type I IFN pathway activation upon KDM5 inhibition, it may not necessarily reduce R-loop formation in retroelements; it may just prevent their excision and downstream cGAS/STING activation. We do agree that CUT-and-Tag experiments in cells treated with siXPF versus siControl will have to be performed in the future to test this hypothesis.
  
  Responses to Reviewer #2 (Recommendations for the authors):
  
  (1) We have modified the text as well as the figure legend to state that this is a simplistic representation of the pathway in normal cells. As stated in the introduction, these pathways can be modified in tumors. The data presented suggest that the dsRNA pathway can be activated in all breast cancer cell lines tested, whereas more variation is observed in the activation of the STING pathway.
  
  (2) The ADAR guides target ADAR 110 and p150 but not ADAR2. This has been clarified in the text.
  
  (3) The guides have been renamed in the figure as the reviewer suggests.
  
  (4) It has been shown by others that KDM5 can occupy the STING promoter (https://pubmed.ncbi.nlm.nih.gov/30080846/); which supports the reviewer’s suggestion that STING upregulation in HMECs may be due to increased H3K4me3 at the STING gene. However, we argue that STING upregulation is not sufficient to activate “viral mimicry” due to the absence of “tumor-specific R-loops” (due to an increase in TRC in tumor cells) in normal cells. It is interesting to note that the S9.6 signal in subtelomeric regions is increased in HMECS similar to what is observed in tumor cells. However, the S9.6 signal over other repeats is not (Author response image 4), suggesting that C48-induced increases over non-telomeric repeats are tumor specific. This suggests that the tumor-specific increases in R-loop formation, which lead to “viral mimicry” activation, are not driven by those formed in subtelomeric regions. Future studies will have to expand on these findings.
  
  Author response image 4.
  
  Percent of S9.6 reads that align to repetitive genome in HMEC cells. (a) % of total aligned S9.6 reads that map to subtelomeric region in HMEC cells treated with DMSO or 2.5 μM C48. (b) % of total aligned S9.6 reads that map to repetitive elements in general in HMEC cells treated as in a).
  
  (5) Clarity on R-loop quantification has been added to the figure legend as well as in the Materials and Methods section. Mean fluorescence intensity in the whole cell (this includes both nuclear and cytoplasmic signals) was quantified together and normalized to the number of DAPI-stained nuclei per well. As mentioned above all quantified in the Operetta imaging system.
  
  (6) We have added some data that shows that increases in H3K4me3 is observed in and around ISGs upon KDM5 inhibition (Figure S4f). However, without time course experiments it is difficult to assess whether these are direct effects of the KDM5 inhibitor or indirect effects from activation of Type I IFN (similarly to what has previously been reported with 5’-azacytidine induction of “viral mimicry”, https://pubmed.ncbi.nlm.nih.gov/26317465/).
  
  (7) We have previously included data showing that S9.6 reads in repeats that do not display C48-mediated increases in H3K4me3 also do not increase with C48 treatment (this is now Figure S4o). In addition, we have added some data showing that repeats with increased H3K4me3 and repeats with increased transcription upon C48 treatment also have increased S9.6 reads. Repeats that display both increases in H3K4me3 and mRNA expression have even greater increases in S9.6 signal compared to repeats that have increases in either one (Figure S4m-n). Taken together, this data suggest that KDM5 inhibition increases H3K4me3 in repeats, thereby allowing for their transcription, which can increase the probability of Transcription replication conflicts (TRC) and R-loop formation at such loci.
  
  (8) As mentioned earlier in this response, while we observe increased S9.6 reads in subtelomeric regions of HCC1428 cells upon KDM5 inhibition, we also observe this in normal HMEC cells. Since KDM5 inhibition does not induce viral mimicry in HMEC cells, this suggests that R-loops formed in subtelomeric regions do not dictate the response observed with C48 treatment in breast cancer cells.
  
  We hope that these answers to the reviewers comments as well as the additional data provided strengthens our findings.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.26.640279v2
www.biorxiv.org www.biorxiv.org

Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Summary of Revisions
  
  We sincerely thank the editors and reviewers for their thorough assessment and constructive feedback, which has greatly improved our manuscript. We have carefully addressed all concerns as summarized below:
  
  In response to the requests made by Reviewer #1:
  
  • Clarified task design and acknowledged its limitations regarding endpoint accuracy control.
  
  • Included analysis comparing the effects of cerebellar block on within-trial versus inter-trial movements.
  
  • Clearly defined target groupings, replacing the term “single-joint” with “movements with low coupling torques” and “multi-joint” with “movements with high coupling torques”: definitions which are now supported by a supplementary material describing the net torque data as a function of the targets.
  
  • Added detailed descriptions of trial success criteria, based on timing, and positional constraints.
  
  • Expanded figures illustrating the effect of the cerebellar block on movement decomposition and variability in joint space and across different target directions.
  
  In response to the requests made by Reviewer #2:
  
  • Included an explicit discussion highlighting why the acute reduction in muscle torque during cerebellar block is likely due to agonist weakness rather than cocontraction, emphasizing the rationale behind our torque-centric analysis.
  
  • Clearly defined trial success criteria and included the timing and accuracy constraints used in our study.
  
  • Clarified our rationale for grouping targets based on shoulder flexion/extension, clearly justified by interaction torque analysis.
  
  • Revised the caption and legend of Figure 3d for clarity and included partial correlation results to account for the variability across monkeys for the analysis of reduction in hand velocity vs. coupling torque in control.
  
  In response to the requests made by Reviewer #3:
  
  • Included electrophysiological validation of the accuracy of targeting the superior cerebellar peduncle from one of the monkeys used in the experiment.
  
  • Provided new analyses comparing movement decomposition and variability between slower and faster movements within the cerebellar block condition.
  
  • Revised manuscript text to clarify terminology and clearly explained the rationale behind target groupings and torque analyses.
  
  • Expanded discussion sections to better explain the relationships between timing deficits, movement decomposition, trajectory variability, and faulty motor commands.
  
  • Clarified methodological choices regarding our analysis timeframe and acknowledged limitations related to the distinction between feedforward and feedback control.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.
  
  Strengths:
  
  This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.
  
  We thank the reviewer for their positive feedback on our study. We particularly appreciate their recognition of the novelty and importance of our experimental approach in non-human primates, as well as their insightful summary of our key findings.
  
  Weaknesses:
  
  My major concerns are described below.
  
  If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.
  
  We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving the reward. However, given the size of the targets and the velocity of movements, it often happened that the monkeys didn’t have to stop their movements fully to obtain the reward. Importantly, we relaxed the task’s requirements (by increasing the target size and reducing the temporal constraints) to enable the monkeys to perform a sufficient number of successful trials under both the control and the cerebellar block conditions. This was necessary as we found that strict criteria regarding these parameters yielded a very low success rate in the cerebellar block condition. Nevertheless, as we appreciate now, this task design is suboptimal for studying endpoint accuracy which is an important aspect of cerebellar control. In the methods section of our revised manuscript, we have clarified this aspect of the task design and acknowledged that it is sub-optimal for examining the role of the cerebellum in end-point control (lines 475-485). The task design of our future studies will explicitly address this point more carefully.
  
  Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.
  
  This is an insightful suggestion. The behavioral task in the present study was designed with a focus on task-relevant, reward-associated reaching movements. Nevertheless, we also have data on the inter-trial movements (e.g., return-to-center reaches) under continued cerebellar stimulation, which were not directly associated with reward. In response to the reviewer’s comment, we compared the effects of cerebellar block on endpoint velocities between these two types of movements. We found that reductions in peak hand velocity during inter-trial movements were significantly smaller than those observed during the target directed reaches. We have updated the Results section of our manuscript (lines 125-137) and expanded our supplementary document (Supplementary Figure S1) to include this analysis.
  
  If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2T4 are labeled as dual-joint. The authors should provide data to justify this.
  
  The reviewer is correct. Movements to all targets involved both shoulder and elbow joints, but the degree to which each joint participated varied in a targetspecific manner. In our original manuscript, we used the term “single-joint” to refer to movements in which one joint was nearly stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5, the net torque—and thus acceleration— at the elbow was negligible, causing the shoulder to experience low coupling torques (as illustrated in Figure 3c of our revised manuscript). Following this comment and to avoid confusion, we have now explained this explicitly in the revised manuscript (lines 178-187). This is supported by Supplementary Figure S2 demonstrating the net torques at the shoulder and elbow for movements to each target. We have also replaced the term ‘singlejoint movements’ and ‘multi-joint movements’ with ‘movements with low coupling torques’ and ‘movements with high coupling torques’ respectively in our revised manuscript (lines 178-180, 204-207, 225-227, 230-232, 305-307, and 362-365).
  
  Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.
  
  While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, the analyses exploring progressive velocity changes across trials under a cerebellar block, and the relationship of motor noise to movement velocity are newly reported in this manuscript. We have included a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new (lines 582-589).
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Before the results are presented, it is useful to present the experimental paradigm in more detail. For example, after the center-out movement was completed, was the monkey required to hold at the target location? How did the next trial begin (re-centering movement)? Next, specify the stimulation protocol, noting that each session was divided into 3-4 blocks of stimulation and not stimulation, with each block 50-80 trials.
  
  We have updated the results section of our revised manuscript (lines 91-104) to present the experimental paradigm in more detail according to the reviewer’s advice.
  
  (2) Figure 1. Hand velocity does not show how the reach was completed. Did the subjects stop at the target or simply shoot through it and turn around without stopping? Why are the traces cut off?
  
  Monkeys were indeed required to hold at the target briefly (100-150 ms) before receiving the reward. However, given the size of the targets and the velocity of movements, it often happened that the monkeys didn’t have to stop their movements fully to obtain the reward. The hand velocity profile shown in Figure 1b and the torque profiles shown in Figures 2a and 2b correspond to the period from movement onset to the entry of the control cursor into the peripheral target which marked the end of the movement for the trial. Since the monkeys didn’t have to stop their movements fully for the trial to end, the traces appear cut off at the beginning of the deceleration/stopping phase of the movement. We have updated the captions of Figures 1b, 2a, and 2b to include this information (lines 869-872 and 882-884).
  
  (3) Maybe state that the data regarding reaction times are not presented because of the task design in which the go signal was predictable.
  
  In monkeys M and C, the timing of the go signal was fixed and therefore predictable. Furthermore, they were also allowed a grace period of 200 ms before the go signal to facilitate predictive timing which often resulted in negative reaction times. However, in Monkeys S and P, the go signal was variable in timing and the monkeys were not allowed to initiate the movements before the go signal. In our previous studies (Nashef et al., 2019; Israely et al. 2025), we reported increased reaction times under cerebellar block. However, since the present study focuses specifically on execution-related motor deficits, we did not analyze reaction time data.
  
  (4) Please provide the data and analysis regarding the entire reach, including the period after the cursor crosses the target and returns to the center position.
  
  We compared the peak hand velocity of the target-directed movements to the inter-trial return-to-center movements. Cerebellar block produced significantly smaller reductions in peak hand velocity during inter-trial movements compared to within-trial reaches. The results section of our revised manuscript (lines 125137) and the supplementary material (Supplementary Figure S1) have been updated accordingly. While the behavioral task in the present study was designed with a focus on task-relevant, reward-associated reaching movements, it will be interesting to examine in detail the effect of cerebellar block on spontaneous movements in a future study.
  
  (5) Figure 5. To illustrate the decomposition of multijoint movements into a sequence of single joint movements, I suggest plotting movements in joint space (in addition to Cartesian space as you have done now). The results in Figure 5 are most interesting and thus should be expanded. Please provide this data using the format in Figure 1C, that is, as a function of direction.
  
  Following the reviewer’s suggestion, we have plotted sample trajectories in joint-velocity (Supplementary Figures 3a and b) and position space (Supplementary Figures 4a and b) to highlight the decomposition of multi-joint movements and increased inter-trial trajectory variability respectively during the cerebellar block. Additionally, we also analyzed movement decomposition and trajectory variability as a function of target direction (Supplementary Figures 3c and 4c respectively). The corresponding text in the Results section has been updated accordingly (lines 256-261, 267-271, 277-278 and 280-288).
  
  Reviewer #2 (Public review):
  
  This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the takehome-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.
  
  We thank the reviewer for their thoughtful and constructive feedback. We are grateful for their recognition of the significance of our findings regarding acute and compensatory motor responses following a cerebellar block.
  
  Primary comments:
  
  (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.
  
  We appreciate the reviewer's emphasis on distinguishing between reduced muscle tone and altered co-contraction patterns as potential explanations for decreased limb velocity. Our focus on torques per se arises from previous studies suggesting that a core deficit in cerebellar ataxia is impaired prediction of passive coupling torques (Bastian et al., 1996). In our study, we demonstrate that motor deficits in cerebellar ataxia result in fact from both the inability to compensate for passive coupling torques and an acute insufficiency in the ability to generate active muscle torques.
  
  The muscle torque, representing the sum of all muscle forces acting at a joint, can indeed be reduced by any of the two mechanisms: (i) co-contraction of agonist and antagonist muscles, and/or (ii) insufficient agonist muscle activity (i.e., agonist weakness). In cerebellar ataxia, co-contraction has been proposed as a simplifying strategy to stabilize stationary joints during decomposed multi-joint movements (Bastian et al., 1996). In our experiments, this strategy would likely emerge gradually following cerebellar block similar to the adaptive slowing of movements aimed at reducing inter-joint interactions. However, we found that irrespective of the magnitude of coupling torques involved, reduction in the velocity of movements also occurred immediately following cerebellar block—a pattern less consistent with gradually emerging compensatory strategies. We therefore argue that this acute onset of movement slowing was mainly driven by agonist weakness. Our argument is further supported by previous studies which attributed reduced agonist muscle activity as a cause for the slowing of voluntary movements in individuals with cerebellar lesions (Hallet et al. 1991; Wild et al., 1996). Additionally, early studies have also reported muscle weakness (asthenia) and hypotonia acutely following cerebellar injury in humans (Haines et al., 2007) and experimental lesions in animals (Luciani, 1893; Bremer et al., 1935; Fulton & Dow, 1937; Granit et al., 1955).
  
  We have modified the discussion section of our revised manuscript (lines 366-376) to explain/clarify this. Additionally, we have also underscored that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torque (lines 350-352).
  
  (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.
  
  Successful trials were defined as trials in which monkeys didn’t leave the center position before the “Go” signal and entered the peripheral target within a permitted movement time. We have updated the results (lines 91-104) and methods (lines 475-485) section of our revised manuscript to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.
  
  (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.
  
  We acknowledge the reviewer's observation regarding the classification of targets 1 and 5 as side-to-side movements rather than strictly "outward" or "inward." In the initial section of our results, we grouped the targets based on shoulder joint movements: "outward" targets involved shoulder flexion, while "inward" targets involved shoulder extension. This classification highlighted the more pronounced effect of cerebellar block on movements requiring shoulder flexion compared to those requiring shoulder extension. For subsequent analyses, we focused on the effects of cerebellar block on movements to "outward" targets, which included directions involving low (target 1) or high (targets 2–4) coupling torques. To clarify this aspect, we have revised our manuscript to explain our definition of "outward" (targets 1–4) and "inward" (targets 5–8) target groupings based on shoulder flexion and extension movements respectively (lines 117-120).
  
  (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.
  
  We have revised the legend of Figure 3d to include a detailed explanation of how the value along each axis is computed (lines 908-920 of the revised manuscript). Please note that the color coding of the data points is as per the target number (T1-T4) and not the monkey number (as denoted in the figure legend). Also, pooling of data across monkeys was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Following the reviewers’ feedback, we now performed a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.32, p < 0.001) between reduction in peak hand velocities during cerebellar block and the net coupling torque impulse. We have updated the manuscript to include the result of the partial correlation analysis (lines 173-176).
  
  (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.
  
  The reviewer is right and the relations between timing, decomposition and variability need to be explicitly explained. In the results section of our revised manuscript, we have explained how decomposed movements and trajectory variability may reflect impaired temporal coordination across multiple joints—a critical cerebellar function (lines 235-244).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Rephrase the findings, starting Line 232. Here the authors state, "Next, we asked whether movement decomposition was mainly due to lower hand velocities. We therefore selected a subset of control trials that matched the cerebellar block trials in their peak velocity. However, even though movement decomposition in these control trials was higher compared to all control trials, it was still significantly lower than velocity matched cerebellar block trials." I suggest inverting the final sentence to: "Movement decomposition in control trials was significantly lower than velocity-matched cerebellar block trials, even though these control trials themselves had somewhat higher decomposition indices than all control trials together." A similar issue pops up with trajectory variability below that simply requires some editing to be less clunky.
  
  Following the reviewer’s suggestion, we have revised the sentences related to movement decomposition and trajectory variability. These sentences now reads as follows:
  
  (lines 267-271 in the revised manuscript): “Movement decomposition in control trials was significantly lower than velocity-matched cerebellar block trials (p < 0.001; Figure 5c), even though these control trials themselves had 11.0% (CI [5.2, 17.0], p = 0.03) higher decomposition than the mean value calculated across all control trials.”
  
  (lines 280-288 in the revised manuscript): “ When we compared the subset of velocitymatched control and cerebellar block trials, we found that cerebellar block trials exhibited 34.6% (CI [26.2, 43.2], p < 0.001) higher trajectory variability (Figure 5e). Normally, slower movements are also less variable due to the speed-accuracy tradeoff (Plamondon and Alimi 1997). Indeed, the trajectory variability in this subset of slower control trials was 5.5% (CI [0.9, 9.9], p = 0.02) lower than that of all control trials. In other words, despite slower movements, cerebellar block led to increased trajectory variability.”
  
  (2) Typo: Ln 73 sequences, not sequence.
  
  Typo error was corrected (line 75 of revised manuscript).
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.
  
  Strengths:
  
  Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.
  
  The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).
  
  In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.
  
  We thank the reviewer for their detailed assessment and thoughtful comments and greatly appreciate their positive feedback.
  
  Weaknesses:
  
  The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.
  
  We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials without using an explicit washout period. We plan to study the effect of the cerebellar block on immediate post-block washout trials in the future.
  
  The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 68) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.
  
  The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. Specifically, we noted that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We propose two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) posture-related facilitation of inward (shoulder extension) movements from the central starting position. This point is addressed in the Discussion section (lines 404-433 in the revised manuscript).
  
  The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.
  
  The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. Therefore, we cannot rule out the fact that the effect of HFS may be mediated in part through pathways other than the cerebello-thalamo-cortical pathway (as mentioned in the Discussion section). However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (at the expense of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways diminished in importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). Previously we found that the ascending spinocerebellar axons which enter the cerebellum through the superior cerebellar peduncle (SCP) are weakly task-related and the descending system is quite small (Cohen et al, 2017). We have clarified these points and acknowledged that HFS disrupts cerebellar communication broadly, rather than solely the cerebellothalamo-cortical pathway in the methods section of our revised manuscript (lines 531544).
  
  The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.
  
  We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. We interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise. This point is presented in our Discussion section (lines 436-458 of the revised manuscript),
  
  Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.
  
  We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task, the amplitude of movements made by the monkeys was small, and therefore the response measures in the first 50-100 ms were too small for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque impulse at the peak velocity, and maximum deviation of the hand trajectory as response measures. We have acknowledged this point in the methods section of our revised manuscript (lines 590-600). We have also refrained from using the term feedforward control throughout the text of our revised manuscript as suggested by the reviewer.
  
  The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.
  
  Indeed, as reviewer #1 also noted, movements to targets 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Movements to all targets involved both shoulder and elbow joints, but the degree to which each joint participated varied in a target-specific manner. In our original manuscript, we used the term “single-joint” to refer to movements in which one joint was largely stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5, the net torque—and thus acceleration—at the elbow was negligible, causing the shoulder to experience low coupling torques (as illustrated in Figure 3c of our revised manuscript). Following this comment and to avoid confusion, we have now explained this explicitly in the revised manuscript (lines 178-187). This is supported by Supplementary Figure S2 demonstrating the net torques at the shoulder and elbow for movements to each target. We have also replaced the term ‘single-joint movements’ and ‘multi-joint movements’ with ‘movements with low coupling torques’ and ‘movements with high coupling torques’ respectively in our revised manuscript (lines 178-180, 204-207, 225-227, 230-232, 305-307, and 362-365).
  
  The labels in Figure 3d are confusing and could use more explanation in the figure legend. In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?
  
  We have revised the legend of Figure 3d to include a detailed explanation of how the values along each axis are computed (lines 908-920 of the revised manuscript). Please note that the pooling of data across monkeys was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.32, p < 0.001) between reduction in peak hand velocities during cerebellar block and the net coupling torque impulse. We have updated the manuscript to include the result of the partial correlation analysis (lines 173-176).
  
  In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?
  
  The breakdown of the percentage increase in failure rate due to cerebellar block as a function of target direction is shown in Author response image 1 inserted to this response.
  
  Author response image 1.
  
  Effect of cerebellar block on failure rate. The change in failure rate for the cerebellar block trials was computed relative to the control trials per session per target. The depicted values are the mean ± 95% confidence intervals across all sessions pooled from all four monkeys. The individual means of each monkey are overlaid. Statistical significance is denoted as follows: p ≥ 0.05NS, p < 0.05*, p < 0.01**, p < 0.001*** [T1-8: Targets 1-8]
  
  The increase in failure rate due to cerebellar block was not affected by the target direction (linear mixed model analysis, target x trial-type interaction effect: p = 0.44). However, it should be noted that success/failure depends on several factors beyond just the execution related impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) premature exit from the central target before the ‘go’ signal, (iii) reaction time longer than the time permitted to reach the peripheral target after the ‘go’ signal, or (iv) not holding at the peripheral target for the required time at the end of the movement.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) It would be helpful to provide some supplemental information on electrophysiological validation of the targeting in each monkey. Was any variability in targeting observed (e.g., some targeting was more effective at eliciting cortical responses)? If so, does targeting variability relate to any of the variability in behavioral effects of HFS across monkeys?
  
  Although we currently do not have an exact measure of the proportion of fibers blocked by HFS, our targeting approach consistently elicited robust cortical responses across monkeys. Specifically, we implanted the stimulating electrode at the location that produced the maximum peak-to-peak evoked responses in the primary motor cortex. Author response image 2 in this response demonstrates that even a slight deviation (~0.5 mm) from this optimal site reduced these responses substantially.:
  
  Author response image 2.
  
  Evoked responses in the primary motor cortex as a function of the location of the stimulation site. [LEFT] Coronal T2-weighted MRI showing the planned trajectory to target the superior cerebellar peduncle (location marked by the tip of the arrowhead) through a round chamber suitably positioned over the skull. [RIGHT] Evoked multi-unit (300-7500 Hz) responses from one of the recording electrodes in the primary motor cortex are used to guide the stimulating electrode to the correct implant site. As the stimulating electrode was lowered deeper, maximum peak-to-peak evoked responses were obtained at a depth of 32.5 mm relative to the cortical surface. This was chosen as the implant site. Elevating or lowering the electrode by ~0.5 mm from this depth reduced the peak-to-peak response amplitude.
  
  (2) The emphasis in the Introduction that HFS provides direct insight into deficits seen in patients with cerebellar disease or injury is a bit overstated. Patients have very diverse etiologies, only a modest number of which might be faithfully mimicked by SCP HFS. I would suggest some text acknowledging that this is only a limited model for cerebellar disease or injury.
  
  We agree with the reviewer that the high-frequency stimulation of the superior cerebellar peduncle provides a limited model that does not fully replicate the diverse pathologies seen in cerebellar disease or injury. In fact, in the introduction section (lines 53-59 of our revised manuscript) we have mentioned that the discrepancy in the conclusions of various clinical studies may reflect the heterogeneity of the individuals with cerebellar lesions who often have differences in lesion etiology and associated damage beyond the cerebellum itself. While this may preclude the generalization of our findings to the wider clinical population per se, our approach offers a precise and controlled method to investigate the immediate and adaptive changes in motor behavior following the disruption of cerebellar signals.
  
  (3) Do animals with HFS show less decomposition and trajectory variability in their slower movements when compared to their faster movements? Comparisons are only made with velocity-matched control blocks, but the comparison of slower vs. faster reaches during HFS blocks would also be informative.
  
  To answer this point we classified movements during cerebellar block as either slow or fast based on the median peak hand velocity of the cerebellar block trials per target per session. We then computed the decomposition index and trajectory variability for the fast and slow movements during cerebellar block relative to control in the same way as in Figure 5 of our manuscript (i.e., the percentage change relative to control). Our analysis revealed significantly lower movement decomposition (p < 0.001) and reduced trajectory variability (p < 0.001) for slower movements compared to faster ones within the cerebellar block condition (Author response image 3).
  
  Author response image 3.
  
  Effect of slow and fast movements during cerebellar block on movement decomposition and trajectory variability. [LEFT] Change in decomposition index (i.e., the proportion of the movement time during which the movement was decomposed) for slow and fast cerebellar block trials relative to all control trials. The change in median decomposition was computed per session per target and then averaged across all eight targets to arrive at one value per session. The depicted values are the mean ± 95% confidence intervals across all sessions pooled from all four monkeys. The individual means of each monkey are overlaid. [RIGHT] Change in inter-trial trajectory variability for slow and fast cerebellar block trials relative to all control trials. The trajectory variability was measured as the standard deviation of the maximum perpendicular distance of the trajectories from the Y-axis after transforming them as in Figure 5d of the main text. The change in trajectory variability for the fast and slow cerebellar block trials was then computed per session per target and averaged across all eight targets to arrive at one value per session. The depicted values are the mean ± 95% confidence intervals across all sessions pooled from all four monkeys. The individual means of each monkey are overlaid. Statistical significance is denoted as follows: p ≥ 0.05NS, p < 0.05*, p < 0.01**, p < 0.001***. [Cbl: Cerebellar block].
  
  (4) Line 220- 'velocity' should be 'speed' or 'absolute velocity'?
  
  The term velocity was changed to speed in the revised manuscript (line 255).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.21.595172v5
www.biorxiv.org www.biorxiv.org

RGS10 deficiency facilitates distant metastasis by inducing epithelial–mesenchymal transition in breast cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study presents a valuable finding on the mechanism to promote distant metastasis in breast cancer. The evidence supporting the claims of the authors is convincing. The work will be of interest to medical biologists working on breast cancer.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Strengths
  
  The paper has shown the expression of RGS10 is related to the molecular subtype, distant metastasis, and survival status of breast cancer. The study utilizes bioinformatic analyses, human tissue samples, and in vitro and in vivo experiments which strengthen the data. RGS10 was validated to inhibit EMT through a novel mechanism dependent on LCN2 and miR-539-5p, thereby reducing cancer cell proliferation, colony formation, invasion, and migration. The study elaborated the function of RGS10 in influencing the prognosis and biological behavior which could be considered as a potential drug target in breast cancer.
  
  Weakness
  
  The mechanism by which the miR-539-5p/RGS10/LCN2 axis may be related to the prognosis of cancer patients still needs to be elucidated. In addition, the sample size used is relatively limited. Especially, if further exploration of the related pathways and mechanisms of LCN2 can be carried out by using organoid models, as well as the potential of RGS10 as a biomarker for further clinical translation to verify its therapeutic target effect, which will make the data more convincing.
  
  Answer: Thank you for your comments and suggestions. In future research, we will utilize large clinical cohorts and organoid models to further explore relevant research mechanisms.
  
  Reviewer #2 (Public Review):
  
  Liu et al., by focusing on the regulation of G protein-signaling 10 (RGS10), reported that RGS10 expression was significantly lower in patients with breast cancer, compared with normal adjacent tissue. Genetic inhibition of RGS10 caused epithelial-mesenchymal transition, and enhanced cell proliferation, migration, and invasion, respectively. These results suggest an inhibitory role of RGS10 in tumor metastasis. Furthermore, bioinformatic analyses determined signaling cascades for RGS10-mediated breast cancer distant metastasis. More importantly, both in vitro and in vivo studies evidenced that alteration of RGS10 expression by modulating its upstream regulator miR-539-5p affects breast cancer metastasis. Altogether, these findings provide insight into the pathogenesis of breast tumors and hence identify potential therapeutic targets in breast cancer.
  
  The conclusions of this study are mostly well supported by data. However, there is a weakness in the study that needs to be clarified.
  
  In Figure 2A, although some references supported that SKBR3 and MCF-7 possess poorly aggressive and less invasive abilities, examining only RGS10 expression in those cells, it could not be concluded that 'RGS10 acts as a tumor suppressor in breast cancer'. It would be better to introduce a horizontal comparison of the invasive ability of these 3 types of cells using an invasion assay.
  
  Answer: Thank you for your comments and suggestions. MDA-MB-231, SKBR3, and MCF-7 originate from triple-negative breast cancer (high invasiveness), Her-2 receptor overexpression (relatively weak invasiveness), and luminal type breast cancer (relatively weak invasiveness) separately. Previous studies have demonstrated the invasive ability of these 3 types of cells. (PMID: 34390568)
  
  Reviewer #3 (Public Review):
  
  Distant metastasis is the major cause of death in patients with breast cancer. In this manuscript, Liu et al. show that RGS10 deficiency elicits distant metastasis via epithelial-mesenchymal transition in breast cancer. As a prognostic indicator of breast cancer, RGS10 regulates the progress of breast cancer and affects tumor phenotypes such as epithelial-mesenchymal transformation, invasion, and migration. The conclusions of this paper are mostly well supported by data, but some analyses need to be clarified.
  
  (1) Because diverse biomarkers have been identified for EMT, it is recommended to declare the advantages of using RGS10 as an EMT marker.
  
  Answer: Thank you for your comments. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. (PMID: 26293348). Previous studies have shown that RGS10 knocking down can lead to chemotherapy resistance of ovarian cancer cells to paclitaxel, cisplatin, and vincristine. In colorectal tumors, the transcription of RGS10 is regulated by DNA methylation and histone deacetylation. As a key regulatory factor in the G protein signaling pathway, RGS 10 is involved in tumor development including survival, polarization, adhesion, chemotaxis, and differentiation, these hints suggest RGS10 might be a marker for EMT in breast cancer.
  
  (2) The authors utilized databases to study the upstream regulatory mechanisms of RSG10. It is recommended to clarify why the authors focused on miRNAs rather than other epigenetic modifications.
  
  Answer: Thank you for your comments. miRNAs are short-chain non-coding RNA molecules that bind to the target mRNA's 3 'untranslated region (3'UTR) to cause mRNA degradation or translation inhibition, thus regulating gene expression in cells. These small molecules play a crucial role in regulating the expression of cancer-related genes and can act as tumor promoters or tumor suppressors. To further improve the molecular mechanism of malignant biological behavior of breast cancer cells with RGS10, we verified that miR-539-5p might be the upstream regulation target of RGS10 through bioinformatics prediction and in-vitro experiments.
  
  (3) The role of miR-539-5p in breast cancer has been described in previous studies. Hence, it is recommended to provide detailed elaboration on how miR-539-5p regulates the expression of RSG10.
  
  Answer: Thank you for your comments. To verify the effect of miRNA-539-5p regulating the expression of RSG10, we transfected miR-539-5p mimic, miR-539-5p mimic NC, miR-539-5p inhibitor, miR-539-5p inhibitor NC in SKBR3 cells and MDA-MB-231 cells respectively, and verified the expression of RGS10 through RT-qPCR and Western blot experiments. The results showed that compared with the transfected miR-539-5p mimic NC or wild-type SKBR3 cells, RGS10 m RNA and protein levels were significantly reduced. On the contrary, after MDA-MB-231 cells were transfected with miR-539-5p inhibitor to inhibit the expression of miR-539-5p, RGS10 mRNA and protein levels in MDA-MB-231 cells were significantly increased (Fig. 3.4A-C, Fig. 3.5A-C). This indicates that miR-539-5p can target and regulate RGS10.
  
  (4) To enhance the clarity and interpretability of the Western blot results, it would be advisable to mark the specific kilodalton (kDa) values of the proteins.
  
  Answer: Thank you for your comments and suggestions. We have corrected to mark the specific kilodalton (kDa) values of the proteins in WB.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  The function of RGS10 in breast cancer was identified in the paper. However, some major issues in this paper need to be specified:
  
  (1) From reading the introduction section and its references, RGS proteins participate in multiple essential cellular processes and may be tumor initiators or suppressors (Li et al., 2023). This article focuses on the significance of RGS10 in breast cancer, it is recommended to show how the function of RGS10 exhibits therapeutic significance in other types of cancer.
  
  Answer: Thanks for your comments and suggestions on our findings. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. Especially in ovarian cancer cells. (PMID: 26293348). It has been found that the RGS10 expression is lower than that of normal ovarian cells. (PMID: 21044322). In addition, it has been found that knocking down RGS10 can enhance the vitality of ovarian cancer cells and promote chemoresistance by activating the Rheb GTP/mTOR signaling pathway. (PMID: 26319900). A study suggests that RGS10 mediates inflammation signaling regulation in SKOV-3 ovarian cancer cells with high expression of TNF and COX-2 after RGS10 knockdown. In colorectal tumors, RGS10 transcription is regulated by DNA methylation and histone deacetylation. (PMID: 35810565). RGS10 expression also are associated with poor prognosis in laryngeal cancer, hepatocellular carcinoma, and pediatric acute myeloid leukemia. (PMID: 32776811, PMID: 26516143, PMID: 30538250)
  
  (2) The authors characterize RGS10 protein expression in the breast cancer cell lines MDA-MB-231, MCF7, and SKBR3 in vitro Figure 2A. However, more information would strengthen the data - e.g. information on the expression of RGS10 protein and the survival in public databases, as well as the correlation between RGS10 and Her-2 expression.
  
  Answer: Thanks for your comments. we have checked the correlation of RGS10 expression and survival rate of Her-2 positive breast cancer patients in a public database. Although there is no significant difference in the “p” value, however, RGS10 high-expression patients have a favorable prognosis tendency than RGS10 low-expression patients after the 100th month.
  
  Author response image 1.
  
  (3) Regarding the current situation of clinical trials in the RGS family, the potential to develop RGS 10 for clinic translation is a driving factor for EMT.
  
  Answer: Thank you for your comments. The RGS (G protein signal transduction regulator) gene family provides an important "braking" function for the cell receptor family of G-protein coupled receptors (GPCR). GPCR controls hundreds of important functions in systemic cells and is the largest class of drug targets, with over one-third of FDA approved drugs treating diseases by binding to GPCR and altering its activity. When GPCRs are activated by hormones or neurotransmitters, they initiate signaling cascades within host cells through signal-carrying proteins called G proteins. The function of the RGS protein is to inactivate the G protein, thereby shutting down this signaling cascade reaction, which limits G protein signal transduction and allows cells to reset and receive new incoming signals. If it were not for it, the signals triggered by GPCR would inappropriately remain on, and the signal transduction would experience dysfunction (PMID: 33007266). The potential to develop RGS10 as a driving factor of EMT is meaningful for clinic translation.
  
  (4) In Figure 3A, the paper showed that differential gene expression revealed 70 genes were significantly upregulated in RGS10-depleted SKBR3 cells, The authors didn't show any data on the expression of other EMT-related proteins in pathway analysis.
  
  Answer: Thank you for your comments. The enrichment analysis of RNA sequencing in RGS10-depleted SKBR3 cells suggests that high correlation factors that are associated with EMT, such as TAGLN, TNFSF10, NDUFA4L2, CCN5, PHGDH, ST3GAL5, ANG, and LCN2.
  
  (5) In Figure 3B, the paper focuses on LCN2 in pathway analysis, however, the author did not elaborate on the significance of LCN2-related pathways in EMT.
  
  Answer: Thank you for your comments. Some studies have the significance of LCN2-related pathways in EMT. It was confirmed that LCN2 upregulation triggered by PTEN insufficiency induces EMT to promote migration and invasion in MCF7 cells (PMID: 27466505). The activation of STAT3 contributes to an increase in LCN2 expression, which activates ERK pathway-dependent EMT, thus promoting lung metastasis in MDA-MB-231 cells in breast cancer (PMID: 33473115). The silencing of LCN2 reduced the ability of migration and invasion of SUM149 cells and the proportion of tumor stem cells, suggesting that LCN2 may mediate the invasion and metastasis of cancer cells by regulating the stemness of breast cancer cells. The biological effects of LCN2 small molecule inhibitors ZINC00640089 and ZINC00784494 targeting IBC cells have been confirmed. The siRNA-mediated silencing of LCN2 in IBC cells significantly reduces cell proliferation, viability, migration, and invasion. (PMID: 34445288).
  
  (6) Minor: the author did not conduct a semi-quantitative analysis of the immunohistochemical results of RGS10.
  
  Answer: Thank you for your suggestion. We would like to demonstrate the qualitative analysis of RGS10 immunohistochemistry. The semi-quantitative analysis is not required in the paper.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The role of RGS10 was well-characterized in this study, However, some minor points need to be modified.
  
  (1) Page 15 line 296, description of cell proliferation was missing, please modify.
  
  Answer: Thank you for your comments. We have corrected the description of cell proliferation on Page 15 highlighted in red.
  
  (2) In Figure 2C, the title of the Y-axis was missing.
  
  Answer: Thank you for your comments. We have corrected the description of the Y-axis title in Figure 2C.
  
  (3) Describe the transfection reagent that was used in this study, and incorporated into the methods section.
  
  Answer: Thank you for your comments. We have added the description of the transfection reagent to the methods section.
  
  (4) The manuscript needs proofreading.
  
  Answer: Thank you for your comments. We have proofread the manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.04.583283v3
www.biorxiv.org www.biorxiv.org

New submission 28/12/2023, 08:20:46

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This fundamental study provides compelling evidence to explain how chemical variations within a set of kinase inhibitors drive the selection of specific Erk2 conformations. Conformational selection plays a critical role in targeting medically relevant kinases such as Erk2 and the findings reported here open new avenues for designing small molecule inhibitors that block the active site while also steering the population of the enzyme into active or inactive conformations. Since protein dynamics and conformational ensembles are essential for enzyme function, this work will be of broad interest to those working in drug development, signal transduction, and enzymology.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary: The authors set out to determine how chemical variation on kinase inhibitors determines the selection of Erk2 conformations and how inhibitor binding affects ERk2 structure and dynamics.
  
  Strengths: The study is beautifully presented both verbally and visually. The NMR experiments and the HDX experiments complement each other for the study of Erk2 solution dynamics. X-ray crystallography of Erk2 complexes with inhibitors shows small but distinct structural changes that support the proposed model for the impact of inhibitor binding.
  
  Weaknesses: A discussion of compound residence time for the different compounds and kinase constructs and how it could affect the very slow HDX rates might be helpful. For example, could any of the observed effects in Figure 4 be due to slow compound dissociation rather than slowed down kinase dynamics? What would be the implications?
  
  Response: Rate constants for kon and koff were estimated for three inhibitors using surface plasmon resonance:
  
  Author response table 1.
  
  SPR estimates of Kd for selected inhibitors ranged between 0.03-3 nM. All HDX time courses involved prebinding of 20 µM inhibitor and 17 µM ERK2 for 30 min (predicted occupancy 99.9%), followed by deuteration time courses with 20 µM inhibitor and 1.7 µM ERK2. Estimated rates of dissociation were ~0.0003-0.007 s-1 and rates of binding were 20-100 s-1 for the inhibitors tested. Because the binding rates are faster than the intrinsic H-D exchange rate at pD 7 (~1 s-1), we expect ligands to rebind and form the enzyme:ligand complex faster than the free enzyme undergoes exchange. Therefore, HDX rates should mostly reflect deuteration of the inhibitor-bound enzyme for all inhibitors.
  
  Reviewer #2 (Public Review):
  
  Erk2 is an essential element of the MAP kinase signaling cascade and directly controls cell proliferation, migration, and survival. Therefore, it is one of the most important drug targets for cancer therapy. The catalytic subunit of Erk2 has a bilobal architecture, with the small lobe harboring the nucleotide-binding pocket and the large lobe harboring the substrate-binding cleft. Several studies by the Ahn group revealed that the catalytic domain hops between (at least) two conformational states: active (R) and inactive (L), which exchange in the millisecond time scale based on the chemical shift mapping. The R state is a signature of the double phosphorylated Erk2 (2P-Erk2), while the L state has been associated with the unphosphorylated kinase (0P-Erk2). Interestingly, the X-ray structures reveal only minimal differences between these two states, a feature that led to the conclusion that active and inactive states are structurally similar but dynamically very different. The Ahn group also found that ATP-competitive inhibitors can steer the populations of Erk2 either toward the R or the L state, depending on their chemical nature. The latter opens up the possibility of modulating the activity of this kinase by changing the chemistry of the ATP-competitive inhibitor. To prove this point, the authors present a set of nineteen compounds with diverse chemical substituents. From their combined NMR and HDX-Mass Spec analyses, fourteen inhibitors drive the kinase toward the R state, while four compounds keep the kinase hopping between the R and L states. Based on these data, the authors rationalize the effects of these inhibitors and the importance of the nature of the substituents on the central scaffold to steer the kinase activity. While all these inhibitors target the ATP binding pocket, they display diverse structural and dynamic effects on the kinase, selecting a specific structural state. Although the inhibited kinase is no longer able to phosphorylate substrates, it can initiate signaling events functioning as scaffolds for other proteins. Therefore, by changing the chemistry of the inhibitors it may be possible to affect the MAP cascade in a predictable manner. This concept, recently introduced as proof of principle, finds here its significance and practical implications. The design of the next-generation inhibitors must be taken into account for these design principles. The research is well executed, and the data support the author's conclusions.
  
  Reviewer #3 (Public Review):
  
  Summary: Anderson et al utilize an array of orthogonal techniques to highlight the importance of protein dynamics for the function and inhibition of the kinase ERK2. ERK2 is important for a large variety of biological functions.
  
  Strengths: This is a thorough and detailed study that uses a variety of techniques to identify critical molecular/chemical parameters that drive ERK2 in specific states.
  
  Weaknesses: No details rules were identified so that novel inhibitors could be designed. Nevertheless, the mode of action of these existing inhibitors is much better defined.
  
  Response: As recommended we added a sentence to the Discussion suggesting that inhibitors that perturb the β1-β2-β3 sheet in such a way that moves helix αC and αL16 away from the binding site might confer R-state selection. We view this as a preliminary model for predicting conformation selection in ERK2.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Maybe the authors can comment on how the HDX timescale and the NMR timescale relate to each other and how such different timescales can report on the same event. In particular, the HDX timescale appears to be on the scale on minutes to tens hours (e.g. 2P state). How would inhibitor dissociation and rebinding affect the observed HDX signal? Is it worth considering compound residence time for the different compounds/kinase states?
  
  Response: The HDX-MS and NMR experiments report different processes therefore their timescales do not necessarily match. For native state proteins at neutral pH, HDX-MS reports fluctuations that allow solvent exposure of backbone amide N-H, reflecting conformational mobility of the main chain. This is often modeled as a two-state interconversion between “closed” (HDX protected) and “open” (HDX accessible) states. Because the µs-ms timescale of main chain fluctuations is faster than the intrinsic rate of HDX (kexch, ~1 s-1), the observed HDX rate (kobs) can be approximated by the ratio of kopen/kclosed x kexch = Kop x kexch. Therefore, kobs can be considered a thermodynamic measurement that reflects Kop.
  
  The [methyl 13C,1H] NMR CPMG experiment that we used to identify global exchange behavior in Xiao et al (PNAS, 2014) modeled the 2P-ERK2 apoenzyme by a two-state equilibrium (L⇌R) between methyl-ILV conformers, yielding rate constants kL→R 240 s-1 and kR→L 60 s-1. Some methyls had large enough chemical shifts between L and R that they appeared as separate peaks in HMQC spectra that matched the L and R populations estimated by CPMG. In this study, the HMQC peaks shown in Figures 1, 6, and 9 are those that report shifts in L vs R populations and conformation selection for the R-state by VTX11e, BVD523 and triazolopyridine inhibitors.
  
  Where HDX and NMR agree is in their ability to report changes in populations of L and R in 2P-ERK2. This was first shown when both HDX and NMR measurements reported perturbations at the activation loop induced by inhibitors with differential selection for the R- vs L-states (Pegram et al. PNAS, 2019). CPMG measurements then confirmed that methyl probes in the activation loop are included in the global exchange process (Iverson et al., Biochemistry, 2020). Therefore, the HDX and NMR experiments reflect shifts in the equilibrium between L and R conformers, rather than motions with specific timescales.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I believe the paper is suitable for the special issue of Elife dedicated to protein kinases after the authors address minor concerns/comments.
  
  a) Introduction, page 3: "[..] But within the ATP binding site, the conserved residues ...are largely overlapping." Do the authors mean that the residues are overlapping in the X-ray structures? If so, what is the rmsd among the X-ray structures?
  
  Response: The overlap between conserved residues K52, E69, D147, N152 and D165 in 2P- and 0P-ERK2 is presented in Fig. S1C, which shows an overlay between their apoenzyme crystal structures (PDBID: 2ERK, 5UMO). The RMSD of atoms in each residue are: K52 0.63 Å (9 atoms); E69 0.15 Å (9 atoms); D147 0.055 Å (8 atoms); D165 0.88 Å (8 atoms). As recommended, this information was added to the legend to Suppl. Fig. S1.
  
  b) Introduction, page 5: "[...] For example binding of VTX11 partially inhibits...[..]" Please provide a citation.
  
  Response: As recommended we added a citation at end of this sentence (Pegram et al. PNAS, 2019).
  
  c) Introduction, page 5: "[...] N-lobe deformities..." What do the authors mean by deformities? Are there frustrated conformations?
  
  Response: We used the term “deformities” to mean conformational differences, which may be but are not necessarily due to frustration. To avoid confusion, we removed the term “deformities” and replaced it with “conformational changes”.
  
  d) Supplementary Information. The authors report the chemical shift perturbations for several inhibitors. Does the extent of the chemical shift perturbation reflect the strength of the binding for each inhibitor? In other words, do the largest chemical shift perturbations correspond to the highest binding affinity?
  
  Response: The concentrations used in the NMR ligand binding experiments (150 µM ERK2, 180 µM inhibitor) allow 99.9+% complex formation over the 0.03 - 3 nM range of Ki for all inhibitors. Therefore, the chemical shifts report changes in electronic environment between bound and free enzyme. These can be ascribed to first or second sphere contacts with ligand or distal allosteric effects. But they are not likely to reflect differences in binding affinity.
  
  New Suppl. Fig. S3 now adds HMQC titrations of VTX11e and GDC0994 into 2P-ERK2, which confirm binding saturation based on the disappearance of free enzyme peaks.
  
  e) Do the authors have any evidence for the dynamic effects of the different inhibitors? Of course, a systematic analysis of the protein dynamics by NMR will require a significant amount of time and effort beyond this work. However, did the authors measure the effects of the inhibitors on the linewidths of the methyl groups distal from the binding site?<br /> Response: As recommended, we examined linewidths of selected peaks in the presence and absence of inhibitors. The results show no significant systematic differences between bound and free ERK2. Therefore dynamic effects of different inhibitors are not indicated by the available data.
  
  f) The authors identified the b3-aC loop as a critical element for the internal network of interactions. Can this structural element be targeted by small molecules as well?
  
  Response: Yes, in fact the X-ray structures of 0P-ERK2 bound to the inhibitor, SCH772984, and 2P-ERK2 bound to the related compound, SCHCPD336, both show inhibitor occupying a pocket between between strand β3 and helix αC, leading to disruption of β3-αC contacts (Chaikaud et al., NSMB 2014; Pegram et al., PNAS 2019). To the extent that β3-αC contacts are important for conformation selection to the R-state, this may explain why SCH772984 favors the L-state. We revised the Discussion to add this point.
  
  g) The authors should mention a recent paper suggesting that it is possible to control substrate-binding affinity by changing the nature of the ATP-binding inhibitors ((DOI: 10.1126/sciadv.abo0696).
  
  Response. As recommended we added this point and citation to the Discussion.
  
  Reviewer #3 (Recommendations For The Authors):
  
  3.1. The manuscript is well written, but very long and sometimes repetitive. Some parts of the introduction are repeated in the result section and parts of the result section are repeated in the discussion. It will be easy to shorten the work to make it easier to read.
  
  Response: As recommended we streamlined the Discussion to remove some of the repetitive elements, while trying to retain the main conclusions and rationale for readers who are not well versed in kinase structure.
  
  3.2. Only specific residues are shown for the NMR spectra figures - while this is helpful to understand the concept, full spectra need to be shown to allow for direct comparison of the data quality (i.e. in supplemental material). If statements are made that measurements are done under full saturation - it should be shown that saturation is achieved in the measurements. All relaxation data should be made available - similar to CSPs.
  
  Response: As recommended, new Suppl. Figs. S2 and S9 were added to show the full spectra of each inhibitor complex analyzed by NMR. New Suppl. Fig. S3 now adds titrations of 2P-ERK2 with VTX11e and GDC0994.The results confirm binding saturation based on the disappearance of free enzyme peaks.
  
  3.3. No validation report was provided, nor a PDB number - so it is unclear if the crystal structures have been submitted - they need to be submitted in order to also access an mtz file, which is critical to understanding the quality of the structure (especially the ligand). This makes it difficult to assess the quality of the structures.
  
  Response: Table S1 has been revised to show data collection and refinement parameters for PDBID: 8U8K (2PERK2:Inh#8, Fig. 8C) and 8U8J (2P-ERK2:Inh#16, Fig. 8D). RCSB validation reports are attached and PDB depositions have been approved and will be released upon VOR assignment.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.12.557258v2
www.biorxiv.org www.biorxiv.org

Periaqueductal gray activates antipredatory neural responses in the amygdala of foraging rats

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews.
  
  Recommendations for the authors:
  
  We sincerely value the insightful and constructive feedback (italicized) provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. In response to these valuable comments, we have significantly revised the manuscript to enhance clarity and accuracy. Specifically, we have corrected an oversight related to the robot’s velocity and secondary antibody ratios, and addressed previously missing values in Figs. 3E and 4E. Importantly, these corrections did not alter the outcomes of our results. Additionally, we have enriched our manuscript with new data analyses, as reflected in Figures 1B, 1F, 2H-J, 4D, 4F-H, S1A, S1C-E, S3H, S5, and Table 1, ensuring a more comprehensive presentation of our findings. Below are our responses detailing each comment and explaining the modifications integrated into the revised manuscript.
  
  Reviewer 1:
  
  (1) To address the question of whether PAG photostimulation biases the cells that respond to the robot, a counterbalanced experiment, in which the BLA activity is initially recorded during the foraging vs. robot test and the PAG stimulation happens at the end of the session, should have been performed.
  
  In our study, we investigated fear behavior and BLA cell responses to intrinsic dPAG photostimulation (320 pulses) in naïve animals, followed by their reactions to an extrinsic predatory robot. We recognize the reviewer's concern regarding the potential influence of initial dPAG photostimulation on BLA neuron responses to the robot. We address this issue in our discussion (pg. 13) as follows: “However, it is crucial to consider the recent discovery that optogenetic stimulation of CA3 neurons (3000 pulses) leads to gain-of-function changes in CA3-CA3 recurrent (monosynaptic) excitatory synapses (Oishi et al., 2019). Although there is no direct connection between dPAG neurons and the BLA (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), and no studies have yet demonstrated gain-of-function changes in polysynaptic pathways to our knowledge, the potential for our dPAG photostimulation (320 pulses) to induce similar changes in amygdalar neurons, thereby enhancing their sensitivity to predatory threats, cannot be dismissed.”
  
  (2) In Figure 3, it is unclear which criteria (e.g. response latency, minimum Z score, spike fidelity) was used to identify the BLA neurons that were indirectly activated by PAG stimulation. A graphic containing at least the distribution of the response latencies for each BLA neuron after PAG laser activation is needed.
  
  We have specified the criteria for determining the responsiveness of BLA neurons to dPAG stimulation on page 22. This involves analyzing the first 500-ms post-stimulation across five 0.1-s bins. Units were classified as ‘stim cells’ if they showed z-scores greater than 3 (z > 3) in any of the bins during the initial 500-ms period post-stimulation. Neurons activated by both pellet procurement and dPAG stimulation were not included in the 'stim cell' category. Additionally, we have included a graphic in the revised manuscript (Fig. S3C) that presents the distribution of response latencies of BLA neurons to dPAG stimulation.
  
  (3) To strengthen the claim that it is a BLA-PVT-PAG circuit that carries information about predatory threat, a new experiment using CTB and cFos could be used to demonstrate that PAG neurons that project to PVT are recruited during the robot exposure.
  
  Our study primarily aimed to explore the transmission of threat signals between the dPAG and BLA. We acknowledge that our evidence for the PVT’s intermediary role, derived from CTB injections in the BLA and subsequent CTB+cFos co-labeling analysis in the PVT (Fig. 4G and 4H), is limited. Accordingly, we have moderated the emphasis on the PVT’s involvement in both the abstract and introduction. We now present the PVT’s role as a promising direction for future research in the discussion section of our revised manuscript.
  
  (4) In Fig 2, the authors' interpretation is that photostimulation of PAG neurons elicits fleeing responses in the rats. However, there is a vast literature demonstrating that the PAG is also involved in nociception. Although this is recognized by the authors in the first part of the introduction and briefly described in the discussion, the authors should more explicitly explain that PAG stimulation produces analgesia and thus is unlikely to underlie the escaping responses observed. This may not be intuitive for a broader audience.
  
  We appreciate the reviewer's insightful suggestion to elaborate on the PAG involvement in nociception and analgesia, as supported by the literature. While our initial manuscript acknowledged these functions, we have now expanded our discussion to address the PAG’s multifaceted roles (pg. 12): “As mentioned in the introduction, the dPAG is recognized as part of the ascending nociceptive pathway to the BLA (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). The dPAG is also implicated in non-opioid analgesia (e.g., Bagley and Ingram 2020, Cannon et al. 1982, Fields 2000). However, it is essential to emphasize that, despite its roles in pain modulation, the primary behavior observed in dPAG-stimulated, naive rats foraging for food in an open arena was goal-directed escape to the safe nest, underscoring the dPAG’s critical function in survival behaviors.” Note that this aligns with human studies on PAG stimulation (e.g., Carrive and Morgan 2012, Magierek et al. 2003), particularly those by Amano et al. (Amano et al. 1982), which reported patients feeling an urge to escape, similar to being chased, upon PAG stimulation.
  
  (5) To truly demonstrate the functional links between the PAG and BLA, more experiments are needed. For example, one could record from BLA neurons during the robot surge while performing optogenetic inhibition of the PAG neurons. There is also no evidence that activity in the indirect pathway that connects the PAG to the BLA is indispensable for the expression of defensive responses towards the robot (e.g., causality tests using chemogenetic or optogenetic inactivation).
  
  We agree that incorporating optogenetic inhibition of PAG neurons while simultaneously recording from BLA neurons during a robot surge would strengthen the evidence for the functional connectivity between the PAG and BLA. Such an experiment would necessitate the transfection and photoinhibition of a wide array of dPAG neurons responsive to predatory threats. This procedure is technically more viable in transgenic mouse models, given their suitability for genetic manipulation. In light of this, and in response to the suggestions in the Joint Public Review, we have revised the abstract, introduction, and discussion to offer a more cautious interpretation of our findings. This revision reflects a careful consideration of both the evidence and the limitations inherent in our study (pg. 13): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG is necessary for the BLA’s response to predatory threats. To establish causality, it is essential to conduct experiments such as optogenetic inhibition to determine whether the dPAG is indispensable for activating BLA neurons and initiating escape behavior in the face of threats. The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions (e.g., Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993), suggests the need for future studies using transgenic mouse models. Should inactivation of the dPAG negate the BLA's response to predatory threats, it would underscore the dPAG's central role in this defensive mechanism. Conversely, if BLA responses remain unaffected by dPAG inactivation, this could indicate the existence of multiple pathways for antipredatory defense mechanisms.”
  
  (6) The manuscript lacks information about the number of rats and trials that were used across the experiments (e.g. Fig 2G-J). In some occasions, the authors start the experiments with a specific number of animals and then reduce the N by half without providing a rationale (e.g. Fig. 3). Equally confusing is the experimental timeline. For example: a) Were the pre-robot, robot, and post-robot sessions always performed within the same day? b) It was described that microdrivable arrays were used, but did the same rats experienced the robot test more than one time? c) How many bins were used for normalization during the Z-score calculation and when were the data binned at 100 ms versus 1 s? d) How many trials were used for each analysis? For example, to identify robot cells, did the authors establish a minimum number of trials per animal to calculate the peristimulus time histograms? Having a significant number of trials is critical to make sure that the observed neuronal responses are replicable across the trials. e) How was the neuronal activity related to "pellet retrieval" aligned during robot sessions? Was the activity aligned with the moment in which the rat touches the pellet or when the animal returns to the nest with the pellet? f) How did the authors control for trials in which the rat consumed the pellets in the same local vs. those in which they returned to the nest to eat it? All these points are extremely important for future replicability.
  
  We apologize for any confusion caused by the initial lack of detail in our experimental procedures. The revised manuscript has been updated with comprehensive methodological details:
  
  (i) The study involved thirteen rats (ChR2, n = 9; EYFP, n = 4), subjected to dPAG stimulation using fixed light parameters (473 nm, 20 Hz, 10-ms pulse width, 2 s duration) during Long and Short pellet distance trials (refer to Fig. 2E-G). The stimulation intensity was adjusted to each animal's response (fleeing behavior), ranging from 1-3 mW. Additional testing occurred over multiple days, with incremental adjustments to stimulation parameters (intensity, frequency, duration) after confirming normal baseline foraging behavior (Fig. 2H-J, at x = 0). These details are now clearly depicted in the manuscript.
  
  (ii) The primary objective was to investigate BLA neuron responses to dPAG opto-stimulation. Six rats were initially tested, with three later assessed for their reactions to dPAG stimulation in the presence of an actual predator, to gauge behavioral effects.
  
  (iii) Regarding the experimental timeline:
  
  a) Pre-robot, robot, and post-robot sessions were conducted successively on the same day.
  
  b) Sessions with the robot predator were repeated until habituation occurred or when unit recordings were deemed invalid due to microdrive limitations or the absence of unit detection. Throughout these sessions, the success rate for pellet retrieval remained consistently low. Specifically, the mean success rate for the dPAG recordings was 2.803% + 1.311. For the BLA recordings, animals did not succeed in retrieving pellets during any of the robot trials. To provide a more detailed account of the methodology, the manuscript has been updated to include the number of recording days and the units recorded in the "Behavioral Procedures" section.
  
  c) As described in Materials and Methods, unit recording data were binned at 0.1-s intervals and normalized against a 5-s pre-event baseline (50 bins). For statistical analyses in Figure 1F’s rightmost column, 1-s bins were used to simplify post-hoc analysis corrections.
  
  d) Each recording session consisted of 5-15 trials. Trials were excluded if rats attempted to procure the pellet within 10 s post-dPAG stimulation or robot activation, ensuring accurate characterization of unit responsiveness. Consequently, the number of trials varied among subjects.
  
  e) Pellet retrieval was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger.
  
  f) Animals were trained to retrieve pellets and return to their nest for consumption prior to robot testing sessions, as elaborated in the “Baseline foraging” section.
  
  (7) In the abstract, the authors mention that predictive cues are ambiguous during naturalistic predatory threats, but it is not clear what do they mean by ambiguous. In addition, in the introduction section, the authors describe that the present study will investigate how the dPAG and BLA communicate threat signals. However, the author should clarify right in the beginning that these two regions are not monosynaptically connected with each other and cite the proper references.
  
  The abstract’s original sentence, “…where predictive cues are ambiguous and do not afford reiterative trial-and-error learning…” has been refined to “…characterized by less explicit cues and the absence of reiterative trial-and-error learning events …” This adjustment more accurately reflects that cues in natural settings often lack the clear and consistent quality of those in controlled experimental settings, which is necessary for the straightforward process of trial-and-error learning.
  
  Regarding the dPAG and BLA connectivity, the revised introduction (pg. 5) now states: “Considering the lack of direct monosynaptic projections between dPAG and BLA neurons (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), we utilized anterograde and retrograde tracers in the dPAG and BLA, respectively. This was complemented by c-Fos expression analysis following exposure to predatory threats. Our anatomical findings suggest that the paraventricular nucleus of the thalamus (PVT) may be part of a network that conveys predatory threat information from the dPAG to the BLA.”
  
  (8) In the introduction section, the authors should clarify that the US information is conveyed from the PAG to BLA via the lateral thalamus (posterior intralaminar nucleus, medial geniculate nucleus) or dorsal midline thalamus (paraventricular nucleus of the thalamus). The statement regarding how "the PAG functions as part of the ascending pain transmission pathway, providing footshock US information to the BLA" is misleading because the PAG does not send monosynaptic projections directly to the BLA.
  
  The revised text (pg. 3) now reads: “…suggest that the dPAG is part of the ascending US pain transmission pathway to the BLA, the presumed site for CS-US association formation (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). This pathway is thought to be mediated through the lateral and dorsal-midline thalamus regions, including the posterior intralaminar nucleus and paraventricular nucleus of the thalamus (Krout and Loewy, 2000; McNally, Johansen, and Blair, 2011; Yeh, Ozawa, and Johansen, 2021; but see Brunzell and Kim, 2001).”
  
  (9) The author's assumption that threat information flows from the PAG to the BLA, rather than BLA to PAG, based on electrical stimulation and lesion experiments performed in previous studies is problematic for at least three reasons: a) Electrical stimulation can activate fibers of passage as well as presynaptic neurons antidromically. b) The lesion approach may not have targeted 100% of the neurons in PAG, which extends anatomically along the antero-posterior axis of the midbrain for several millimeters in rats. This observation also disagrees with more recent studies using optogenetics and imaging tools demonstrating that the PAG is the downstream target of the BLA-CeA pathway. c) The authors cited prior reports describing the role of the amygdala-PAG pathway in dampening the US response and providing a negative signal to the PAG. However, a series of previous studies demonstrating that the PAG serves as the downstream target of the central nucleus of the amygdala for the expression of defensive response are completely ignored by the authors. Here are just some examples: Massi et al, 2023, PMID: 36652513; Tovote et al 2016, PMID: 27279213; Penzo et al, 2014 PMID: 24523533).
  
  We recognize the complexities in interpreting findings from electrical stimulation and lesion studies. Our prior work (Kim et al. 2013) supports the conclusion that predatory threat information directionally flows from the dPAG to the BLA, as evidenced by distinct behavioral outcomes from experimental manipulations of dPAG and BLA. Specifically, dPAG stimulation-induced fleeing behavior was blocked by BLA lesions (as well as muscimol inactivation), whereas BLA stimulation-induced fleeing was unaffected by dPAG or combined dPAG+vPAG lesions (refer to Fig. 5A), suggesting a flow from dPAG to BLA. Our manuscript further clarifies that dPAG optostimulation results confirmed that escape behavior in foraging rats, induce by dPAG electrical stimulation (Kim et al. 2013), was activated by intrinsic dPAG neurons rather than by fibers of passage or current spread to other brain regions.
  
  Furthermore, the PAG’s anatomical and functional diversity, with distinct segments along its longitudinal axis associated with different defensive behaviors, reinforces our conclusions. The dPAG is implicated in flight responses, while the vPAG is associated with freezing behavior (e.g., Bandler and Shipley 1994, Kim, Rison, and Fanselow 1993, Lefler, Campagner, and Branco 2020, Morgan, Whitney, and Gold 1998). The critiques' referenced studies primarily focus on the BLA-CeA-vPAG circuit's role in freezing during Pavlovian fear conditioning, contrasting with our emphasis on the dPAG-PVT-BLA circuit and its mediation in escape behavior in response to naturalistic predatory threats.
  
  We also note that different invasive procedures can yield varying behavioral outcomes. For example, both acute (e.g., optogenetic and muscimol inactivation) and chronic (e.g., surgical ablation) manipulations within the same brain circuit have shown diverse effects across species (Otchy et al. 2015). Moreover, optogenetics comes with its own set of conceptual and technical challenges (Adamantidis et al. 2015), including the difficulty of targeting, quantifying and photo-inhibiting 100% of PAG neurons. Despite the limitations of each technique, our collective evidence from lesions, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recordings (the present study) supports the premise that the dPAG acts upstream of the BLA in processing predatory threat information.
  
  (10) In the discussion, the authors suggest that the PVT may be the interface between the PAG and the BLA for the expression of antipredatory defensive behavior during their foraging vs. robot test, but previous studies looking at the role of PVT in antipredator defensive behavior and/or approach-avoidance conflict tasks are not cited and discussed in the manuscript (Engelke et al, 2021, PMID: 33947849; Choi et al 2019, PMID: 30979815; Choi and McNally 2017, PMID: 28193686).
  
  We thank the reviewer for pointing out these pivotal studies, which we have carefully reviewed and integrated into the revised manuscript (pg. 14): “These results, in conjunction with previous research on the roles of the dPAG, PVT, and BLA in producing flight behaviors in naïve rats (Choi and Kim 2010, Daviu et al. 2020, Deng, Xiao, and Wang 2016, Kim et al. 2013, Kim et al. 2018, Kong et al. 2021, Ma et al. 2021, Reis et al. 2021), the anterior PVT’s involvement in cat odor-induced avoidance behavior (Engelke et al. 2021), and the PVT’s regulation of behaviors motivated by both appetitive and aversive stimuli (Choi and McNally 2017, Choi et al. 2019), suggest the involvement of the dPAGàPVTàBLA pathways in antipredatory defensive mechanisms, particularly as rats leave the safety of the nest to forage in an open arena (Figure 4I) (Reis et al. 2023).”
  
  (11) The authors use the expression "looming robot predator" in many cases throughout the manuscript. However, it is unclear whether the defensive responses observed in the rats are elicited by the looming stimulus produced by the movement of the robot towards the rats. The authors describe that rats do not respond to a stationary robot, but would the sound produced by the movement of the robot elicit defensive responses? Would non-approaching lateral or dorsoventral movements (not associated with looming) be sufficient to induce defensive behavior in the rats? There is a vast literature in the field about defensive behaviors induced by looming stimuli. The authors should empirically demonstrate that the escaping responses induced by the robot are mediated by looming or refrain to use the looming terminology to avoid confusion.
  
  Our use of "looming robot predator" is based on empirical evidence from a prior parametric study, which identified the forward, or 'looming,' motion of the Robogator as the key stimulus eliciting a flight response in rats (Kim, Choi, and Lee 2016). This reaction significantly decreased when the robot moved backward from the same starting position, producing a similar sound, and was absent when the robot remained stationary. This suggests that neither sound alone nor the mere presence of a novel object provokes goal-directed escape behavior (Kong et al. 2021). This aligns with studies indicating that simulated looming stimuli, like an expanding disk, induce flight or freezing responses in mice (De Franceschi et al. 2016, Yilmaz and Meister 2013).
  
  It should be noted that the 2013 study by Yilmaz & Meister (Yilmaz and Meister 2013) on the looming disk paradigm showed that not all mice responded to the stimuli (e.g., Figs. 2A and 3A), with those that did exhibiting rapid habituation by the second exposure. This contrasts with our predatory robot paradigm (Choi and Kim 2010), where all rats consistently fled from the looming robotic predator across multiple trials, underscoring the critical role of looming motion in simulating predator attacks that trigger flight behavior in rats.
  
  Thus, the term "looming" accurately captures the nature of the robot's movement and its effect on eliciting defensive responses in rats. Nonetheless, should the editors agree with the reviewer's suggestion to minimize potential confusion, we are willing to substitute "looming" with "approaching," although we consider the terms to be synonymous in the context of our study.
  
  (12) If the authors are citing the Rescorla-Wagner model, they should include at least one additional sentence to explain it, as many people in the field are not familiar with this model.
  
  In response to the request for clarification on the Rescorla-Wagner model, we have added an explanatory sentence (pg. 4): “Fundamentally, the negative feedback circuit between the amygdala and the dPAG serves as a biological implementation of the Rescorla–Wagner (1972) model, a foundational theory of associative learning that emphasizes the importance of prediction errors in reinforcement (i.e., US), as applied to FC (Fanselow 1998).”
  
  (13) The authors need to include the normality test used to determine whether a parametric or non-parametric statistical analysis was the most appropriate test for each experiment.
  
  We have included the outcomes of the normality tests, detailed in Table S1.
  
  (14) In Fig. 1F, the authors show a representative PAG neuron with peristimulus-time histogram and rasters reaching frequencies higher than 100 Hz and sustained firing rates of >50 Hz following robot activation. The authors should include a firing rate analysis (e.g., average firing rate and maximum firing rate before and after robot activation) of the 22 robot-responsive PAG neurons recorded during the session to clarify whether this high firing rate, which is atypical in other brain regions, is commonly observed in the PAG. Showing the isolated waveforms of some representative neurons would help to clarify whether the activity is being recorded from a single-isolated unit instead of multiple units within the same channel.
  
  In response to the critique, we have expanded our analysis to include both average and maximum firing rates before and after robot activation for the 22 robot-responsive PAG neurons. This detailed firing rate analysis, illustrating their distribution, has been incorporated into the revised manuscript (refer to Figure S1C and S1D). Furthermore, to alleviate concerns regarding the identification of single-unit activity versus potential multi-unit recordings, we have included peri-event raster plots and waveforms for two additional representative neurons in Figure 1F.
  
  (15) In Figure 2, the authors should indicate when the recordings are performed on anesthetized vs. freely-moving awake animals.
  
  In the original manuscript, we specified that the optrode recordings depicted in Figure 2B were conducted on anesthetized rats. To enhance clarity and directly address the critique, we have now clearly indicated this condition in Figure 2A as well.
  
  (16) The optogenetic stimulation parameters used in Fig 2H indicate that 0.5 mW was sufficient to induce behavioral changes. This is surprising because most optogenetic experiments in the field use much higher intensities (> 5mW). If much lower intensities are sufficient to drive PAG-mediated behaviors, this may be a very important observation that should be conveyed to the field. I recommend the reviewers clarify if they in fact used 0.5 mW and then discuss that the laser intensity used in the experiments was 10X lower than that required for other brain regions
  
  In our study, we indeed observed that 0.5 mW of dPAG stimulation increased the latency to procure the pellet without completely preventing the action. Notably, at 1 mW, more than half of the animals (n = 5/9 rats; Fig. 2H) and at 3 mW, all rats (9/9) failed to procure the pellet and fled from the foraging area to the nest (Fig. 2G). These results indicate that even lower intensities were sufficient to elicit behavioral changes through dPAG stimulation in a large foraging arena, highlighting the dPAG's sensitivity to optogenetic manipulation. This finding is consistent with our earlier research on dPAG electrical stimulation, which required significantly lower intensities to provoke defensive behaviors compared to the BLA. Specifically, the stimulation intensity needed for aversive behavior in the dPAG was substantially lower (dPAG: 65.0 ± 6.85 µA) than for the BLA (BLA: 275.0 ± 24.44 µA) (Kim et al. 2013). Furthermore, Deng et al. (Deng, Xiao, and Wang 2016) showed that 1 mW of blue light could elicit a 60% freezing response, with 2 mW triggering flight behavior within a latency of 0.6 seconds.
  
  (17) In Fig 2 G-J, how many animals are being used per group and how was the sequence of the experiments performed? This is very important for replicability.
  
  A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J. The experimental sequence for these animals consisted of successive pre-stimulation, stimulation, post-stimulation, and robot sessions. We have updated the manuscript to include this information.
  
  (18) For the photostimulation of PAG neurons in Figs. 2 and 3, the authors need to clarify if the same parameters of laser stimulation used during the anesthetized recordings were also used during the behavioral tests. Also, the wavelength corresponding to the blue laser should be 473 nm instead of 437 nm.
  
  We thank the reviewer for identifying the error. We confirm that the opto-stimulation parameters (473 nm, 10-ms pulse width, 2 s duration) were consistently applied across both anesthetized recordings and behavioral tests. This consistency has been explicitly stated in the revised manuscript to ensure clarity regarding our experimental approach.
  
  (19) In Fig. 3I, how was the representative trials selected? Instead of picking up the most representative trials, the authors should demonstrate the response of the cell during the entire session.
  
  In response to the critique, we clarify that the color-coded PETH shown in Fig. 3I represents averaged BLA activity across a comprehensive set of trials. This includes 8 pre-stimulation, 10 stimulation, and 8 post-stimulation trials for the robot-activated sessions, with a similar distribution for non-stimulated sessions. This approach was chosen to provide a representative overview of the cell's response throughout the entire session. To address the request for more detailed data, we have added traditional PETHs to the revised manuscript (see Fig. S3H), which depict the cell's response across all trials.
  
  (20) Fig 4 D should demonstrate a colabeling between the anterograde PAG fibers in the PVT and the retrogradely labeled neurons from BLA instead of PAG fibers only.
  
  We wish to clarify that Fig. 4D is intended to show the distribution of dPAG terminals within the midline thalamic nuclei, as noted in prior research (Krout and Loewy 2000). Although dPAG terminals are distributed throughout the midline thalamus, our observations have specifically highlighted a notable increase in c-Fos expression within the paraventricular nucleus of the thalamus (PVT) in rats subjected to the robotic predator stimulus, in contrast to those in the foraging-only control condition (Fig. 4E). Addressing the reviewer's point, we direct attention to Fig. 4G, which includes images labeled "Robot-experienced" and "Merge." This figure demonstrates a subset of PVT neurons that were retrogradely labeled with CTB injected into the BLA, anterogradely labeled with AAV injected into the dPAG, and activated (as indicated by c-Fos expression) in response to the robotic predator. This provides specific colabeling evidence between anterograde PAG fibers in the PVT and retrogradely labeled neurons from the BLA, directly addressing the critique.
  
  (21) The resolution of the cFos images is very low and makes it hard to appreciate.
  
  We have updated Figs. 4F and 4G with high-resolution versions to ensure the details are more clearly visible. Furthermore, should there be a need for even greater clarity, we are prepared to supply the images as TIFF files, which are known for preserving high image quality.
  
  Reviewer 2:
  
  (1) The text is clearly written, and I appreciated the inclusion of interesting citations, such as the one about paintings by cavemen. The authors also do a good job of discussing the underlying theoretical framework and the figures are easy to understand. Although the topic is very interesting, the amount of novel work is somewhat low. Figure 1 shows that dPAG cells are activated by the predator, and this has been shown by many prior reports. Similarly, Figure 2 shows that dPAG activation creates defensive responses, and this too has been shown by many prior reports.
  
  We appreciate the reviewer’s positive remarks. We acknowledge the rich body of research documenting dPAG neuronal activation by various predator cues such as odors (e.g., fox urine) (Lu et al. 2023), and scenarios involving anesthetized or spontaneously moving rat/cat predators, either physically partitioned or harness-restrained (Bindi et al. 2022, Deng, Xiao, and Wang 2016, Esteban Masferrer et al. 2020). Nevertheless, our study distinguishes itself by examining dPAG neuronal responses to a robotic predator, uniquely designed to replicate consistent looming motions across multiple trials and subjects within an environment that simulates natural foraging conditions, inclusive of a safe nest (cf. Choi and Kim, 2010). This approach allowed us to not only reveal the immediate activation of dPAG neurons in response to a rapidly approaching predator but also to explore the consequent fleeing behavior towards safety, thereby providing new insights into the dPAG's role in mediating goal-directed defensive responses in a more ecologically-relevant setting. Furthermore, our investigation extends beyond these findings to assess the impact of dPAG activation on BLA neuronal responses and their functional connectivity during predator-prey interactions, offering a fresh perspective on the neural circuits that support survival behaviors in animals when confronted with naturalistic threats.
  
  (2) The results in Figure 3 are novel and interesting, but the characterization of BLA activity is incomplete. For example, what are the percentages of BLA cells that are inhibited or activated by all major behaviors observed? These behaviors include approach to pellet, escape from robot, freezing, stretch-attend postures, etc. These same analyses should also be added to dPAG activity in Figure 1. How does BLA single cell encoding of these behaviors relate to their responsivity to dPAG stimulation? And, finally, it is unclear what is the significance of BLA correlated synchronous firing. Is the animal more or less likely to be performing certain behaviors when correlated BLA firing occurs?
  
  Our analysis, as presented in Figs. 3I, 3K, and S3D-F, selectively focused on BLA cell responses during distinct behaviors such as approaching a pellet and escaping from the robot. These behaviors were selected because their precise temporal markers allow for accurate correlation with BLA cell activity, building on the findings of our previous research (Kim et al. 2018, Kong et al. 2021).
  
  The robot's motion, programmed to advance a fixed distance before retreating to its starting position, is designed to repeatedly elicit foraging, thus facilitating analysis of neural changes during conflict situations involving food approach and predator avoidance. However, this also leads to the rapid diminution of freezing and stretch-attend postures inside the nest as animals quickly adapt to the robot's movement pattern, rendering a time-stamped analysis of these behaviors unfeasible under our experimental conditions. While the inclusion of these behaviors in our analysis would be insightful, especially in extended interaction scenarios where the robot advances to the nest opening and remains before returning in a less predictable manner, such conditions would likely reduce foraging behavior due to increased fear, deviating from our study's primary objective of elucidating the interactions between the dorsal periaqueductal gray (dPAG) and the basolateral amygdala (BLA) functions.
  
  Regarding the significance of BLA correlated synchronous firing, our findings, particularly in Figures 3M-O and S4, demonstrate significant synchronous activity among BLA neuronal pairs during encounters with the robot, as opposed to pre-stim, stim, and post-stim sessions. This synchrony is notably prominent among neurons responsive to dPAG stimulation, indicating that BLA neurons involved in processing dPAG signals may play a crucial role in enhancing BLA network coherence to effectively manage predatory threat information (pg. 13).
  
  (3) In Figure 4, the authors identify the PVT as a potential region that can mediate dPAG to BLA communication via anatomical tracing. However, functional assays are missing. For example, if the PVT is inhibited chemogenetically, does this result in a smaller number of BLA cells that are activated by dPAG stimulation? Does activation of the dPAG-PVT or the PVT-BLA projections cause defensive behaviors? Functionally showing that the dPAG-PVT-BLA circuit controls defensive actions would be a major advance in the field and would greatly enhance the significance of this paper. It would also provide an anatomical substrate to support the view that the BLA is downstream of the dPAG, which was first demonstrated by the authors in their elegant 2013 PNAS paper.
  
  We appreciate the reviewer’s constructive critique and valuable suggestions on the necessity for functional validation of the dPAG-PVT-BLA circuit's involvement in mediating defensive behaviors. In light of these comments, we have carefully considered and included a discussion on the importance of these proposed experiments as a direction for future research in our manuscript revision (also see response to Reviewer 1’s critique #5).
  
  Our initial work in 2013 (Kim et al. 2013) laid the groundwork for identifying BLA neurons responsive to dPAG stimulation and suggested the PVT as a potential relay in this neural circuit. Recognizing the limitations of our current study, which does not include direct functional assays, we have adjusted our manuscript to convey the speculative aspect of the dPAG-PVT-BLA circuit’s role more accurately. Moreover, we have enriched our discussion by citing relevant studies that lend support to our proposed circuit mechanism. These references serve to place our findings within the broader context of existing research and highlight the imperative for subsequent studies to empirically confirm the functional significance of the dPAG-PVT-BLA pathway in driving defensive behaviors.
  
  Reviewer 3:
  
  (1) The Introduction refers to a negative feedback amygdala-dPAG from a study of the Johansen group, but in this case, the authors were referring to the ventrolateral and not the dorsal PAG.
  
  We thank the reviewer for pointing out the need to distinguish between the dPAG and vPAG regions in our introduction. While Johansen et al. (2010) investigated the roles of PAG (including both dPAG and vPAG regions; see their Supplementary Figs. 4, 5, and 10), the differentiation between their specific contributions to the amygdala's negative feedback mechanism was not explicitly detailed in their initial publication. This distinction was further elaborated upon in later work by the same group (Yeh, Ozawa, and Johansen 2021), which specifically illuminated the dPAG's role in conditioned fear memory formation and its neural pathways to the PVT that influence fear learning. To reflect this nuanced understanding, we have revised our introduction (pg. 3): “In parallel, Johansen et al. (2010) found that pharmacological inhibition of the PAG, encompassing both dPAG and vPAG regions, diminishes the behavioral and neural responses in the amygdala elicited by periorbital shock US, thereby impairing the acquisition of auditory FC.”
  
  (2) In the experiments recording dPAG in response to the predator threat, the authors mentioned cells activated by the predator threat, referred to as "robot cells." Were these cells inhibited in response to threat?
  
  In the Result and Materials and Methods sections, we report that 23.4% (22 out of 94) of dPAG neurons, termed “robot cells,” showed a significant increase in firing rates (z > 3) within a latency of less than 500 ms during exposure to the looming robot threat, but not during the pre- and post-robot sessions. These cells are highlighted in Figures 1E-G. In contrast, we identified only a single unit exhibiting a decrease in activity (z-score < -3) in response to the robot threat. Given the overwhelming prevalence of cells with excitatory responses to the threat, our discussions and analyses have primarily centered on these excited cells. Nevertheless, to ensure a full depiction of our observations, we have included data on the inhibited unit in the revised manuscript, specifically in Figure S1E.
  
  (3) The authors claim that tetrodes were implanted in the dorsal PAG; however, the electrodes' tips shown in the figures are positioned more ventrally in the lateral PAG (see Figures 1B, S5A).
  
  The PAG is anatomically organized into dorsomedial (dmPAG), dorsolateral (dlPAG), lateral (lPAG), and ventrolateral (vlPAG) columns along the rostro-caudal axis of the aqueduct. The designation "dorsal PAG" (dPAG) traditionally encompasses the dmPAG, dlPAG, and lPAG regions, a classification supported by extensive track-tracing, neurochemical, and immunohistochemical evidence (e.g., (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993)). As Bandler and Shipley (Bandler and Shipley 1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (Schenberg et al. 2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switch-off behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, electrode placements were strictly within these specified dPAG regions. The electrode tip locations depicted in Figures 1B and S5A correspond with the -6.04 mm template (left panel below) from Paxinos & Watson’s atlas (Paxinos and Watson 1998), situated anteriorly to the emergence of the vlPAG (right panel below). To enhance clarification in our manuscript, we provide a detailed definition of the dPAG that includes the dmPAG, dlPAG, and lPAG, and support our electrode placement rationale with references to established literature (pg. 5).
  
  Author response image 1.
  
  (4) It would be nice to include a series of observations applying inhibitory tools (i.e., optogenetic photo inhibition) in the dPAG and BLA and see how they affect the behavioral responses in the 'approach food-avoid predator' paradigm. Moreover, it would be interesting to explore how inhibiting the dPAG to PVT pathway influences the flee response during the robot surge.
  
  We appreciate the suggestion to explore the effects of optogenetic inhibition in the dPAG and BLA on behavioral responses within the 'approach food-avoid predator' paradigm, as well as the potential impact of inhibiting the dPAG to PVT pathway on flee responses during robot surge incidents. As mentioned in our response to Reviewer 1’s critique #5, the application of optogenetic inhibition necessitates transfecting, quantifying, and photoinhibiting a comprehensive set of dPAG neurons activated by predatory threats. This approach is more viable in future studies that can leverage transgenic mouse models for their genetic tractability. Following the Joint Public Review’s recommendations, we have revised our manuscript to ensure a more measured interpretation of our data, carefully balancing the evidence from tracer studies against the limitations of our current methodology.
  
  Furthermore, referencing Reviewer 1’s critique #9, it is important to consider that various invasive techniques can yield different behavioral outcomes. For instance, research by Olveczky and colleagues (Otchy et al. 2015) demonstrated that acute manipulations (i.e., optogenetic and muscimol inactivation) and chronic surgical ablation of the same brain circuit can produce distinct effects in rats and finches. Despite these methodological constraints, our collective results from lesion, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recording (present) studies cohesively suggest that the dPAG functions upstream of the BLA in processing predatory threat signals.
  
  (5) The authors should also examine whether 'synaptic' appositions exist between the anterogradely labeled terminals from the dPAG and the double labeled CTB and cFOS neurons in the PVT.
  
  We appreciate the suggestion to investigate the presence of synaptic appositions, which could potentially offer valuable insights into the synaptic connections and functional interactions within this neural circuit. However, due to the specialized nature of electron microscopy required for these examinations and the extensive resources it entails, this line of inquiry falls beyond the scope of our current study. We hope to address this aspect in future studies, where we can dedicate the necessary resources and expertise to conducting these intricate analyses.
  
  (6) It is odd to see the projection fields shown in Fig. 4D, where the projection to the PVT looks much sparser compared to other targets in the thalamus and hypothalamus. If the projection to the PVT has such an important function, why does it seem so weak? This should be discussed. Also, because the projection to the PVT seems sparse, the authors should consider alternative paths like the one involving the cuneiform nucleus. The cuneiform nucleus is an important region responding to looming shadows with strong bidirectional links to the dorsolateral periaqueductal gray, providing strong projections to the rostral PVT.
  
  The perceived scarcity of the dPAG-PVT pathway might not reflect its functional significance accurately. The PVT's small size could make its projections appear less dense in broad anatomical studies. To address this, we have updated Figure 4D with a high-resolution image that offers a detailed view of the PVT region. This enhancement (refer to the updated Fig. 4, bottom) more accurately depicts the projection density within the PVT. It is also critical to consider that the functional impact of neural pathways is not solely dependent on the quantity of projecting neurons. For instance, work by Deisseroth and colleagues (Rajasethupathy et al. 2015) has shown that even relatively sparse monosynaptic projections from the anterior cingulate cortex to the hippocampus can exert significant effects on neural circuit dynamics. Additionally, we have expanded our discussion to consider the potential roles of other circuits, such as the cuneiform nucleus, in driving the behavioral responses observed in our study (pg. 15): “Given the recent significance attributed to the superior colliculus in detecting innate visual threats (Lischinsky and Lin 2019, Wei et al. 2015, Zhou et al. 2019) and the cuneiform nucleus in the directed flight behavior of mice (Bindi et al. 2023, Tsang et al. 2023), further exploration into the communication between these structures and the dPAG-BLA circuitry is warranted.”
  
  (7) Finally, in the Discussion, it would be nice to comment on how the BLA mediates flee responses. Which pathways are likely involved?
  
  This excellent suggestion has been incorporated in the discussion (pg. 15): “Future studies will also need to delineate the downstream pathways emanating from the BLA that orchestrate goal-directed flight responses to external predatory threats as well as internal stimulations from the dPAG/BLA circuit. Potential key structures include the dorsal/posterior striatum, which has been associated with avoidance behaviors in response to airpuff in head-fixed mice (Menegas et al. 2018) and flight reactions triggered by auditory looming cues (Li et al. 2021). Additionally, the ventromedial hypothalamus (VMH) has been implicated in flight behaviors in mice, evidenced by responses to the presence of a rat predator (Silva et al. 2013) and upon optogenetic activation of VMH Steroidogenic factor 1 (Kunwar et al. 2015) or the VMH-anterior hypothalamic nucleus pathway (Wang, Chen, and Lin 2015). Investigating the indispensable role of these structures in flight behavior could involve lesion or inactivation studies. Such interventions are anticipated to inhibit flight behaviors elicited by amygdala stimulation and predatory threats, confirming their critical involvement. Conversely, activating these structures in subjects with an inactivated or lesioned amygdala, which would typically inhibit fear responses to external threats (Choi and Kim 2010), is expected to induce fleeing behavior, further elucidating their functional significance.”
  
  Adamantidis, A., S. Arber, J. S. Bains, E. Bamberg, A. Bonci, G. Buzsaki, J. A. Cardin, R. M. Costa, Y. Dan, Y. Goda, A. M. Graybiel, M. Hausser, P. Hegemann, J. R. Huguenard, T. R. Insel, P. H. Janak, D. Johnston, S. A. Josselyn, C. Koch, A. C. Kreitzer, C. Luscher, R. C. Malenka, G. Miesenbock, G. Nagel, B. Roska, M. J. Schnitzer, K. V. Shenoy, I. Soltesz, S. M. Sternson, R. W. Tsien, R. Y. Tsien, G. G. Turrigiano, K. M. Tye, and R. I. Wilson. 2015. "Optogenetics: 10 years after ChR2 in neurons--views from the community." Nat Neurosci 18 (9):1202-12. doi: 10.1038/nn.4106.
  
  Amano, K., T. Tanikawa, H. Kawamura, H. Iseki, M. Notani, H. Kawabatake, T. Shiwaku, T. Suda, H. Demura, and K. Kitamura. 1982. "Endorphins and pain relief. Further observations on electrical stimulation of the lateral part of the periaqueductal gray matter during rostral mesencephalic reticulotomy for pain relief." Appl Neurophysiol 45 (1-2):123-35.
  
  Bagley, E. E., and S. L. Ingram. 2020. "Endogenous opioid peptides in the descending pain modulatory circuit." Neuropharmacology 173:108131. doi: 10.1016/j.neuropharm.2020.108131.
  
  Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization." Prog Brain Res 87:269-305. doi: 10.1016/s0079-6123(08)63056-3.
  
  Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression." Prog Brain Res 107:285-300. doi: 10.1016/s0079-6123(08)61871-3.
  
  Bandler, R., and M. T. Shipley. 1994. "Columnar organization in the midbrain periaqueductal gray: modules for emotional expression?" Trends Neurosci 17 (9):379-89. doi: 10.1016/0166-2236(94)90047-7.
  
  Bindi, R. P., C. C. Guimaraes, A. R. de Oliveira, F. F. Melleu, M. A. X. de Lima, M. V. C. Baldo, S. C. Motta, and N. S. Canteras. 2023. "Anatomical and functional study of the cuneiform nucleus: A critical site to organize innate defensive behaviors." Ann N Y Acad Sci 1521 (1):79-95. doi: 10.1111/nyas.14954.
  
  Bindi, R. P., R. G. O. Maia, F. Pibiri, M. V. C. Baldo, S. L. Poulter, C. Lever, and N. S. Canteras. 2022. "Neural correlates of distinct levels of predatory threat in dorsal periaqueductal grey neurons." Eur J Neurosci 55 (6):1504-1518. doi: 10.1111/ejn.15633.
  
  Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections." J Comp Neurol 351 (4):585-601. doi: 10.1002/cne.903510408.
  
  Cannon, J. T., G. J. Prieto, A. Lee, and J. C. Liebeskind. 1982. "Evidence for opioid and non-opioid forms of stimulation-produced analgesia in the rat." Brain Res 243 (2):315-21. doi: 10.1016/0006-8993(82)90255-4.
  
  Carrive, P, and M. M. Morgan. 2012. "Periaqueductal Gray." In The Human Nervous System, edited by J. K.; Paxinos Mai, G., 367-400. London: Academic Press.
  
  Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization." Behav Brain Res 58 (1-2):27-47. doi: 10.1016/0166-4328(93)90088-8.
  
  Choi, E. A., P. Jean-Richard-Dit-Bressel, C. W. G. Clifford, and G. P. McNally. 2019. "Paraventricular Thalamus Controls Behavior during Motivational Conflict." J Neurosci 39 (25):4945-4958. doi: 10.1523/JNEUROSCI.2480-18.2019.
  
  Choi, E. A., and G. P. McNally. 2017. "Paraventricular Thalamus Balances Danger and Reward." J Neurosci 37 (11):3018-3029. doi: 10.1523/JNEUROSCI.3320-16.2017.
  
  Choi, J. S., and J. J. Kim. 2010. "Amygdala regulates risk of predation in rats foraging in a dynamic fear environment." Proc Natl Acad Sci U S A 107 (50):21773-7. doi: 10.1073/pnas.1010079108.
  
  De Franceschi, G., T. Vivattanasarn, A. B. Saleem, and S. G. Solomon. 2016. "Vision Guides Selection of Freeze or Flight Defense Strategies in Mice." Curr Biol 26 (16):2150-4. doi: 10.1016/j.cub.2016.06.006.
  
  De Oca, B. M., J. P. DeCola, S. Maren, and M. S. Fanselow. 1998. "Distinct regions of the periaqueductal gray are involved in the acquisition and expression of defensive responses." J Neurosci 18 (9):3426-32. doi: 10.1523/JNEUROSCI.18-09-03426.1998.
  
  Deng, H., X. Xiao, and Z. Wang. 2016. "Periaqueductal Gray Neuronal Activities Underlie Different Aspects of Defensive Behaviors." J Neurosci 36 (29):7580-8. doi: 10.1523/JNEUROSCI.4425-15.2016.
  
  Engelke, D. S., X. O. Zhang, J. J. O'Malley, J. A. Fernandez-Leon, S. Li, G. J. Kirouac, M. Beierlein, and F. H. Do-Monte. 2021. "A hypothalamic-thalamostriatal circuit that controls approach-avoidance conflict in rats." Nat Commun 12 (1):2517. doi: 10.1038/s41467-021-22730-y.
  
  Esteban Masferrer, M., B. A. Silva, K. Nomoto, S. Q. Lima, and C. T. Gross. 2020. "Differential Encoding of Predator Fear in the Ventromedial Hypothalamus and Periaqueductal Grey." J Neurosci 40 (48):9283-9292. doi: 10.1523/JNEUROSCI.0761-18.2020.
  
  Fanselow, M. S. 1998. "Pavlovian conditioning, negative feedback, and blocking: mechanisms that regulate association formation." Neuron 20 (4):625-7. doi: 10.1016/s0896-6273(00)81002-8.
  
  Fields, H. L. 2000. "Pain modulation: expectation, opioid analgesia and virtual pain." Prog Brain Res 122:245-53. doi: 10.1016/s0079-6123(08)62143-3.
  
  Gross, C. T., and N. S. Canteras. 2012. "The many paths to fear." Nat Rev Neurosci 13 (9):651-8. doi: 10.1038/nrn3301.
  
  Herry, C., and J. P. Johansen. 2014. "Encoding of fear learning and memory in distributed neuronal circuits." Nat Neurosci 17 (12):1644-54. doi: 10.1038/nn.3869.
  
  Kim, E. J., O. Horovitz, B. A. Pellman, L. M. Tan, Q. Li, G. Richter-Levin, and J. J. Kim. 2013. "Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats." Proc Natl Acad Sci U S A 110 (36):14795-800. doi: 10.1073/pnas.1310845110.
  
  Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats." Sci Adv 4 (4):eaar7328. doi: 10.1126/sciadv.aar7328.
  
  Kim, J. J., J. S. Choi, and H. J. Lee. 2016. "Foraging in the face of fear: Novel strategies for evaluating amygdala functions in rats." In Living without an amygdala, edited by D. G. Amaral and R. Adolphs, 129-148. The Guilford Press.
  
  Kim, J. J., R. A. Rison, and M. S. Fanselow. 1993. "Effects of amygdala, hippocampus, and periaqueductal gray lesions on short- and long-term contextual fear." Behav Neurosci 107 (6):1093-8. doi: 10.1037//0735-7044.107.6.1093.
  
  Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network." Elife 10. doi: 10.7554/eLife.72040.
  
  Krout, K. E., and A. D. Loewy. 2000. "Periaqueductal gray matter projections to midline and intralaminar thalamic nuclei of the rat." J Comp Neurol 424 (1):111-41. doi: 10.1002/1096-9861(20000814)424:1<111::aid-cne9>3.0.co;2-3.
  
  Kunwar, P. S., M. Zelikowsky, R. Remedios, H. Cai, M. Yilmaz, M. Meister, and D. J. Anderson. 2015. "Ventromedial hypothalamic neurons control a defensive emotion state." Elife 4. doi: 10.7554/eLife.06633.
  
  Lefler, Y., D. Campagner, and T. Branco. 2020. "The role of the periaqueductal gray in escape behavior." Curr Opin Neurobiol 60:115-121. doi: 10.1016/j.conb.2019.11.014.
  
  Li, Z., J. X. Wei, G. W. Zhang, J. J. Huang, B. Zingg, X. Wang, H. W. Tao, and L. I. Zhang. 2021. "Corticostriatal control of defense behavior in mice induced by auditory looming cues." Nat Commun 12 (1):1040. doi: 10.1038/s41467-021-21248-7.
  
  Lischinsky, J. E., and D. Lin. 2019. "Looming Danger: Unraveling the Circuitry for Predator Threats." Trends Neurosci 42 (12):841-842. doi: 10.1016/j.tins.2019.10.004.
  
  Lu, B., P. Fan, M. Li, Y. Wang, W. Liang, G. Yang, F. Mo, Z. Xu, J. Shan, Y. Song, J. Liu, Y. Wu, and X. Cai. 2023. "Detection of neuronal defensive discharge information transmission and characteristics in periaqueductal gray double-subregions using PtNP/PEDOT:PSS modified microelectrode arrays." Microsyst Nanoeng 9:70. doi: 10.1038/s41378-023-00546-8.
  
  Magierek, V., P. L. Ramos, N. G. da Silveira-Filho, R. L. Nogueira, and J. Landeira-Fernandez. 2003. "Context fear conditioning inhibits panic-like behavior elicited by electrical stimulation of dorsal periaqueductal gray." Neuroreport 14 (12):1641-4. doi: 10.1097/00001756-200308260-00020.
  
  McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit." Trends Neurosci 34 (6):283-92. doi: 10.1016/j.tins.2011.03.005.
  
  Menegas, W., K. Akiti, R. Amo, N. Uchida, and M. Watabe-Uchida. 2018. "Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli." Nat Neurosci 21 (10):1421-1430. doi: 10.1038/s41593-018-0222-1.
  
  Morgan, M. M., P. K. Whitney, and M. S. Gold. 1998. "Immobility and flight associated with antinociception produced by activation of the ventral and lateral/dorsal regions of the rat periaqueductal gray." Brain Res 804 (1):159-66. doi: 10.1016/s0006-8993(98)00669-6.
  
  Otchy, T. M., S. B. Wolff, J. Y. Rhee, C. Pehlevan, R. Kawai, A. Kempf, S. M. Gobes, and B. P. Olveczky. 2015. "Acute off-target effects of neural circuit manipulations." Nature 528 (7582):358-63. doi: 10.1038/nature16442.
  
  Paxinos, G., and C. Watson. 1998. The Rat Brain in Stereotaxic Coordinates. San Diego: Academic Press.
  
  Rajasethupathy, P., S. Sankaran, J. H. Marshel, C. K. Kim, E. Ferenczi, S. Y. Lee, A. Berndt, C. Ramakrishnan, A. Jaffe, M. Lo, C. Liston, and K. Deisseroth. 2015. "Projections from neocortex mediate top-down control of memory retrieval." Nature 526 (7575):653-9. doi: 10.1038/nature15389.
  
  Ressler, R. L., and S. Maren. 2019. "Synaptic encoding of fear memories in the amygdala." Curr Opin Neurobiol 54:54-59. doi: 10.1016/j.conb.2018.08.012.
  
  Schenberg, L. C., R. M. Povoa, A. L. Costa, A. V. Caldellas, S. Tufik, and A. S. Bittencourt. 2005. "Functional specializations within the tectum defense systems of the rat." Neurosci Biobehav Rev 29 (8):1279-98. doi: 10.1016/j.neubiorev.2005.05.006.
  
  Silva, B. A., C. Mattucci, P. Krzywkowski, E. Murana, A. Illarionova, V. Grinevich, N. S. Canteras, D. Ragozzino, and C. T. Gross. 2013. "Independent hypothalamic circuits for social and predator fear." Nat Neurosci 16 (12):1731-3. doi: 10.1038/nn.3573.
  
  Tsang, E., C. Orlandini, R. Sureka, A. H. Crevenna, E. Perlas, I. Prankerd, M. E. Masferrer, and C. T. Gross. 2023. "Induction of flight via midbrain projections to the cuneiform nucleus." PLoS One 18 (2):e0281464. doi: 10.1371/journal.pone.0281464.
  
  Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear." Braz J Med Biol Res 36 (5):557-66. doi: 10.1590/s0100-879x2003000500002.
  
  Walker, D. L., and M. Davis. 1997. "Involvement of the dorsal periaqueductal gray in the loss of fear-potentiated startle accompanying high footshock training." Behav Neurosci 111 (4):692-702. doi: 10.1037//0735-7044.111.4.692.
  
  Wang, L., I. Z. Chen, and D. Lin. 2015. "Collateral pathways from the ventromedial hypothalamus mediate defensive behaviors." Neuron 85 (6):1344-58. doi: 10.1016/j.neuron.2014.12.025.
  
  Wei, P., N. Liu, Z. Zhang, X. Liu, Y. Tang, X. He, B. Wu, Z. Zhou, Y. Liu, J. Li, Y. Zhang, X. Zhou, L. Xu, L. Chen, G. Bi, X. Hu, F. Xu, and L. Wang. 2015. "Processing of visually evoked innate fear by a non-canonical thalamic pathway." Nat Commun 6:6756. doi: 10.1038/ncomms7756.
  
  Yeh, L. F., T. Ozawa, and J. P. Johansen. 2021. "Functional organization of the midbrain periaqueductal gray for regulating aversive memory formation." Mol Brain 14 (1):136. doi: 10.1186/s13041-021-00844-0.
  
  Yilmaz, M., and M. Meister. 2013. "Rapid innate defensive responses of mice to looming visual stimuli." Curr Biol 23 (20):2011-5. doi: 10.1016/j.cub.2013.08.015.
  
  Zhou, Z., X. Liu, S. Chen, Z. Zhang, Y. Liu, Q. Montardy, Y. Tang, P. Wei, N. Liu, L. Li, R. Song, J. Lai, X. He, C. Chen, G. Bi, G. Feng, F. Xu, and L. Wang. 2019. "A VTA GABAergic Neural Circuit Mediates Visually Evoked Innate Defensive Responses." Neuron 103 (3):473-488 e6. doi: 10.1016/j.neuron.2019.05.027.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.19.541463v2
www.biorxiv.org www.biorxiv.org

Interleukin-1 prevents SARS-CoV-2-induced membrane fusion to restrict viral transmission via induction of actin bundles

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  SARS-CoV-2 infection induces syncytia formation, which promotes viral transmission. In this paper, the authors aimed to understand how host-derived inflammatory cytokines IL-1α/β combat SARS-CoV-2 infection.
  
  Strengths:
  
  First, they used a cell-cell fusion assay developed previously to identify IL-1α/β as the cytokines that inhibit syncytia formation. They co-cultured cells expressing the spike protein and cells expressing ACE2 and found that IL-1β treatment decreased syncytia formation and S2' cleavage.
  
  Second, they investigated the IL-1 signaling pathway in detail, using knockouts or pharmacological perturbation to understand the signaling proteins responsible for blocking cell fusion. They found that IL-1 prevents cell-cell fusion through MyD88/IRAK/TRAF6 but not TAK1/IKK/NF-κB, as only knocking out MyD88/IRAK/TRAF6 eliminates the inhibitory effect on cell-cell fusion in response to IL-1β. This revealed that the inhibition of cell fusion did not require a transcriptional response and was mediated by IL-1R proximal signaling effectors.
  
  Third, the authors identified RhoA/ROCK activation by IL-1 as the basis for this inhibition of cell fusion. By visualizing a RhoA biosensor and actin, they found a redistribution of RhoA to the cell periphery and cell-cell junctions after IL-1 stimulation. This triggered the formation of actin bundles at cell-cell junctions, preventing fusion and syncytia formation. The authors confirmed this molecular mechanism by using constitutively active RhoA and an inhibitor of ROCK.
  
  Diverse Cell types and in vivo models were used, and consistent results were shown across diverse models. These results were convincing and well-presented.
  
  Weaknesses:
  
  As the authors point out in the discussion, whether IL-1-mediated RhoA activation is specific to viral infection or regulates other RhoA-regulated processes is unclear. We would also require high-magnification images of the subcellular organization of the cytoskeleton to appreciate the effect of IL-1 stimulation.
  
  Thanks for the suggestions. We tested the role of IL-1β in other RhoA-regulated processes, and found that IL-1β-mediated RhoA activation also reduced cell migration in a cell scratch assay (see Author response image 1). We also provided high-magnification images in the revised Figures 4 and 5, as well as their respective figure supplements.
  
  Author response image 1.
  
  (A) Cell scratch assay images of HEK293T cells treated with PBS or IL-1β. (B) Quantification of cell migration in (A).
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this study, Zheng et al investigated the role of inflammatory cytokines in protecting cells against SARS-CoV-2 infection. They demonstrate that soluble factors in the supernatants of TLR-stimulated THP1 cells reduce fusion events between HEK293 cells expressing SARS-CoV-2 S protein and the ACE2 receptor. Using qRT-PCR and ELISA, they demonstrate that IL-1 cytokines are (not surprisingly) upregulated by TLR treatment in THP1 cells. Further, they convincingly demonstrate that recombinant IL-1 cytokines are sufficient to reduce cell-to-cell fusion mediated by the S protein. Using chemical inhibitors and CRISPR knock-out of key IL-1 receptor signaling components in HEK293 cells, they demonstrate that components of the myddosome (MYD88, IRAK1/4, and TRAF6) are required for fusion inhibition, but that downstream canonical signaling (i.e., TAK1 and NFKB activation) is not required. Instead, they provide evidence that IL-1-dependent non-canonical activation of RhoA/Rock is important for this phenotype. Importantly, the authors demonstrate that expression of a constitutively active RhoA alone is sufficient to inhibit fusion and that chemical inhibition of Rock could reverse this inhibition. The authors followed up these in vitro experiments by examining the effects of IL-1 on SARS-COV-2 infection in vivo and they demonstrate that recombinant IL-1 can reduce viral burden and lung pathogenesis in a mouse model of infection. However, the contribution of the RhoA/Rock pathway and inhibition of fusion to IL-1-mediated control of SARS-CoV-2 infection in vivo remains unclear.
  
  Strengths:
  
  (1) The bioluminescence cell-cell fusion assay provides a robust quantitative method to examine cytokine effects on viral glycoprotein-mediated fusion.
  
  (2) The study identifies a new mechanism by which IL-1 cytokines can limit virus infection.
  
  (3) The authors tested IL-1 mediated inhibition of fusion induced by many different coronavirus S proteins and several SARS-CoV-2 strains.
  
  Weaknesses:
  
  (1) The qualitative assay demonstrating S2 cleavage and IL-1 mediated inhibition of this phenotype is extremely variable across the data figures. Sometimes it appears like S2 cleavage (S2') is reduced, while in other figures immunoblots show that total S2 protein is decreased. Based on the proposed model the expectation would be that S2 abundance would be rescued when cleavage is inhibited.
  
  In our present manuscript, IL-1-mediated changes of the full-length spike showed some variation between authentic SARS-CoV-2 infection model and HEK293T-S + HEK293T-ACE2 coculture model, while IL-1 inhibited S2’ cleavage accompanied by a reduction of S2 subunit in both models.
  
  In the authentic SARS-CoV-2 infection model, we observed that IL-1 inhibited S2' cleavage accompanied with a reduction in both S2 subunit and full-length spike protein. This is likely because the S2 subunit and full-length spike protein in this model are not only from infected cells, but also from intracellular viral particles. IL-1 inhibited SARS-CoV-2 induced cell-cell fusion and reduced the viral load in host cells, therefore the abundance of S2 subunit and full-length spike proteins were both reduced.
  
  In the HEK293T-based co-culture model, IL-1 inhibited S2' cleavage accompanied with a reduction in S2 subunit, while the full-length spike protein was more or less rescued. Based on our previous study, R685A and ΔRRAR spike mutants cannot generate the S2 subunit, but still generated S2′ fragment to induce cell-cell fusion, and the S2' fragment produced from R685A and ΔRRAR spike mutants were only slightly reduced compared to wild-type spike protein, suggesting that the S2' fragment is mainly derived from the full-length spike directly, and to a minimal extent from the S2 subunit (Fig. 4B and 4G, PMID: 34930824). Thus, inhibition of S2’ cleavage by IL-1 mainly rescued the full-length spike protein.
  
  (2) The text referencing Figure 1H suggests that TLR-stimulated THP-1 cell supernatants "significantly" reduce syncytia, but image quantification and statistics are not provided to support this statement.
  
  Thanks for pointing out this issue. We have provided fluorescence image quantification and statistics in the revised version of our manuscript (Figure 1D, Figure 1-figure supplement 1A, Figure 1H-1I, Figure 2H-2I, Figure 1-figure supplement 1D-1E, Figure 1-figure supplement 1H-1I, Figure 2-figure supplement 1C-1D, Figure 2-figure supplement 2B-2E, Figure 2-figure supplement 2G-2H, Figure 2-figure supplement 6A-6B, Figure 2-figure supplement 7F-7G).
  
  (3) The authors conclude that because IL-1 accumulates in TLR2-stimulated THP1 monocyte supernatants, this cytokine accounts for the ability of these supernatants to inhibit cell-cell fusion. However, they do not directly test whether IL-1 is required for the phenotype. Inhibition of the IL-1 receptor in supernatant-treated cells would help support their conclusion.
  
  Thanks for the suggestion. Accordingly, we performed experiment and found that IL-1RA treatment reduced the inhibitory effect of PGN-stimulated THP-1 cell culture supernatant on cell-cell fusion, suggesting that IL-1 is required for the inhibition. This result has been added in our revised manuscript (Figure 2J and Figure2-figure supplement 4C).
  
  (4) Immunoblot analysis of IL-1 treated HEK293 cells suggests that this cytokine does not reduce the abundance of ACE2 or total S protein in cells. However, it is possible that IL-1 signaling reduces the abundance of these proteins on the cell surface, which would result in a similar inhibition of cell-cell fusion. The authors should confirm that IL-1 treatment of their cells does not change Ace2 or S protein on the cell surface.
  
  Thanks for the suggestion. Accordingly, we applied Wheat Germ Agglutinin (WGA) to stain cell surface in HKE293T cells and observed that IL-1β treatment did not change ACE2 or Spike protein on the cell surface. This result has been added in our revised manuscript (Figure 5-figure supplement 3A-D).
  
  (5) In Figure 5A, expression of constitutively active RhoA appears to have profound effects on how ACE2 runs by SDS-PAGE, suggesting that RhoA may have additional effects on ACE2 biology that might account for the decreased cell-cell fusion. This phenotype should be addressed in the text and explored in more detail.
  
  Thanks for pointing out this. We also noticed that the occurrence of cell-cell fusion reduced the amount of ACE2, whereas inhibition of cell-cell fusion restored the ACE2 abundance. Take the original Figure 5A (revised Figure 4-figure supplement 2B) as example, the increased ACE2 protein should be attributed to the decreased cell-cell fusion upon RhoA-CA transfection, as Spike binding with ACE2 leads to clathrin- and AP2-dependent endocytosis, resulting in ACE2 degradation in the lysosome (PMID: 36287912).
  
  In addition, we have examined the potential effect of RhoA-CA on ACE2, and found that RhoA-CA did not affect ACE2 expression, nor Spike binding to ACE2 (revised Figure 5-figure supplement 2E); it did not affect ACE2 distribution on cell surface either (revised Figure 5-figure supplement 2F and G).
  
  (6) The experiments linking IL-1 mediated restriction of SARS-COV-2 fusion to the control of virus infection in vivo are incomplete. The reported data demonstrate that recombinant IL-1 can restrict virus replication in vivo, but they fall short of confirming that the in vitro mechanism described (reduced fusion) contributes to the control of SARS-CoV2 replication in vivo. A critical piece of data that is missing is the demonstration that the ROCK inhibitor phenocopies IL-1RA treatment of SARS-COV-2 infected mice (viral infection and pathology).
  
  Thanks for this suggestion. Accordingly, we applied the ROCK inhibitor in vivo to confirm its role in SARS-CoV-2-infected mice, and found similar phenotype as the IL-1RA treatment experiment. That is to say, Y-26732 treatment prevented the formation of IL-1β-induced actin bundles at cell-cell junctions, thus promoted syncytia formation and further viral transmission in vivo (revised Figure 7).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I suggest providing single-channel images in a supplementary figure for the live-cell images in Figures 4 and 5. Higher magnification images would also help distinguish the subcellular details of the cytoskeleton organization.
  
  Thanks for the suggestion. We have provided the single channel images and higher magnification images in the revised Figures 4 and 5, as well as their respective figure supplements.
  
  In Figure 4, the authors showed that IL-1 activates RhoA and induces the accumulation of activated RhoA at the cell-cell junctions. They also showed that IL-1 promotes the formation of actin bundles at cell-cell junctions. However, the authors have not shown any connection between RhoA and actin yet, but in lines 263-264, they claim that actin bundle formation is induced by RhoA. Evidence for this part was shown in later results, but at this moment, it is lacking. The same applies to lines 282-284; I think this conclusion that IL-1-induced actin bundle formation is through the RhoA-ROCK pathway should come after showing how RhoA affects actin bundle formation at cell-cell junctions. To this end, I suggest moving Supplementary Figures S12B and S12D to the main figure, as they provide strong evidence of the IL-1-RhoA-ROCK-actin pathway.
  
  We appreciate these valuable comments. As suggested, we have moved the respective supplementary figures to the main figures to support our findings in the revised manuscript (Figure 4E and Figure 4-figure supplement 2B; Figure 5C and Figure 5-figure supplement 2A), the text has also been adjusted accordingly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.16.594569v2
www.biorxiv.org www.biorxiv.org

Jointly looking to the past and the future in visual working memory

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews.
  
  eLife assessment
  
  This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.
  
  We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.
  
  In addition, please note that we have now also made our data and code publicly available.
  
  Reviewer 1, Comments:
  
  In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.
  
  Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.
  
  This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.
  
  Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.
  
  At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).
  
  Author response image 1.
  
  Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).
  
  This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.
  
  We now also call out these additional findings and figure in our article:
  
  Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.
  
  Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).
  
  Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.
  
  These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.
  
  Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.
  
  Author response image 2.
  
  Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.
  
  We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:
  
  Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.
  
  Reviewer 2, Comments:
  
  The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.
  
  Strengths:
  
  This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".
  
  Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.
  
  Weaknesses:
  
  (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.
  
  Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.
  
  In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.
  
  Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.
  
  Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:
  
  Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”
  
  (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.
  
  Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.
  
  It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).
  
  (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.
  
  Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.
  
  For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.
  
  As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:
  
  Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”
  
  Reviewer 3, Comments:
  
  This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".
  
  Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.
  
  Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.
  
  If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:
  
  Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”
  
  Reviewer 2, Recommendations:
  
  It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.
  
  Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:
  
  Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.
  
  First, [….]”
  
  Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.
  
  Reviewer 3, Recommendations:
  
  I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.
  
  Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:
  
  Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”
  
  Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.30.526235v3
www.biorxiv.org www.biorxiv.org

High resolution deep mutational scanning of the melanocortin-4 receptor enables target characterization for drug discovery

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 1 (Public reviews):
  
  Summary
  
  Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.
  
  Strengths
  
  They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.
  
  Weaknesses
  
  While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells. Additionally, there are several questions that it would be helpful for authors to clarify.
  
  (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?
  
  Are variants evenly represented in the library?
  
  We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.
  
  Author response image 1.
  
  We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below:
  
  Author response image 2.
  
  Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?
  
  We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.
  
  Finally, could the authors provide details on the correlation between experimental replicates under each condition?
  
  Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below, which have been added as a panel to Supplementary Figure 1, show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions.
  
  We added the following text to reference this panel:
  
  (see Methods > Sequence processing for barcode expression): “The correlation (r) of barcode readcounts between replicates was ~0.5 and ~0.4 for the Gs and Gq assays, respectively (Supplementary Fig. 1E).”
  
  One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.
  
  (2) Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?
  
  The text has been added in the manuscript as follows:
  
  (in Methods > Running DMS Assays): “Given the seeding density (~17x10<sup>6</sup> cells per 150 mm replicate dish), time from seeding to collection, and doubling time of HEK293T cells, approximately 25.5x10<sup>6</sup> cells were collected per replicate. This translates to approximately 30-60x cellular coverage per amino acid variant in each replicate.”
  
  (in Methods > Sequence processing for barcode expression): “Total mapped reads per replicate at the RNA-seq stage were as follows:
  
  - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3
  
  - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5
  
  - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5”
  
  The median read counts per sample per barcode were 8, 10, and 6 reads for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively. The median number of barcodes per variant across all samples (the “median of medians”) were 56 for Gs/CRE, 28 for Gq/UAS, and 44 for Gs/CRE+Chaperone.”
  
  (3) It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MC4R activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.
  
  We account for this heterogeneity in several ways. First, as shown above (see Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g., a “high count” barcode consistently being high relative to the mean).
  
  (4) Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?
  
  Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.
  
  (5) To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?
  
  The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., Boss, Talpade, and Murphy 1996). We will have added this reference to the text (see Results> Assays for disease-relevant mechanisms) to further support the use of the Gq assay.
  
  (6) Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?
  
  While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).
  
  We have revised text in the Methods section as follows (see Running DMS Assays) to better articulate this : “For chaperone experiments, cells were washed 3x with 10 mL DMEM to remove Ipsen 17 prior to agonist stimulation as it has been shown to be an antagonist of α-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al. 2014).”
  
  (7) As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?
  
  We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.
  
  Reviewer 2 (Public reviews):
  
  Overview
  
  In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.
  
  They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.
  
  Strengths
  
  The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.
  
  The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.
  
  The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.
  
  The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.
  
  The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.
  
  Weaknesses
  
  The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.
  
  Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.
  
  Impact
  
  In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.
  
  Recommendations for the authors:
  
  (1) Page 7 - the Gq reporter relay system is clever. Could the authors include the original data showing that the simpler design didn't work at all, or at least revise the text to say more precisely what "not suitable due to weak SNR" means?
  
  We added a panel (D) to Supplementary Figure 2 showing that the native NFAT reporter was ~10x weaker than the CRE reporter, and the relay system amplified the NFAT signal to be comparable to the CRE reporter:
  
  (2) Page 7 - Even though the relay system gives some signal, it's clearly less sensitive/higher background than Gs. How does that play out in the quantitative analysis?
  
  —AND—
  
  (4) Page 10 - The Gq library had fewer barcodes per variant, and, as noted above, the Gq reporter doesn't work quite as well as the Gs one. It would be nice if the authors could comment on how these aspects of the Gq experiments affected data quality/power to detect effects.
  
  Due to the reviewer's excellent suggestion, we updated Supplementary Figure 2B to better contextualize the quantitative effects of the difference in signal to noise ratio of the Gq versus the Gs reporter system (see changes below). These distributions show the Z-statistic for testing either each stop mutation (red) or all possible coding variants against WT. Thus, a |Z| > 1.96 corresponds to a p = 0.05 in a two-sided Wald Test. We can see that in the Gs reporter, 95% of the stops are nominally significantly different from WT (visualized above with the majority of the red distribution being < -1.96). Alternatively, only 64% of stops are nominally significantly different from WT in Gq. This implies that it will be more difficult to detect effects in the Gq system, especially those less severe than stops.
  
  In addition to the overall signal to noise ratio being less in the Gq system, there were also less barcodes per variant (28 vs 56 barcodes per variant on average for Gq vs Gs). As demonstrated in Supplementary Figure 2C, the error bars on our estimates are related to the number of barcodes per variant (Standard Error ~ 1 / sqrt(Number of Barcodes), as shown in the plot below). This suggests that our estimates of mutant effects will be less certain in the Gq library than the Gs library. For example, the average standard error in the Gq library was 0.260 which was ~1.58 times larger than the Gs library's 0.165. Finally, we believe this further reiterates the power of our statistical framework, as it naturally enables formalized hypothesis testing that takes these errors into account when making comparisons both within reporters and across reporters.
  
  (3) Page 9 - it would be nice to see the analysis framework applied to a few existing datasets from other types of assays, to really judge its performance. That's not the main point of this paper, and it's fine, but it would be lovely!
  
  We agree with the reviewer and hope others apply our framework to their problems to further refine its utility and applicability! To that end, we’ve open-sourced it under a permissive license to help encourage the community to use it. Part of the challenge in applying it to other existing datasets is that few DMS experiments leverage variant-level replication through barcodes. While we re-analyzed an older DMS data from Jones et al. 2020 to produce the distributions in Supplementary Figure 2b, a more thorough comparison is outside the scope of this paper. That said, we have two additional manuscripts in preparation that leverage this framework to analyze DMS data in different proteins and assay types.
  
  (5) Page 10 - In discussing the relationship of the data to ClinVar and AM, the authors use qualitative comparisons like "majority" and "typically." Just giving numbers would better help the reader appreciate how the data compare.
  
  We added specific proportions for these statements to the text for the ClinVar and AlphaMissense comparisons as follows:
  
  (See Results > Comprehensive Deep Mutational Scanning of MC4R): “For example, the majority (63.3%, 31/49) of human MC4R variants classified as pathogenic or likely pathogenic in ClinVar (Landrum et al., 2014) lead to a significant reduction of Gs signaling under low α-MSH stimulation conditions (significance threshold: false discovery rate (FDR) < 1%; Fig. 2C). Variants that are significantly loss-of-function in this condition are rarer in the human population, and more common human variants have no significant effect on MC4R function (significance threshold: FDR < 1%; Fig. 2D). Loss-of-function variants by our DMS assay are also typically (e.g., AlphaMissense: 93.4%, 1894/2028) predicted to be deleterious by commonly used variant effect predictors like AlphaMissense (Cheng et al., 2023) and popEVE (Orenbuch et al., 2023) (Supplementary Fig. 5).”
  
  (6) Pages 10-12, Figures 2C, E. The data look really nice, but the correlation with clinvar and the Huang data is not perfect (e.g. many pathogenic variants are classified as WT and partial LoF variants too). Can the authors comment on this discrepancy? For ClinVar, they should say when ClinVar was accessed and also how they filtered variants. I would recommend using variants with at least 1 star. Provided they did use high-quality clinical classifications, do they think the classifications are wrong, or their data? The same goes for Huang.
  
  —AND—
  
  (7) Page 13 - similar to previous comments, I'm curious about the 5 path/likely path ClinVar variants that are not LoF in the assay. Are they high noise/fewer barcodes? Or does the assay just miss some aspect of human biology?
  
  ClinVar data was accessed on January 5, 2024 (see Methods: Comparison to human genetics data and variant effect predictors). No annotation quality filtering was performed, and we have revised the text as follows to clarify this:
  
  (see Methods > Comparison to Human Genetics Data and Variant Effect Predictors): “Pathogenicity classifications of MC4R missense and nonsense variants were obtained from ClinVar (Landrum et al., 2014) on January 5, 2024, and all available annotations were included in the analysis regardless of ClinVar review status metric.”
  
  A substantial proportion of the discrepancy between our data and ClinVar is, as the reviewer suggests, likely due to low quality ClinVar annotations. Of the five variants that the reviewer notes were reported as pathogenic/likely pathogenic but did not result in loss of protein function in any of our DMS assays, two (V50M and V166I) have been reclassified in ClinVar to uncertain or conflicting interpretation since we accessed annotations in early 2024. An additional two of the five discrepant variants (Q43K and S58C) currently have 0 star ratings to support their pathogenic/likely pathogenic annotation. The remaining discrepant variant (S94N) has a 1 star rating supporting an annotation of “likely pathogenic.
  
  The Huang et al. paper did an admirably thorough job of aggregating variant annotations from more than a dozen primary literature sources that each reported functional validation data for small panels of variants. However, one inherent limitation of this approach is that the resulting annotation classes are based on experiments that were carried out using inconsistent methods and/or scoring criteria. For example, classifications in the Huang et al. paper are based on an inconsistent mix of functional assay types (e.g., Gs signaling, Gq signaling, protein cell surface expression, etc.), and different variants were tested in different cell types (e.g., HEK293T, CHO, Cos-7, etc.). In principle, DMS assays should provide a more accurate assessment of the relative quantitative differences between alleles since each variant was tested using identical experimental conditions and analysis parameters.
  
  That being said, while very good, our assays are likely missing or only indirectly reporting on at least some aspects of MC4R biology. For example, in addition to Gs and Gq signaling, MC4R interfaces with β-arrestin. Variants that are protective against obesity-related phenotypes have been shown to increase recruitment of β-arrestin to MC4R, and we did not directly assess this function.
  
  (8) Page 15, Fig 3C - The three variants they highlight all have paradoxical changes in bias as a-MSH dose is increased (e.g. the bias inverts). I'm not a GPCR expert, but this seems interesting and a little weird. Perhaps the authors could comment on it?
  
  We agree this is an interesting observation that deserves further study, but unfortunately is outside the scope of our priorities at the moment. As noted, all three highlighted variants in this region have a biased basal activity, and this bias inverts upon stimulation. While we don’t have a good explanation for why this would be the case, this phenomenon has been previously observed for 158R (Paisdzior et al., 2020). Our DMS data emphasizes how diverse biased effects can be and further highlights the importance of characterizing these effects. It would be interesting if further studies could elucidate the mechanistic basis for this behavior and how it may be related to G protein coupling in this region.
  
  (9) Page 16 - I'm not familiar with the A21x1 formalism. For the general reader, maybe the authors could introduce this formalism.
  
  Given the shared structural topology of GPCRs, others have developed a variety of numbering schemes to refer to where various variants are to allow more direct comparisons between different GPCRs. We use the GPCRDB.org numbering scheme (e.g., F202<sup>5x4</sup>) as it takes experimentally determined structures into account. Roughly speaking, the number preceding the “x” corresponds to which transmembrane domain (one through seven) or region the residue is located in. The numbers following the “x” correspond to where that residue is located in that region relative to a structurally conserved residue that is always assigned 50. For example F202<sup>5x48</sup> means that F202 is located in the 5th transmembrane helix and is 2 residues before the most conserved M204<sup>5x50</sup>. We updated the text to clarify this accordingly:
  
  (see Results > Structural Insights into Biased Signaling): “Upon ligand binding, W258 (W258<sup>6x48</sup> in https://gpcrdb.org/ nomenclature, where 6 corresponds to the 6th transmembrane helix and 48 denotes 258 is 2 residues before the most conserved residue in that helix (Isberg et al., 2015)) of the conserved CWxP motif undergoes a conformational rearrangement that is translated to L133<sup>3x36</sup> and I137<sup>3x40</sup>, of the conserved PIF motif (MIF in melanocortin receptors).”
  
  (10) Page 17, Figure 3A - Since 137, 254, and 140 are not picked out on the structure, I have no idea where they are. If the authors want to show readers these residues, perhaps they could be annotated or a panel added. Since ~1 entire page of the manuscript is dedicated to this cascade, it might make sense to add a panel. Just amplifying the comment above as regards position 79, others were discussed in that paragraph but not highlighted.
  
  We updated Supplementary Fig. 6C,D to label all of the listed residues on the protein structure for easy reference.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.11.617882v2
www.biorxiv.org www.biorxiv.org

Circular RNA HMGCS1 sponges miR-4521 to aggravate type 2 diabetes-induced vascular endothelial dysfunction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  HMGCS1, 3-hydroxy-3-methylglutaryl-CoA synthase1 is predicted to be involved in Acetyl-CoA metabolic process and mevalonate-cholesterol pathway. To induce diet-induced diabetes, they fed wild-type littermates either a standard chow (Control) or a high fat-high sucrose (HFHG) diet, where the diet composition consisted of 60% fat, 20% protein, and 20% carbohydrate (H10060, Hfkbio, China). The dietary regimen was maintained for 14 weeks. Throughout this period, body weight and fasting blood glucose (FBG) levels were measured on a weekly basis. Although the authors induced diabetes with a diet also rich in fat, the cholesterol concentration or metabolism was not investigated. After the treatment, were the animals with endothelial dysfunction? How was the blood pressure of the animals?
  
  Thank you for your comments and kind suggestions. We have conducted a study on the impact of HFHG diet on the serum levels of total cholesterol(T-CHO) in mice over a 14-week period. Our findings indicated that the HFHG diet significantly elevated T-CHO levels in the serum of mice (Supplementary Figure 5E). Additionally, HFHG diet was associated with an increased in blood pressure (Figure 5F) and it exacerbated the progression of endothelial dysfunction in mice (Figure 5H-L).
  
  Strengths:
  
  To explore the potential role of circHMGCS1 in regulating endothelial cell function, the authors cloned exons 2-7 of HMGCS1 into lentiviral vectors for ectopic overexpression of circHMGCS1 (Figure S2). The authors could use this experiment as a concept proof and investigate the glucose concentration in the cell culture medium. Is the pLV-circ HMGCS1 transduction in HUVEC increasing the glucose release? (Line 163)
  
  In the manuscript, we utilized a DMEM culture medium containing 4500 mg/L glucose. Given that the HUVEC cell culture is glucose-dependent for its metabolic processes, it was challenging to precisely evaluate the relationship between pLV-circHMGCS1 transduction and the glucose concentration in the medium.
  
  Weaknesses:
  
  (1) Pg 20. The cells were transfected with miR-4521 mimics, miR-inhibitor, or miR-NC and incubated for 24 hours. Subsequently, the cells were treated with PAHG for another 24 hours. Were the cells transfected with lipofectanine? The protocol or the lipofectamine kit used should be described. The lipofectamine protocol suggests using an incubation time of 72 hours. Why did the authors incubate for only 24 hours? If the authors did the mimic and inhibitor curves, these should be added to the supplementary figures. Please, describe the miRNA mimic and antagomir concentration used in cell culture.
  
  For detailed transfection methods of miRNA mimic and its inhibitor, please refer to “Transfection of miRNA mimic or inhibitor” (Line 587) in the revised Experimental Section. We employed the Hieff Trans®siRNA/miRNA in vitro transfection reagent (yeason, China, 40806ES03), with a transfection duration of 48h. The miR-4521 content in HUVEC post-transfection was quantified using qRT-PCR. The transfection of the miR-4521 mimic for 48h notably enhanced its expression in HUVEC (Supplementary Figure 3B), whereas the transfection of the miR-4521 inhibitor for the same duration significantly suppressed its expression (Supplementary Figure 3C). The concentration used for both miRNA mimic and inhibitor transfection was 50 nM. In the revised manuscript, we have corrected the transfection time and clarified that we did not utilize miRNA antagomirs in our experiments.
  
  (2) Pg 20, line 507. What was the miR-4521 agomiR used to treatment of the animals?
  
  miRNA agomir serves as a valuable experimental tool for elucidating miRNA function, used to simulate the overexpression of a specific miRNA. miRNA agomir is a chemically modified RNA molecule identical in sequence to the target miRNA, engineered for enhanced stability and transfection efficacy. Utilizing miRNA agomir enables the overexpression of the target miRNA, facilitating the investigation of miRNA functions and mechanism in vivo. In our study, we have employed miRNA mimic for cellular studies and miRNA agomir in vivo applications to achieve high expression of miRNA (Fu et al, 2019).
  
  (3) Figure 1B. The results are showing the RT-qPCR for only 5 circRNA, however, the results show 48 circRNAs were upregulated, and 18 were downregulated (Figure S1D). Why were the other cicRNAs not confirmed? The circRNAs upregulated with high expression are not necessarily with the best differential expression comparing control vs. PAHG groups. Furthermore, Figure 1A and S1D show circRNAs downregulated also with high expression. Why were these circRNAs not confirmed?
  
  Our study aims to the identification of potential biomarkers for endothelial dysfunction in type 2 diabetes, To the end, we focused on circRNAs that exhibited significant upregulation following PAHG treatment. In our sequencing data, the p-values for these top upregulated circRNAs were notably below the threshold of 0.001, prompting their selection for further validation. We employed qRT-PCR to ascertain the consistency of their expression levels with the RNA-sequencing findings. Among these, circHMGCS1 was identified as a promising candidate with regulatory potential in endothelial dysfunction. Additionally, circRNAs that were significantly downregulated will be the subject of our ongoing research endeavors.
  
  (4) Figure 1B shows the relative circRNAs expression. Were host genes expressed in the same direction?
  
  circRNAs are generated from specific exons or introns of their host genes, either individually or in combination, and the main function of circRNA depends on its non-coding RNA characteristics. The expression levels of circRNAs is not necessarily correlated with those of their host genes, and similarly, the function of circRNAs do not inherently relate to the functions of the host genes (Kristensen et al, 2019; Liu & Chen, 2022). Consequently, the data presented in Figure 1B were primarily aimed at validating the accuracy of circRNA-seq. Although we did not conduct host gene expression analysis for the identified circRNAs, our subsequent results indicated that the overexpression of circHMGCS1 did not influence the expression levels of HMGCS1 (Figure 2A).
  
  (5) Line 128. The circRNA RT-qPCR methodology was not described. The methodology should be described in detail in the Methods Session.
  
  The only difference between the circRNA RT-qPCR method and other gene detection is that random primers need to be used for reverse transcription during the reverse transcription process. Unlike linear RNAs that possess a 3' polyA tail, which allows for the use of oligo(dT) primers, circRNAs require random primers to initiate the reverse transcription process. Beyond this distinction, the other processes are no different from the common qRT-PCR process. We have revised the Isolation of RNA and miRNA for quantitative Real Time-PCR (qRT-PCR) analysis method in the revised version (Line 695).
  
  (6) Line 699. The relative gene expression was calculated using the 2-ΔΔCt method. This is not correct, the expression for miRNA and gene expression are represented in percentage of control.
  
  We initially employed the 2^-ΔΔCt method to ascertain the relative gene expression levels. Subsequently, we scaled all values by a factor of 100 to amplify the visual representation of the observed variations, thereby enhancing the visualization of the data.
  
  (7) Line 630. Detection of ROS for tissue and cells. The methodology for tissue was described, but not for cells.
  
  We have added the detailed description of the cellular ROS detection methods in the revised manuscript as follows:
  
  For ROS detection in cells, the treated cells were washed once by PBS, then 20 μM DHE was added, and incubated at 37°C for 30 min away from light, then washed three times by PBS and then colorless DMEM medium was added, followed by fluorescence microscopy for observation (Line 640-643).
  
  (8) Line 796. RNA Fluorescent In Situ Hybridization (RNA-FISH). Figure 1F shows that the RNA-Fluorescence in situ hybridization (RNA-FISH) confirmed the robust expression of cytoplasmic circHMGCS1 in HUVECs (Figure 1F). However, in the methods, lines 804 and 805 described the probes targeting circMAP3K5 and miR-4521 were applied to the sections. Hybridization was performed in a humid chamber at 37C overnight. Is it correct?
  
  We have made a correction in the revised manuscript. The accreted description is "the probes targeting circHMGCS1 and miR-4521 were applied to the sections"(Line816).
  
  (9) Line 14. Fig 1-H. The authors discuss qRT-PCR demonstrated that circHMGCS1 displayed a stable half-life exceeding 24 h, whereas the linear transcript HMGCS1 mRNA had a half-life less than 8 h (Figure 1H). Several of the antibodies may contain trace amounts of RNases that could degrade target RNA and could result in loss of RNA hybridization signal or gene expression. Thus, all of the solutions should contain RNase inhibitors. The HMGCS1 mRNA expression could be degraded over the incubation time (0-24hs) leading to incorrect results. Moreover, in the methods is not mentioned if the RNAse inhibitor was used. Please, could the authors discuss and provide information?
  
  This experiment was performed in cell culture as described in our Experimental Methods (Line 753), where we added actinomycin D directly into the cell culture well plates, and the cells remained in a healthy state during this treatment. We did not directly extract mRNA from cells for this experiment. Additionally, all solutions utilized throughout the whole experiment were prepared using Rnase-free water, ensuring that the integrity of the mRNA.
  
  (10) Further experiments demonstrated that the overexpression of circHMGCS1 stimulated the expression of adhesion molecules (VCAM1, ICAM1, and ET-1) (Figures 2B and 2C), suggesting that circHMGCS1 is involved in VED. How were these genes expressed in the RNA-seq?
  
  In the manuscript, we only focused exclusively on circRNA and miRNA sequencing, and not perform mRNA sequencing, Consequently, we employed qRT-PCR and Western blot to assess the expression alterations of ET-1, ICAM1, and VCAM1 at gene and protein level. The findings revealed that the overexpression of circHMGCS1 significantly upregulated the expression of adhesion molecules (VCAM1, ICAM1, and ET-1).
  
  (11) Line 256. By contrast, the combined treatment of circHMGCS1 and miR-4521 agomir did not significantly affect the body weight and blood glucose levels. OGTT and ITT experiments demonstrated that miR-4521 agomir considerably enhanced glucose tolerance and insulin resistance in diabetic mice (Figures 5C, 5D, and Figures S5B and S5C). Why did the miR-4521 agomir treatment considerably enhance glucose tolerance and insulin resistance in diabetic mice, but not the blood glucose levels?
  
  Our results showed that miR-4521 agomir could effectively suppress the increase of body weight and blood glucose in mice (Figure 5A-B).
  
  (12) In the experiments related to pull-down, the authors performed Biotin-coupled miR-4521 or its mutant probe, which was employed for circHMGCS1 pull-down. This result only confirms the Luciferase experiments shown in Figure 4A. The experiment that the authors need to perform is pull-down using a biotin-labeled antisense oligo (ASO) targeting the circHMGCS1 backsplice junction sequence followed by pulldown with streptavidin-conjugated magnetic beads to capture the associated miRNAs and RNA binding proteins (RBPs). Also, the ASO pulldown assay can be coupled to miRNA RT-qPCR and western blotting analysis to confirm the association of miRNAs and RBPs predicted to interact with the target circRNA.
  
  This point is correct. As suggested, we utilized a biotin-labeled circHMGCS1 probe for pull down experiments. Because circRNA-miRNA interactions are mainly mediated by the RNA-induced silencing complex, which includes Argonaute 2 (AGO2), we examined the levels of miR-4521 and AGO2 in the capture meterial. Our results demonstrated that circHMGCS1 significantly captured miR-4521 in the cells, with a concomitant acquisition of AGO2. These findings have been integrated into the revised manuscript (Supplementary Figures 4D and 4E).
  
  (13) In Figure 5, the authors showed that the results suggest that miR-4521 can inhibit the occurrence of diabetes, whereas circHMGCS1 specifically dampens the function of miR-4521, weakening its protective effect against diabetes. In this context, what are the endogenous target genes for the miR-4521 that could be regulating diabetes?
  
  In this study, we focused on the role of miR-4521 in endothelial function. Our animal experiments involving ARG1 knockdown revealed that the reduction of ARG1 expression resulted in the inability of miR-4521 to modulate the progression of type 2 diabetes. Consequently, ARG1 is likely an endogenous target gene of miR-4521, potentially implicated in the regulation of diabetes.
  
  (14) In the western blot of Figure 5, the β-actin band appears to be different from the genes analyzed. Was the same membrane used for the four proteins? The Ponceau S membrane should be provided.
  
  As described in our experimental methodology (Western blot analysis), we have utilized PVDF membranes for our Western blot experiments. β-actin, recognized for its high expression and specificity as a housekeeping gene, yields distinct bands with minimal background noise. This property can lead to the migration β-actin from the spot wells to both sides during electrophoresis. So much so that it is not aligned with the lane shown by the target gene. And the other 3 genes can see the phenomenon of obvious lane because their expression is not as high as β-actin. We replaced β-actin with a similar background in the revised manuscript (Figure 5L).
  
  (15) Why did the authors use AAV9, since the AAV9 has a tropism for the liver, heart, skeletal muscle, and not to endothelial vessels?
  
  AAV9 has garnered significant interest as a gene delivery vector due to its extensive tissue penetration, minimal immunogenicity, and stable gene expression profile. Its application in cardiovascular disease research and therapy has been widely reported (Barbon et al, 2023; Yao et al, 2018; Zincarelli et al, 2008). Meanwhile, we employed AAV9 for gene delivery via the tail vein injection in mice, and as shown in Figure 5J and Figure 7Q, we observed GFP signals carried by AAV9 in the thoracic aorta of mice. These findings suggest that AAV9 possesses the capability to infect endothelial cells effectively.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors observed an aggravated vascular endothelial dysfunction upon overexpressing circHMGCS1 and inhibiting miR-4521. This study discovered that circHMGCS1 promotes arginase 1 expression by sponging miR-4521, which accelerated the impairment of vascular endothelial function.
  
  Strengths:
  
  The study is systematic and establishes the regulatory role of the circHMGCS1-miR-4521 axis in diabetes-induced cardiovascular diseases.
  
  Weaknesses:
  
  (1) The authors selected the miR-4521 as the target based on their reduced expression upon circHMGCS1 overexpression. Since the miRNA level is downregulated, the downstream target gene is expected to be upregulated even in the absence of circRNA. The changes in miRNA expression opposite to the levels of target circRNA could be through Target RNA-Directed MicroRNA Degradation. In addition, miRNA can also be stabilized by circRNAs. Hence, selecting miRNA targets based on opposite expression patterns and concluding miRNA sponging by circRNA needs further evidence of direct interactions.
  
  Thank you for your positive comments and kind suggestions.
  
  As suggested by Public Reviewer #1 (12), we employed a biotin-tagged circHMGCS1 to capture miR-4521 and AGO2 in HUVECs (Supplementary Figures 4D and 4E), and Dual luciferase assays have confirmed that miR-4521 can bind to circHMGCS1 directly. Furthermore, RNA pull down and RIP assays have demonstrated the direct binding capability of circHMGCS1 for miR-4521. Collectively, these findings underscore the direct interaction between circHMGCS1 and miR-4521.
  
  (2) The majority of the experiments were performed with an overexpression vector which can generate a lot of linear RNAs along with circRNAs. The linear RNAs produced by the overexpression vectors can have a similar effect to the circRNA due to sequence identity.
  
  In our manuscript, the employed vectors incorporate reverse repeat sequences that facilitate efficient circularization of circRNAs. This design ensures robust circular shearing upon the insertion of circRNA sequences into the polyclonal sites, thereby enhancing the overexpression of circRNAs (Supplementary Figure 2). Moreover, we used lentiviral virus as a vector for circRNA overexpression, not direct plasmid transfection. As demonstrated in Figure 2A, upon overexpression of circHMGCS1, we observed a significant upregulation in circHMGCS1 levels compared to the pLV-circNC and Control groups. Notably, the expression levels of the linear HMGCS1 mRNA did not exhibit significant alterations.
  
  (3) There is a lack of data of circHMGCS1 silencing and its effect on target miRNA & mRNAs.
  
  According to your suggestion, we employed shRNA to knockdown circHMGCS1 in HUVEC, and qRT-PCR was used to assess the expression levels of miR-4521 and ARG1. The knockdown of circHMGCS1 significantly inhibit the expression of circHMGCS1 in HUVEC without obviously affecting the levels of HMGCS1 mRNA. We then selected circHMGCS1 shRNA1 for further investigation. We observed that the knockdown of circHMGCS1 resulted in an upregulation of miR-4521 and a downregulation of ARG1 expression.
  
  Author response image 1.
  
  The impact of circHMGCS1 knockdown on ARG1 and miR-4521 expression levels in HUVEC. The cells were transfected with either circHMGCS1 shRNA1 or circHMGCS1 shRNA2, and the expressions levels of circHMGCS1 and HMGCS1 (A), miR-4521 (B) and ARG1 (C and D) in HUVECs were detected by qRT-PCR and Western blot. n=3 in each group. *p < 0.05, **p < 0.01. All significant difference was determined by one-way ANOVA followed by Bonferroni multiple comparison post hoc test, error bar indicates SD.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I suggest improving the discussion based on the literature.
  
  (1) Line 131. .... (hsa_circ_0008621, 899 nt in length, identified as circHMGCS1 in subsequent studies because of its host gene being HMGCS1). Please, provide the reference.
  
  We appreciate the valuable comments. We have made changes for improvement, which is add in Line 133(Liang et al, 2021).
  
  (2) The authors conclude that both in vitro and in vivo data suggest that the miR-4521 or circHMGCS1 fails to regulate the effect of diabetes-induced VED in the absence of ARG1. Therefore, ARG1 may serve as a promising VED biomarker, and circHMGCS1 and miR-4521 play a key role in regulating diabetes-induced VED by ARG1. In this context, they should re-evaluate whether this is the best title. "Circular RNA HMGCS1 sponges miR-4521 to aggravate type 2 diabetes-induced vascular endothelial dysfunction"
  
  This manuscript initiates its exploration with circRNA as the focal point of study (Figure 1 and Figure 2), It then delves into the miRNAs associated with circRNA and elucidates their interactions (Figure 3, Figure 4 and Figure 5). Subsequently, the manuscript identifies the target genes of miRNA and validates the regulatory effects of circRNA and miR-4521 on ARG1 (Figure 6). The study culminates with the application of the ceRNA theory to confirm the significance of ARG1 in the functional interplay between circHMGCS1 and miR-4521 (Figure 7). These findings throughout the manuscript are dedicated to uncovering the pivotal roles of circHMGCS1 and miR-4521 in modulating vascular endothelial function. Notably, the interaction between circHMGCS1 and miR-4521 represents a novel discovery of our research. Therefore, we aim to emphasize the critical function of circHMGCS1 and miR-4521 in the regulation of vascular endothelial dysfunction in type 2 diabetes within the manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I have a few suggestions for improving the study further.
  
  (1) Although the experiments suggest the role of circHMGCS1, miR-4521 in vascular endothelial function, the direct regulation or interaction of circHMGCS1-miR-4521-ARG1 is unclear. A rescue experiment that checks the effect of circHMGCS1 silencing with/without inhibition of miR-4521 on ARG1 expression must be performed to prove the circHMGCS1- miR-4521 regulatory axis.
  
  Thank you very much for your constructive comments.
  
  According to your suggestion, we utilized shRNA to effectively knockdown circHMGCS1 in HUVEC, Subsequent expression analysis via qRT-PCR was conducted to assess the levels of miR-4521 and ARG1. The knockdown of circHMGCS1 significantly reduced the expression of circHMGCS1 in HUVEC without influencing the expression of the host gene HMGCS1. Concurrently, the knockdown of circHMGCS1 resulted in an upregulation of miR-4521 (Supplementary Figure 4B) and a downregulation of ARG1 (Figure 6P and 6Q). In our manuscript, the upregulation in ARG1 expression caused by circHMGCS1 overexpression was reduced by miR-4521, and the downregulation in ARG1 expression caused by miR-4521 overexpression was also reversed by circHMGCS1. When miR-4521 was knocked down, the expression of ARG1 increased, and circHMGCS1 abrogated its regulatory effect on the expression of ARG1. Collectively, these findings indicate that the interplay between circHMGCS1 and miR-4521 significantly influences ARG1 expression.
  
  Author response image 2.
  
  The impact of circHMGCS1 knockdown on ARG1 and miR-4521 expression levels in HUVEC. The cells were transfected with either circHMGCS1 shRNA1 or circHMGCS1 shRNA2, and the expressions levels of circHMGCS1 and HMGCS1 (A), miR-4521 (B) and ARG1 (C and D) in HUVECs were detected by qRT-PCR and Western blot. n=3 in each group. *p < 0.05, **p < 0.01. All significant difference was determined by one-way ANOVA followed by Bonferroni multiple comparison post hoc test, error bar indicates SD.
  
  (2) It is unclear how the authors arrived at the circHMGCS1-miR-4521 pair. The pull down of circHMGCS1 followed by qPCR enrichment analysis of all target miRNAs must be performed to select the target miRNA.
  
  In this manuscript, we identified the expression of miRNA under PAHG treatment through miRNA sequencing, and then further screened out 4 miRNAs with potential binding sites to circHMGCS1 utilizing the miRanda database. Subsequently, we employed qRT-PCR and Western blot analysis to confirm the regulatory influence of miR-4521 on endothelial function (Figure 3). Following this, RIP, RNA pull down, dual luciferase and RNA-FISH experiments were conducted to map the interaction between circHMGCS1 and miR-4521 (Figure 4), the direct interaction between circHMGCS1 and miR-4521 was further substantiated through overexpression and knockdown studies (Figures 5-7). while the reviewer's method may offer a more direct validation, our methodology initially involved a database-driven screening of candidate miRNAs with the potential to target and bind circHMGCS1, followed by experimental validation of these interactions. Both methodologies are capable of establishing the interaction sites between circHMGCS1 and miR-4521.
  
  (3) Since the back splicing is not that efficient, the linear RNA from the overexpression construct may produce many linear RNAs with miRNA binding sites. The effect seen in the case of overexpression experiments needs to consider the level of linear and circular HMGCS1 produced by the vector.
  
  In this manuscript, the vector's multiple cloning site is flanked by inverted repeat sequences that facilitate efficient circRNA looping. This design enables the inserted sequence to form a stable loop and undergo circularization upon transcription, leading to the overexpression of circRNA (Supplementary Figure 2). For the validation of circular RNA, we employed divergent primers that straddle the circRNA splicing junction. These primers are specific for circRNA amplification and do not amplify the corresponding linear RNA, as demonstrated in Figure 2A. Upon overexpression of circHMGCS1, we observed a significant increase in circHMGCS1 levels compared to the empty vector and Control groups, while there was no significant change in the expression level of HMGCS1 mRNA.
  
  (4) As miR-4521 has multiple miRNA binding sites on circHMGCS1, it is not very clear which sites were mutated in circHMGCS1-MUT.
  
  We have made corrections to Supplementary Figure 4C. Utilizing the miRanda algorithm, we identified 10 potential binding sites for miR-4521 on circHMGCS1. Subsequently, we selected the site with the highest binding affinity for mutational analysis (miR-4521 binding positions 3-15, circHMGCS1 binding positions 260-281, binding rate 91.67%, binding ability -17.299999 kCal/Mol). We employed a dual-luciferase assay to confirm the direct interaction between circHMGCS1 and miR-4521.
  
  (5) Since the ceRNA network works efficiently in an equimolar concentration of the regulatory molecules, providing the copy number of circHMGCS1, miR-4521, and target mRNAs would be helpful.
  
  We employed qRT-PCR to ascertain the absolute quantification of mRNA copy numbers, following established methodologies (Nolan et al, 2006; Wagatsuma et al, 2005; Zhang et al, 2009). Our qRT-PCR data reveal that the circHMGCS1 mRNA copy number is 2343±529. In comparison, the ARG1 mRNA copy number stands at 88±27, while the miR-4521 copy number is significantly higher, recorded at 36277±9407.
  
  Author response image 3.
  
  The distribution of copy numbers for circHMGCS1, miR-4521 and ARG1 in HUVECs.
  
  (6) The yellow highlighted "cyclization-mediated sequence-F & R" does not seem to be complementary sequences. The method section may include the details of the vectors and cloning strategies for the overexpression constructs.
  
  The figure below illustrates the schematic representation of the complementary structure between the upstream and downstream sequences that facilitate circRNA circularization. This strategic pairing is designed to enhance the circularization efficiency of circRNA while concurrently suppressing mRNA synthesis (Liang & Wilusz, 2014). Details of this design have been integrated into the experimental method (Line539). The specific additions are as follows:
  
  The circHMGCS1 sequence [NM_001098272: 43292575-43297268], the splice site AG/GT and ALU elements were inserted into the pCDH-circRNA-GFP vector (upstream ALU: AAAGTGCTGAGATTACAGGCGTGAGCCACCACCCCCGGCCCACTTTTTGTAAAGGTACGTACTAATGACTTTTTTTTTATACTTCAG, downstream ALU: GTAAGAAGCAAGGAAAAGAATTAGGCTCGGCACGGTAGCTCACACCTGTAATCCCAGCA). The restriction enzyme sites selected were EcoRI and NotI.
  
  Author response image 4.
  
  (7) Since circHMGCS1 is a multi-exonic circRNA that can undergo alternative splicing and divergent primers only validate the backsplice junction, the full-length sequence of mature circHMGCS1 needs to be checked by circRNA-RCA PCR followed by Sanger sequencing.
  
  In compliance with your guidance, we have enriched the revised manuscript with additional data. Specifically, we have included the full-length nucleic acid electrophoresis diagram of circHMGCS1 in Supplementary Figure 1F, the Sanger sequencing results in Supplementary Figure 1G, and a comparative analysis of the circHMGCS1 sequences obtained from Sanger sequencing with those referenced in the circBase database, presented in Supplementary Figure 1H.
  
  Reference:
  
  Barbon, E., C. Kawecki, S. Marmier, A. Sakkal, F. Collaud, S. Charles, G. Ronzitti, C. Casari, O.D. Christophe, C.V. Denis, P.J. Lenting, and F. Mingozzi. 2023. Development of a dual hybrid AAV vector for endothelial-targeted expression of von Willebrand factor. Gene Ther. 30: 245-254.
  
  Fu, Y., J. Chen, and Z. Huang. 2019. Recent progress in microRNA-based delivery systems for the treatment of human disease. ExRNA. 1: 24.
  
  Kristensen, L.S., M.S. Andersen, L.V.W. Stagsted, K.K. Ebbesen, T.B. Hansen, and J. Kjems. 2019. The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet. 20: 675-691.
  
  Liang, D., and J.E. Wilusz. 2014. Short intronic repeat sequences facilitate circular RNA production. Genes Dev. 28: 2233-2247.
  
  Liang, J., X. Li, J. Xu, G.M. Cai, J.X. Cao, and B. Zhang. 2021. hsa_circ_0072389, hsa_circ_0072386, hsa_circ_0008621, hsa_circ_0072387, and hsa_circ_0072391 aggravate glioma via miR-338-5p/IKBIP. Aging (Albany NY). 13: 25213-25240.
  
  Liu, C.X., and L.L. Chen. 2022. Circular RNAs: Characterization, cellular roles, and applications. Cell. 185: 2016-2034.
  
  Nolan, T., R.E. Hands, and S.A. Bustin. 2006. Quantification of mRNA using real-time RT-PCR. Nat Protoc. 1: 1559-1582.
  
  Wagatsuma, A., H. Sadamoto, T. Kitahashi, K. Lukowiak, A. Urano, and E. Ito. 2005. Determination of the exact copy numbers of particular mRNAs in a single cell by quantitative real-time RT-PCR. J Exp Biol. 208: 2389-2398.
  
  Yao, C., T. Veleva, L. Scott, Jr., S. Cao, L. Li, G. Chen, P. Jeyabal, X. Pan, K.M. Alsina, I.D. Abu-Taha, S. Ghezelbash, C.L. Reynolds, Y.H. Shen, S.A. Lemaire, W. Schmitz, F.U. Müller, A. El-Armouche, N. Tony Eissa, C. Beeton, S. Nattel, X.H.T. Wehrens, D. Dobrev, and N. Li. 2018. Enhanced Cardiomyocyte NLRP3 Inflammasome Signaling Promotes Atrial Fibrillation. Circulation. 138: 2227-2242.
  
  Zhang, X.X., T. Zhang, M. Zhang, H.H. Fang, and S.P. Cheng. 2009. Characterization and quantification of class 1 integrons and associated gene cassettes in sewage treatment plants. Appl Microbiol Biotechnol. 82: 1169-1177.
  
  Zincarelli, C., S. Soltys, G. Rengo, and J.E. Rabinowitz. 2008. Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection. Mol Ther. 16: 1073-1080.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.06.583717v2
www.biorxiv.org www.biorxiv.org

The identification of extensive samples of motor units in human muscles reveals diverse effects of neuromodulatory inputs on the rate coding

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This study explores the neural control of muscle by decomposing the firing activity of constituent motor units from the grid of surface electromyography (EMG) in the Tibialis (TA) Anterior and Vastus Lateralis (VL) during isometric contractions. The study involves extensive samples of motor units across the broadest range of voluntary contraction intensities up to 80% of MVC. The authors examine the rate coding of the population of motor units, which describes the instantaneous firing rate of each motor unit as a function of muscle force. This relationship is characterized by a natural logarithm function that delineates two distinct phases: an initial phase with a steep acceleration in firing rate, particularly pronounced in low-threshold motor units, and a subsequent modest linear increase in firing rate, more significant in high-threshold motor units.
  
  Strengths:
  
  The study makes a significant contribution to the field of neuromuscular physiology by providing a detailed analysis of motor unit behavior during muscle contractions in a few ways.
  
  (1) The significance lies in its comprehensive framework of motor unit activity during isometric contractions in a broad range of intensities, providing insights into the non-linear relationship between the firing rate and the muscle force. The extensive sample of motor units across the pool confirms the observation in animal studies in which the spinal motoneuron exhibits a discharge consisting of distinct phases in response to synaptic currents, under the influence of persistent inward currents. As such, it is now reasonable to state the human motor units across the pool are also under the control of gain modulation via some neuromodulatory effects in addition to synaptic inputs arising from ionotropic effects.
  
  (2) The firing scheme across the entire motoneuron pool revealed in this study reconciles the discrepancy in firing organization under debate; i.e., whether it is 'onion skin' like or not (Heckman and Enoka 2012). The onion skin like model states that the low threshold motor units discharge higher than high threshold motor units and have been held for a long time because the firing behaviors were examined in a partial range of contraction force range due to technical limitations. This reconciliation is crucial because it is fundamental to modelling the organization of motor unit recruitment and rate coding to achieve a desired force generation to advance our understanding of motor control.
  
  (3) The extensive data collection with a novel blind source separation algorithm on the expanded number of channels of surface EMG signal provides a robust dataset that enhances the reliability and validity of findings, setting a new standard for empirical studies in the field.
  
  Collectively, this study fills several knowledge gaps in the field and advances our understanding of the mechanism underlying the isometric force generation.
  
  We thank the reviewer for their positive appreciation of our work.
  
  Weaknesses:
  
  Although the findings and claims based on them are mostly well aligned, some accounts of the methods and claims need to be clarified.
  
  (1) The authors examine the input-output function of a motor unit by constructing models, using force as an input and discharge rate as an output. It sounds circular, or the other way around to use the muscle force as an input variable, because the muscle force is the result of motor unit discharges, not the cause that elicits the discharges. More specifically, as a result of non-linear interactions of synchronous and/or asynchronous discharges of a population of a given motoneuron pool that give rise to transient increase/maintenance in twitch force, the gross muscle force is attained. I acknowledge that it is extremely challenging experimentally to measure synaptic currents impinging upon the spinal motoneurons in human subjects and the author has an assumption that the force could be used as a proxy of synaptic currents. However, it is necessary to explicitly provide the caveats and rationale behind that. Force could be used as the input variable for modelling.
  
  Force is indeed used in this study as a proxy of the common excitatory synaptic currents as their direct measurement is not possible in vivo in humans. It is worth noting that this approach has been extensively used in the past by many groups to study rate coding (e.g., Monsters & Chan, De Luca’s, Heckman’s, and Fuglevand’s groups). Heckman’s, Gorassini’s, Fuglevand’s groups and others have considered the non-linearities in the relation between motor unit firing rates and muscle force in humans as an indicator of the impact of neuromodulation on motor unit behaviour and changes of the intrinsic properties of motoneurons.
  
  One could also use the cumulative spike train as a more direct estimate of common excitatory inputs, assuming that it is possible to identify a group of motor units not influenced by PICs, as done when selecting a reference low-threshold motor neuron in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020). However, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did to generate force/firing rate relations on the widest range of force.
  
  We added a sentence in the discussion to highlight this limitation (P19, L470):
  
  ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020)’.
  
  (2) The authors examine the firing organizations in TA and VL in this study without explicit purposes and rationale for choosing these muscles. The lack of accounts makes it hard for the readers to interpret the data presented, particularly in terms of comparing the results from the different muscles.
  
  We wanted to compare the rate coding of pools of motor units from proximal (VL) and distal (TA) muscles within the lower limb. Indeed, distal and proximal muscles exhibit differences in rate coding and spatial recruitments (De Luca et al., 1982, J Physiol), potentially due to different levels of recurrent inhibition (Cullheim & Kellerth, 1978, J Physiol; Rossi & Mazzocchio, 1991, Exp Brain Res; Edgley et al., 2021, J Neurosci) or different levels of neuromodulation depending on their involvement (or not) in postural control (Hoonsgaard et al., 1988, J Physiol; Kim et al., 2020, J Neurophysiol).
  
  We added a paragraph at the beginning of the result section to support our muscle choice (P6; L137): ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.
  
  Another factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume (Farina et al., 2003, IEEE Trans Biomed Eng)
  
  (3) In the methods, the author described the manual curation process after applying the blind source separation algorithm. For the readers to understand the whole process of decomposition and to secure rigor and robustness of the analyses, it would be necessary to provide details on what exact curation is performed with what criteria.
  
  The manual curation of EMG decomposition with blind source separation is different from what is classically done with intramuscular EMG and template-matching algorithms.
  
  In short, our decomposition algorithm uses fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a set of weights, i.e., a separation vector, for each motor unit. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and only a few samples close to one (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units).
  
  The manual curation consists of inspecting the automatic detection of the peaks of the motor unit pulse train and manually add missed peaks (missed discharge times) or remove wrongly detected peaks. Then, the separation vector is updated using the correct discharge times and the motor unit pulse train recalculated. This procedure generally improves the distance between the discharge times and the noise, which confirm the accuracy of the manual curation. If that’s not the case, the motor unit is discarded from the analyses.
  
  We added a section on manual editing in the methods (P23, L615):
  
  ‘At the end of these automatic steps, all the motor unit pulse trains and identified discharge times were visually inspected, and manual editing was performed to correct the false identification of artifacts or the missed discharge times (Del Vecchio et al., 2020; Hug et al., 2021; Avrillon et al., 2023). The manual editing consisted of i) removing the spikes causing erroneous discharge rates (outliers), ii) adding the discharge times clearly separated from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the entire EMG signals, and v) repeating this procedure until the selection of all the discharge times is achieved. The manual editing of potential missed discharge times and falsely identified discharge times was never immediately accepted. Instead, the procedure was consistently followed by the application of the updated motor unit separation vector on the entire EMG signals to generate a new motor unit pulse train. Then, the manual editing was only accepted when the silhouette value increased or stayed well above the threshold of 0.9 quantified with the silhouette value (Negro et al., 2016b). Only these motor units were retained for further analysis.’
  
  (4) In Figure 3, the early recruited units tend to become untraceable in the higher range of contraction. This is more pronounced in the muscle VL. This limitation would ambiguate the whole firing curve along the force axis and therefore limitation and the applicability in the different muscles needs to be discussed.
  
  The loss of low threshold motor units in the higher range of contractions was caused either by the decrease in signal-to-noise ratio for small motor units when many larger ones are recruited, or by the cancellation of the surface action potentials of the small units in the interference electromyographic signal, or by the recruitment of a motor unit with a very similar spatio-temporal filter (an example is shown in the figure below). In the latter case, the motor unit pulse train contains peaks that represent the discharge times of both motor units (green and red dots in the simulated example below), making them undistinguishable by the operator during manual editing.
  
  Author response image 1.
  
  This was discussed in the results (P7; L190):
  
  ‘On average, we tracked 67.1 ± 10.0% (25th–75th percentile: 53.9 – 80.1%) of the motor units between consecutive contraction levels (10% increments, e.g., between 10% and 20% MVC) for TA and 57.2 ± 5.1% (25th–75th percentile: 46.6 – 68.3%) of the motor units for VL (Figure S2). There are two explanations for the inability to track all motor units across consecutive contraction levels. First, some motor units are recruited at higher targets only. Second, it is challenging to track small motor units beyond a few contraction levels due to a lower signal-to-noise ratio for the small motor units when larger motor units are recruited, or signal cancellation (Keenan et al., 2005; Farina et al., 2014a).’
  
  However, we believe that it had a limited impact on the output of the paper, as the non-linear portion of the rate coding/force relation due to the persistent inward currents occurs during the first seconds after recruitment, before plateauing (for a review see Binder et al., 2020, Physiology).
  
  (5) It is unclear how commonly the notion "the long-held belief that rate coding is similar across motor units from the same pool" is held among the community without a reference. Different firing organizations have been modelled and discussed in the seminal paper by Fuglevand et al. (1993) and as far as I understand, the debate has not converged to a specific consensus. As such, any reference would be required to support the claim the notion is widely recognized.
  
  In the paper of Fuglevand et al., (1993, J Neurophysiol), all the motor units had the same rate coding pattern relative to the excitatory input, though they changed the slope of the relations and the saturation threshold of motor units between simulations. This is similar to the paper of De Luca & Contessa (2012, J Neurophysiol), where the equation used to simulate the rate coding was non-linear, but consistent across motor units.
  
  We added these citations to the text:
  
  ‘Overall, we found that motor units within a pool exhibit distinct rate coding with changes in force level (Figure 2 and 3), which contrasts with the long-held belief that rate coding is similar across motor units from the same pool (Fuglevand et al., 1993; De Luca and Contessa, 2012).’
  
  (6) The authors claim that the firing behavior as a function of force is well characterized by a natural logarithmic function, which consists of initial steep acceleration followed by a modest increase in firing rate. Arguably the gain modulation in firing rate could be attributed to a neuromodulatory effect on the spinal motoneuron, which has been suggested by a number of animal studies. However, the complexity of the interactions between ionotropic and neuromodulatory inputs to motoneurons may require further elucidation to fully understand the mechanisms of neural control; it is possible to consider the differential acceleration among different threshold motor units as a differential combinatory effect of ionotropic and neuromodulatory inputs, but it is not trivially determined how differentially or systematically the inputs are organized. Likewise, the authors make an account for the difference in firing rate between TA and VL in terms of different amounts or balances of excitatory and inhibitory inputs to the motoneuron pool, but again this could be explained by other factors, such as a different extent of neuromodulatory effects. To determine the complexity of the interactions, further studies will be warranted.
  
  We appreciate the reviewer’s view on this point, as we indeed only indirectly inferred the combination of neuromodulatory and ionotropic inputs to motoneurons in this study. A more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required in the future to directly highlight the mechanisms responsible for these variations in rate coding within pools. However, it is also worth noting that the acceleration in firing rate, the increase in firing rate during the ramp up, and the hysteresis between ramps up and downs have been used to infer the distribution of ionotropic and neuromodulatory inputs from the firing rate/force relations (Johnson et al., 2017; Beauchamp et al., 2023; Chardon et al., 2023). This approach has been validated with hundreds of thousands of simulations using a biophysical model of motor neurons (Chardon et al., 2023). There is also a series of studies in humans showing how the absence of neuromodulation modulated via inhibitory inputs (Revill & Fuglevand, 2017) or medication blocking serotonin receptors (Goodlich et al., 2023) impact the non-linearity of the firing rate/force relation. Therefore, we are confident that the differences observed within and between pools are linked to different distribution of excitatory/inhibitory inputs and neuromodulation.
  
  We added a sentence in the discussion to highlight this point (P18; L435):
  
  ‘Taken together, these results show how ionotropic and neuromodulatory inputs to motoneurons uniquely combine to generate distinct rate coding across the pool, even if a more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required to directly estimate their interactions.’
  
  (7) It is unclear with the account " ... the bandwidth of muscle force is < 10Hz during isometric contraction" in the manuscript alone, and therefore, it is difficult to understand the following claim. It appears very interesting and crucial for motor unit discharge and force generation and maintenance because it would pose a question of why the discharge rate of most motor units is higher than 10Hz, despite the bandwidth being so limited, but needs to be elaborated.
  
  We described the slow fluctuations in smoothed firing rates associated with the variations in force observed during isometric contractions. The bandwidth of muscle force is lower than 10Hz due to the contractile properties of muscle tissues (Baldissera et al., 1998, J Physiol). Having an average firing rate higher than this bandwidth enables the pool of motor neurons to effectively transmit the common inputs (the main discriminant of muscle force) over this bandwidth without distortion (Farina et al., 2014, J Physiol). Increasing the firing rate beyond the muscle bandwidth also increases the power of the spike train at the direct current frequency (frequency equal to 0) since this power is related to the number of spikes per second. Thus, increasing the firing rate well beyond the muscle bandwidth still has a clear effect in force. To illustrate this point, note that electrical stimuli delivered at 100 Hz can lead to an increase in muscle force.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The motivation for this study is to provide a comprehensive assessment of motor unit firing rate responses of entire pools during isometric contractions. The authors have used new quantitative methods to extract more unique motor units across contractions than prior studies. This was achieved by recording muscle fibre action potentials from four high-density surface electromyogram (HDsEMG) arrays (Caillet et al., 2023), quantifying residual EMG comparing the recorded and data-based simulation (Figure 1A-B), and developing a metric to compare the spatial identification for each motor unit (Figure 1D-E). From identified motor units, the authors have provided a detailed characterization of recruitment and firing rate responses during slow voluntary isometric contractions in the vastus lateralis and tibialis anterior muscles up to 80% of maximum intensity. In the lower limb, it is interesting how lower threshold motor units have firing rate responses that saturate, whereas higher threshold units that presumably produce higher muscle contractile forces continue to increase their firing rate. In many ways, these results agree with the rate coding of motor units in the extensor digitorum communis muscle (Monster and Chan, 1977). The paper is detailed, and the analyses are well explained. However, there are several points that I think should be addressed to strengthen the paper.
  
  We thank the reviewer for their positive appreciation of our work.
  
  General comments:
  
  (1) The authors claim they have measured the complete rate coding profiles of motor units in the vastus lateralis and tibialis anterior muscles. However, this study quantified rate coding during slow and prolonged voluntary isometric contractions whereas the function of rate coding during movements (Grimby and Hannerz, 1977) or more complex isometric contractions (Cutsem and Duchateau, 2005; Marshall et al., 2022) remains unexplored. For example, supraspinal inputs may not scale the same way across low and higher threshold motor units, or between muscles (Devanne et al., 1997), making the response of firing rates to increasing isometric contraction force less clear.
  
  We agree with the reviewer that rate coding strategies may vary with the velocity and the type of contractions (Duchateau & Enoka, 2008, J Physiol). It is thus likely that the firing rate would increase during the first milliseconds of fast contractions, with the occurrence of doublets (Cutsem and Duchateau, 2005, J Physiol; Del Vecchio et al., 2019, J Physiol), or that motor unit firing rate may be lower during lengthening than shortening contractions (Duchateau & Enoka, J Physiol).
  
  However, the decomposition of EMG signals in non-stationary conditions remains challenging, and is still limited to slow varying patterns of force (Chen et al., 2000, Oliveira & Negro, 2021, Mendez Guerra et al., 2024, Yeung et al., 2024). Future methodological developments will be required to expand our findings to other patterns of force.
  
  Conceptually, the authors focus on the literature on intrinsic motoneurone properties, but in vivo, other possibilities are that descending supraspinal drive, spinal network dynamics, and afferent inputs have different effects across motor unit sizes, muscles, and types of contractions. Also, the influence from local muscles that act as synergists (e.g., vastii muscles for the vastus lateralis, and peroneal muscles that evert the foot for the tibialis anterior) or antagonists (coactivation during higher contraction intensities would stiffen the joint) may provide differential forms of proprioceptive feedback across motor pools.
  
  The reviewer is right that differences in spinal network dynamics and afferent inputs may explain the differences in rate coding observed between the two muscles. Indeed, computational models have shown how the pattern of inhibitory inputs may affect the increase in firing rate during linear increase in force (Powers & Heckman, 2017, J Neurophysiol; Chardon et al., 2023, Elife). Specifically, the difference observed between proportional inhibitory inputs vs. a push pull pattern mirror the differences observed here between the TA (push-pull like pattern) and the VL (proportional pattern). This difference may reflect the impact of various pathways of inhibition, such as reciprocal inhibition or recurrent inhibition from homonymous motor units or motor units from synergistic muscles.
  
  These points have been further discussed in the manuscript (P19; L475):
  
  ‘The increase in firing rate was also significantly greater for TA motor units than for those in VL. This difference may reflect a varying balance between excitatory/inhibitory synaptic inputs and neuromodulation due to multiple spinal circuits (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Specifically, the strength of recurrent and reciprocal inhibitory inputs to motoneurons innervating VL and TA, and their proportional or inverse covariation with excitatory inputs, respectively, may explain the differences in rate limiting and maximal firing rates (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Thus, the motor units from the VL may receive more recurrent inhibition than those of distal muscles, though direct evidence of these differences remains to be found in humans (Windhorst, 1996). Interestingly, similar differences in rate coding were previously observed between proximal and distal muscles of the upper limb (De Luca et al., 1982). However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’
  
  (2) The evidence that the entire motor unit pool was recorded per muscle is not clear. There appears to be substantial residual EMG (Figure 1B), signal cancellation of smaller motor units (lines 172-176), some participants had fewer than 20 identified motor units, and contractions never went above 80% of MVC. Also, to my understanding, there remains no gold-standard in awake humans to estimate the total motor unit number in order to determine if the entire pool was decomposed.
  
  The reviewer is right that we did not decode the full pool of motor units. As indicated in the initial version of the manuscript (e.g. title, introduction), we considered that we identified an extensive sample of motor units representative of the dynamic of the pool. This claim was supported by the identification of motor units with recruitment thresholds ranging from 0 to 75% of the maximal force.
  
  This statement was in the introduction (P4; L109): ‘We were able to identify up to ~200 unique active motor units per muscle and per participant in two human muscles in vivo, yielding extensive samples of motor units that are representative of the entire motoneuron pools (Caillet et al., 2023a).’
  
  Furthermore, using four HDsEMG arrays also raises questions about how some channels were placed over non-target muscles, and if motor units were decomposed from surrounding synergists.
  
  A factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume.
  
  (3) The authors claim (Abstract L51; Discussion L376) that a commonly held view in the field is that rate coding is similar across motor units from the same pool. Perhaps this is in reference to some studies that have carefully assessed lower threshold motor units during lower force ramp contractions (e.g., Fuglevand et al., 2015; Revill and Fuglevand, 2017). However, a more complete integration of the literature exploring motor unit firing rate responses during rapid isometric contractions, comparing different muscles and contraction intensities would be helpful. From Figure 3, the range of rate coding in the tibialis anterior (~7-40 Hz) is greater than the vastus lateralis (~5-22 Hz) muscle across contraction levels. In agreement with other studies, the range of rate coding within some muscles is different than others (Kirk et al., 2021) and during maximal intensity (Bellemare et al., 1983) or rapid contractions (Desmedt and Godaux, 1978). Likewise, within a motor pool, there is a diversity of firing rate responses across motor units of different sizes as a function of isometric force (Monster and Chan, 1977; Desmedt and Godaux, 1977; Kukula and Clamann, 1981; Del Vecchio et al., 2019; Marshall et al., 2022). A strength of this paper is how firing rate responses are quantified across a wide range of motor unit recruitment thresholds and between two muscles. I suggest improving clarity for the general reader, especially in the motivation for testing two lower limb muscles, and elaborating on some of the functional implications.
  
  We thank the reviewer for his input on this question. We have added references to these works and lines of research in the discussion:
  
  (P18; L449): ‘In addition, rate coding patterns should also vary with the pattern of contractions, with fast contractions lowering the range of recruitment thresholds within motoneuron pools (Desmedt and Godaux, 1977b, 1979; van Bolhuis et al., 1997). The variability in rate coding observed here between motor units from the same pool could lead to small deviations from the size principle sometimes observed between pairs of units during isometric contractions with various patterns of force (Desmedt and Godaux, 1979; Marshall et al., 2022) or during the derecruitment phase (Bracklein et al., 2022).’ (P19; L487): ‘However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’
  
  In addition to the responses above, we have added a section at the beginning of the results to motivate the choice of the muscles (P6; L137):
  
  ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  This is an interesting manuscript that uses state-of-the-art experimental and simulation approaches to quantify motor unit discharge patterns in the human TA and VL. The non-linear profiles of motor unit discharge were calculated and found to have an initial acceleration phase followed by an attenuation phase. Lower threshold motor units had a larger gain of the initial acceleration whereas the higher threshold motor unit had a higher gain in the attenuation phase. These data represent a technical feat and are important for understanding how humans generate and control voluntary force.
  
  Strengths:
  
  The authors used rigorous, state-of-the-art analyses to decompose and validate their motor unit data during a wide range of voluntary efforts.
  
  The analyses are clearly presented, applied, and visualized.
  
  The supplemental data provides important transparency.
  
  We thank the reviewer for their positive appreciation of our work.
  
  Weaknesses:
  
  The number of participants and muscles tested are quite small - particularly given the constraints on yield. It is unclear if this will translate to other motor pools. The justification for TA and VL should be provided.
  
  One strength of our study is to provide relations between key-parameters of rate coding (acceleration in firing rate, increase in firing rate, hysteresis) and the recruitment thresholds of motor units within two different pools, and for each individual participant. These relations were consistent across all the participants (Figures 2 to 4), making us confident that increasing the sample size would not change the conclusions of the study.
  
  It is likely that the differences observed here between the VL and TA will also appear between other muscles of the leg, due to differences in the arrays of excitatory and inhibitory inputs they receive, the pattern of inhibitory inputs during increases in force (recurrent/reciprocal inhibition), and different levels of neuromodulation (Johnson et al., 2017, J Neurophysiol; Beauchamp et al., 2023; J Neural Eng). We have added a paragraph in the results to motivate our choice of muscles (P6; L137):
  
  ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.
  
  While an impressive effort was made to identify and track motor units across a range of contractions, it appears that a substantial portion of muscle force was not identified. Though high-intensity contractions are challenging to decompose - the authors are commended for their technical ability to record population motor unit discharge times with recruitment thresholds up to 75% of a participant's maximal voluntary contractions. However previous groups have seen substantial recruitment of motor units above 80% and even 90% maximum activation in the soleus. Given the innervation ratios of higher threshold motor units, if recruitment continued to 100%, the top quartile would likely represent a substantial portion of the traditional fast-fatigable motor units. It would be highly interesting to understand the recruitment and rate coding of the highest threshold motor units, at a minimum I would suggest using terms other than "entire range" or "full spectrum of recruitment thresholds"
  
  Motor units were indeed identified between 0 and 80% of the maximal force in this study. This is due to the requirements of the decomposition algorithm that needs sustained and stable contraction to converge toward a set of separation vectors that generate sparse spike trains. Thus, it was not possible for our participants to sustain contractions above 80%MVC without generating fatigue.
  
  However, it is important to note that only a few motor units are recruited above 80% of the maximal force in the TA (Van Cutsem et al., 1998, J Physiol), as well as in other muscles of the lower limb (Oya et al., 2009, J Physiol; Aeles et al., 2020, J Neurophysiol). Thus, we may have only missed a few motor units recruited above 80% of the maximal force. Nevertheless, we removed the terms ‘full spectrum of recruitment thresholds’ and ‘entire range’ from the manuscript to now read ‘most of the spectrum of recruitment thresholds observed in humans.’.
  
  The quantification of hysteresis using torque appears to make self-evident the observation that lower threshold motor units demonstrate less hysteresis with respect to torque. If there is motor unit discharge there will be force. I believe this limitation goes beyond the floor effects discussed in the manuscript. Traditionally, individuals have used the discharge of a lower threshold unit as the measure on which to apply hysteresis analyses to infer ion channel function in human spinal motoneurons.
  
  We agree with the reviewer that the hysteresis is classically estimated using the firing rate of a ‘reporter unit’ with the delta F method (introduced in humans by Gorassini et al..), or most recently with the advances in motor unit identification using the cumulative spike train of the identified motor unit. The researchers use this data as a proxy of the synaptic drive, and compare their values at recruitment and derecruitment thresholds of the ‘test unit’.
  
  As mentioned above in response to reviewer 1, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force. This limitation is now highlighted in the discussion section (P19; L470): ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’.
  
  The main findings are not entirely novel. See Monster and Chan 1977 and Kanosue et al 1979.
  
  We agree with the reviewer that the results of the paper are remarkably aligned with previous experimental findings in humans, in animals, or with in vitro and in silico models. However, we believe that our study shows in humans the incredible variety of rate coding patterns within a pool of motor units that span most of the spectrum of recruitment thresholds observed in humans. It also highlights the variability of rate coding patterns between motor neurons that have a similar recruitment threshold. Finally, we observe differences between pools of motor neurons innervating two different muscles in the lower limb, mirroring what has been done in the past in the upper limb muscle.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  The wording 'decode' across the manuscript may sound somewhat unsuitable for the context, because 'decode' would involve interpreting the signals and activities to understand how they relate to specific variables or proxies of behavior. Here in this study it does not necessarily involve the interpretation, but sounds to be used for decomposing the signal into the constituent motor units. As such, it might be appropriate to use other words such as decompose, read out, or extract.
  
  ‘Decode’ was removed from the manuscript to now read motor unit ‘identification’
  
  Reviewer #2 (Recommendations For The Authors):
  
  Figures 1 and 2 are informative and interesting. Figures 3 and 4 are harder to interpret. For example, in Figure 4, data plotted along the diagonal is overplotted and not as informative.
  
  For the sake of clarity, we separated the lines of the fits and the scatter plots in in the right panels in Figure 3. In Figure 4, we remove the scatter plots and only reported the lines of the fits for each participant.
  
  Do you think the different durations of the isometric plateau across contraction intensities influenced motor unit derecruitment? Longer duration in lower threshold motor units would have resulted in a larger effect of PICs?
  
  We did not find an effect of the duration of the plateau on the derecruitment threshold. Notably, a computational study found that the duration of the plateau may impact the delta F, due to the combination of PICs, spike threshold accommodation and spike frequency adaptation (Revill & Fuglevand, 2011, J Neurophysiol). However, we did not use the delta F value here to estimate the effect of PICs on the hysteresis.
  
  L703. For the measure of firing rate hysteresis the difference between recruitment and derecruitment was calculated, but why not use the delta-F method? This is more commonly used to assess hysteresis as a rough estimate of intrinsic dynamics.
  
  As further discussed above, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force.
  
  This was mentioned in the discussion (P19; L470):
  
  ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’
  
  L144. The standard deviation seems high. Some participants had fewer than 20 motor units and your number of participants per muscle was eight, could you state the complete range?
  
  A table was added in the results section to indicate the yields of the decomposition per contraction.
  
  If other studies are able to randomly sample motor units with intramuscular electrodes does this also represent an estimate of rate coding from the 'entire' pool? One criticism of HDsEMG arrays is that they are biased towards decomposing superficial larger motor units and in the male sex.
  
  The decomposition of EMG signals recorded with arrays of surface electrodes is indeed biased toward the identification of motor units with the larger action potentials in the signal (large and superficial; Farina & Holobar, 2016, Proceedings of IEEE). We took advantage of the latter limitation by performing successive contractions at different levels of force with the objective to identify the last recruited motor units (larger units according to the size principle), while tracking the smaller ones. In that way, we were able to sequentially identify motor units recruited from 0% to 75% of the maximal force. A similar approach could be applied to selective intramuscular electrodes. However, because identifying motor units up to maximal force requires a highly selective pair of fine wires or needle electrodes, the procedure described above should be repeated hundreds of times to reach the same samples as those obtained in our study.
  
  L151-161. The ratio between simulated and decomposed surface EMG reached 55% for the TA and 70% for the VL. How does this provide support that the "entire" MU pool was sampled?
  
  As said above, we do not identify all the motor units during each contraction, but rather the larger ones with the larger action potentials within the EMG signals. However, we used here a sequential approach to identify new motor units during each trial while tracking smaller units. In that way, we were able to sequentially identify on average 130 motor units per muscle.
  
  To avoid any confusion, we removed the references to ‘entire’ pools in the manuscript.
  
  L266. How is it possible that in some participants no motor units were recruited below 5% of MVC? Do the authors suspect they produced force from synergist muscles or that the decomposition failed to identify these presumably smaller and deeper motor units?
  
  This mostly results from the limitations of the decomposition algorithm. In these participants, it is likely that the decomposition was biased toward motor units only active during the plateau of force or recruited at the end of the ramp.
  
  Figure 2B. Do the higher threshold motor units with linear responses receive more inhibitory input (coactivation) or are devoid of large PIC effects?
  
  Were antagonist muscles recorded? During higher contraction intensities, greater antagonist coactivation in some trials or participants may have linearized the firing rate profiles (e.g., Revill and Fuglevand, 2017).
  
  L427. This is a neat finding that higher threshold motor units are less likely to have the functional hallmark of a strong PIC effect and may therefore be more representative of extrinsic inputs. Could this be an advantage to increase the precision of stronger contractions or reduce the fatigability of muscle fibres during repeated strong contractions?
  
  Synaptic contacts with Renshaw cells (Fyffe, 1991, J Neurophysiol) and Ia inhibitory interneurons (Heckman & Binder, 1991, J Neurophysiol) are widespread within pools of motor units, which induces homogeneously distributed inhibitory inputs. However, the amplitude of these inhibitory inputs can increase with muscle force. We found that the EMG amplitude of the soleus and the gastrocnemius medialis recorded with bipolar EMG during the dorsiflexion increased with the force. Therefore, the higher inhibitory at higher force may also contribute to the linearisation of the force/firing rate relations observed with high threshold motor neurons, as suggested by Revill and Fuglevand (2017, J Physiol).
  
  We discussed this point in the new version of the manuscript (P17; L415):
  
  ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’.
  
  In Figure 2B, it makes sense that linear firing rate responses occur later in the ramp contraction when myotendinous slack is lower. Do the authors think contractile dynamics are matched to the firing rate profiles?
  
  To our knowledge, there is no direct data on the link between the linearity of the force/firing rate relation and the stiffness of the tendon. A recent work from Mazzo et al. (2021, J Physiol) has shown that repeated stretches of calf muscles, which induce a decrease in their stiffness, induced an increase in motor unit firing rate at low levels of forces. This indicates that the contractile properties of the muscle may potentially also impact the profile of rate coding when considered as function of force.
  
  We added this point in the discussion (P20; L512):
  
  ‘On a different note, the steep increase in firing rate over the first percentages of the ramp-up may also enable the motor units to produce the required level of force despite having a more compliant muscletendon unit (Mazzo et al., 2021).’
  
  L371. It is likely that Marshall et al., 2022, recorded over 100 unique motor units from the same animal.
  
  The reviewer is right that Marshall may have identified hundreds of motor units across sessions in one non-human primate. However, there is no ways to verify this statement as they used fine wire electrodes inserted in different locations in each session, which made it impossible to verify the uniqueness of each identified unit. Conversely, we verified in our study that all the motor units were unique using the distribution of their surface action potentials across the 236 surface electrodes.
  
  L378. What do the authors mean by "rate coding is similar"? I find this statement confusing. Is this regarding the absolute firing rate range, response to force increases, hysteresis, or how they scale with contraction intensity?
  
  This statement was removed from the discussion to avoid any confusion.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors may want to consider other mechanisms of the linearization of discharge rates of medium and high threshold motor units. Monica's work may suggest that, over time, there is a subthreshold activation of the PIC, which serves to linearize the eventual suprathreshold activation underlying repetitive discharge. Additionally, Andy has shown that inhibitory drive from cutaneous inputs can linearize the initial acceleration of low threshold motor units - cutaneous inputs, or even Ib inputs, may be greater later in the contraction and serve to linearize discharge rates.
  
  We thank the reviewer for their input on the discussion, where we now discuss this point:
  
  ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’.
  
  Lines 433 - intrinsic properties, in particular the afterhyperpolarization, will likely influence maximal discharge rate and provide a ceiling to the change in firing rate.
  
  This point is now discussed in the draft (P17; L428):
  
  ‘This difference may be explained by smaller excitatory synaptic inputs onto low- than high-threshold motoneurons (Powers and Binder, 2001; Heckman and Enoka, 2012), lower synaptic driving potential of the dendritic membrane (Powers and Binder, 2000; Cushing et al., 2005; Fuglevand et al., 2015), and longer and larger afterhyperpolarisation phase in low- than high-threshold motoneurons (Bakels and Kernell, 1993; Gardiner, 1993; Deardorff et al., 2013; Caillet et al., 2022).’
  
  The actual yield per contraction is not entirely clear. Figure S2 is quite nice in this regard, but a table with this and other information on it may be helpful. This would help with the beginning of the abstract and discussion when it is stated that, on average over 100 motor units were identified per person.
  
  We added a table in the results to give the number of motor units identified per contraction.
  
  Are the thin film units represented in S2 and S3?
  
  Only motor units identified from signals recorded with arrays of surface electrodes are presented in figures S2 and S3.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.25.568607v2
www.biorxiv.org www.biorxiv.org

Reorganization of the Flagellum Scaffolding Induces a Sperm Standstill During Fertilization

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This important work advances our understanding of sperm motility regulation during fertilization by uncovering the midpiece/mitochondria contraction associated with motility cessation and structural changes in the midpiece actin network as its mode of action. The evidence supporting the conclusion is solid, with rigorous live cell imaging using state-of-the-art microscopy, although more functional analysis of the midpiece/mitochondria contraction would have further strengthened the study. The work will be of broad interest to cell biologists working on the cytoskeleton, mitochondria, cell fusion, and fertilization. Strengths: The authors demonstrate that structural changes in the flagellar midpiece F-actin network are concomitant to midpiece/mitochondrial contraction and motility arrest during sperm-egg fusion by rigorous live cell imaging using state-of-art microscopy.
  
  Response P1.1: We thank the reviewer for her/his positive assessment of our manuscript.
  
  Weaknesses:
  
  Many interesting observations are listed as correlated or in time series but do not necessarily demonstrate the causality and it remains to be further tested whether the sperm undergoing midpiece contraction are those that fertilize or those that are not selected. Further elaboration of the function of the midpiece contraction associated with motility cessation (a major key discovery of the manuscript) would benefit from a more mechanistic study.
  
  Response P1.2: We thank the reviewer for this point. We have toned down some of our statements since some of the observations are indeed temporal correlations. We will explore some of these possible connections in future experiments. In addition, we have now incorporated additional experiments and possible explanations about the function of the midpiece contraction.
  
  Reviewer #2 (Public Review):
  
  (1) The authors used various microscopy techniques, including super-resolution microscopy, to observe the changes that occur in the midpiece of mouse sperm flagella. Previously, it was shown that actin filaments form a double helix in the midpiece. This study reveals that the structure of these actin filaments changes after the acrosome reaction and before sperm-egg fusion, resulting in a thinner midpiece. Furthermore, by combining midpiece structure observation with calcium imaging, the authors show that changes in intracellular calcium concentrations precede structural changes in the midpiece. The cessation of sperm motility by these changes may be important for fusion with the egg. Elucidation of the structural changes in the midpiece could lead to a better understanding of fertilization and the etiology of male infertility. The conclusions of this manuscript are largely supported by the data, but there are several areas for improvement in data analysis and interpretation. Please see the major points below.
  
  Response P2.1: We thank the reviewer for the positive comments.
  
  (2) It is unclear whether an increased FM4-64 signal in the midpiece precedes the arrest of sperm motility. in or This needs to be clarified to argue that structural changes in the midpiece cause sperm motility arrest. The authors should analyze changes in both motility and FM4-64 signal over time for individual sperm.
  
  Response P2.2 : We have conducted single cell experiments tracking both FM4-64 and motility as the reviewer suggested (Supplementary Fig S1). We have observed that in all cases, cells gradually diminished the beating frequency and increased FM4-64 fluorescence in the midpiece until a complete motility arrest is observed. A representative example is shown in this Figure but we will reinforce this concept in the results section.
  
  (3) It is possible that sperm stop moving because they die. Figure 1G shows that the FM464 signal is increased in the midpiece of immotile sperm, but it is necessary to show that the FM4-64 signal is increased in sperm that are not dead and retain plasma membrane integrity by checking sperm viability with propidium iodide or other means.
  
  Response P2.3: This is a very good point. In our experiments, we always considered sperm that were motile to hypothesize about the relevance of this observation. We have two types of experiments:
  
  (1) Sperm-egg Fusion: In experiments where sperm and eggs were imaged to observe their fusion, sperm were initially moving and after fusion, the midpiece contraction (increase in FM4-64 fluorescence was observed) indicating that the change in the midpiece (that was observed consistently in all fusing cells analyzed), is part of the process.
  
  (2) Sperm that underwent acrosomal exocytosis (AE): we have observed two behaviours as shown in Figure 1:
  
  a) Sperm that underwent AE and they remain motile without midpiece contraction (they are alive for sure);
  
  b) Sperm that underwent AE and stopped moving with an increase in FM464 fluorescence. We propose that this contraction during AE is not desired because it will impede sperm from moving forward to the fertilization site when they are in the female reproductive tract. In this case, we acknowledge that the cessation of sperm motility may be attributed to cellular death, potentially correlating with the increased FM4-64 signal observed in the midpiece of immotile sperm that have undergone AE. To address this hypothesis, we conducted image-based flow cytometry experiments, which are well-suited for assessing cellular heterogeneity within large populations.
  
  Author response image 1 illustrates the relationship between cell death and spontaneous AE in noncapacitated mouse sperm, where intact acrosomes are marked by EGFP. Cell death was evaluated using Sytox Blue staining, a dye that is impermeable to live cells and shows affinity for DNA. AE was assessed by the absence of EGFP in the acrosome.
  
  Author response image 1a indicates a lack of correlation between Sytox and EGFP fluorescence. Two populations of sperm with EGFP signals were found (EGFP+ and EGFP-), each showing a broad distribution of Sytox signal, enabling the distinction between cells that retain plasma membrane integrity (live sperm: Sytox-) and those with compromised membranes (dead cells: Sytox+). The observed bimodal distribution of EGFP signal, regardless of live versus dead cell populations, indicates that the fenestration of the plasma membrane known to occur during AE is a regulated process that does not necessarily compromise the overall plasma membrane integrity.
  
  These observations are reinforced by the single-cell examples in Author response image 1b, where we were able to identify sperm in four categories: live sperm with intact acrosome (EGFP+/Sytox-), live sperm with acrosomal exocytosis (EGFP-/Sytox-), dead sperm with intact acrosome (EGFP+/Sytox+), and dead sperm with AE (EGFP-/Sytox+). Note the case of AE (lacking EGFP signal) which bears an intact plasma membrane (lacking Sytox Blue signal). Author response image 2 shows single-cell examples of the four categories observed with confocal microscopy to reinforce the observations from Author response image 1a.
  
  Author response image 1.
  
  Fi. Image based flow cytometry analysis (ImageStream Merk II), of non-capacitated mouse sperm, showing the distribution of EGFP signal (acrosome integrity) against Sytox Blue staining (cell viability). (A) The quadrants show: Sytox Blue + / EGFP low (17.6%), Sytox Blue + / EGFP high (40.1%), Sytox Blue - / EGFP high (20.2%), and Sytox Blue - / EGFP low (21.7%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented in a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A). The top row displays sperm with compromised plasma membrane integrity (Sytox Blue +), showing low (left) and high (right) EGFP signals. The bottom row shows sperm with intact plasma membrane (Sytox Blue -), displaying high (left) and low (right) EGFP signal. It is worth noting that when analyzing the percentages in (A), we observed that the data also encompass a population of headless flagella, which was present in all observed categories. Therefore, the percentages should be interpreted with caution.
  
  Author response image 2.
  
  Confocal Microscopy Examples of AE and cell viability. The top row features sperm with compromised plasma membrane integrity (Sytox Blue +) and high EGFP expression; the second row displays sperm with compromised membrane and low EGFP expression; the third row illustrates sperm with intact membrane (Sytox Blue -) and high EGFP expression; the bottom row shows sperm with intact membrane and low EGFP expression.
  
  Author response images 3-5 provide insight into the relationship between FM4-64 and Sytox Blue fluorescence intensities in non-capacitated sperm (CTRL, Author response image 3), capacitated sperm and acrosome exocytosis events stimulated with 100 µM progesterone (PG, Author response image 4), and capacitated sperm stimulated with 20 µM ionomycin (IONO, Author response image 5). Two populations of sperm with Sytox Blue signals were clearly distinguished (Sytox+ and Sytox-), enabling the discernment between live and dead sperm. Interestingly, the upper right panels of Author response images 3A, 4A, and 5A (Sytox Blue+ / FM4-64 high) consistently show a positive correlation between FM4-64 and Sytox Blue. This observation aligns with the concern raised by Reviewer 2, suggesting that compromised membranes due to cell death provide more binding sites for FM4-64.
  
  Nonetheless, the lower panels of Author response images 3A, 4A and 5A (Sytox Blue-) show no correlation with FM4-64 fluorescence, indicating that this population can exhibit either low or high FM4-64 fluorescence. As expected, in stark contrast with the CTRL case, the stimulation of AE with PG or IONO in capacitated sperm increased the population of live sperm with high FM4-64 fluorescence (Sytox Blue+ / FM4-64 high: CTRL: 7.85%, PG: 8.73%, IONO: 13.5%).
  
  Single-cell examples are shown in Author response images 3B, 4B, and 5B, where the four categories are represented: dead sperm with low FM4-64 fluorescence (Sytox Blue+ / FM4-64 low), dead sperm with high FM4-64 fluorescence (Sytox Blue+ / FM4-64 high), live sperm with low FM4-64 fluorescence (Sytox Blue- / FM4-64 low), and live sperm with high FM4-64 fluorescence (Sytox Blue- / FM4-64 high).
  
  Author response image 3.
  
  Relationship between cell death and FM4-64 fluorescence in capacitated sperm without inductor of RA. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM464 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (13.3%), Sytox Blue+ / FM4-64 high (49.8%), Sytox Blue- / FM4-64 low (28.1%), and Sytox Blue- / FM4-64 high (7.85%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A).
  
  Author response image 4.
  
  Relationship between cell death and FM4-64 fluorescence capacitated sperm stimulated with progesterone. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM4-64 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (9.04%), Sytox Blue+ / FM4-64 high (61.6%), Sytox Blue- / FM4-64 low (19.7%), and Sytox Blue- / FM4-64 high (8.73%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A)
  
  Author response image 5.
  
  Relationship between cell death and FM4-64 fluorescence capacitated sperm stimulated with ionomycin. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM464 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (4.52%), Sytox Blue+ / FM4-64 high (60.6%), Sytox Blue- / FM4-64 low (20.5%), and Sytox Blue- / FM4-64 high (13.5%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A).
  
  Based on the data presented in Author response images 1 to 6, we derive the following conclusions summarized below:
  
  (1) There is no direct relationship between cell death (Sytox Blue-) and AE (EGFP) (Author response images 1 and 2).
  
  (2) There is bistability in the FM4-64 fluorescent intensity. Before reaching a certain threshold, there is no correlation between FM4-64 and Sytox Blue signals, indicating no cell death. However, after crossing this threshold, the FM4-64 signal becomes correlated with Sytox Blue+ cells, indicating cell death (Author response images 4-6).
  
  (3) The Sytox Blue- population of capacitated sperm is sensitive to AE stimulation with progesterone, leading to the expected increase in FM4-64 fluorescence.
  
  Therefore, while the FM4-64 signal alone is not a definitive marker for either AE or cell death, it is crucial to use additional viability assessments, such as Sytox Blue, to accurately differentiate between live and dead sperm in studies of acrosome exocytosis and sperm motility. In the present work, we did not use a cell viability marker due to the complex multicolor, multidimensional fluorescence experiments. However, cell viability was always considered, as any imaged sperm was chosen based on motility, indicated by a beating flagellum. The determination of whether selected sperm die during or after AE remains to be elucidated. The results presented in Figure 2 and Supplementary S1 show examples of motile sperm that experience an increase in FM4-64 fluorescence.
  
  All this information is added to the manuscript (Supplementary Figure 1D).
  
  (4) It is unclear how the structural change in the midpiece causes the entire sperm flagellum, including the principal piece, to stop moving. It will be easier for readers to understand if the authors discuss possible mechanisms.
  
  Response P2.4: As requested, we have incorporated a possible explanation in the discussion section (see line 644-656). We propose three possible hypotheses for the cessation of sperm motility, which can be attributed to the simultaneous occurrence of various events:
  
  (1) Rapid increase in [Ca2+]i levels: A rapid increase in [Ca2+]i levels may trigger the activation of Ca2+ pumps within the flagellum. This process consumes local ATP levels, disrupting glycolysis and thereby depleting the energy required for motility.
  
  (2) Reorganization of the actin cytoskeleton: Alterations in the actin cytoskeleton can lead to changes in the mechanical properties of the flagellum, impacting its ability to move effectively.
  
  (3) Midpiece contraction: Contraction in the midpiece region can potentially interfere with mitochondrial function, impeding the energy production necessary for sustained motility.
  
  (5) The mitochondrial sheath and cell membrane are very close together when observed by transmission electron microscopy. The image in Figure 9A with the large space between the plasma membrane and mitochondria is misleading and should be corrected. The authors state that the distance between the plasma membrane and mitochondria approaches about 100 nm after the acrosome reaction (Line 330 - Line 333), but this is a very long distance and large structural changes may occur in the midpiece. Was there any change in the mitochondria themselves when they were observed with the DsRed2 signal?
  
  Response P2.5: The authors appreciate the reviewer’s observation regarding the need to correct the image in Figure 9A, as the original depiction conveys a misleading representation of the spatial relationship between the mitochondrial sheath and the plasma membrane. This figure has been corrected to accurately reflect a more realistic proximity, while keeping in mind that it is a cartoonish representation.
  
  Regarding the comments about the distances mentioned between former lines 330 and 333, the measurement was not intended to describe the gap between the plasma membrane and the mitochondria but rather the distance between F-actin and the plasma membrane.
  
  Author response image 6 shows high-resolution scanning electron microscopy (SEM) of two sperm fixed with a protocol tailored to preserve plasma membranes (ref), where the insets clearly show the flagellate architecture in the midpiece with an intact plasma membrane covering the mitochondrial network. A non-capacitated sperm with an intact acrosome is shown in panel A, and a capacitated sperm that has experienced AE is shown in panel B.
  
  Notably, the results depicted in Author response image 6 demonstrate that, irrespective of the AE status, the distance between the plasma membrane and mitochondria consistently remains less than 20 nm, thus confirming the close proximity of these structures in both physiological states. As Reviewer 2 pointed out, if there is no significant difference in the distance between the plasma membrane and mitochondria, then the observed structural changes in the actin network within the midpiece should somehow alter the actual deposition of mitochondria within the midpiece. Figure 5D-F shows that midpiece contraction is associated with a decrease in the helical pitch of the actin network; the distance between turns of the actin helix decreases from l = 248 nm to l = 159 nm. This implies a net change in the number of turns the helix makes per 1 µm, from 4 to 6 µm-1.
  
  Author response image 6.
  
  SEM image showing the proximity between plasma membrane and mitochondria. Scale bar 100 nm.
  
  Additionally, a structural contraction can be observed in Figure 5D-F, where the radius of the helix decreases by about 50 nm. To clarify this point, we sought to measure the deposition of individual DsRed2 mitochondria using computational superresolution microscopy—FF-SRM (SRRF and MSSR), Structured Illumination Microscopy (SIM), or a combination of both (SIM + MSSR), in 2D. Author response image 7 shows that these three approaches allow the observation of individual DsRed mitochondria; however, the complexity of their 3D arrangement, combined with the limited space between mitochondria (as seen in Author response image 6), precludes a reliable estimation of mitochondrial organization within the midpiece. To overcome these challenges, we decided to study the midpiece architecture via SEM experiments on non-capacitated versus capacitated sperm stimulated with ionomycin to undergo the AE.
  
  Author response image 7.
  
  Organization of mitochondria observed via FF-SRM and SIM. Scale bar 2 µm. F.N: Fluorescence normalized. F: Frequency
  
  Author response image 8 presents a single-cell comparison of the midpiece architecture in noncapacitated (NC) and acrosome-intact (AI) versus acrosome-reacted (AR) sperm, along with measurements of the midpiece diameter throughout its length. Notably, the diameter of the midpiece increases from the base of the head to more distal regions, ranging from 0.45 nm to 1.10 µm (as shown in Author response images 7 and 8). A significant correlation between the diameter of the flagellum and its curvature was observed (Author response image 9), suggesting a reorganization of the midpiece due to shearing forces. This is further exemplified in Author response images 8 and 9, which provide individual examples of this phenomenon.
  
  Author response image 8.
  
  Comparison of the midpiece architecture in acrosome-intact and acrosome-reacted sperm using scanning electron microscopy (SEM).
  
  As expected, the overall diameter of the midpiece in AI sperm was larger than in AR sperm, with measurements of 0.731 ± 0.008 µm for AI and 0.694 ± 0.007 µm for AR (p = 0.013, Kruskal-Wallis test n > 100, N = 2), as shown in Author response image 10. Additionally, this Author response image 7 indicates that the reorganization of the midpiece architecture involves a change in the periodicity of the mitochondrial network, with frequencies shifting from fNC to fEA mitochondria per micron.
  
  Author response image 9.
  
  Comparison of the midpiece architecture in acrosome-intact (A) and acrosome-reacted (B) sperm using scanning electron microscopy (SEM).
  
  Collectively, the structural results presented in Figure 5 and Author response images 6 to 10 demonstrate that the AE involves a comprehensive reorganization of the midpiece, affecting its diameter, pitch, and the organization of both the actin and mitochondrial networks. All this information is now incorporated in the new version of the paper (Figure. 2F)
  
  Author response image 10.
  
  Quantification of the midpiece diameter of the sperm flagellum in acrosome-intact and acrosome-reacted sperm analyzed by scanning electron microscopy (SEM). Data is presented as mean ± SEM. Kruskal-Wallis test was employed, p = 0.013 (AI n=85 , AR n=72).
  
  (6) In the TG sperm used, the green fluorescence of the acrosome disappears when sperm die. Figure 1C should be analyzed only with live sperm by checking viability with propidium iodide or other means.
  
  Response P2.6: We concur with Reviewer 2 that ideally, any experiment conducted for this study should include an intrinsic cell viability test. However, the current research employs a wide array of multidimensional imaging techniques that are not always compatible with, or might be suboptimal for, simultaneous viability assessments. In agreement with the reviewer's concerns, it is recognized that the data presented in Figure 1C may inherently be biased due to cell death. Nonetheless, Author response image 1 demonstrates that the relationship between AE and cell death is more complex than a straightforward all-or-nothing scenario. Specifically, Author response image 1C illustrates a case where the plasma membrane is compromised (Sytox Blue+) yet maintains acrosomal integrity (EGFP+). This observation contradicts Reviewer 1's assertion that "the green fluorescence of the acrosome disappears when sperm die," as discussed more comprehensively in response P2.3.
  
  In light of these observations, we have meticulously revisited the entire manuscript to address and clarify potential biases in our results due to cell death. Consequently, Author response image 5 and its detailed description have been incorporated into the supplementary material of the manuscript to contribute to the transparency and reliability of our findings.
  
  Reviewer #3 (Public Review):
  
  (1) While progressive and also hyperactivated motility are required for sperm to reach the site of fertilization and to penetrate the oocyte's outer vestments, during fusion with the oocyte's plasma membrane it has been observed that sperm motility ceases. Identifying the underlying molecular mechanisms would provide novel insights into a crucial but mostly overlooked physiological change during the sperm's life cycle. In this publication, the authors aim to provide evidence that the helical actin structure surrounding the sperm mitochondria in the midpiece plays a role in regulating sperm motility, specifically the motility arrest during sperm fusion but also during earlier cessation of motility in a subpopulation of sperm post acrosomal exocytosis. The main observation the authors make is that in a subpopulation of sperm undergoing acrosomal exocytosis and sperm that fuse with the plasma membrane of the oocyte display a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix. The authors show the decrease in midpiece diameter via various microscopy techniques all based on membrane dyes, bright-field images and other orthogonal approaches like electron microscopy would confirm those observations if true but are missing. The lack of additional experimental evidence and the fact that the authors simultaneously observe an increase in membrane dye fluorescence suggests that the membrane dyes instead might be internalized and are now staining intracellular membranes, creating a false-positive result. The authors also propose that the midpiece diameter decrease is driven by changes in sperm intracellular Ca2+ and structural changes of the actin helix network. Important controls and additional experiments are needed to prove that the events observed by the authors are causally dependent and not simply a result of sperm cells dying.
  
  Response P3.1: We appreciate the reviewer's observations and critiques. In response, we have expanded our experimental approach to include alternative methodologies such as mathematical modeling and electron microscopy, alongside further fluorescence microscopy studies. This diversified approach aims to mitigate potential interpretation artifacts and substantiate the validity of our observations regarding the contraction of the sperm midpiece. Additionally, we have implemented further control experiments to fortify the credibility and robustness of our findings, ensuring a more comprehensive and reliable set of results.
  
  First, we acknowledge the concerns raised by Reviewer 2 regarding the interpretation of the magnitude of the observed contraction of the sperm flagellum's midpiece (see response P2.5). Specifically, we believe that the assertion that "... there is a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix" stated by reviewer 3 needs careful examination. We recognize that the fluorescence microscopy data provided might not conclusively support such a substantial shift. Our live cell imaging and superresolution microscopy experiments indicate that there is a significant decrease in the diameter of the sperm flagellum associated with AE. This is supported by colocalization experiments where FM4-64-stained structures (fluorescing upon binding to membranes) are observed moving closer to Sir-Actinlabeled structures (binding to F-actin). Quantitatively, Figure S5 describes the spatial shift between FM4-64 and Sir-Actin signals, narrowing from a range of 140-210 nm to 50-110 nm (considering the 2nd and 3rd quartiles of the distributions). The mean separation distance between both signals changes from 180 nm in AI cells to 70 nm in AR cells, a net shift of 110 nm. This observation suggests caution regarding the claim of a "200 nm shift of the plasma membrane towards the actin cortex."
  
  Moreover, the concerns raised by Reviewer #3 about the potential internalization of membrane dyes, which might create a false-positive result by staining intracellular membranes, offer an alternative mechanism to explain a shift of up to 100 nm. This perspective is also supported by the critique from Reviewer #2 regarding the substantial distance (about 100 nm) between the plasma membrane and mitochondria post-acrosome reaction: “The authors state that the distance between the plasma membrane and mitochondria approaches about 100 nm after the acrosome reaction (…), but this is a very long distance and large structural changes may occur in the midpiece”. These insights have prompted us to refine our methodology and interpretation of the data to ensure a more accurate representation of the underlying biological processes.
  
  Author response image 11 shows a first principles approach in two spatial dimensions to explore three scenarios where a membrane dye, such as FM4-64, stains structures at and within the midpiece of a sperm flagellum, but yet does not result in a net change of diameter. Author response image 11A-C illustrates three theoretical arrangements of fluorescent dyes: Model 1 features two rigid, parallel structures that mimic the plasma membrane surrounding the midpiece of the flagellum. Model 2 builds on Model 1 by incorporating the possibility of dye internalization into structures located near the membrane, suggesting a slightly more complex interaction with nearby membranous intracellular structures. Model 3 represents an extreme scenario where the fluorescent dyes stain both the plasma membrane and internal structures, such as mitochondrial membranes, indicating extensive dye penetration and binding. Author response image 11D-F displays the convolution of the theoretical fluorescent signals from Models 1 to 3 with the theoretical point spread function (PSF) of a fluorescent microscope, represented by a Gaussian-like PSF with a sigma of 19 pixels (approximately 300 nm). This process simulates how each model's fluorescence would manifest under microscopic observation, showing subtle differences in the spatial distribution of fluorescence among the models. Author response image 11G-I reveals the superresolution images obtained through Mean Shift Super Resolution (MSSR) processing of the models depicted in Author response image 11D-F.
  
  By analyzing the three scenarios, it becomes clear that the signals from Models 2 and 3 shift towards the center compared to Model 1, as depicted in Author response image 11J. This shift in fluorescence suggests that the internalization of the dye and its interaction with internal structures might significantly influence the perceived spatial distribution and intensity of fluorescence, thereby impacting the interpretation of structural changes within the midpiece. Consequently, the experimentally observed contraction of up to 100 nm in could represent an actual contraction of the sperm flagellum's midpiece, a relocalization of the FM4-64 membrane dyes to internal structures, or a combination of both scenarios.
  
  To discern between these possibilities, we implemented a scanning electron microscopy (SEM) approach. The findings presented in Figure 5 and Author response images 7 to 9 conclusively demonstrate that the AE involves a comprehensive reorganization of the midpiece. This reorganization affects its diameter, which changes by approximately 50 nm, as well as the pitch and the organization of both the actin and mitochondrial networks. This data corroborates the structural alterations observed and supports the validity of our interpretations regarding midpiece dynamics during the AE.
  
  Author response image 11.
  
  Modeling three scenarios of midpiece staining with membrane fluorescent dyes.
  
  Secondly, we wish to clarify that in some of our experiments, we have utilized changes in the intensity of FM4-64 fluorescence as an indirect measure of midpiece contraction. This approach is supported by a linear inverse correlation between these variables, as illustrated in Figure S2D. It is important to note that this observation is correlative and indirect; therefore, our data does not directly substantiate the claim that "in a subpopulation of sperm undergoing AE and sperm that fuse with the plasma membrane of the oocyte, there is a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix". Specifically, we have not directly measured the distance between the plasma membrane and actin cortex in experiments involving gamete fusion.
  
  All the concerns highlighted in this Response P1.1 have been addressed and incorporated into the manuscript. This addition aims to provide comprehensive insight into the experimental observations and methodologies used, ensuring that the data is transparent and accessible for thorough review and replication.
  
  Editor Comment:
  
  As the authors can see from the reviews, the reviewers had quite different degrees of enthusiasm, thus discussed extensively. The major points in consensus are summarized below and it is highly recommended that the authors consider their revisions.
  
  (1) Causality of midpiece contraction with motility arrest is not conclusively supported by the current evidence. Time-resolved imaging of FM4-64 and motility is needed and the working model needs to be revised with two scenarios - whether the sperm contracting indicates a fertilizing sperm or sperm to be degenerated.
  
  (2) The rationale for using FM4-64 as a plasma membrane marker is not clear as it is typically used as an endo-membrane marker, which is also related to the discrepancy of Fluo-4 signal diameter vs. FM4-64 (Figure 4E). The viability of sperm with increased FM4-64 needs to be demonstrated.
  
  (3) The mechanism of midpiece contraction in motility cessation along the whole flagellum is not discussed.
  
  (4) The use of an independent method to support the changes in midpiece diameter/structural changes such as DsRed (transgenic) or TEM.
  
  (5) The claim of Ca2+ change needs to be toned down.
  
  Response Editor: We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. We have addressed all these points in the current version. Briefly,
  
  (1) Time resolved images to show the correlation between FM4-64 fluorescence increase and the motility was incorporated
  
  (2) The rationale for using FM4-64 was added.
  
  (3) The mechanism of midpiece contraction was discussed in the paper
  
  (4) An independent method was included to support our conclusions (SEM and other markers not based on membrane dyes)
  
  (5) The results related to the calcium increase were toned down.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) To claim midpiece actin polymerization/re-organization is required for AE, demonstrating that AE does not occur in the presence of actin depolymerizing drugs (e.g., Latrunculin A, Cytochalasin D) would be necessary since the current data only shows the association/correlation. Was the block of AE by actin depolymerization observed?
  
  Response R1.1: We agree with the reviewer but unfortunately, since actin polymerization and or depolymerization in the head are important for exocytosis, we cannot use this experimental approach to dissect both events. Addition of these inhibitors block the occurrence of AE (PMID: 12604633).
  
  (2) Please provide the rationale for using FM4-64 to visualize the plasma membrane since it has been reported to selectively stain membranes of vacuolar organelles. What is the principle of increase of FM4-64 dye intensity, other than the correlation with midpiece contraction? For example, in lines 400-402: the authors mentioned that 'some acrosomereacted moving sperm within the perivitelline space had low FM4-64 fluorescence in the midpiece (Figure 6C). After 20 minutes, these sperm stopped moving and exhibited increased FM4-64 fluorescence, indicating midpiece contraction (Figure 6D).' While recognizing the increase of FM4-64 dye intensity can be an indicator of midpiece contraction, without knowing how and when the intensity of FM4-64 dye changes, it is hard to understand this observation. Please discuss.
  
  Response R1.2: FM4-64 is an amphiphilic styryl fluorescent dye that preferentially binds to the phospholipid components of cell membranes, embedding itself in the lipid bilayer where it interacts with phospholipid head groups. Due to its amphiphilic nature, FM dyes primarily anchor to the outer leaflet of the bilayer, which restricts their internalization. It has been demonstrated that FM4-64 enters cells through endocytic pathways, making these dyes valuable tools for studying endocytosis.
  
  Upon binding, FM4-64's fluorescence intensifies in a more hydrophobic environment that restricts molecular rotation, thus reducing non-radiative energy loss and enhancing fluorescence. These photophysical properties render FM dyes useful for observing membrane fusion events. When present in the extracellular medium, FM dyes rapidly reach a chemical equilibrium and label the plasma membrane in proportion to the availability of binding sites.
  
  In wound healing studies, for instance, the fluorescence of FM4-64 is known to increase at the wound site. This increase is attributed to the repair mechanisms that promote the fusion of intracellular membranes at the site of the wound, leading to a rise in FM4-64 fluorescence. Similarly, an increase in FM4-64 fluorescence has been reported in the heads of both human and mouse sperm, coinciding with AE. In this scenario, the fusion between the plasma membrane and the acrosomal vesicle provides additional binding sites for FM4-64, thus increasing the total fluorescence observed in the head. This dynamic response of FM4-64 makes it an excellent marker for studying these cellular processes in real-time.
  
  This study is the first to report an increase in FM4-64 fluorescence in the midpiece of the sperm flagellum. Figures 5 and Author response images 6 to 9 demonstrate that during the contraction of the sperm flagellum, structural rearrangements occur, including the compaction of the mitochondrial sheath and other membranous structures. Such contraction likely increases the local density of membrane lipids, thereby elevating the local concentration of FM4-64 and enhancing the probability of fluorescence emission. Additionally, changes in the microenvironment such as pH or ionic strength during contraction might further influence FM4-64’s fluorescence properties, as detailed by Smith et al. in the Journal of Membrane Biology (2010). The photophysical behavior of FM4-64, including changes in quantum yield due to tighter membrane packing or alterations in curvature or tension, may also contribute to the increased fluorescence observed. Notably, Figure S2 indicates that other fluorescent dyes like Memglow 700, Bodipy-GM, and FM1-43 also show a dramatic increase in their fluorescence during the midpiece contraction. Investigating whether the compaction of the plasma membrane or other mesoscale processes occur in the midpiece of the sperm flagellum could be a valuable area for future research. The use of fluorescent dyes such as LAURDAN or Nile Red might provide further insights into these membrane dynamics, offering a more comprehensive understanding of the biochemical and structural changes during sperm motility and gamete fusion events.
  
  (3) As the volume of the whole midpiece stays the same while the diameter decreases along the whole midpiece (midpiece contraction), the authors need to describe what changes in the midpiece length they observe during the contraction. Was the length of the midpiece during the contraction measured and compared before and after contraction?
  
  Response R1.3: As requested, we have measured the length of the midpiece in AI and AR sperm. As shown in Author response image 12 (For review purposes only), no statistically significant differences were observed.
  
  Author response image 12.
  
  Midpiece length measured by the length of mitochondrial DsRed2 fluorescence in EGFP-DsRed2 sperm. Measurements were done before (acrosome-intact) and after (acrosome-reacted) acrosome exocytosis and midpiece contraction. Data is presented as the mean ± sem of 14 cells induced by 10 µM ionomycin. Paired t-test was performed, resulting in no statistical significance.
  
  (4) Most of all, it is not clear what the midpiece, thus mitochondria, contraction means in terms of sperm bioenergetics and motility cessation. Would the contraction induce mitochondrial depolarization or hyperpolarization, increase or decrease of ATP production/consumption? It will be great if this point is discussed. For example, an increase in mitochondrial Ca2+ is a good indicator of mitochondrial activity (ATP production).
  
  Response R1.4: That is an excellent point. We have discussed this idea in the discussion (line 620-624). We are currently exploring this idea using different approaches because we also think that these changes in the midpiece may have an impact in the function of the mitochondria and perhaps, in their fate once they are incorporated in the egg after fertilization.
  
  (5) The authors claimed that Ca2+ signal propagates from head to tail, which is the opposite of the previous study (PMID: 17554080). Please clarify if it is a speculation. Otherwise, please support this claim with direct experimental evidence (e.g., high-speed calcium imaging of single cells).
  
  Response R1.5: In that study, it was claimed that a [Ca2+]i increase that propagates from the tail to the head occurs when CatSper is stimulated. They did not evaluate the occurrence of AE when monitoring calcium.
  
  Our data is in agreement with our previous results (PMID: 26819478) that consistently indicated that only the[Ca2+]i rise originating in the sperm head is able to promote AE.
  
  (6) Figure 4E: Please explain how come Fluo4 signal diameter can be smaller than FM4-64 dye if it stains plasma membrane (at 4' and 7').
  
  Response R1.6: When colocalizing a diffraction-limited image (Fluo4) with a super-resolution image (FM4-64), discrepancies in signal sizes and locations can become apparent due to differences in resolution. The Fluo4 signal, being diffraction-limited, adheres to a resolution limit of approximately 200-300 nanometers under conventional light microscopy. This limitation causes the fluorescence signal to appear broader and less defined. Conversely, super-resolution microscopy techniques, such as SRRF (Super-Resolution Radial Fluctuations), achieve resolutions down to tens of nanometers, allowing FM4-64 to reveal finer details at the plasma membrane and display potentially smaller apparent sizes of stained structures. Although both dyes might localize to the same cellular regions, the higher resolution of the FM4-64 image allows it to show a more precise and smaller diameter of the midpiece of the flagellum compared to the broader, less defined signal of Fluo4. To address this, the legend of Figure 4E has been slightly modified to clarify that the FM4-64 image possesses greater resolution.
  
  (7) Figure 5D-G: the midpiece diameter of AR intact cells was shown ~ 0.8 um or more in Figure 2, while now the radius in Figure 5 is only 300 nm. Since the diameter of the whole midpiece is nearly uniform when the acrosome is intact, clarify how and what brings this difference and where the diameter/radius measurement is done in each figure.
  
  Response R1.7: The difference resides in what is being measured. In Figure 2, the total diameter of the cell is measured, through the maximum peaks of FM4-64 fluorescence which is a probe against plasma membrane. As for Figure 5, the radius shown makes reference to the radius of the actin double helix within the midpiece. To that end, cells were fixed and stained with phalloidin, a F-actin probe.
  
  Minor points
  
  (8) Figure S1 title needs to be changed. The "Midpiece contraction" concept is not introduced when Figure S1 is referred to.
  
  Response R1.8: This was corrected in the new version.
  
  (9) Reference #19: the authors are duplicated.
  
  Response R1.9: This was corrected in the new version.
  
  (10) Line 315-318: sperm undergoing contraction -> sperm undergoing AR/AE?
  
  Response R1.10: This was corrected in the new version.
  
  (11) Line 3632 -> punctuation missing.
  
  Response R1.11: Modified as requested.
  
  (12) Movie S7: please add an arrow to indicate the spermatozoon of interest.
  
  Response R1.12: The arrow was added as suggested.
  
  (13) Line 515: One result of this study was that the sperm flagellum folds back during fusion coincident with the decrease in the midpiece diameter. The authors did not provide an explanation for this observation. Please speculate the function of this folding for the fertilization process.
  
  Response R1.13: As requested, this is now incorporated in the discussion. We speculate that the folding of the flagellum during fusion further facilitates sperm immobilization because it makes it more difficult for the flagellum to beat. Such processes can enhance stability and increase the probability of fusion success. Mechanistically, the folding may occur as a consequence of the deformation-induced stress that develops during the decrease of midpiece diameter.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Figure 2C, D, E. Does "-1" on the X-axis mean one minute before induction? If so, the diameter is already smaller and FM4-64 fluorescence intensity is higher before the induction in the spontaneous group. Does the acrosome reaction already occur at "-1" in this group?
  
  Response R2.1: Yes, “-1” means that the measurements of the diameter/FM4-64 fluorescence was done one minute before the induction. And it is correct that the diameter is smaller and FM464 fluorescence higher in the spontaneous group because these sperm underwent acrosome exocytosis before the induction, that is, spontaneously.
  
  (2) Figure 3D. Purple dots are not shown in the graph on the right side.
  
  Response R2.2: Modified as requested.
  
  (3) Lines 404-406. "These results suggest that midpiece contraction and motility cessation occur only after acrosome-reacted sperm penetrate the zona pellucida". Since midpiece contraction and motility cessation also occur before the passage through the zona pellucida (Figure 9B), "only" should be deleted.
  
  Response R2.3: Modified as requested.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Do the authors have a hypothesis as to why the observed decrease in midpiece parameter results in cessation of sperm motility? It would be beneficial for the manuscript to include a paragraph about potential mechanisms in the discussion.
  
  Response R3.1: As requested, a potential mechanism has been proposed in the discussion section (line 644-656).
  
  (2) Since the authors propose in Gervasi et al. 2018 that the actin helix might be responsible for the integrity of the mitochondrial sheath and the localization of the mitochondria, is it possible that the proposed change in plasma membrane diameter and actin helix remodeling for example alters the localization of the mitochondria? TEM should be able to reveal any associated structural changes. In its current state, the manuscript lacks experimental evidence supporting the author's claim that the "helical actin structure plays a role in the final stages of motility regulation". The authors should either include additional evidence supporting their hypothesis or tone down their conclusions in the introduction and discussion.
  
  Response R3.3: We agree with the reviewer. This is an excellent point. As suggested by this reviewer as well as the other reviewers, we have performed SEM to observe the changes in the midpiece observed after its contraction for two main reasons. First, to confirm this observation using a different approach that does not involve the use of membrane dyes. As shown in Author response image 6-10, we have observed that in addition to the midpiece diameter, there is a reorganization of the mitochondria sheet that is also suggested by the SIM experiments. These observations will be explored with more experiments to confirm the structural and functional changes that mitochondria undergo during the contraction. We are currently investigating this phenomenon, These results are now included in the new Figure 2F.
  
  (3) In line 134: The authors write: 'Some of the acrosome reacted sperm moved normally, whereas the majority remained immotile". Do the authors mean that a proportion of the sperm was motile prior to acrosomal exocytosis and became immotile after, or were the sperm immotile to begin with? Please clarify.
  
  Response R3.4: This statement is based on the quantification of the motile sperm after induction of AE within the AR population (Fig. 1C).
  
  (4) The authors do not provide any experimental evidence supporting the first scenario. In video 1 a lot of sperm do not seem to be moving to begin with, only a few sperm show clear beating in and out of the focal plane. The highlighted sperm that acrosome-reacted upon exposure to progesterone don't seem to be moving prior to the addition of progesterone. In contrast, the sperm that spontaneously acrosome react move the whole time. In video 1 this reviewer was not able to identify one sperm that stopped moving upon acrosomal exocytosis. Similarly in video 3, although the resolution of the video makes it difficult to distinguish motile from non-motile sperm. In video 2 the authors only show sperm that are already acrosome reacted. Please explain and provide additional evidence and statistical analysis supporting that sperm stop moving upon acrosomal exocytosis.
  
  Response R3.5: In videos 1 and 3, the cells are attached to the glass with concanavalin-A, this lectin makes sperm immotile (if well attached) because both the head and tail stick to the glass. The observed motility of sperm in these videos is likely due to them not being properly attached to the glass, which is completely normal. On the contrary, in videos 2 and 4, sperm are attached to the glass with laminin. This is a glycoprotein that only binds the sperm to the glass through its head, that is why they move freely.
  
  (5) Could the authors provide additional information about the FM4-64 fluorescent dye?
  
  What is the mechanism, and how does it visualize structural changes at the flagellum? Since the whole head lights up, does that mean that the dye is internalized and now stains additional membranes, similar to during wound healing assays (PMID 20442251, 33667528). Or is that an imaging artifact? How do the authors explain the correlation between FM4-64 fluorescence increase in the midpiece and the observed change in diameter? Does FM4-64 have solvatochromatic properties?
  
  Response R3.6: We appreciate the insightful queries posed by Reviewer 3, which echo the concerns initially brought forward by Reviewer 1. For a detailed explanation of the mechanism of FM4-64 dye, how we interpret it, visualizes structural changes in the flagellum, and its behavior during cellular processes, please refer to our detailed response in Response R1.2. In brief, FM464 is a lipophilic styryl dye that preferentially binds to the outer leaflets of cellular membranes due to its amphiphilic nature. Upon binding, the dye becomes fluorescent, allowing for the visualization of membrane dynamics. The increase in fluorescence in the sperm head or midpiece likely results from the dye’s accumulation in areas where membrane restructuring occurs, such as during AE or in response to changes in the flagellum structure.
  
  Regarding the specific questions about internalization and whether FM4-64 stains additional membranes similarly to what is observed in wound healing assays, it's important to note that FM4-64 can indeed be internalized through endocytosis and subsequently label internal vesicular structures. Additionally, FM4-64 may experience changes in its fluorescence as a result of fusion events that increase the lipid content of the plasma membrane, as observed in studies cited (PMID 20442251, 33667528). This characteristic makes FM4-64 valuable not only for outlining cell membranes but also for tracking the dynamics of both internal and external membrane systems, particularly during cellular events that involve significant membrane remodeling, such as wound healing or AE.
  
  Concerning whether the increased fluorescence and observed changes in diameter are artifacts or reflect real biological processes, the correlation observed likely indicates actual changes in the midpiece architecture through molecular mechanisms that remain to be further elucidated. The data presented in Figures 5 and Author response images 6-10 support that this increase in fluorescence is not merely an artifact but a feature of how FM4-64 interacts with its environment.
  
  Finally, regarding the solvatochromatic properties of FM4-64, while the dye does show changes in its fluorescence intensity in different environments, its solvatochromatic properties are generally less pronounced than those of dyes specifically designed to be solvatochromatic. FM464's fluorescence changes are more a result of membrane interaction dynamics and dye concentration than of solvatochromatic shifts.
  
  (6) For the experiment summarized in Figure S1, did the authors detect sperm that acrosome-reacted upon exposure to progesterone and kept moving? This reviewer is wondering how the authors reliably measure FM4-64 fluorescence if the flagellum moves in and out of the focal plane. If the authors observe sperm that keep moving, what was the percentage within a sperm population and how did FM4-64 fluorescence change?
  
  Response R3.6: We did identify sperm that underwent acrosome reaction upon exposure to progesterone and continued to exhibit movement. However, due to the issue raised by the reviewer regarding the flagellum going out of focus, we opted to quantify the percentage of sperm that were adhered to the slide (using laminin). This approach allows for the observation of flagellar position over time, facilitating an easy assessment of fluorescence changes. The percentage of sperm that maintained movement after AE is depicted in Figure 1C.
  
  (7) In Figure S1B it doesn't look like the same sperm is shown in all channels or time points, the hook shown in the EGFP channel is not always pointing in the same direction. If FM4-64 is staining the plasma membrane, how do the authors explain that the flagellum seems to be more narrow in the FM4-64 channel than in the brightfield and DsRed2 channel?
  
  Response 3.7: It is the same sperm, but due to technical limitations images were sequentially acquired. For example, for time 5 minutes after progesterone, all images in DIC were taken, then all images in the EGFP channel, then DsRed2* and finally FM4-64. The reason for this was to acquire images as fast as possible, particularly in DIC images which were then processed to get the beat frequency.
  
  Regarding the flagellum that seems to be more narrow in the FM4-64 channel compared to the BF or DsRed2 channel, the explanation is related to the fact that intensity of the DsRed2 signal is stronger than the other two. This higher signal may have increased the amount of photons captured by the detector.
  
  (8) Overall, it would be beneficial to include statistics on how many sperm within a population did change FM4-64 fluorescence during AE and how many did not, in addition to information about motility changes and viability. Did the authors exclude that the addition of FM4-64 causes cell death which could result in immotile sperm or that only dying sperm show an increase in FM4-64 fluorescence?
  
  Response 3.8: The relationship between cell death and the increase in FM4-64 fluorescence is widely discussed in Response P2.3. In our experiments, we always considered sperm that were motile to hypothesize about the relevance of this observation. We have two types of experiments:
  
  (1) Sperm-egg Fusion: In experiments where sperm and eggs were imaged to observe their fusion, sperm were initially moving and after fusion, the midpiece contraction (increase in FM4-64 fluorescence was observed) indicating that the change in the midpiece (that was observed consistently in all fusing cells analyzed), is part of the process.
  
  (2) Sperm that underwent AE: we have observed two behaviours as shown in Figure 1:
  
  a) Sperm that underwent AE and they remain motile without midpiece contraction (they are alive for sure);
  
  b) Sperm that underwent AE and stopped moving with an increase in FM464 fluorescence. We propose that this contraction during AE is not desired because it will impede sperm from moving forward to the fertilization site when they are in the female reproductive tract. In this case, we acknowledge that the cessation of sperm motility may be attributed to cellular death, potentially correlating with the increased FM4-64 signal observed in the midpiece of immotile sperm that have undergone AE. To address this hypothesis, we conducted image-based flow cytometry experiments, which are well-suited for assessing cellular heterogeneity within large populations.
  
  Regarding the relationship between the increase in FM4-64 and AE, we have always observed that AE is followed by an increase in FM4-64 in the head in mice (PMID: 26819478) as well as in human (PMID: 25100708) sperm. This was originally corroborated with the EGFP sperm. However, not all the cells that undergo AE increase the FM4-64 fluorescence in the midpiece.
  
  (9) The authors report that a fraction of sperm undergoes AE without a change in FM4-64 fluorescence (Figure 1F). How does the [Ca2+]i change in those cells? Again statistics on the distribution of a certain pattern within a population in addition to showing individual examples would be very helpful.
  
  Response 3.9: A recent work shows that an initial increase in [Ca2+]i is required to induce changes in flagellar beating necessary for hyperactivation (Sánchez-Cárdenas et al., 2018). However, when [Ca2+]i increases beyond a certain threshold, flagellar motility ceases. These conclusions are based on single-cell experiments in murine sperm with different concentrations of the Ca2+ ionophore, A23187. The authors reported that complete loss of motility was observed when using ionophore concentrations higher than 1 μM. In contrast, spermatozoa incubated with 0.5 μM A23187 remained motile throughout the experiment. Once the Ca2+ ionophore is removed, the sperm would reduce the concentration of this ion to levels compatible with motility and hyperactivation (Navarrete et al., 2016). However, some of the washed cells did not recover mobility in the recorded time window (Sánchez-Cárdenas et al., 2018). These results would indicate that due to the increase in [Ca2+]i induced by the ionophore, irreversible changes occurred in the sperm flagellum that prevented recovery of mobility, even when the ionophore was not present in the recording medium.
  
  Taking into account our results, one possible scenario to explain this irreversible change would be the contraction of the midpiece. Our results demonstrate that the increase in [Ca2+]i observed in the midpiece (whether by induction with progesterone, ionomycin or occurring spontaneously) causes the contraction of this section of the flagellum and its subsequent immobilization.
  
  (10) While the authors results show that changes in [Ca2+]i correlate with the observed reduction of the midpiece diameter, they do not provide evidence that the structural changes are triggered by Ca2+i influx. It could just be a coincidence that both events spatially overlap and that they temporarily follow each other. The authors should either provide additional evidence or tone down their conclusion.
  
  Response 3.10: We agree with the reviewer. As suggested, we have toned down our conclusion.
  
  (11) Are the authors able to detect the changes in the midpiece diameter independent from FM4-64 or other plasma membrane dyes? An alternative explanation could be that the dyes are internalized due to cell death and instead of staining the plasma membrane they are now staining intracellular membranes, resulting in increased fluorescence and giving the illusion that the midpiece diameter decreased. How do the authors explain that the Bodipy-GM1 Signal directly overlaps with DsRed2 and SIR-actin, shouldn't there be some gap? Since the rest of the manuscript is based on that proposed decrease in midpiece diameter the authors should perform orthogonal experiments to confirm their observation.
  
  Response 3.11: As requested by the reviewer, we have not used new methods to visualize the change in sperm diameter in the midpiece. In neither of them, a membrane dye was used. First, we have performed immunofluorescence to detect a membrane protein (GLUT3). Second, we have used scanning electron microscopy. The results are now incorporated in the new Figure 2FG. In both experiments, a change in the midpiece diameter was observed. Please, also visit responses P2.5 and Author response images 8 to 10.
  
  Regarding the overlap between the signal of Bodipy GM1 (membrane) and the fluorescence of DsRed2 (mitochondria) and Sir-Actin (F-actin), it is only observed in acrosomereacted sperm, not in acrosome-intact sperm (Figure S4). In our view, these structures become closed after midpiece contraction, and the resolution of the images is insufficient to distinguish them clearly. This issue is also evident in Figure 5B. Therefore, we conducted additional experiments using more powerful super-resolution techniques such as STORM (Figures 5D-F).
  
  (12) The proposed gap of 200 nM between the actin helix and the plasma membrane, has been observed by TEM? Considering that the diameter of the mouse sperm midpiece is about 1 um, that is a lot of empty space which leaves only about 600 nm for the rest of the flagellum. The axoneme is 300 nm and there needs to be room for the ODFs and the mitochondria. Please explain.
  
  Response 3.12: Unfortunately, the filament of polymerized actin cannot be observed by TEM. Furthermore, we were discouraged from trying other approaches, such as utilizing phalloidin gold, because for some reason, it does not work properly.
  
  In our view, the 200 nm gap between the actin cytoskeleton and the plasma membrane is occupied by the mitochondria (that is the size that it is frequently reported based on TEM; see https://doi.org/10.1172/jci.insight.166869).
  
  (13) The results provided by the authors do not convince this reviewer that the actin helix moves, either closer to the plasma membrane or toward the mitochondria, the observed differences are minor and not confirmed by statistical analysis.
  
  Response 3.13: As requested, the title of that section was changed. Moreover, our conclusion is exactly as the reviewer is suggesting: “Since the results of the analysis of SiR-actin slopes were not conclusive, we studied the actin cytoskeleton structure in more detail”. This conclusion is based on the statistical analysis shown in Figure S5D-E.
  
  (14) The fluorescence intensity of all plasma membrane dyes increases in all cells chosen by the authors for further analysis. Could the increase in SiR-Actin fluorescence be explained by a microscopy artifact instead of actin helix remodeling? Alternatively, can the authors exclude that the observed increase in SIR-Actin might be an artifact caused by the increase in FM4-64 fluorescence? Since the brightness in the head similarly increases to the fluorescence in the flagellum the staining pattern looks suspiciously similar. Did the authors perform single-stain controls?
  
  Response 3.14: We had similar concerns when we were doing the experiments using SiR-actin. Although we have performed single stain controls to make sure that the actin helix remodelling occurs during the midpiece contraction, we have performed experiments using higher resolution techniques such as STORM using a different probe to stain actin (Phalloidin).
  
  (15) Should actin cytoskeleton remodeling indeed result in a decrease of actin helix diameter, what do the authors propose is the underlying mechanism? Shouldn't that result in changes in mitochondrial structure or location and be visible by TEM? This reviewer is also wondering why the authors focus so much on the actin helix, while the plasma membrane based on the author's results is moving way more dramatically.
  
  Response 3.15: This raises an intriguing point. Currently, we lack an understanding of the underlying mechanism driving actin remodeling, and we are eager to conduct further experiments to explore this aspect. For instance, we are investigating the potential role of Cofilin in remodeling the F-actin network. Initial experiments utilizing STORM imaging have revealed the localization of Cofilin in the midpiece region, where the actin helix is situated.
  
  Regarding mitochondria, thus far, we have not uncovered any evidence suggesting that acrosome reaction or fusion with the egg induces a rearrangement of these organelles within the structure. The rationale for investigating polymerized actin in depth stems from the fact that, alongside the axoneme and other flagellar structures such as the outer dense fibers and fibrous sheet, these are the sole cytoskeletal components present in that particular tail region.
  
  (14) The fact that the authors observe that most sperm passing through the zona pellucida, which requires motility, display high FM4-64 fluorescence, doesn't that contradict the authors' hypothesis that midpiece contraction and motility cessation are connected? Videos confirming sperm motility and information about pattern distribution within the observed sperm population in the perivitelline space should be provided.
  
  Response 3.14: We believe it is a matter of time, as depicted in Figure 1D, our model shows that first the cells lose the acrosome, present motility and low FM4-64 fluorescence in the midpiece (pattern II) and after that, they lose motility and increase FM4-64 fluorescence in the midpiece (pattern III). That is why, we think that when sperm pass the zona pellucida they present pattern II and after some time they evolve into pattern III.
  
  (15) In the experiments summarized in Figure 8, did all sperm stop moving? Considering that 74 % of the observed sperm did not display midpiece contraction upon fusion, again doesn't that contradict the authors' hypothesis that the two events are interdependent? Similarly, in earlier experiments, not all acrosome-reacted sperm display a decrease in midpiece diameter or stop moving, questioning the significance of the event. If some sperm display a decrease in midpiece diameter and some don't, or undergo that change earlier or later, what is the underlying mechanism of regulation? The observed events could similarly be explained by sperm death: Sperm are dying × plasma membrane integrity changes and plasma membrane dyes get internalized × [Ca2+]i simultaneously increases due to cell death × sperm stop moving.
  
  Response 3.15: The percentage of sperm that did not exhibit midpiece contraction in Fig.8B is 26%, not 74%, indicating that it does not contradict our hypothesis. However, this still represents a significant portion of sperm that remain unchanged in the midpiece, leaving room for various explanations. For instance, it's possible that: i) the change in fluorescence was not detected due to the event occurring after the recording concluded, or ii) in some instances, this alteration simply does not occur. Nevertheless, we did not track subsequent events in the oocyte, such as egg activation, to definitively ascertain the success of fusion. Incorporation of the dye only manifests the initiation of the process.
  
  (16) The authors propose changes in Ca2+ as one potential mechanism to regulate midpiece contraction, however, the Ca2+ measurements during fusion are flawed, as the authors write in the discussion, by potential Ca2+ fluorophore dilution. Considering that the authors observe high Ca2+ in all sperm prior to fusion, could that be a measuring artifact? Were acrosome-intact sperm imaged with the same settings to confirm that sperm with low and high Ca2+ can be distinguished? Should [Ca2+]i changes indeed be involved in the regulation of motility cessation during fusion, could the authors speculate on how [Ca2+]i changes can simultaneously be involved in the regulation of sperm hyperactivation?
  
  Response 3.16: We agree with the reviewer that our experiments using calcium probes are not conclusive for many technical problems. We have toned down our conclusions in the new version of the manuscript.
  
  (17) 74: AE takes place for most cells in the upper segment of the oviduct, not all of them.
  
  Please correct.
  
  Response 3.17: Corrected in the new version.
  
  (18) 88: Achieved through, or achieved by, please correct.
  
  Response 3.18: Corrected in the new version.
  
  (19) 243: Acrosomal exocytosis initiation by progesterone, please specify.
  
  Response 3.19: Modified in the new version.
  
  (20) 277: "The actin cytoskeleton approaches the plasma membrane during the contraction of the midpiece" is misleading. The author's results show the opposite.
  
  Response 3.20: As suggested, this statement was modified.
  
  (21) 298: Why do the authors find it surprising that the F-actin network was unchanged in acrosome-intact sperm that do not present a change in midpiece diameter?
  
  Response 3.21: The reviewer is right. The sentence was modified.
  
  (22) Figures 5D,F: The provided images do not support a shift in the actin helix diameter.
  
  Response 3.22: The shift in the actin helix diameter is provided in Figure 5E and 5G.
  
  (23) Figure S5C: The authors should show representative histograms of spontaneously-, progesterone induced-, and ionomycin-induced AE. Based on the quantification the SiRactin peaks don't seem to move when the AR is induced by progesterone.
  
  Response 3.23: As requested, an ionomycin induced sperm is incorporated.
  
  (24) 392: Which experimental evidence supports that statement?
  
  Response 3.24: A reference was incorporated.
  
  Reference 13 is published, please update. Response 3.25: updated as requested.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.22.546073v4
www.biorxiv.org www.biorxiv.org

New submission 20/07/2023, 09:16:18

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  First, the authors would like to thank the reviewers and editors for their thoughtful comments. The comments were used to guide our revision, which is substantially improved over our initial submission. We have addressed all comments in our responses below, through a combination of clarification, new analyses and new experimental data.
  
  Reviewer #1 (Public Review):
  
  In this manuscript, the authors identified and characterized the five C-terminus repeats and a 14aa acidic tail of the mouse Dux protein. They found that repeat 3&5, but not other repeats, contribute to transcriptional activation when combined with the 14aa tail. Importantly, they were able to narrow done to a 6 aa region that can distinguish "active" repeats from "inactive" repeats. Using proximal labeling proteomics, the authors identified candidate proteins that are implicated in Dux-mediated gene activation. They were able to showcase that the C-terminal repeat 3 binds to some proteins, including Smarcc1, a component of SWI/SNF (BAF) complex. In addition, by overexpressing different Dux variants, the authors characterized how repeats in different combinations, with or without the 14aa tail, contribute to Dux binding, H3K9ac, chromatin accessibility, and transcription. In general, the data is of high quality and convincing. The identification of the functionally important two C-terminal repeats and the 6 aa tail is enlightening. The work shined light on the mechanism of DUX function.
  
  A few major comments that the authors may want to address to further improve the work:
  
  We thank the reviewer for their efforts and constructive comments, which have guided our revisions.
  
  1) The summary table for the Dux domain construct characteristics in Fig. 6a could be more accurate. For example, C3+14 clearly showed moderate weaker Dux binding and H3K9ac enrichment in Fig 3c and 3e. However, this is not illustrated in Fig. 6a. The authors may consider applying statistical tests to more precisely determine how the different Dux constructs contribute to DNA binding (Fig. 3c), H3K9ac enrichment (Fig. 3e), Smarcc1 binding (Fig. 5e), and ATAC-seq signal (Fig. 5f).
  
  We thank the reviewer for this comment, and agree that there were some modest differences in construct characteristics that were not captured in the Summary Table (6a). To better reflect the differences between constructs, we added additional dynamic range to our depiction/scoring, and believe that the new scoring system provides sufficient qualitative range to capture the difference without imposing a statistical approach.
  
  2) Another concern is that exogenous overexpressed Dux was used throughout the experiments. The authors may consider validating some of the protein-protein interactions using spontaneous or induced 2CLCs (where Dux is expressed).
  
  We agree that it would be helpful to determine endogenous DUX interaction with our BioID candidates. Here, we attempted co-IPs for endogenous DUX protein with the DUX antibody and were unsuccessful, which indicated that the DUX antibody is useful for detection but not efficient in the primary IP. This is why we utilized the mCherry tag for DUX IP experiments, which worked exceptionally well.
  
  3) It could be technically challenging, but the authors may consider to validate Dux and Smarcc1 interaction in a biologically more relevant context such as mouse 2-cell embryos where both proteins are expressed. Whether Smarcc1 binding will be dramatically reduced at 4-cell embryos due to loss of Dux expression?
  
  While we agree that it would be interesting to validate the in vivo interaction of DUX and SMARCC1 in the early embryo, it is not technically feasible for us to conduct the experiment, as the IP would require thousands of two-cell embryos, and we have the issue of poor co-IP quality with the DUX antibody.
  
  Reviewer #2 (Public Review):
  
  In this manuscript, Smith et al. delineated novel mechanistic insights into the structure-function relationships of the C-terminal repeat domains within the mouse DUX protein. Specifically, they identified and characterised the transcriptionally active repeat domains, and narrowed down to a critical 6aa region that is required for interacting with key transcription and chromatin regulators. The authors further showed how the DUX active repeats collaborate with the C-terminal acidic tail to facilitate chromatin opening and transcriptional activation at DUX genomic targets.
  
  Although this study attempts to provide mechanistic insights into how DUX4 works, the authors will need to perform a number of additional experiments and controls to bolster their claims, as well as provide detailed analyses and clarifications.
  
  We thank this reviewer for their constructive comments, and have conducted several new analyses, additional experiments and clarifications – which have strengthened the manuscript in several locations. Highlights include a statistical approach to the similarity of mouse repeats to themselves and to orthologs (Figure S1d) and clarified interpretations, a wider dynamic range to better reflect changes in DUX construct behaviors (Figure 6a), and additional data on construct behavior, including ‘inactive’ constructs (e.g C1+14aa in Figure 1a,d, new ATAC-seq in Figure S1g), and active constructs such as C3+C5+14aa and C3+C514aa (in Figure S1b).
  
  Reviewer #3 (Public Review):
  
  Dux (or DUX4 in human) is a master transcription factor regulating early embryonic gene activation and has garnered much attention also for its involvement in reprogramming pluripotent embryonic stem cells to totipotent "2C-like" cells. The presented work starts with the recognition that DUX contains five conserved c. 100-amino acid carboxy-terminal repeats (called C1-C5) in the murine protein but not in that of other mammals (e.g. human DUX4). Using state-of-the-art techniques and cell models (BioID, Cut&Tag; rescue experiments and functional reporter assays in ESCs), the authors dissect the activity of each repeat, concluding that repeats C3 and C5 possess the strongest transactivation potential in synergy with a short C-terminal 14 AA acidic motif. In agreement with these findings, the authors find that full-length and active (C3) repeat containing Dux leads to increased chromatin accessibility and active histone mark (H3K9Ac) signals at genomic Dux binding sites. A further significant conclusion of this mutational analysis is the proposal that the weakly activating repeats C2 and C4 may function as attenuators of C3+C5-driven activity.
  
  By next pulling down and identifying proteins bound to Dux (or its repeat-deleted derivatives) using BioID-LC/MS/MS, the authors find a significant number of interactors, notably chromatin remodellers (SMARCC1), a histone chaperone (CHAF1A/p150) and transcription factors previously (ZSCAN4D) implicated in embryonic gene activation.
  
  The experiments are of high quality, with appropriate controls, thus providing a rich compendium of Dux interactors for future study. Indeed, a number of these (SMARCC1, SMCHD1, ZSCAN4) make biological sense, both for embryonic genome activation and for FSHD (SMCHD1).
  
  A critical question raised by this study, however, concerns the function of the Dux repeats, apparently unique to mice. While it is possible, as the authors propose, that the weak activating C1, C2 C4 repeats may exert an attenuating function on activation (and thus may have been selected for under an "adaptationist" paradigm), it is also possible that they are simply the result of Jacobian evolutionary bricolage (tinkering) that happens to work in mice. The finding that Dux itself is not essential, in fact appears to be redundant (or cooperates with) the OBOX4 factor, in addition to the absence of these repeats in the DUX protein of all other mammals (as pointed out by the authors), might indeed argue for the second, perhaps less attractive possibility.
  
  In summary, while the present work provides a valuable resource for future study of Dux and its interactors, it fails, however, to tell a compelling story that could link the obtained data together.
  
  We appreciated the reviewer’s views regarding the high quality of the work and our generation of an important dataset of DUX interactors. We also appreciate the comments provided to improve the work, and have performed and included in the revised version a set of clarifications, additional analyses and additional experiments that have served to reinforce our main points and provide additional mechanistic links. We also agree that more remains to be done to understand the function and evolution of repeats C1, C2 and C4.
  
  Reviewer #1 (Recommendations For The Authors):
  
  1) For immuno-blots, authors may indicate the expected bands to help readers better understand the results.
  
  Agreed, and we have included the predicted molecular weight of proteins in the Figure Legends. We note that our work shows that the C-terminal domains confer anomalous migration in SDS-PAGE.
  
  2) Fig. 5b, a blot missing for the mCherry group?
  
  Figure 5b is a volcano blot, so we believe the reviewer is referring to Figure 5d, which is a coimmunoprecipitation experiment between SMARCC1 and mCherry-tagged DUX constructs. However, we are unsure of the comment as an anti mCherry sample is present in that panel.
  
  3) Line 99-100, Fig. S1d, it seems that repeat2, but not repeat3, is more similar to human DUX4 C-terminal region.
  
  This comment and one by another reviewer have prompted us to re-examine the similarities of the DUX repeats, and we have new analyses (Figure S1d) and an alternative framing in the manuscript as a result. We have expanded on this in our response to Reviewer #2, point #1 – and direct the reviewer there for our expanded treatment.
  
  4) There are a few references are misplaced. For example, line 48, the studies that reported the role of Dux in inducing 2CLCs should be from Hendrickson et al., 2017, De Iaco et al., 2017, and Whiddon et al., 2017. The authors may want to double check all references.
  
  Thanks for pointing these out. These issues have been corrected in the manuscript.
  
  5) In the materials & methods section, a few potential errors are noticed. For example, concentrations of PD0325901 and CHIR99021 in mESC medium appear ~1000-fold higher than standards.
  
  Thanks – corrected.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Major Points
  
  1) Line 99 - The authors claimed that the "human DUX4 C-terminal region is most similar to the 3rd repeat of mouse DUX", but based on Supp. Fig. 1d, the human DUX4 C-term should be most similar to the 2nd repeat of mouse DUX. If this is indeed the case, it will undermine the rest of this study, since the authors claim that the 3rd repeat is transcriptionally active, whereas the 2nd repeat is transcriptionally inactive, and the bulk of this study largely focused on how the active repeats, not the inactive repeats, are critical in recruiting key transcriptional and chromatin regulators to induce the embryonic gene expression program.
  
  We thank the reviewer for their comments here. Since submission,and as mentioned above for reviewer #1 we have revisited the issue of similarity of the DUX4 C-terminal region to the mouse C-terminal repeats, with a BLAST-based approach that is more rigorous and informed by statistics – which is in Author response table 1 and now in the manuscript as Figure S1d, and has affected our interpretation. Our prior work involved a simple % identity comparison table and we now appreciate that some of the similarity analyses did not meet statistical significance, and therefore we are unable to draw certain conclusions. We make the appropriate modifications in the text. For example, we no longer state that the DUX4 C-terminus appears to be most similar to mouse repeats 3 and 5. This does not affect the main conclusions of the paper regarding interactions of the C-terminus with chromatin-related proteins, only our speculation on which repeat might have represented the original single repeat in the mouse – an issue we think of some interest, but did not rise to the level of mentioning in the original or current abstract.
  
  Author response table 1.
  
  Parameters: PAM250 matrix. Gap costs of existence: 15 and extension: 3. Numbers represent e-value of each pairwise comparison
  
  *No significant similarities found (>0.05).
  
  2) In Supp Fig 1d, it seems that the rat DUX4 C-terminal region is most similar to the 4th repeat of mouse DUX, which according to the author is supposedly transcriptionally inactive. This weakens the authors justification that the 3rd or 5th repeat is likely the "parental repeat for the other four", and further echoes my concern in point 1 where the human DUX4 C-term is most similar to the 2nd (inactive) repeat of mouse DUX.
  
  The reviewer’s point is well taken and is addressed in point #1 above.
  
  3) In Fig. 1d, the authors showed that DUX4-containing C3 and C5, but lacking acidic tail, can promote MERVL::GFP expression, albeit to a slightly lower extent compared to FL. However, in Fig. 2b, C3 or C5 alone (lacking acidic tail) completely failed to promote MERVL::GFP expression. However, in the presence of the acidic tail, both versions were able to promote MERVL::GFP expression, similar to that of FL. The latter would suggest that it is the acidic tail that is crucial for MERVL::GFP expression, and this does not quite agree with Fig 1b, where C12345 (lacking acidic tail) was able to promote MERVL::GFP expression. Although C12345 did not activate MERVL to a similar level as FL, it is clearly proficient, compared to C3 or C5 alone (lacking acidic tail) where there is no increase in MERVL at all. Additional constructs will be helpful to clarify these points. For example, 'C3+C5 minus acidic tail' and 'HD1+HD2+acidic tail only' constructs.
  
  We agree that constructs such as those mentioned would add to the work. First, we have done the additional construct HD1+HD2+14aa tail, which is presented as ΔC12345+14aa in Figure 2a and in S2a. Additionally, we performed experiments on the requested C3+C5+14aa and C3+C5Δ14aa (see samples 6 and 7 in Author response image 1, which are now included in Supplemental Figure 2b). The results reinforce our hypothesis of an additive effect toward DUX target gene activation by increasing C-terminal repeats and including the 14aa tail.
  
  Author response image 1.
  
  4) Related to the above, the flow cytometry data for the MERVL::GFP reporter as presented in Figures 1 and 2, as well as in Supp. Fig. 2, show a considerably large difference in the %GFP|mCherry for the FL construct, ranging from ~6-26%. This makes it difficult to convince the reader which of the different DUX domain constructs cannot or can partially induce GFP|mCherry signal when compared to FL, and hence it is tough to definitively ascertain the exact contribution of each of the 5 C-terminal repeats with high confidence, as it appears that there exists a significant amount of variability in this MERVL::GFP reporter system. The authors need to address this issue since this is their primary method to elucidate the transcriptional activity of each of the mouse DUX repeat domains.
  
  We note that with the Dux-/- cell lines we used throughout the timeline of the study, the percent of %GFP|mCherry expression progressively and slowly decreased – possibly due to slow/modest epigenetic silencing of the reporter. However, we always used the full-length DUX construct to establish the dynamic range. We emphasize that the relative differences between constructs over multiple cell line replicates remained relatively consistent. However, we elected to show absolute values in each experiment, rather than simply normalizing the full-length to 100% and showing relative.
  
  5) Lines 140-142 - The authors claimed that the functional difference between the transcriptionally active and inactive repeats could be narrowed down to a "6aa region which is conserved between repeats C3 and C5, but not conserved in C1, C2 and C4". Assuming the 6aa sequence is DPLELF, why does C1C3a elicit almost twice the intensity of GFP|mCherry signal compared to C3C1c, despite both constructs having the exact same 6aa sequence?
  
  Indeed, C1C3a and C3C1c both containing the ‘active’ DPL sequence but having different relative levels of %GFP|mCherry. This is consistent with these sequences having a positive role in DUX target gene regulation – but likely in combination with other other regions which potentiate its affect, possibly through interacting proteins or post-translational modifications.
  
  Why does DPLEPL (the intermediate C3C1b construct) induce a similar extent of GFP|mCherry signal as the FL construct, even though the former includes 3aa from a transcriptionally inactive repeat? In contrast, GSLELF (the other intermediate C1C3b construct) that also includes 3aa from a transcriptionally inactive repeat is almost completely deficient in inducing any GFP|mCherry signal. Why is that so? Is DPL the most crucial sequence? It will be important to mutate these 3 (or the above 6) residues on FL DUX4 to examine if its transcriptional activity is abolished.
  
  These are interesting points. DPL does appear to be the most important region in the mouse DUX repeats. However, DPL is not shared in the C-terminus of human DUX4. Notably, the DUX4 C-terminus is sufficient to activate the mouse MERVL::GFP reporter when cloned to mouse homeodomains (see Author response image 2, second sample) and other DUX target genes (initially published in Whiddon et al. 2017). One clear possibility is that the DPL region is helping to coordinate the additive effects of multiple DUX repeats, which only exist in the mouse protein.
  
  Author response image 2.
  
  6) Line 154 - The intermediate DUX domain construct C1C3b occupied a different position on the PCA plot from the C1C3c construct that does not contain any of the critical 6aa sequence, as shown in Fig. 2e. However, both these constructs appear to be similarly deficient in inducing any GFP|mCherry signal, as seen in Fig. 2c. Why is that so?
  
  The PCA plot assesses the impact on the whole transcriptome and not just the MERVL::GFP reporter, suggesting the 3aa region has transcriptional effects on the genome beyond what is detected in the MERVL::GFP reporter.
  
  7) To strengthen the claim that "Chromatin alterations at DUX bindings sites require a transcriptionally active DUX repeat", the authors should also perform CUT&Tag for constructs containing transcriptionally inactive DUX repeats (e.g. C1+14aa), and show that such constructs fail to occupy DUX binding sites, as well as are deficient in H3K9ac accumulation.
  
  This is a good comment. We elected to control this with constructs containing or lacking an active repeat. Although we have not pursued this by CUT&TAG, we have examined the impact of DUX constructs with inactive repeats (including the requested C1+14aa, new Figure S1g) by ATAC-seq (see #12, ATAC-seq section, below), and observe no chromatin opening, suggesting that the lack of transcriptional activity is rooted in the inability to open chromatin.
  
  8) It would be good if the authors could also include CUT&Tag data for some of the C1C3 chimeric constructs that were used in Fig. 2, since the authors argued that the minimal 6aa region is sufficient to activate many of the DUX target genes. This would also strengthen the authors’ case that the transcriptionally active, not inactive, repeats are critical for binding at DUX binding sites and ensuring H3K9ac occupancy.
  
  We agree that these would be helpful, and have examined the inactive repeats in transcription and ATAC-seq formats during revision (new data in Figures 1d and S1g), but not yet the CUT&TAG format.
  
  9) Line 213 - "SMARCA4" should have been "SMARCA5"? Based on Fig. 4d, SMARCA5 is picked up in the BirA*-DUX interactome, not SMARCA4.
  
  Thanks – corrected.
  
  10) Lines 250-252 - The authors compared the active BirA-C3 against the inactive BirA-C1 to elucidate the interactome of the transcriptionally active C3 repeat, as illustrated in Fig. 5c. They found 12 proteins more enriched in C1 and 154 proteins in C3. This information should be presented clearly as a separate tab in Supp Table 2. What are the proteins common to both constructs, i.e. enriched to a similar extent? Do they include chromatin remodellers too? Although the authors sought to identify differential interactors between the 2 constructs, it is also meaningful to perform 2 separate comparisons - active BirA-C3 against BirA alone control, and inactive BirA-C1 against BirA alone control - like in Fig. 4d, so as to more accurately define whether the active C3 repeat, and not the inactive C1 repeat, interacts with proteins involved in chromatin remodeling.
  
  We thank the reviewer for this comment, and we have modified the manuscript by adding a second sheet in Supplementary Table 2 including the results for enriched proteins in BirA-C1 vs. C3. Additionally, due to limitations of annotation between BirA alone and BirA*-C3 being sequenced in different mass spectrometry experiments, it is difficult to quantitatively compare the two datasets with pairwise comparisons.
  
  11) Fig 5d: The authors mentioned in the legend that endogenous IP was performed for SMARCC1. However, in line 266, they stated Flag-tagged SMARCC1. Is SMARCC1 overexpressed? The reciprocal IP should also be presented. More importantly, C1 constructs (e.g. C1+14aa and C1Δ14aa) should also be included.
  
  To clarify, Figure 4e used exogenously overexpressed FLAG-SMARCC1 in HEK-293T cells to confirm the results of the full-length DUX BioID experiment. Figure 5d was performed with overexpressed DUX construct, but involved endogenous SMARCC1 in mESCs. This has now been made clearer in the revised manuscript.
  
  12) For both the SMARCC1 CUT&Tag and ATAC-seq experiments shown in Figures 5e and 5f respectively, the authors need to include DUX derivatives that contain transcriptionally inactive repeats with and without the 14aa acidic tail, i.e. C1+14aa and C1Δ14aa, and show that these constructs prevent the binding/recruitment of SMARCC1 to DUX genomic targets, and correspondingly display a decrease in chromatin accessibility. Only then can they assert the requirement of the transcriptionally active repeat domains for proper DUX protein interaction, occupancy and target activation.
  
  We agree that examination of an inactive repeat in certain approaches would improve the manuscript. Importantly, we have now included C1+14 in our ATAC-seq experiments, and in Author response image 3 two individual replicates, which constitute a new Figure S1g. Compared to the transcriptionally active DUX constructs, which see opening at DUX binding sites, we do not see chromatin opening at DUX binding sites with transcriptionally inactive C1+14.
  
  Author response image 3.
  
  13) To prove that DUX-interactors are important for embryonic gene expression, it will be important to perform loss of function studies. For instance, will the knockdown/knockout of SMARCC1 in cells expressing the active DUX repeat(s) lead to a loss of DUX target gene occupancy and activation?
  
  We agree that it would be interesting to better understand SMARCC1 cooperation with DUX function in the embryo, but we believe this is beyond the scope of this paper.
  
  Minor Points
  
  1) Lines 124-126 - What is the reason/rationale for why the authors used one linker (GGGGS2) for constructs with a single internal deletion, but 2 different linkers (GGGGS2 and GAGAS2) for constructs with 2 internal deletions?
  
  With Gibson cloning, there are homology overhang arms for each PCR amplicon that are required to be specific for each overlap. Additionally, each PCR amplicon needs to be specific enough from one another so that all inserts (up to 5 in this manuscript) are included and oriented in the right order. The linker sequences were included in the homology arm overlaps, so the nucleotide sequences for each linker needed to be specific enough to include all inserts. This is a general rule to Gibson cloning. Additionally, both GGGGS2 and GAGAS2 are common linker sequences used in molecular biology and the amino acids structures are similar to one another, suggesting there is no functional difference between linkers.
  
  2) Line 704 - 705: In the figure legend, the authors stated that 'Constructs with a single black line have the linker GGGGS2 and constructs with two black lines have linkers with GGGGS2 and GAGAS2, respectively.'. This was not obvious in the figures.
  
  Constructs used for flow and genomics experiments that are depicted in Figure 2, Supplementary Figure 2, Figure 3, Figure 4, and Figure 5 have depicted black lines where deletions are present. Where these deletions are present, there are linkers in order to preserve spacing and mobility for the protein.
  
  3) Line 160 - Clusters #1 and #2 are likely written in the wrong order. It should have been "activating the majority of DUX targets in cluster #2, not cluster #1" and "failed to activate those in cluster #1, not cluster #2", based on the RNA-seq heatmap in Fig. 2f.
  
  We thank the reviewer for this comment, and the error has been corrected in the manuscript.
  
  4) Line 188 - Delete the word "of" in the following sentence fragment: "DUX binding sites correlating with the of transcriptional".
  
  Thanks – corrected.
  
  5) Line 191 - Delete the word "aids" in the following sentence fragment: "important for conferring H3K9ac aids at bound".
  
  Thanks – corrected.
  
  6) Line 711 - "C1-C3 a,b,d" should be "C1-C3 a,b,c".
  
  Thanks – corrected.
  
  7) Lines 711-712 - The colors "pink to blue" and "blue to pink" are likely written in the wrong order. Based on Fig. 2c, the blue to pink bar graphs should represent C1-C3 a,b,c in that order, and likewise the pink to blue bar graphs should represent C3-C1 a,b,c in that order.
  
  Thanks – corrected.
  
  8) There is an overload of data presented in Fig. 2c, such that it is difficult to follow which part of the figure represents each data segment as written in the figure legend. It is recommended that the data presented here is split into 2 sub-figures.
  
  Figure 2c has a supporting figure in Supplementary Figure 2b. While there is both a graphical depiction of the constructions and the data both in the main panel of Figure 2C, we have depicted it as so to be as clear as possible for the reader to interpret the complexity and presentence of amino acids in each of the constructs.
  
  9) Line 717 - "following" is misspelt.
  
  Thanks – corrected.
  
  10) Lines 720-721 - "(Top)" and "(Bottom)" should be replaced with "(Left)" and "(Right)", as the 2 bar graphs presented in Fig. 2d are placed side by side to each other, not on the top and bottom.
  
  Thanks – corrected.
  
  11) Lines 725 and 839 - "Principle" is misspelt. It should be "Principal".
  
  Thanks – corrected.
  
  12) In Figures 3d and 3e, the sample labeled "C3+14_1" should be re-labeled to "C3+14", in accordance with the other sub-figures. Additionally, for the sake of consistency, "aa" should be appended to the relevant constructs, e.g. "C3+14aa" and "C3Δ14aa".
  
  Thanks – corrected.
  
  13) Line 773 - Were the DUX domain constructs over-expressed for 12hr (as written in the figure legend) or 18hr (as labeled in Fig. 5d)?
  
  Thanks – corrected.
  
  14) Related to minor point 19 above, is there a reason/rationale for why some of the experiments used 12hr over-expression of DUX domain constructs (e.g. for CUT&TAG in Fig. 3), whereas in other experiments 18hr over-expression was chosen instead (e.g. flow cytometry for MERVL::GFP reporter in Figures 1 and 2, and co-IP validations of BirA*-DUX interactions in Fig. 4)?
  
  Thanks for the opportunity to explain. In this work, experiments that reported on proteins that are translated following DUX gene activation (e.g. MERVL:GFP via flow) were done at 18hr to allow for enough time for transcription and translation of GFP (or other DUX target genes). For experiments that report on the impact of DUX on chromatin and transcription, such as RNA-seq, CUT&Tag, and ATAC-seq, we induced DUX domain constructs for 12 hours.
  
  15) Line 804 - "ΔHDs" is missing between "C2345+14aa" and "ΔHD1".
  
  Thanks – corrected.
  
  16) In Fig. 5c, "Chromatin remodelers" is misspelt.
  
  Thanks – corrected.
  
  17) There is no reference in the manuscript to the proposed model that is presented in Fig. 6b.
  
  Thanks – corrected.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Given the uncertainty of the function of the Dux peptide repeats in mice, could it not also be possible that the underlying repeated nature of the (coding) DNA? That is, could these DNA repeats exert a regulatory function on Dux transcription itself (also given the dire consequences of misregulated DUX4 expression as seen in FSHD, for example).
  
  Yes, it remains possible that the internal coding repeats within Dux are playing a role in locus regulation, and might be interesting to examine. However, we consider this question as being outside the scope of the current paper.
  
  Finally, it would be interesting to know whether these repeats are, in fact, present in all mouse species. Already no longer present in rat, do they exist, or not, in more "distant" mice, e.g. M. caroli?
  
  Determining whether all mouse strains contain C-terminal repeats in DUX is a question we also considered. However, Dux and its orthologs are present in long and very complex repeat arrays that are not present in the sequencing data or annotation in other mouse strains. Therefore, we are not unable to answer this question from existing sequencing data. Answering would require a considerable genome sequencing and bioinformatics effort, or alternatively a considerable effort aimed at cloning ortholog cDNAs from 2-cell embryos.
  
  Minor points:
  
  line 169: here it seems, in fact, that the 'inactive' C2, C4 repeats are more similar to each other (my calculation: 91 and 96% identity at the protein and DNA level, respectively) than the active C3 and C5 repeats (82 and 89% identity, resp.), the outlier being C1.
  
  Thanks for this comment, which was mentioned by other reviewers as well and has been addressed through new statistical analyses and interpretation (see new Figure S1d).
  
  line 191: I'm not sure this sentence parses correctly ("...14AA tail is important for conferring H3K9Ac aids at bound sites...")
  
  We thank the reviewer for this comment, and we have corrected the sentence in the manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.29.534786v2
www.biorxiv.org www.biorxiv.org

Human airway macrophages are metabolically reprogrammed by IFN-γ resulting in glycolysis dependent functional plasticity

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The researchers demonstrated that when cytokine priming is combined with exposure to pathogens or pathogen-associated molecular patterns, human alveolar macrophages and monocyte-derived macrophages undergo metabolic adaptations, becoming more glycolytic while reducing oxidative phosphorylation. This metabolic plasticity is greater in monocyte-derived macrophages than in alveolar macrophages.
  
  Strengths:
  
  This study presents evidence of metabolic reprogramming in human macrophages, which significantly contributes to our existing understanding of this field primarily derived from murine models.
  
  Weaknesses:
  
  The study has limited conceptual novelty.
  
  We acknowledge that the study has limited conceptual novelty, however, the current manuscript provides the field with evidence of the changes in the phenotype and functions of human macrophages in response to IFN-γ or IL-4 which is currently lacking in the literature. Moreover, our data shows for the first time that human airway macrophages change their function in response to IFN-γ.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors aimed to functionally characterize primary human airway macrophages and monocytederived macrophages, correlating their glycolytic shift in metabolism. They conducted this macrophage characterization in response to type II interferon and IL-4 priming signals, followed by different stimuli of irradiated Mycobacterium tuberculosis and LPS.
  
  Strengths:
  
  (1) The study employs a thorough measurement of metabolic shift in metabolism by assessing extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) of differentially polarized primary human macrophages using the Seahorse XFe24 Analyzer.
  
  (2) The effect of differential metabolic shift on the expression of different surface markers for macrophage activation is evaluated through immunofluorescence flow cytometry and cytokine measurement via ELISA.
  
  (3) The authors have achieved their aim of preliminarily characterizing the glycolysis-dependent cytokine profile and activation marker expression of IFN-g and IL-4 primed primary human macrophages.
  
  (4) The results of the study support its conclusion of glycolysis-dependent phenotypical differences in cytokine secretion and activation marker expression of Ams and MDMs.
  
  Weaknesses:
  
  (1) The data are presented in duplicates for cross-analyses.
  
  (2) The data presented supports a distinct functional profile of airway macrophages (Ams) compared to monocyte (blood)-derived macrophages (MDMs) in response to the same priming signals. However, the study does not attempt to explore the underlying mechanism for this difference.
  
  (3) The study is descriptive in nature, and the results validate IFN-g-mediated glycolytic reprogramming in primary human macrophages without providing mechanistic insights.
  
  (1) We acknowledge the data is presented in duplicate for cross-analyses. This duplication allowed us to examine both (A) the effect of IFN-γ or IL-4 on primary human airway and monocyte derived macrophages in the presence or absence of distinct stimulations and (B) to directly compare the fold change in function occurring in the AM with the changes in the MDM.
  
  (2 & 3) We acknowledge that our study is descriptive however, by inhibiting glycolysis using 2DG we have demonstrated that increased flux through glycolysis is mechanistically required to mediate enhanced cytokine responses in both primary human AM and MDM primed with IFN-γ. However, we acknowledge that we have not determined the differential molecular mechanisms downstream of IFNγ in the AM versus the MDM. IFN-γ promotes both pro- and anti-inflammatory cytokines in AM and this was reduced by inhibiting glycolysis with 2DG. This identifies glycolysis as a key mechanistic pathway which can be therapeutically targeted in AM to modulate inflammation. Mechanistic studies on human AM are limited due to low number of AM retrieved from BAL samples. Nevertheless, the differences between AM and MDM identified in the current study indicate that future mechanistic studies are warranted to identify why IFN-γ promotes IL-10 in AM and not MDM, and, why TNF is differentially regulated by glycolysis in the two macrophage subpopulations, for example.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this manuscript, the authors explore the contribution of metabolism to the response of two subpopulations of macrophages to bacterial pathogens commonly encountered in the human lung, as well as the influence of priming signals typically produced at a site of inflammation. The two subpopulations are resident airway macrophages (AM) isolated via bronchoalveolar lavage and monocyte-derived macrophages (MDM) isolated from human blood and differentiated using human serum. The two cell types were primed using IFNγ and Il-4, which are produced at sites of inflammation as part of initiation and resolution of inflammation respectively, followed by stimulation with either irradiated Mycobacterium tuberculosis (Mtb) or LPS to simulate interaction with a bacterial pathogen. The authors use human cells for this work, which makes use of widely reported and thoroughly described priming signals, as well as model antigens. This makes the observations on the functional response of these two subpopulations relevant to human health and disease. To examine the relationship between metabolism and functional response, the authors measure rates of oxidative phosphorylation and glycolysis under baseline conditions, primed using IFNγ or IL-4, and primed and stimulated with Mtb or LPS.
  
  Strengths:
  
  • The data indicate that both populations of macrophages increase metabolic rates when primed, but MDMs decrease their rates of oxidative phosphorylation after IL-4 priming and bacterial exposure while AMs do not.
  
  • It is demonstrated that glycolysis rates are directly linked to the expression of surface molecules involved in T-cell stimulation and while secretion of TNFα in AM is dependent on glycolysis, in MDM this is not the case. IL-1β is regulated by glycolysis only after IFN-γ priming in both MDM and AM populations. It is also demonstrated that Mtb and LPS stimulation produces responses that are not metabolically consistent across the two macrophage populations. The Mtb-induced response in MDMs differed from the LPS response, in that it relies on glycolysis, while this relationship is reversed in AMs. The difference in metabolic contributions to functional outcomes between these two macrophage populations is significant, despite acknowledgement of the reductive nature of the system by the authors.
  
  • The observations that AM and MDM rely on glycolysis for the production of cytokines during a response to bacterial pathogens in the lung, but that only MDM shift to Warburg Metabolism, though this shift is blocked following exposure to IL-4, are supported by the data and a significant contribution the study of the innate immune response.
  
  Weaknesses:
  
  • It is unclear whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. ECAR and OCR analyses were therefore difficult to interpret.
  
  All data sets have been presented and analysed relative to both unprimed unstimulated to show both the effect of priming and subsequent stimulation. A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change. Therefore, each of unprimed, IFN-γ and IL-4 primed cells were set to 100% in order to assess the effect of stimulation independent of the baseline priming effect. For clarity we have removed the following line:
  
  “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”
  
  We have amended the text in the manuscript (lines 164-173) to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D). These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure 1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”
  
  • The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.
  
  Our data suggests that the MDM are more phenotypically plastic (in terms of their ability to alter expression of cell surface markers in response to cytokine cues), whereas AM have a greater ability to alter cytokine production, our measure of functional plasticity. We have now defined the use of the terms ‘functional plasticity’ and ‘phenotypic plasticity’ in the context of our paper in lines 6063. To consider different culture and plating requirements of MDM versus AM, cytokine production was analysed relative to the average of the unprimed Mtb or LPS control of the respective MDM or AM. This allowed us to draw more accurate comparisons between the two macrophage populations by examining their relative ability to increase their cytokine production (expressed as fold change) rather than defining this functional plasticity only in terms of concentrations of cytokine produced in culture.
  
  We have therefore added the following sentence into the conclusion of the manuscript. “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to modulate cytokine production after exposure Th1 and Th2 cytokines.”
  
  We have edited the discussion (lines 421-423) to clarify the following "have increased ability to produce all cytokines assayed in response to Mtb stimulation" and changed it to “stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”
  
  • The claim that AM are better for "innate training" via IFNγ may not be consistent with increased IL1β and a later claim that MDM have increased production and are "associated with optimal training."
  
  We have removed the word “better” and now simply state that AM are a tractable target to induce innate training in the human lung.
  
  • Statistical analyses may not appropriately support some of the conclusions.
  
  We have consulted with a statistician. Please see response to reviewer 3 recommendations for authors point 1 below.
  
  • AM populations would benefit from further definition-presumably this is a heterogenous, mixed population.
  
  AM are routinely >97% CD68+CD14+ used in the current study (Author response image 1). However, we acknowledge that tissue resident macrophages represent a spectrum of phenotypes. Given limitations in cell numbers from primary human AM derived from BALF, we have not attempted to define the function of discreet subpopulations of AM.
  
  • The term "functional plasticity" could also be more stringently defined for the purposes of this study.
  
  We are terming functional plasticity to be the macrophages’ ability to alter their production of cytokines in response to external cues like IFN-γ and IL-4 whereas phenotypic plasticity is measured based on ability to alter the cell surface expression of activation markers. We have now defined this in the manuscript (lines 60-63).
  
  Author response image 1.
  
  Expression of macrophage markers on AM.
  
  Conclusion:
  
  Overall, the authors succeed in their goals of investigating how inflammatory and anti-inflammatory cytokine priming contributes to the metabolic reprogramming of AM and MDM populations. Their conclusions regarding the relationship between cytokine secretion and inflammatory molecule expression in response to bacterial stimuli are supported by the data. The involvement of metabolism in innate immune cell function is relevant when devising treatment strategies that target the innate immune response during infection. The data presented in this paper further our understanding of that relationship and advance the field of innate immune cell biology.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Authors are suggested to provide rationale for their choice of cytokines as IFN-gamma and IL-4. This will be useful for the readers.
  
  We have updated the following sentence (line 44-46) in the manuscript to add more rationale for the choice of IFN-γ and IL-4. “There is a paucity of data on the role of metabolism in response to Th1 or Th2 microenvironments induced by cytokines-such as IFN-γ or IL-4 respectively, in human macrophages, especially in tissue resident macrophages, such as AM.”
  
  (2) Authors have shown the final outcome of metabolic reprogramming in terms of expression of HLADR and CD-40, and cytokine release. What pathways/receptors are activated or associated with IL-4 and IFN-gamma priming as a first line of response?
  
  The relationship between IFN-γ or IL-4 induced expression of CD40 is established in haematological cell lines and fibroblasts as well as APC, with roles for the JAK/STAT pathways and upregulation of IRFs defined (1-3). Similarly, the relationship between exogenous IFN-γ and upregulation of HLA-DR expression on human monocytes or endothelial cells is established (4, 5). Whist our work does not outline the signalling pathways downstream of Th1 or Th2 cytokine priming, we have shown for the first time that glycolysis mechanistically underpins the shift in phenotype and function observed in human macrophages upon priming with IFN-γ or IL-4.
  
  (3) What are the intracellular signals leading to glycolytic shift?
  
  One of the most likely mechanisms that under pin the shift to glycolytic metabolism is the stabilisation of HIF-1α mediated by activation of mTOR (see response below and rebuttal figure 2).
  
  (4) Additional evidence is required to show Warburg effect such as stabilization and activation of HIF1alpha.
  
  We acknowledge that we have not shown the activation and stabilisation of HIF-1α, however, we have provided functional evidence of increased glycolysis with concomitant decreased oxidative phosphorylation indicative of Warburg metabolism.
  
  In order to address this gap in evidence we have reworded the manuscript to describe this functional change to “Warburg-like metabolism” throughout the manuscript. In addition, we have undertaken Western Blotting to provide evidence of mTOR activation when cells are primed with IFN-γ (Author response image 2).
  
  Author response image 2.
  
  IFN-γ activates mTOR in primary human monocytes. Monocytes were isolated from healthy donor PBMC using magnetic separation. Monocytes were left untreated (-), stimulated with rapamycin as a negative control (Rap; 50 nM), IFN-γ (10 ng/ml) or IFN-γ and rapamycin simultaneously (IFN-γ + Rap) for 15 minutes. Phosphorylation of S6 was used as a readout of mTOR activation and measured by western blot using β-actin as a control with a blot (A) and (b) densitometry results are shown as the relative expression of pS6: β-actin from. Graphs show data of n=1 of unprimed (black dot) vs IFN-γ primed (red) with and without rapamycin. ImageLab (Bio-Rad) software was used to perform densitometric analysis.
  
  (5) What is the importance of showing percentage change vs fold change in figure 1 (1C vs 1A)?
  
  All data sets have been presented and analysed relative to both unprimed unstimulated to show the effect of first priming and subsequent stimulation (Figure 1A). A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change (Figure 1C). Therefore, each of unprimed, IFN-γ or IL-4 primed cells were set to 100% to assess the effect of stimulation independent of the pre-existing effect of priming on the baseline metabolism. For clarity we have removed the following line:
  
  “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”
  
  We have amended the text (lines 164-173) in the manuscript to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D). These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”
  
  (6) Why IL-4 primed cells have lower glycolysis than unprimed control cells even in absence of pathogen in Figure 1A?
  
  IL-4 primed AM do not have statistically significant changes in glycolysis compared with unprimed control cells in the absence of stimulation.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The manuscript entitled "Human airway macrophages are metabolically reprogrammed by IFN-γ resulting in glycolysis dependent functional plasticity" by Cox et al., characterizes glycolytic-linked cytokine secretion and surface receptor expression of primary human airway macrophages (AM) and monocyte-derived macrophages (MDM). The authors primed the primary macrophages with type II interferon (IFN-γ) or interleukin-4 (IL-4) into Th1 and Th2 polarized states. This was followed by measurement of the shift in macrophage metabolism to glycolysis (ECAR measurement) and/or oxidative phosphorylation (OCR measurement) in response to lipopolysaccharide and irradiated Mycobacterium tuberculosis. The authors then utilize 2-DG (an inhibitor of glycolysis) to show the reliance of glycolytic shift in metabolism to drive the expression of different macrophage activation markers in MDMs and cytokine secretion in AMs.
  
  Significance:
  
  The study provides important validation of IFN-γ-mediated glycolytic shift and its correlated functionalities in primary human macrophage populations.
  
  Highlights: The study characterizes glycolytic-linked cytokine secretion and expression of macrophage activation markers in primary human resident (lung) and monocyte (blood)-derived macrophages. The study also shows data in support of IFN-γ alone in mediating glycolytic reprogramming of human primary macrophages.
  
  Limitations:
  
  The study lacks novelty and does not provide any new or different information in relation to IFN-γmediated glycolytic shift in the metabolism of human macrophages.
  
  Major comments:
  
  (1) The authors have relied on irradiated Mycobacterium tuberculosis (Mtb) and LPS stimulation to measure different correlates of macrophage functions. Additionally, the authors have discussed their results with irradiated Mtb with that of infection with live Mtb. There are also recent reports that show Mtb infection limiting glycolytic reprogramming in murine and human macrophages (PMID: 31914380) in contrast to their observation with irradiated Mtb. The authors should also include live Mtb infection or other replicative live bacterium for the induction of surface activation markers and cytokine release in their setup.
  
  We thank the reviewer for this suggestion; however, this is beyond the scope of the current study which was to assess AM and MDM in the context of immune stimulation in a reductive manner using TLR4 ligand LPS and a more complete whole bacteria stimulation. The selected bacterial ligands were employed in the study to allow us to model an optimal macrophage host response. This minimises the confounding variable of live bacteria which can perturb cellular metabolism and immune responses, which we have highlighted in the discussion. Since both LPS and irradiated Mtb induced similar metabolic and phenotypic profiles, it is likely that the effects of priming are maintained with diverse stimuli.
  
  (2) The authors should add a quantitative measure (like extracellular lactate secretion or ECAR level) for the extent of glycolytic inhibition by the use of 5 mM 2-DG in their setup.
  
  We would like to draw the attention of the reviewer to the data represented in supplementary figure 2B, demonstrating that 2DG lowers ECAR at 5mM at both 1 and 24 h post stimulation with iH37Rv by an average of approximately 40%. In addition, we have acknowledged that inhibition with 5 mM 2DG does not fully inhibit glycolysis as outlined in the study limitations (lines 477-480).
  
  (3) Percent change and fold change have been used to show the same or similar result in Fig. 1 and 2. Whereas, supplementary Fig. 1 shows absolute ECAR/OCR values in addition to fold change. The authors can plot either fold change or percent change in different measurements to avoid confusion. For example, do ECAR changes upon LPS stimulation in Fig. 1A and 1C come from the same dataset? One of the data points in percent change shows a decrease in percent ECAR change under no cytokine control, whereas all the data points in fold change show an increase.
  
  We have addressed this comment above in response to reviewer 1 point 5 (recommendations for the authors).
  
  We thank the reviewer for highlighting this single error in the data points for percent change. We have fixed this data point which was a result of a calculation error. All data throughout the manuscript has now been rechecked.
  
  Minor comments:
  
  (1) The manuscript for review should be line-marked for referencing and commenting during review.
  
  We have now included line-marking on the manuscript.
  
  (2) The authors can depict marker legends differently for all figures. In all figures, circles to squares or triangles represent treatment/stimulation with iH37Rv or LPS. The authors can depict this as circles to squares/triangles in contrast to different legends.
  
  We have changed the legend to include a more detailed description of data represented inserting additional information regarding the colours and symbols represented in the figures.
  
  (3) Describe bars in supplementary figure 1A - 1H in its legend?
  
  We thank the reviewer for highlighting this oversight, we have amended the legend to state “error bars represent standard deviation”
  
  (4) Discuss the significant increase in CD86 expression in IFN-γ and IL-4 primed unstimulated AMs in Fig. 3E.
  
  We have updated the results section to state that IFN-γ increased the expression of CD86 when isolated in the absence of bacterial stimulations in Fig. 3E (lines 271-272). There is no significant increase in CD86 by IL-4 primed unstimulated AM. IL-4 primed human AM only upregulated CD86 when treated with 2DG or in the presence of stimulation.
  
  (5) Contrary to Fig. 2, the data points of unstimulated cells in Fig. 4 vary for different treatment conditions (no cytokine, IFN-γ, and IL-4) for each cytokine measurement. What is the difference between unstimulated cells in Fig. 4 (for each cytokine) from that of Fig. 2 (for each receptor MFI)?
  
  Unstimulated cells change their surface activation markers and phenotype in response to IFN-γ and IL-4 in Fig. 2. For Fig. 4, IFN-γ and IL-4 are not sufficient to induce cytokine secretion in the absence of stimulation with bacterial ligands.
  
  (6) The methodology for seeding and treatment of cells is reemphasized for almost all results. Defining macrophage priming and stimulation of macrophages in the method section and once at the start of results should be fine.
  
  Plating happens differently for Seahorse compared to the flow cytometric phenotyping and ELISA for cytokine production. For clarity we have stated and reemphasized the seeding and treatment of cells throughout the results section.
  
  (7) Clarify "IL-4 reduced glycolysis in response to LPS stimulation" in relation to the results depicted in Fig. 1A and 1C. Similarly, clarify "IL-4 resulting in reduced IL-1β and IL-10 production" in relation to Fig. 4E.
  
  For clarity we have added the following lines (157-160, 164-170) to the manuscript:
  
  “IL-4 primed iH37Rv stimulated AM increased ECAR to similar extent as unprimed controls (Figure 1A; left). Conversely, IL-4 primed AM stimulated with LPS AM did not increase their ECAR to the same extent as controls (Figure 1A; right), suggesting that IL-4 reduces the AM ability to increase ECAR in response to LPS stimulation.”
  
  “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D). These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C).”
  
  For clarity we have amended the sentence the reviewer has highlighted (lines 214-215): “IL-4 primed AM had reduced fold change in glycolysis upon stimulation with LPS compared with controls”.
  
  Since IFN-γ priming induced large effect sizes, we statistically analysed the IL-4 primed and unprimed data sets in the absence of the IFN-γ primed data sets to determine how IL-4 influenced macrophage function. The only data where this resulted in any statistical significance was in response to cytokine production. We have now clarified this in the methods and relevant figure legends by stating, “Statistically significant differences were determined using two-way ANOVA with a Tukey post-test (AD); *P≤0.05, **P≤0.01, ***P≤0.001, ****P≤0.0001 or #P≤0.05, ##P≤0.01 (where IFN-γ primed data sets were excluded for post-test analysis to analyse statistical differences between no cytokine and IL4 treated data sets).
  
  To further clarify this, we have amended the text of the manuscript (lines 307-310) to reflect this. “All stimulated AM secreted IL-10 regardless of priming (Figure 4E). IFN-γ significantly enhanced iH37Rv induced IL-10 in AM compared to unprimed or IL-4 primed comparators (Figure 4E). IL-4 priming of human AM significantly reduced IL-10 production in response to iH37Rv compared with unprimed AM (Figure 4E). LPS strongly induced IL-10 production in unprimed MDM, which was significantly attenuated by either IFN-γ or IL-4 priming (Figure 4F).”
  
  (8) Clarify whether data points in unstimulated, iH37Rv stimulated, and LPS-stimulated control cells in Fig. 3A - 3F are from independent experiments from those in Fig. 2A - 2F? The distribution of data points of control (no 2-DG treatment) in Fig. 3 is highly similar to the corresponding data points in Fig. 2. Similarly, provide clarification for similarity in Fig. 5A - 5F and Fig. 4A - 4F.
  
  The data illustrated in figure 2 and 3 are from one very large dataset, as are the data in figures 4 and 5. This large experiment was designed to test the effect of priming macrophages with IFN- or IL-4 (in the presence or absence of stimulation), and also to determine if the differential responses elicited due to priming were dependent on glycolysis (by inhibiting with 2DG). For clarity and transparency, the same stimulated dataset is repeated in both figures. Given the size and complexity of the experiment, we chose to present the data this way to aid the reader.
  
  (9) Clarify the statement "where data was reanalyzed in the absence of IFN-γ" in the section pertaining to Statistical analysis. The authors should clearly mention nature of biological and technical replicates for each experiment in its figure legend. The authors should also confirm multiple comparison correction in all 2-way ANOVA tests done in each figure legend."
  
  We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated data sets.”
  
  Figures represent biological replicates (which are the average of technical replicates, presented as a single data point). This is indicated by the following sentence in each figure legend: “Each linked data point represents the average of technical duplicates for one individual biological donor”.
  
  Each legend has been amended to include the multiple comparison post-test applied.
  
  (10) Discuss the differences and similarities of IFN-γ driven metabolic reprogramming of primary murine macrophages with the results of this study relative to cytokine secretion and activation marker expression.
  
  We have added additional discussion and detail comparing human and murine macrophages in lines 381-382, 403, 407 and 412-415 of the manuscript.
  
  (11) The repetitive data plots of similar results can be significantly reduced to improve the interpretation of the results.
  
  The benefit of the plotting the data in this way is for a clearer understanding and representation of the data. The repetitive data plots allow the benefit of being able to first delineate the effect of priming and priming plus stimulation and then, separately, to further examine the differences in AM versus MDM. The repetition of the primed data points then allows of the reader to determine the effect of inhibiting glycolysis with 2DG on unprimed and primed macrophages (with and without stimulation).
  
  Reviewer #3 (Recommendations For The Authors):
  
  The methods used and data reported in this manuscript contribute to our understanding of the role of metabolism in programming of macrophages during priming. Suggestions for improving the presentation and interpretation of results include:
  
  • Consult with a statistician regarding analyses of the multiple conditions used during these assays. The use of repeated statistical analyses with different comparison groups in the same figure/data set seems atypical and should either be amended or fully justified in the text. Also, use of two-way vs. one-way ANOVA should be evaluated and clarified.
  
  We have now consulted a statistician. We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated groups.”
  
  There are two variables in the data sets; cytokine priming as well as stimulation status therefore we opted for a two-way ANOVA rather than a One-way ANOVA. There are three stimulation groups: unstimulated, Mtb-stimulated and LPS-stimulated. Cytokine priming also has three groups: no cytokine, IFN-y, or IL-4. There are two variables (priming and stimulation), each with 3 groups i.e., six treatment conditions in total, therefore two-way AVOVA with multiple comparisons tests help pinpoint exactly which groups (e.g., the 6 different levels of the 'stimulation' and 'cytokine' treatments) are significantly different from each other. This was important for understanding the specific effects of our treatments. The reader can therefore also deduce how these six treatment conditions compare to each other.
  
  In contrast, performing multiple single comparisons independently of the rest of the dataset (e.g. t tests), increases the risk of false positives (type 1 error). Multiple comparisons ANOVA with post-tests adjust for this, helping to reduce the likelihood of a type 1 error. These stats are more stringent, and it is therefore harder to get P values <0.05. Hence, if we compared all six treatment groups without adjustment, you increase the chance of finding false positives due to the sheer number of comparisons, leading to biased and incorrect conclusions.
  
  In our case, multiple comparisons tests were essential after the two-way ANOVA because they helped to objectively identify specific treatment group differences and control the overall error rate when we were extracting our conclusions, thereby reducing any risk of biases in our conclusions.
  
  A one-way ANOVA is used to test the effect of a single variable with more than two groups contained in the dataset. For example, in our case if you only want to test how different 'stimulation' groups affect ECAR or OCR, only in unprimed macrophages, a one-way ANOVA would be used.
  
  The current study used two-way ANOVA to test the effects of two variables (priming and stimulation, or in some cases priming and inhibition) each containing 3 groups, and see if there is any interaction between the two factors. For example, in our case this allowed us to examine how the 'stimulation' and the 'cytokine' priming affect ECAR/OCR levels and to determine if the effect of 'stimulation' depends on the 'cytokine' priming.
  
  • More justification could be given for the dose of IFNγ used for priming. Inflammatory priming is typically performed with a "low-dose" treatment (e.g., ~1 ng/ml), whereas the authors use 10 ng/ml, which would be considered a high dose. It would be useful to repeat select experiments with a more standard low-dose treatment of IFNg to demonstrate that this is also sufficient to induce the observed metabolic changes.
  
  Previous work has identified little difference in the response of AM and peripheral monocytes to low versus high doses of IFN-γ (6). We have inserted the following into the study limitations (lines 479-481).
  
  “Furthermore, only one dose of IFN-γ was utilised due to limitations in AM yield, however, recently both low and high doses of IFN-γ have been shown to have similar effects on AM in vitro (6).”
  
  • Check for accuracy of the Fig.4 legend. Also check that 4G and 4B math is consistent.
  
  The legend for Figure 4 has been amended for incorrect A,B to state G,H. The math has been double checked for accuracy and is correct. 3 out of 10 MDM donors produced IL-1β in the absence of IFN-γ in Figure 4B, therefore the average used to calculate the data represented in Figure 4G was brought down markedly by donors who produced little or no IL-1β.
  
  • Functional plasticity is a vague term and difficult to interpret in this context. It is stated that AM have greater functional plasticity, but MDMs appear to have greater capacity to secrete IL-1β and respond more robustly to IL-4 in terms of T cell stimulation. On that note, the claims regarding antigen presentation would be more impactful if a direct comparison of antigen presentation capacity was made between AM and MDM.
  
  Our data suggests that AM have a greater ability to alter cytokine production, such as IL1β. To consider different culture and plating requirements of MDM v AM cytokine concentration was normalised and expressed in terms of fold change. This gives a more controlled and accurate comparison of the ability of IFN-γ or IL-4 to modulate cytokine production in AM compared with MDM.
  
  The terms ‘functional plasticity’ and phenotypic plasticity’ have now been defined in the manuscript in lines 60-63.
  
  We have therefore added the following sentence into the conclusion of the manuscript (lines 490-493). “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to produce cytokine after exposure Th1 and Th2 cytokines.”
  
  However, we acknowledge that the MDM may be regarded as more plastic because of their ability to respond robustly to IL-4, whereas the phenotypic and functional changes in the AM in response to IL4 are more limited. Whilst the focus of our work was to determine if AM are a tractable target to promote immunity in the lungs through upregulation of pro-inflammatory effector function, their ability to downregulated inflammation in response to IL-4 is comparatively less profound compared with MDM.
  
  We acknowledge the shortcomings of our work which did not allow us to directly measure antigen processing in the AM, due to limitations in the cellular yield from BALF. We have edited the text (lines 251-252 and 286) to clarify this for the reader.
  
  • Inconsistent normalization complicates interpretation of metabolic data. For example, it is unclear, for example, whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. Check harmony of methods for analysis of "metabolic assays" with Fig.1 data, axis, and legend.
  
  We have addressed this comment, which is similar to points made by the other reviewers and amended the manuscript to increase clarity. These changes are outlined in the response to reviewer 1, point 5 (recommendations for the author). In addition, we have amended the metabolic assay method (lines 111-112) to state that “Post stimulation the ECAR and OCR were continually sampled at 20-minute intervals for times indicated.”
  
  • A direct comparison of cytokine production after priming and stimulation with Mtb or LPS is limited by inconsistent axes. The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.
  
  We have amended the text to clarify this issue (lines 313-315). “These data suggest that the AM have greater functional plasticity in terms of their ability to upregulate cytokine production in response to IFN-γ, compared with the MDM. IFN-γ primed AM have enhanced IL-10 and TNF production in response to Mtb and LPS, respectively.”
  
  We have amended the manuscript and have replaced “IFN-γ primed AM have increased ability to produce all cytokines assayed in response to Mtb stimulation” with the following (lines 421-423) “IFNγ primed AM stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”
  
  • AM populations could be defined experimentally.
  
  Airway macrophages were adherence purified from bronchoalveolar lavage fluid defined as CD68+CD14+ as per rebuttal figure 1. The purpose of this study was to examine if human peripherally derived or lung resident macrophages were plastic in response to the classical polarising cytokines IFNγ and IL-4. We have identified that the AM and MDM do indeed have different functional and metabolic responses to these cytokines. However, determining functional differences within the AM subpopulations is beyond the scope of the current study and hampered by low cell numbers in human BALF.
  
  References
  
  (1) Conzelmann M, Wagner AH, Hildebrandt A, Rodionova E, Hess M, Zota A, Giese T, Falk CS, Ho AD, Dreger P, Hecker M, Luft T. IFN-γ activated JAK1 shifts CD40-induced cytokine profiles in human antigen-presenting cells toward high IL-12p70 and low IL-10 production. Biochemical pharmacology 2010; 80: 2074-2086.
  
  (2) Fries KM, Sempowski GD, Gaspari AA, Blieden T, Looney RJ, Phipps RP. CD40 Expression by human fibroblasts. Clinical Immunology and Immunopathology 1995; 77: 42-51.
  
  (3) Gu W, Chen J, Yang L, Zhao KN. TNF-α promotes IFN-γ-induced CD40 expression and antigen process in Myb-transformed hematological cells. TheScientificWorldJournal 2012; 2012: 621969.
  
  (4) Hershman MJ, Appel SH, Wellhausen SR, Sonnenfeld G, Polk HC, Jr. Interferon-gamma treatment increases HLA-DR expression on monocytes in severely injured patients. Clinical and experimental immunology 1989; 77: 67-70.
  
  (5) Maenaka A, Kenta I, Ota A, Miwa Y, Ohashi W, Horimi K, Matsuoka Y, Ohnishi M, Uchida K, Kobayashi T. Interferon-γ-induced HLA Class II expression on endothelial cells is decreased by inhibition of mTOR and HMG-CoA reductase. FEBS open bio 2020; 10: 927-936.
  
  (6) Thiel BA, Lundberg KC, Schlatzer D, Jarvela J, Li Q, Shaw R, Reba SM, Fletcher S, Beckloff SE, Chance MR, Boom WH, Silver RF, Bebek G. Human alveolar macrophages display marked hyporesponsiveness to IFN-γ in both proteomic and gene expression analysis. PLoS One 2024; 19: e0295312.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.20.585747v2
www.biorxiv.org www.biorxiv.org

Early parafoveal semantic integration in natural reading

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.
  
  We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:
  
  “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”
  
  We also rephase our research questions in the introduction section on Page 5 with tracked changes:
  
  “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”
  
  Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:
  
  “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”
  
  Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”
  
  References:
  
  Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281
  
  Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200
  
  Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212
  
  Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986
  
  (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.
  
  We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.
  
  (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,
  
  “Last night, my lazy brother/jacket came to the party one minute before it was over.
  
  Lily says this blue jacket/brother will be a big fashion trend this fall.”
  
  In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.
  
  (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:
  
  “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”
  
  References:
  
  Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206
  
  Reviewer #2 (Public Review):
  
  This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.
  
  The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.
  
  We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.
  
  Major points:
  
  (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).
  
  We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:
  
  “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”
  
  References:
  
  Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206
  
  (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.
  
  Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.
  
  Author response image 1.
  
  We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:
  
  “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”
  
  (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.
  
  We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:
  
  “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”
  
  References:
  
  Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005
  
  Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058
  
  Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061
  
  McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972
  
  Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.
  
  Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752
  
  (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.
  
  We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:
  
  “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”
  
  We reformulated the last sentence:
  
  “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”
  
  (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.
  
  We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:
  
  Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.
  
  (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).
  
  In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:
  
  “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”
  
  References:
  
  Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.
  
  We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.
  
  Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.
  
  (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?
  
  We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:
  
  "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."
  
  (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?
  
  In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:
  
  “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”
  
  (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).
  
  We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:
  
  "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."
  
  References:
  
  Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.
  
  (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?
  
  We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:
  
  "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."
  
  (6) Source analysis, p. 25f.: How was the beamformer regularized?
  
  This information was already included in the original manuscript on Page 26. The original text is provided below for reference:
  
  “No regularisation was performed to the CSD matrices (lambda = 0).”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.09.26.509511v3
www.biorxiv.org www.biorxiv.org

The penetration ring is a novel infection structure formed by the penetration peg for invading plant cell membrane in rice blast fungus

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study focuses on characterizing a previously identified gene, encoding the secreted protein Ppe1, that may play a role in rice infection by the blast fungus Magnaporthe oryzae. Magnaporthe oryzae is a hemibiotrophic fungus that infects living host cells before causing disease. Infection begins with the development of a specialized infection cell, the appressorium, on the host leaf surface. The appressorium generates enormous internal turgor that acts on a thin penetration peg at the appressorial base, forcing it through the leaf cuticle. Once through this barrier, the peg elaborates into bulbous invasive hyphae that colonizes the first infected cell before moving to neighboring cells via plasmodesmata. During this initial biotrophic growth stage, invasive hyphae invaginate the host plasma membrane, which surrounds growing hyphae as the extra-invasive hyphae membrane (EIHM). To avoid detection, the fungus secretes apoplastic effectors into the EIHM matrix via the conventional ER-Golgi secretion pathway. The fungus also forms a plant-derived structure called the biotrophic interfacial complex (BIC) that receives cytoplasmic effectors through an unconventional secretion route before they are delivered into the host cell. Together, these secreted effector proteins act to evade or suppress host innate immune responses. Here the authors contribute to our understanding of M. oryzae infection biology by showing how Ppe1, which localizes to both the appressorial penetration peg and to the appressorial-like transpressoria associated with invasive hyphal movements into adjacent cells, maximizes host cell penetration and disease development and is thus a novel contributor to rice blast disease.
  
  We sincerely appreciate the reviewer’s thoughtful evaluation of our work. We are grateful for your recognition of Ppe1 as a novel contributor to M. oryzae infection biology and your insightful summary of its spatio-temporal localization and functional importance in host penetration. We also appreciate devoting your time to provide us with constructive feedback, which greatly strengthens our manuscript.
  
  Strengths:
  
  A major goal of M. oryzae research is to understand how the fungus causes disease, either by determining the physiological underpinnings of the fungal infection cycle or by identifying effectors and their host targets. Such new knowledge may point the way to novel mitigation strategies. Here, the authors make an interesting discovery that bridges both fungal physiology and effector biology research by showing how a secreted protein Ppe1, initially considered an effector with potential host targets, associates with its own penetration peg (and transpressoria) to facilitate host invasion. In a previous study, the authors had identified a small family of small secreted proteins that may function as effectors. Here they suggest Ppe1 (and, later in the manuscript, Ppe2/3/5) localizes outside the penetration peg when appressoria develops on surfaces that permit penetration, but not on artificial hard surfaces that prevent peg penetration. Deleting the PPE1 gene reduced (although did not abolish) penetration, and a fraction of those that penetrated developed invasive hyphae that were reduced in growth compared to WT. Using fluorescent markers, the authors show that Ppe1 forms a ring underneath appressoria, likely where the peg emerges, which remained after invasive hyphae had developed. The ring structure is smaller than the width of the appressorium and also lies within the septin ring known to form during peg development. This so-called penetration ring also formed at the transpressorial penetration point as invasive hyphae moved to adjacent cells. This structure is novel, and required for optimum penetration during infection. Furthermore, Ppe1, which carries a functional signal peptide, may form on the periphery of the peg, together suggesting it is secreted and associated with the peg to facilitate penetration. Staining with aniline blue also suggests Ppe1 is outside the peg. Together, the strength of the work lies in identifying a novel appressorial penetration ring structure required for full virulence.
  
  We are deeply grateful to the reviewer for the clear understanding and insightful evaluation of our work. Your recognition of the novel contribution and scientific merit of our study is both encouraging and motivating. We sincerely appreciate the time, expertise and constructive feedback dedicated to reviewing our manuscript, as the comments have been instrumental in enhancing the quality of this work.
  
  Weaknesses:
  
  The main weakness of the paper is that, although Ppe1 is associated with the peg and optimizes penetration, the function of Ppe1 is not known. The work starts off considering Ppe1 a secreted effector, then a facilitator of penetration by associating with the peg, but what role it plays here is only often speculated about. For example, the authors consider at various times that it may have a structural role, a signaling role orchestrating invasive hyphae development, or a tethering role between the peg and the invaginated host plasma membrane (called throughout the host cytoplasmic membrane, a novel term that is not explained). However, more effort should be expended to determine which of these alternative roles is the most likely. Otherwise, as it stands, the paper describes an interesting phenomenon (the appressorial ring) but provides no understanding of its function.
  
  We sincerely appreciate the reviewer’s comments. We have revised "host cytoplasmic membrane" to "host plasma membrane" throughout the manuscript for consistency. To further investigate the role of the Ppe1 in the interaction between M. oryzae and rice, we overexpressed PPE1 in rice ZH11. A pCXUN-SP-GFP-Ppe1 vector containing a signal peptide and an N-terminal GFP tag was constructed (pCXUN-SP-GFP-Ppe1), and 35 GFP-PPE1-OX plants (T0) were subsequently obtained through Agrobacterium-mediated rice transformation. Subsequently, PCR and qRT-PCR validation were performed on the T0 transgenic plants. The PCR results showed that the inserted plasmid could be amplified from the genomic DNA extracted from the leaves of all the resulting T0 plants (Author response image 1A). qRT-PCR results indicated that most T0 transgenic plants could transcriptionally express PPE1 (Author response image 1B). T0 plants with higher expression levels were selected for western blot analysis, which confirmed the presence of GFP-Ppe1 bands of the expected size (Author response image 1C). To further explore the targets of Ppe1 in rice, the leaf sheaths of T0 plants were inoculated with M. oryzae strain Guy11. Total proteins were extracted at 24 hours post-inoculation (hpi) and subjected to immunoprecipitation using GFP magnetic beads. Silver staining revealed more interacting protein bands in T0 plants compared to ZH11 and GFP-OX controls (Author response image 1D). These samples were then analyzed by mass spectrometry in which 331 rice proteins that potentially interact with Ppe1 were identified (Author response image 1E). Subsequently, yeast two-hybrid assays were performed on 13 putative interacting proteins with higher coverage, but no interaction was detected between Ppe1 and these proteins (Author response image 1F-G). Considering that the identification and functional validation of interacting proteins is a labor-intensive and time-consuming endeavor, we will focus our future efforts on in-depth studies of Ppe1's function in rice.
  
  Author response image 1.
  
  Screening of Ppe1 candidate targets in rice. (A) The determination of GFP-PPE1 construct in transgenic rice. (B) The expression of PPE1 transgenic rice (T0) was verified by qRT-PCR. (C) Western blot analysis of Ppe1 expression in transgenic rice. (D) Rapid silver staining for detection of the purified proteins captured by the GFP-beads. (E) Venn diagram comparing the number of proteins captured in the different samples. (F) Identity of the potential targets of Ppe1 in rice. (G) Yeast two-hybrid assay showing negative interaction of Ppe1 with rice candidate proteins.
  
  The inability to nail down the function of Ppe1 likely stems from two underlying assumptions with weak support. Firstly, the authors assume that Ppe1 is secreted and associated with the peg to form a penetration ring between the plant cell wall and cytoplasm membrane. However, the authors do not demonstrate it is secreted (for instance by blocking Ppe1 secretion and its association with the peg using brefeldin A).
  
  To investigate the secretion pathway of Ppe1 in M. oryzae, we determined the inhibitory effects of Brefeldin A (BFA) on conventional ER-to-Golgi secretion in fungi as suggested by the reviewer. We inoculated rice leaf sheaths with conidia suspensions from the Ppe1-mCherry and PBV591 strains (containing a Pwl2-mCherry-NLS and Bas4-GFP co-expressing constructs) and treated them with BFA. We found that, even after exposure to BFA for 5 to 11 hours, the Ppe1-mCherry still formed its characteristic ring conformation (Author response image 2). Similarly, in the BFA-treated samples, the cytoplasmic effector Pwl2-mCherry accumulated at the BIC, while the apoplastic effector Bas4-GFP was retained in the invasive hyphae (Author response image 2). These results indicate that Ppe1 is not secreted through the conventional ER-Golgi secretion pathway.
  
  Author response image 2.
  
  The secretion of Ppe1 is not affected by BFA treatment. (A) and (B) The Ppe1-mCherry fluorescent signal was still observed both in the presence and absence of BFA. (C) Following BFA treatment, the secretion of the apoplastic effector Bas4-GFP was blocked while that of the cytoplasmic effector Pwl2-mCherry was not affected. The rice leaf sheath tissue was inoculated with 50 μg/mL BFA (0.1% DMSO) at 17 hpi. Images were captured at 22 hpi for A and 28 hpi for B and C. Scale bars = 10 µm.
  
  Also, they do not sufficiently show that Ppe1 localizes on the periphery of the peg. This is because confocal microscopy is not powerful enough to see the peg. The association they are seeing (for example in Figure 4) shows localization to the bottom of the appressorium and around the primary hyphae, but the peg cannot be seen. Here, the authors will need to use SEM, perhaps in conjunction with gold labeling of Ppe1, to show it is associating with the peg and, indeed, is external to the peg (rather than internal, as a structural role in peg rigidity might predict). It would also be interesting to repeat the microscopy in Figure 4C but at much earlier time points, just as the peg is penetrating but before invasive hyphae have developed - Where is Ppe1 then? Finally, the authors speculate, but do not show, that Ppe1 anchors penetration pegs on the plant cytoplasm membrane. Doing so may require FM4-64 staining, as used in Figure 2 of Kankanala et al, 2007 (DOI: 10.1105/tpc.106.046300), to show connections between Ppe1 and host membranes. Note that the authors also do not show that the penetration ring is a platform for effector delivery, as speculated in the Discussion.
  
  We sincerely appreciate the reviewer's valuable suggestion regarding SEM with immunogold labeling to precisely visualize Ppe1's association with penetration peg. While we fully acknowledge this would be an excellent approach, after consulting several experts in the field, we realized that the specialized equipment and technical expertise required for fungal immunogold-SEM are currently unavailable to us. We sincerely hope that the reviewer will understand this technical limitation.
  
  To further strengthen our evidence for the role of Ppe1's in anchoring penetration peg to the plant plasma membrane, we provided new co-localization images of Ppe1 and penetration peg (Fig. S7). At 16 hours post-inoculation (hpi), when the penetration peg was just forming and prior to the development of invasive hyphae, the Ppe1-mCherry fluorescence forms a tight ring-like structure closely associated with the base of the appressorium. As at 23 hpi, the circular Ppe1-mCherry signal was still detectable beneath the appressorium, and around the penetration peg which differentiated into the primary invasive hyphae. Furthermore, we obtained 3D images of the strain expressing both Ppe1-mCherry and Lifeact-GFP during primary invasive hyphal development. The results revealed that Ppe1 forms a ring-like structure that remains anchored to the penetration peg during fungal invasion (Fig. S6).
  
  We also conducted FM4-64 staining experiment as recommended by the reviewer. Although the experiment provided valuable insights, we found that the resolution was insufficient to precisely delineate the spatial relationship between Ppe1 and host membranes at the penetration peg (Author response image 3). To optimize this colocalization, we tested the localization between Ppe1-mCherry ring and rice plasma membrane marker GFP-OsPIP2 (Fig. S8). These new results provide compelling complementary evidence supporting our conclusion that Ppe1 functions extracellularly at the host-pathogen interface. We hope these additional data will help address the reviewer's concerns regarding Ppe1's localization.
  
  Author response image 3.
  
  FM4-64-stained rice leaf sheath inoculated with M. oryzae strain expressing Ppe1-GFP. Ppe1-GFP ring was positioned above the primary invasive hyphae. Scale bar = 5 µm.
  
  Secondly, the authors assume Ppe1 is required for host infection due to its association with the peg. However, its role in infection is minor. The majority of appressoria produced by the mutant strain penetrate host cells and elaborate invasive hyphae, and lesion sizes are only marginally reduced compared to WT (in fact, the lesion density of the 70-15 WT strain itself seems reduced compared to what would be expected from this strain). The authors did not analyze the lesions for spores to confirm that the mutant strains were non-pathogenic (non-pathogenic mutants sometimes form small pinprick-like lesions that do not sporulate). Thus, the pathogenicity phenotype of the knockout mutant is weak, which could contribute to the inability to accurately define the molecular and cellular function of Ppe1.
  
  We appreciate the reviewer’s comments. To ensure the reliability of our findings, we conducted spray inoculation experiments with multiple independent repeats. Our results consistently demonstrated that deletion of the PPE1 gene significantly attenuates the virulence of M. oryzae. Further analysis of lesion development and sporulation in the Δ_ppe1_ mutant revealed that it retains the ability to produce conidia. To validate these observations, we generated a PPE1 knockout in the wild-type reference strain Guy11. Similarly, we observed a significant decrease in the pathogenicity of the Δ_ppe1_ mutants generated from the wild-type Guy11 strain compared to Guy11 in the spray assay (Fig S2). These results collectively indicate the importance of Ppe1 in the pathogenicity of M. oryzae to rice.
  
  In summary, it is important that the role of Ppe1 in infection be determined.
  
  Reviewer #2 (Public review):
  
  The article focuses on the study of Magnaporthe oryzae, the fungal pathogen responsible for rice blast disease, which poses a significant threat to global food security. The research delves into the infection mechanisms of the pathogen, particularly the role of penetration pegs and the formation of a penetration ring in the invasion process. The study highlights the persistent localization of Ppe1 and its homologs to the penetration ring, suggesting its function as a structural feature that facilitates the transition of penetration pegs into invasive hyphae. The article provides a thorough examination of the infection process of M. oryzae, from the attachment of conidia to the development of appressoria and the formation of invasive hyphae. The discovery of the penetration ring as a structural element that aids in the invasion process is a significant contribution to the understanding of plant-pathogen interactions. The experimental methods are well-documented, allowing for reproducibility and validation of the results.
  
  We sincerely appreciate the thoughtful and insightful evaluation of our work. Thank you for recognizing the significance of our findings regarding the penetration ring and the functional role of Ppe1 during host invasion.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Line 48: "after appressorium- or transpressorium-mediated penetration of plant cell wall" - transpressoria do not penetrate the plant cell wall.
  
  Thank you for your valuable suggestion. For improved clarity, we have rephrased the sentence as follows: In this study, we showed that a penetration ring is formed by penetration pegs after appressorium-mediated penetration of plant cell wall.
  
  Line 143: "approximately 25% of the 143 appressoria formed by the Δppe1 mutant had no penetration peg" - It is not possible to see the penetration peg by confocal microscopy.
  
  Thank you for your valuable suggestion. We have revised the sentence as follows: In contrast, approximately 25% of the appressoria formed by the Δ_ppe1_ mutant had no penetration.
  
  Line 159: "inner cycle" -should be inner circle?
  
  We gratefully acknowledge the reviewer's careful reading. The typographical error has been corrected throughout the revised manuscript.
  
  Line 255: "These results indicate that initiation of penetration peg formation is necessary for the formation of the penetration ring." Actually, more precisely, they indicate that penetration is necessary.
  
  We appreciate this suggestion and have revised the text to be more concise: These results indicate that penetration is necessary for the formation of the penetration ring.
  
  Line 282: "unlike subcellular localizations of other effectors"- is this an effector if no plant targets are known?
  
  We appreciate this suggestion and have revised the text as follows: unlike subcellular localizations of Bas4, Slp1, Pwl2, and AvrPiz-t.
  
  Line 299: "it may function as a novel physical structure for anchoring penetration pegs on the surface of plant cytoplasm membrane after cell wall penetration" - an interaction with the plant plasma membrane was not shown and this is speculative.
  
  We have provided new evidence to show the spatial positioning of Ppe1-mCherry ring with the rice plasma membrane (see figure S8)
  
  Line 301: "It is also possible that this penetration ring functions as a collar or landmark that is associated with the differentiation of penetration pegs (on the surface of cytoplasm membrane) into primary invasive hyphae enveloped in the EIHM cytoplasm membrane (Figure 7)." The alternative conclusions for Ppe1 function, either interacting with host membranes or acting as a developmental landmark, need to be resolved here.
  
  We appreciate this suggestion and have revised the text as follows: It is also possible that this penetration ring functions as a collar that is associated with the differentiation of penetration pegs into primary invasive hyphae enveloped in the EIHM (Figure 7).
  
  Line 317: "is likely a structural feature or component for signaling the transition of penetration pegs to invasive hyphae",- if the authors think Ppe1 has these roles, why do they refer to Ppe1 as an effector?
  
  Many thanks for these comments. We have revised this and refer to Ppe1 as a secreted protein throughout the revised manuscript.
  
  Line 337: "After the penetration of plant cell wall, the penetration ring may not only function as a physical structure but also serve as an initial effector secretion site for the release of specific effectors to overcome plant immunity in early infection stages"- which is it? Also, no evidence is provided to suggest it is a platform for effector secretion.
  
  We sincerely appreciate your valuable suggestion. We have revised this sentence as follows: After the penetration of plant cell wall, the penetration ring may not only function as a physical structure but also serve as a secretion site for the release of specific proteins to overcome plant immunity during the early infection stages.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) While the study suggests the penetration ring as a structural feature, it remains unclear whether it also serves as a secretion site for effectors. Further exploration of this aspect would strengthen the conclusions.
  
  We thank the reviewer for this useful suggestion. In this study, we demonstrated that Ppe1 proteins form a distinct penetration ring structure at the site where the penetration peg contacts the plant plasma membrane prior to differentiation into primary invasive hyphae (Figs. 2 and 7). Thus, we reasoned that penetration ring may function as a novel physical structure. Notably, additional Ppe family members (Ppe2, Ppe3, and Ppe5) were also found to localize to this penetration ring (Fig. 6B), suggesting that it also serves as a secretion site for releasing proteins. To test whether Ppe1 and Ppe2 label to the same site, we analyzed the colocalization between Ppe1-GFP and Ppe2-mCherry. The results showed that Ppe1-GFP and Ppe2-mCherry are well colocalized (Author response image 4). This study primarily focuses on the discovery and characterization of the penetration ring. The potential role of this structure in effector translocation will be investigated in future studies.
  
  Author response image 4.
  
  Ppe1 co-localizes with Ppe2 at the penetration ring in M. oryzae. Line graphs were generated at the directions pointed by the white arrows. Scale bar = 2μm.
  
  (2) The article could benefit from a discussion on the broader implications of these findings for developing resistant crop varieties or new fungicidal strategies.
  
  We have incorporated this discussion as suggested (lines 358-360).
  
  (3) What is the significance of the formation of the penetration ring in the pathogenicity of the rice blast fungus? Or, how does it assist the fungus in its infection process?
  
  Our findings have several significant implications. First, we believe that the discovery of the penetration ring as a novel physical structure associated with the differentiation of invasive hyphae represents a breakthrough in plant-pathogen interactions that will be of interest to fungal biologists, pathologists and plant biologists. Secondly, our study presents new role of the peg as a specialized platform for secretory protein deployment, in addition to its commonly known role as a physical penetration tool for the pathogen. Thirdly, we identify Ppe1 as a potential molecular target for controlling the devastating rice blast disease, as Ppe homologs are absent in plants and mammals. We have incorporated this discussion in the revised manuscript (lines 354-362).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.11.603048v2
www.biorxiv.org www.biorxiv.org

Cortico-striatal action control inherent of opponent cognitive-motivational styles

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Strengths:
  
  Overall there are some very interesting results that make an important contribution to the field. Notably, the results seem to point to differential recruitment of the PL-DMS pathway in goal-tracking vs sign-tracking behaviors.
  
  Thank you.
  
  Weaknesses:
  
  There is a lot of missing information and data that should be reported/presented to allow a complete understanding of the findings and what was done. The writing of the manuscript was mostly quite clear, however, there are some specific leaps in logic that require more elaboration, and the focus at the start and end on cholinergic neurons and Parkinson's disease are, at the moment, confusing and require more justification.
  
  In the revised paper, we provide additional graphs and information in support of results, and we further clarify procedures and findings. Furthermore, we expanded the description of the proposed interpretational framework that suggests that the contrasts between the cortical-striatal processing of movement cues in sign- versus goal trackers are related to previously established contrasts between the capacity for the cortical cholinergic detection of attention-demanding cues.
  
  Reviewer #2 (Public review):
  
  Strengths:
  
  The power of the sign- and goal-tracking model to account for neurobiological and behavioral variability is critically important to the field's understanding of the heterogeneity of the brain in health and disease. The approach and methodology are sound in their contribution to this important effort.
  
  The authors establish behavioral differences, measure a neurobiological correlate of relevance, and then manipulate that correlate in a broader circuitry and show a causal role in behavior that is consistent with neurobiological measurements and phenotypic differences.
  
  Sophisticated analyses provide a compelling description of the authors' observations.
  
  Thank you.
  
  Weaknesses:
  
  It is challenging to assess what is considered the "n" in each analysis (trial, session, rat, trace (averaged across a session or single trial)). Representative glutamate traces (n = 5 traces (out of hundreds of recorded traces)) are used to illustrate a central finding, while more conventional trial-averaged population activity traces are not presented or analyzed. The latter would provide much-needed support for the reported findings and conclusions. Digging deeper into the methods, results, and figure legends, provides some answers to the reader, but much can be done to clarify what each data point represents and, in particular, how each rat contributes to a reported finding (ie. single trial-averaged trace per session for multiple sessions, or dozens of single traces across multiple sessions).
  
  Representative traces should in theory be consistent with population averages within phenotype, and if not, discussion of such inconsistencies would enrich the conclusions drawn from the study. In particular, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may be missed in a population average (averaged prolonged elevation) and could serve as a rationale for more sophisticated analyses of peak probability presented subsequently.
  
  We have added two new Tables to clarify the number of rats per phenotype and sex used for each experiment described in the paper (Table 1), and the number of glutamate traces (range, median and total number) extracted for each analysis of performance-associated glutamate levels and the impact of CNO-mediated inhibition of fronto-striatal glutamate (Table 3).
  
  As the timing of glutamate peaks varies between individual traces and subjects, relative to turn and stop cue onset or reward delivery, subject-and trial-averaged glutamate traces would “wash-out” the essential findings of phenotype- and task event-dependent patterns of glutamate peaks. In the detailed responses to the reviewers, we illustrate the results of an analysis of averaged traces to substantiate this view. Furthermore, as detailed in the section on statistical methods, and as mentioned by the reviewer under Strengths, we used advanced statistical methods to assure that data from individual animals contribute equally to the overall result, and to minimize the possibility that an inordinate number of trials obtained from just one or a couple of rats biased the overall analysis.
  
  Reviewer #3 (Public review):
  
  Strengths:
  
  Overall these studies are interesting and are of general relevance to a number of research questions in neurology and psychiatry. The assessment of the intersection of individual differences in cue-related learning strategies with movement-related questions - in this case, cued turning behavior - is an interesting and understudied question. The link between this work and growing notions of corticostriatal control of action selection makes it timely.
  
  Thank you.
  
  Weaknesses:
  
  The clarity of the manuscript could be improved in several places, including in the graphical visualization of data. It is sometimes difficult to interpret the glutamate results, as presented, in the context of specific behavior, for example.
  
  We appreciate the reviewer’s concerns about the complexity of some of the graphics, particularly the results from the arguably innovative analysis illustrated in Figure 6. Figure 6 illustrates that the likelihood of a cued turn can be predicted based on single and combined glutamate peak characteristics. The revised legend for this figure provides additional information and examples to ease the readers’ access to this figure. In addition, as already mentioned above, we have added several graphs to further illustrate our findings.
  
  (Recommendations for the authors)
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The differences in behavioral phenotype according to vendor (Figure 1c) are slightly concerning, could the authors please elaborate on why they believe this difference is? Are there any other differences in these stocks- i.e. weight, appearance, other types of behaviors?
  
  Differences in PCA behavior across vendors or specific breeding colonies were documented previously and may reflect the impact of environmental, developmental and genetic factors (references added in the revised manuscript). We included animals from both vendors to increase phenotypic variability and due to animal procurement constraints during COVID-related restrictions.
  
  (2) Possibly related to the above, the rats in Figure 1a and Figure 2 are different strains. Please clarify.
  
  In the revised legend of Figure 2 we clarify that the rat shown in the photographs is a Long-Evans rat that was not part of the experiments described in this paper. This rat was used to generate these photos as the black-spotted fur provided better contrast against the white treadmill belt.
  
  (3) Figure 3c, the pairwise comparison showing a significant increase from Day 1 to Day 3 is hard to understand unless this is a lasting change. Is this increase preserved at Day 4? Examination of either a linear trend across days or a simple comparison of either Day 1 & 2 against Day 3 & 4 or, minimally Day 1 against Day 4 would communicate this message. Otherwise, there doesn't seem to be much of a case for improvement across test sessions, which would also be fine in my view.
  
  As the analysis of post-criterion performance also revealed an effect of DAY, we felt compelled to report and illustrate the results of pairwise comparisons in Fig. 3c. In agreement with the reviewer’s point, we did not further comment on this finding in the manuscript.
  
  (4) Figure 4e. I find it extremely unlikely that every included electrode was located exactly at anterior 0.5mm. Please indicate the range - most anterior and most posterior of the included electrodes in the study.
  
  The schematic section shown in Fig. 4e depicted that AP level of that section and collapsed all placements onto that level. As detailed in Methods, electrode placements needed to be within the following stereotaxic space: AP: -0.3 to 0.6 mm, ML: 2 to 2.5 mm, and DV: -4.2 to -5 mm (see Methods). To clarify this issue, the text in Results and the legend was modified and the 0.5 mm label was removed from Fig. 4e.
  
  (5) The paper generally is quite data light and there are a lot of extra results reported that aren't shown in the figures. There are 17 instances of the phrase "not shown", some are certainly justified, but a lot of results are missing…
  
  We followed the reviewer’s suggestion and added several graphs. The revised Figure 5 includes the new graph 5d that shows the number of glutamate traces with just 1, 2 or 3 peaks occurring during cue presentation period. Likewise, the revised Figure 7 includes the new graph 7h that shows the number of glutamate traces with just 1, 2 or 3 peaks following the administration of CNO or its vehicle. In both cases, we also revised the analysis of peak number data, by counting the number of cases (or traces) with just 1, 2 or 3 peaks and using Chi-squared tests to determine the impact of phenotype and, in the latter case, of CNO. In addition, the revised Figure 7 now includes a graph showing the main effects of phenotype and CNO in reward delivery-locked glutamate maximum peak concentrations (Fig. 7k). In revising these sections, we also removed the prior statement about glutamate current rise times as this isolated observation had no impact on subsequent analyses or the discussion.
  
  Concerning the reviewer’s point 5d (DMS eGFP transfection correlations Figure 8), the manuscript clarifies that the absence of such a correlation was expected given that eGFP expression in the DMS does not accurately reproduce the prelimbic-DMS projection space that was inhibited by CNO. In contrast, the correlations between the efficacy of CNO and DREADD expression measures in prelimbic cortex were significant and are graphed (Figs. 8g and 8j).
  
  (6) Please clarify the exact number of animals in each experiment. The caption of Figure 3 seems to suggest there are 29 GTs and 22 STs in the initial experiment, but the caption of Figure 5b seems to suggest there are N=30 total rats being analyzed (leaving 21 un-accounted for), or is this just the number of GTs (meaning there is one extra)?
  
  We have added Table 1 to clarify the number of animals used across different experiments and stages. Additionally, we have included a new Table 3 that identifies, for each graph showing results from the analyses of glutamate concentrations, the number of rats from which recordings were obtained and the number of traces per rat (range, median, and total).
  
  (7) Relatedly, in Figures 5c-f and Figures 7g-i, the data seem to be analyzed by trial rather than subject-averaged, please clarify and what is the justification for this?
  
  As detailed Experimental design and statistical analyses, we employed linear mixed-effects modeling to analyze the amperometric data that generated figures 5 and 7 to minimize the risk of bias due to an excessive number of trials obtained from specific rats. LMMs were chosen to analyze these repeated (non-independent) data to address issues that may be present with subject-averaged data. For clarity, throughout the results for these figures, the numerator in the F-ratio reflects the degrees of freedom from the fixed effects (phenotype/sex) and the denominator reflects the error term influenced by the number of subjects and the within-subject variance.
  
  Concerning the illustration and analysis of trial- or subject-averaged glutamate traces please see reviewer 2, point 1 and the graph in that section. Within a response bin, such as the 2-s period following turn cues, glutamate peaks – as defined in Methods - occur at variable times relative to cue onset. Averaging traces over a population of rats or trials would “wash-out” the phenotype- and task event-dependent patterns of glutamate concentration peaks, yielding, for example, a single, nearly 2-s long plateau for cue-locked glutamate recordings from STs (see Figure 5b versus the graph shown in response to reviewer 2, point 1).
  
  (8) Likewise on page 22, the number of animals from which these trials were taken should be stated "The characteristics of glutamate traces (maximum peak concentration, number of peaks, and time to peak) were extracted from 548 recordings of turn cue trials, 364 of which yielded a turn (GTs: 206, STs: 158) and 184 a miss (GTs: 112, STs: 72).".
  
  The number of animals is now included in the text and listed in Table 3.
  
  (9) The control group for Figure 7 given the mCherry fluorophore - given the known off-target effects of CNO, this is a very important control. Minimally, this data should be shown, but it is troubling that the ST group has n=2, I don't really understand how any sort of sensible stats can be conducted with a group this size, and obviously it's too small to find any significant differences if they were there.
  
  As discussed on p. 14-15 in the manuscript under the section Clozapine N-Oxide, the conversion rate of CNO to clozapine suggests that approximately 50-100 times the dose of clozapine (compared to our 5.0 mg/kg CNO dosage) would be required to produce effects on rodent behavior (references on p. 14-15).
  
  Regarding evidence from control rats expressing the empty construct, the revised manuscript clarifies that no effects of CNO on cued turns were found in 5 GTs expressing the empty control vector. Although CNO had no effects in STs expressing the DREADD, we also tested the effects of CNO in 2 STs expressing the empty control vector (individual turn rates following vehicle and CNO are reported for these 2 STs). Moreover, we extracted turn cue-locked glutamate traces (vehicle: 18 traces; 16 CNO traces) from an empty vector-expressing GT and found that administration of CNO neither reduced maximum glutamate peak concentrations nor the proportion of traces with just one peak. The absence of effects of CNO on cued turning performance and on turn-cue locked glutamate dynamics are consistent with prior studies showing no effects of 5.0 mg/kg CNO in rats not expressing the DREADD vector (references in manuscript).
  
  (10) Figure 8b - the green circle indicated by 1 is definitely not the DMS, this is the DLS, and animals with virus placement in this region should be excluded.
  
  The reviewer of course is correct and that exactly was the point of that illustration, as such a transfection space would have received the lowest possible rating (as indicated by the “1” in the green space). Fig. 8b was intended to illustrate expression efficacy ratings and does not indicate actual viral transfection spaces. Because the results described in the manuscript did not include data from a brain with a striatal transfection space as was illustrated in green in the original Fig. 8b, we removed that illustration of an off-target transfection space.
  
  (11) Figure 8j, the correlation specifically counts double-labeled PL hM4Di + eGFP neurons. Separating dual-labeled cells from all mCherry-labeled cells seems very strange given the nature of the viral approach. There seems to be an assumption that there are some neurons that express the mCherry-hM4Di that don't also have the AAV-Cre (eGFP). Obviously, if that were true this poses a huge problem for your viral approach and would mean that you're inhibiting a non-selective population of neurons. More likely, the AAV-Cre (eGFP) is present in all of your mCherry-hM4Di cells, just not at levels visible without GFP antibody amplification. Ideally, staining should be done to show that all cells with mCherry also have eGFP, but minimally this correlation should include all cells expressing mCherry with the assumption that they must also have the AAV-Cre.
  
  As noted on page 15 in the Visualization and Quantification of eGFP/mCherry-Expressing Neurons section, eGFP expression in our viral approach was notably bright and did not necessitate signal enhancement. Furthermore, given the topographic organization of prelimbic-DMS projections on the on hand, and the variable transfection spaces in cortex and striatum on the other hand, the speculation that AAV-Cre may have been present in all mCherry cells is without basis. Second, there certainly are mCherry-positive cells that do not also express the retrogradely transported AAV-Cre, and that therefore were not affected by CNO. Third, the entire point of this dual vector strategy was to selectively inhibit prelimbic-striatal projections, and the strong correlation between double-labeled neuron numbers and cued turn scores substantiates the usefulness of this approach.
  
  (12) Discussion, a bit more interpretation of the results would be good. Specifically - does the PL-DMS inhibition convert GTs to STs? There were several instances where the behavior and glutamate signals seemed to be pushed to look like STs but also a lot of missing data so it is hard to say. One would assume this kind of thing if, as I think is being said (please clarify), the ST phenotype is being driven by glutamatergic drive either locally or from sources other than PL cell bodies, presumably silencing the PL cell body inputs in GTs also leaves other glutamatergic inputs as the primary sources?
  
  We agree with the reviewer that one could say, perhaps somewhat colloquially, that PL-DMS inhibition turns GTs to STs, in terms of turning performance and associated glutamate peak dynamics. The newly added data graphs are consistent with this notion. However, there are of course numerous other neurobiological characteristics which differ between GTs and STs and are revealed in the context of other behavioral or physiological functions. In the Discussion, and as noted by the reviewer, we discuss alternative sources of glutamatergic control in STs and the functional implications of bottom-up mechanisms. In the revised manuscript, we have updated references and made minor revisions to improve this perspective.
  
  (13) I found the abstract really detailed and very dense, it is pretty hard to understand in its current form for someone who hasn't yet read the paper. At this level, I would recommend more emphasis on what the results mean rather than listing the specific findings, given that the task is still quite opaque to the reader.
  
  We revised the abstract, in part by deleting two rather dense but non-essential statements of results and by adding a more accessible conclusion statement.
  
  (14) There are a lot of abbreviations: CTTT, PD, PCA, GT, ST, MEA, GO, LMM, EMMs, PL, DMS. Some of these are only mentioned a few times: MEA, LMM, and EMMs are all mentioned less than 5 times. To reduce mental load for the reader, you could spell these ones out, or include a table somewhere with all of the abbreviations.
  
  We added a list of Abbreviations and Acronyms and eliminated abbreviations that were used infrequently.
  
  (15) Generally, the logic that cortico-striatal connections contribute to GT vs ST seems easy to justify, however, the provided justification is missing a line of connection: "As such biases of GTs and STs were previously shown to be mediated in part via contrasting cholinergic capacities for the detection of cues (Paolone et al., 2013; Koshy Cherian et al., 2017; Pitchers et al., 2017a; Pitchers et al., 2017b), we hypothesized that contrasts in the cortico-striatal processing of movement cues contribute to the expression of these opponent biases." Please elaborate on why specifically cholinergic involvement suggests corticostriatal involvement. I think there are probably more direct reasons for the current hypothesis.
  
  Done – see p. 4-5.
  
  (16) Along the same line, paragraph 3 of the intro about Parkinson's disease and cholinergics seems slightly out of place. This is because the specific or hypothesized link between these things and corticostriatal glutamate has not been made clear. Consider streamlining the message specifically to corticostriatal projections in the context of the function you are investigating.
  
  Done – see p. 4-5.
  
  (17) Page 8, paragraph 2. There is a heading or preceding sentence missing from the start of this paragraph: "Contrary to the acclimation training phase, during which experimenters manually controlled the treadmill, this phase was controlled entirely by custom scripts using Med-PC software and interface (MedAssociates).".
  
  Revised and clarified.
  
  (18) Page 13 "We utilized a pathway-specific dual-vector chemogenetic strategy (e.g., Sherafat et al., 2020) to selectively inhibit the activity of fronto-cortical projections to the DMS". The Hart et al (2018) reference seems more appropriate being both the same pathway and viral combination approach.
  
  Yes, thank you, we’ve updated the citation.
  
  (19) Pages 20-21: "Maximum glutamate peak concentrations recorded during the cue period were significantly higher in GTs than in STs (phenotype: F(1,28.85)= 8.85, P=0.006, ηp 2=0.23; Fig. 5c). In contrast, maximum peak amplitudes locked to other task events all were significantly higher in STs." The wording here is misleading, both Figures 5c and 5d report glutamate peaks during the turn cue, the difference is what the animal does. So, it should be something like "Maximum glutamate peak concentrations recorded during the cue period were significantly higher in GTs than in STs when the animal correctly made a turn (stats) but this pattern reversed on missed trials when the animal failed to turn (stats)..." or something similar.
  
  Yes, thank you. We have revised this section accordingly.
  
  (20) Same paragraph: "Contingency tables were used to compare phenotype and outcome-specific proportions and to compute the probability for turns in GTs relative to STs." What is an outcome-specific proportion?
  
  This has been clarified.
  
  .
  
  (21) Page 22 typo: "GTs were only 0.74 times as likely as GTs to turn".
  
  Fixed.
  
  (22) The hypothesis for the DREADDs experiment isn't made clear enough. Page 23 "In contrast, in STs, more slowly rising, multiple glutamate release events, as well as the presence of relatively greater reward delivery-locked glutamate release, may have reflected the impact of intra-striatal circuitry and ascending, including dopaminergic, inputs on the excitability of glutamatergic terminals of corticostriatal projections" As far as I can understand, the claim seems to be that glutamate release might be locally modulated in the case of ST, on account of the profile of glutamate release- more slowly rising, multiple events, and reward-locked. Please clarify why these properties would preferentially suggest local modulation.
  
  We have revised and expanded this section to clarify the basis for this hypothesis.
  
  (23) The subheadings for the section related to Figure 7 "CNO disrupts..." "CNO attenuates..." presumably you mean fronto-striatal inhibition disrupts/attenuates. As it stands, it reads like the CNO per se is having these effects, off-target.
  
  Fixed.
  
  (24) The comparison of the results in the discussion against a "hypothetical" results section had the animals not been phenotyped behaviorally is unnecessary and overly speculative, given that 30-40% of rats don't fall into either of these two categories. I think the point here is to emphasize the importance of taking phenotype into account. This point can surely be made directly in its own sentence, probably somewhere towards the end of the discussion).
  
  We have partly followed the reviewer’s advice and separated the discussion of the hypothetical results from the summary of main findings. However, we did not move this discussion toward the end of the Discussion section as we believe that it justifies the guiding focus of the discussion on the impact of phenotype.
  
  (25) The discussion, like the introduction, talks a lot about cholinergic activity. As noted, this link is unclear - particularly how it links with the present results, please clarify or remove. Likewise high-frequency oscillations.
  
  We have revised relevant sections in the Introduction (see above) and Discussion sections. However, given the considerable literature indicating contrasts between the cortical cholinergic-attentional capacities of GTs and STs, the interpretation of the current findings in that larger context is justified.
  
  (26) Typo DSM in the discussion x 2.
  
  Thanks, fixed.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) As mentioned in the Public Review, it is challenging to assess what is considered the "n" in each analysis, particularly for the glutamate signal analysis (trial, session, rat, trace (averaged across session or single trial)). Representative glutamate traces are used to illustrate a central finding, while more conventional trial-averaged population activity traces are not presented or analyzed. For example, n = 5 traces, out of hundreds of recorded traces, with each rat contributing 1-27 traces across multiple sessions suggests ~1-2% of the data are shown as time-resolved traces. Representative traces should in theory be consistent with population averages within phenotype, and if not, discussion of such inconsistencies would enrich the conclusions drawn from the study. In particular, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may be missed in a population average (averaged prolonged elevation in signal) and could serve as rationale for more sophisticated analyses of peak probability presented subsequently (and relevant to opening paragraph of discussion where hypothetical data rationale is presented).
  
  We have added the new Table 1 to provide a complete account of the number of rats, per phenotype and sex, for each component of the experiments. In addition, the new Table 3 provides the range, median and total number of glutamate traces that were analyzed and formed the foundation of the individual data graphs depicting the results of glutamate concentration analyses.
  
  We chose not to present trial- or subject-averaged traces, as glutamate peaks occur at variable times relative to the onset of turn and stop cues and reward delivery, and therefore averaging across a population of rats or trials would obscure phenotype- and task event-dependent patterns of glutamate peaks. The attached graph serves to illustrate this issue. The graph shows turn cue-locked glutamate concentrations (M, SD) from trials that yielded turns, averaged over all traces used for the analysis of the data shown in Fig. 5d (see also Table 3, top row). Because of the variability of peak times, trial- and subject-averaging of traces from STs yielded a nearly 2-s long elevated plateau of glutamate concentrations (red triangles), contrasting with the presence single and multiple peaks in STs as illustrated in Figs. 5b and 5e. Furthermore, averaging of traces from GTs obscured the presence of primarily single turn cue-locked peaks. Because of the relatively large variances of averaged data points, again reflecting the variability of peak times, analysis of glutamate levels during the cue period did not indicate an effect of phenotype (F(1,190)=1.65, P\=0.16). Together, subject- or trial-averaged traces would not convey the glutamate dynamics that form the essence of the amperometric findings obtained from our study. We recognize, as inferred by the reviewer, that smaller irregular peaks in STs may have been missed given the definition of a glutamate peak (see Methods). It is in part for that reason that we conducted a prospective analysis of the probability for turns given a combination of peak characteristics (maximum peak concentration and peak numbers; Fig. 6).
  
  (2)To this latter point, the relationship between the likelihood to turn and the size of glutamate peak is focused on the GT phenotype, which limits understanding of how smaller multiple peaks relate to variables of interest in ST (missed turns, stops, reward). If it were possible to determine the likelihood for each phenotype, without a direct contrast of one phenotype relative to the other, this would be a more straightforward description of how signal frequency and amplitude relate to relevant behaviors in each group. Depending on the results, this could be done in addition to or instead of the current analysis in Figure 6.
  
  We considered the reviewer’s suggestion but could not see how attempts to analyze the role of maximum glutamate concentrations and number of peaks within a single phenotype would provide any significant insights beyond the current description of results. Moreover, as stressed in the 2nd paragraph of the Discussion (see Reviewer 1, point 24), the removal of the phenotype comparison would nearly completely abolish the relationships between glutamate dynamics and behavior from the current data set.
  
  Author response image 1.
  
  (3) If Figure 6 is kept, a point made in the text is that GT is 1.002x more likely than ST to turn at a given magnitude of Glu signal. 1.002 x more likely is easily (perhaps mistakenly) interpreted as nearly identical likelihood. Looking closely at the data, perhaps what is meant is @ >4uM the difference between top-line labeled {b} and bottom-line labeled {d,e} is 1.002? If not, there may be a better way to describe the difference as 1x could be interpreted as the same/similar.
  
  Concerning the potential for misinterpretation, the original manuscript stated (key phrase marked here in red font): Comparing the relative turn probabilities at maximum peak concentrations >4 µM, GTs were 1.002 times more likely (or nearly exactly twice as likely) as STs to turn if the number of cue-evoked glutamate peaks was limited to one (rhombi in Fig. 6a) when compared to the presence of 2 or 3 peaks (triangles in Fig. 6a). However, we appreciate the reviewer’s concern about the complexity of this statement and, as it merely re-emphasized a result already described, it was deleted.
  
  (4) For Figure 7e, the phenotype x day interaction is reported, but posthocs are looking within phenotype (GT) at treatment effects. Is there a phenotype x day x treatment, or simply phenotype x treatment (day collapsed) to justify within-group treatment posthocs?
  
  We have revised the analysis and illustration of the data shown in Figs 7e and 7f, by averaging the test scores from the two tests, per animal, of the effects of vehicle and CNO, to be able to conduct a simpler 2-way analysis of the effects of phenotype and treatment.
  
  (5) Ideally, viral control is included as a factor in this analysis as well. The separate analysis for viral controls was likely done due to low n, however negative findings from an ANOVA in which an n=2 (ST) should be interpreted with extreme caution. The authors already have treatment control (veh, CNO) and may consider dropping the viral controls completely due to the lack of power to perform appropriate analyses.
  
  This issue has been clarified – see reviewer 1, point 9.
  
  Minor:
  
  (1) In the task description, it could be clearer how reward delivery relates to turns and stops. For example, does the turn cue indicate the rat will be rewarded at the port behind it? Does the stop cue indicate that the rat will be rewarded at the port in front of it? This makes logical sense, but the current text does not describe the task in this way, instead focusing on what is the correct action (seemingly but unlikely independent of reinforcement).
  
  We have updated the task description in Methods and the legend of Figure 2 to indicate the location of reward delivery following turns and stops.
  
  (2) For the peak analysis, what is the bin size for determining peaks? It is indicated that the value before and after the peak is >1 SD below the peak value, so it is helpful to know the temporal bin resolution for this definition.
  
  As detailed on p 11-12 under Amperometry Data Processing and Analysis of Glutamate Peaks, we analyzed glutamate concentrations recorded at a frequency of 5 Hz (200 ms bins) throughout the 2-second-long presentation of turn and stop cues and for a 2-second period following reward delivery.
  
  (3) Long Evans rats are pictured in Figure 2 (presumably contrast with a white background is better here), while SD rats are pictured in Figure 1. Perhaps stating why LE rats are pictured would help clear up any ambiguity about the strains used, as a quick look gives the impression two strains are used in two different tasks.
  
  Yes, see reviewer 1, point 2.
  
  (4) In Figure 7e, the ST and GT difference in turns/turn cue does not seem to replicate prior findings for tracking differences for this measure (Figure 3b). ST from the chemogenetic cohort seems to perform better than rats whose behavior was examined prior to glutamate sensor insertion. What accounts for this difference? Training and testing conditions/parameters?
  
  The reviewer is correct. The absence of a significant difference between vehicle-treated GTs and vehicle-treated STs in Fig. 7e reflects a relatively lower turn rate in GTs than was seen in the analysis of baseline behavior (Fig. 3b; note the different ordinates of the two figures, needed to show the impact of CNO in Fig. 7e). Notably, the data in Fig. 7e are based on fewer rats (12 versus 29 GTs and 10 versus 22 STs; Table 1) and on rats which at this point had undergone additional surgeries to infuse the DREADD construct and implant electrode arrays. We can only speculate that these surgeries had greater detrimental effects in GTs, perhaps consistent with evidence suggesting that immune challenges trigger a relatively greater activation of their innate immune system (Carmen et al., 2023). We acknowledged this issue in the revised Results.
  
  (5) The authors are encouraged to revise for grammar (are vs. is, sentence ending with a preposition, "not only" clause standing alone) and word choice (i.e. in introduction: insert, import, auditorily). Consider revising the opening sentence on page 5 for clarity.
  
  We have revised the entire text to improve grammar and word choice.
  
  (6) Do PD fallers refer to rats or humans? if the latter, this may be a somewhat stigmatizing word choice.
  
  We have replaced such phrases using more neutral descriptions, such as referring to people with PD who frequently experience falls.
  
  (7) Page 27 What does "non-instrumental" behavior mean?
  
  We have re-phrased this statement without using this term.
  
  (8) The opening paragraph of the discussion is focused on comparing reported results (with phenotype as a factor) to a hypothetical description of results (without phenotype as a factor) that were not presented in the results section. There is one reference to a correlation analysis on collapsed data, but otherwise, no reporting of data overall rats without phenotype as a factor. If this is a main focus, including these analyses in the results would be warranted. If this is only a minor point leading to discussion, authors could consider omitting the hypothetical comparison.
  
  We have revised this section - see reviewer 1 point 24.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) These are really interesting studies. I think there are issues in data presentation/analysis that make it difficult to parse what exactly is happening in the glutamate signals, and when. Overall the paper is just a bit of a difficult read. A generally standard approach for showing neural recording data of many kinds, including, for example, subject-averaged traces, peri-event histograms, heatmaps, etc summarizing and quantifying the results - would be helpful. Beyond the examples in Figure 5, I would suggest including averaged traces of the glutamate signals and quantification of those traces.
  
  We have addressed these issues in multiple ways, see the response to several points of reviewers 1 and 2, particularly reviewer 2, point 1.
  
  (2) Figure 6 (and the description in the response letter) is also very non-intuitive. It's unclear how the examples shown relate to the reported significance indicators/labels/colors etc in the figure. I would suggest rethinking this figure overall, and if there is a more direct quantitative way to connect signal features with behavior. Again, drawing from standard visualization approaches for neural data could be one approach.
  
  See also reviewer 2 points 1 and 3. Furthermore, we have revised the text in Results and the legend to improve the accessibility of Fig. 6.
  
  (3) As far as I can tell, all of the glutamate sensor conclusions reflect analysis collapsed across 100s of trials. Do any of the patterns hold for a subjects-wise analysis? How variable are individual subjects?
  
  We employed linear mixed-effect model analyses and added a random subject intercept to account for subject variability outside fixed effects (phenotype and treatment). The variance of the intercept ranged 0.01-1.71 SEM across outcome (cued turns/cued stops/misses). See also reviewer 1, point 7 and reviewer 2, point 1.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.12.584623v4
www.biorxiv.org www.biorxiv.org

Potassium-mediated bacterial chemotactic response

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This important study reports a novel measurement for the chemotactic response to potassium by Escherichia coli. The authors convincingly demonstrate that these bacteria exhibit an attractant response to potassium and connect this to changes in intracellular pH level. However, some experimental results are incomplete, with additional controls/alternate measurements required to support the conclusions. The work will be of interest to those studying bacterial signalling and response to environmental cues.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with an amplitude larger than aspartate, and cells can quickly adapt (but possibly imperfectly). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.
  
  Strengths:
  
  The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH.
  
  Weaknesses:
  
  The authors show that changes in pH impact fluorescent protein brightness and modify the FRET signal; this measurement explains the apparent imprecise adaptation they measured. However, this effect reduces confidence in the quantitative accuracy of the FRET measurements. For example, part of the potassium response curve (Fig. 4B) can be attributed to chemotactic response and part comes from the pH modifying the FRET signal. Measuring the full potassium response curve of the no-receptor mutants as a control would help quantify the true magnitude of the chemotactic response and the adaptation precision to potassium.
  
  Response: We thank the reviewer for the suggestion. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.
  
  We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.
  
  The measured response may also be impacted by adaptation. For other strong attractant stimuli, the response typically shows a low plateau before it recovers (adapts). However, in the case of Potassium, the FRET signal does not have an obvious plateau following the stimuli. Do the authors have an explanation for that? One possibility is that the cells may have already partially adapted when the response reaches its minimum, which could indicate a different response and/or adaptation dynamics from that of a regular chemo-attractant? In any case, directly measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels would shed more light on the problem.
  
  Response: We appreciate the reviewer’s insightful questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.
  
  The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).
  
  We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (Sourjik & Berg, PNAS 99:123, 2002), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1 below, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.
  
  The relevant text was added at line 413-424.
  
  Author response image 1.
  
  The response of the cheRcheB mutant (HCB1382-pVS88) to different concentrations of KCl. The blue solid line denotes the original signal, while the red dots represent the pH-corrected signal. The vertical purple (green) dashed lines indicate the moment of adding (removing) 0.01 mM, 0.1 mM, 0.3 mM, 1 mM, 3 mM, 10 mM and 30 mM KCl, in chronological order.
  
  There seems to be an inconsistency between the FRET and bead assay measurements, the CW bias shows over-adaptation, while the FRET measurement does not.
  
  Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.
  
  Now we clarified it at line 315.
  
  The small hill coefficient of the potassium response curve and the biphasic response of the Tar-only strain, while both very interesting, require further explanation since these are quite different than responses to more conventional chemoattractants.
  
  Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5) and the biphasic response of the Tar-only strain (Fig. 5C). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspected that this Hill coefficient of slightly less than 1 resulted from the different responses of Tar and Tsr receptors to potassium.
  
  The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change. Nevertheless, this is an important piece of work. The text is well written, with a good balance of background information to help the reader follow the questions investigated in this research work.
  
  In my view, the effect of pH on the FRET between CheY-eYFP and CheZ-eCFP is not fully examined. The authors demonstrated in Fig. S3 that CFP intensity itself changes by KCl, likely due to pH. They showed that CFP itself is affected by pH. This result raises a question of whether the FRET data in Fig3-5 could result from the intensity changes of FPs, but not FRET. The measured dynamics may have nothing to do with the interaction between CheY and CheZ. It should be noted that CFP and YFP have different sensitivities to pH. So, the measurement is likely confounded by the change in intracellular pH. Without further experiments to evaluate the effect of pH on CFP and YFP, the data using this FRET pair is inconclusive.
  
  Response: We thank the reviewer for pointing this out. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.
  
  We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.
  
  The data in Figure 1 is convincing. It would be helpful to include example videos. There is also ambiguity in the method section for this experiment. It states 100mM KCl was flown to the source channel. However, it is not clear if 100 mM KCl was prepared in water or in the potassium-depleted motility buffer. If KCl was prepared with water, there would be a gradient of other chemicals in the buffer, which confound the data.
  
  Response: We apologize for the ambiguity. The KCl solution used in this work was prepared in the potassium-depleted motility buffer. We have now clarified this at both lines 116 and 497. We now provided an example video, Movie S1, with the relevant text added at line 123.
  
  The authors show that the FRET data with both KCl and K2SO4, and concluded that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4.
  
  Response: We thank the reviewer for the suggestion. The aim of comparing the responses to KCl and K2SO4 was to determine the role of chloride ions in the response and to prove that the chemotactic response of E. coli to KCl comes primarily from its response to potassium ions. It is more sensitive to compare the responses to KCl and K2SO4 by using the FRET assay. In contrast, the microfluidic motility assay is less sensitive in revealing the difference in the chemotactic responses, making it difficult to determine the potential role of chloride ions.
  
  Methods:
  
  Please clarify the promotes used for the constitutive expression of FliCsticky and LacI.
  
  Response: The promoters used for the constitutive expression of LacIq and FliCsticky were the Iq promoter and the native promoter of fliC, respectively (ref. 57).
  
  Now these have been clarified at line 471.
  
  Fluorescence filters and imaging conditions (exposure time, light intensity) are missing.
  
  Response: Thank you for the suggestion. We have now added more descriptions at lines 535-546: The FRET setup was based on a Nikon Ti-E microscope equipped with a 40× 0.60 NA objective. The illumination light was provided by a 130-W mercury lamp, attenuated by a factor of 1024 with neutral density filters, and passed through an excitation bandpass filter (FF02-438/24-25, Semrock) and a dichroic mirror (FF458-Di02-25x36, Semrock). The epifluorescent emission was split into cyan and yellow channels by a second dichroic mirror (FF509-FDi01-25x36, Semrock). The signals in the two channels were then filtered by two emission bandpass filters (FF01-483/32-25 and FF01-542/32-25, Semrock) and collected by two photon-counting photomultipliers (H7421-40, Hamamatsu, Hamamatsu City, Japan), respectively. Signals from the two photomultipliers were recorded at a sampling rate of 1 Hz using a data-acquisition card installed in a computer (USB-1901(G)-1020, ADlink, New Taipei, Taiwan).
  
  Please clarify if the temperature was controlled in motility assays.
  
  Response: All measurements in our work were performed at 23 ℃. It was clarified at line 496.
  
  L513. It is not clear how theta was selected. Was theta set to be between 0 and pi? If not, P(theta) can be negative?
  
  Response: The θ was set to be between 0 and π. This has now been added at line 581.
  
  Typo in L442 (and) and L519 (Koff)
  
  Response: Thank you. Corrected.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) From the motor measurements the authors find that the CW bias over-adapts to a level larger than prestimulus, but this is not seen in the FRET measurements. What causes this inconsistency? Fig. 2D seems to rule out any change in CheY binding to the motor.
  
  Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.
  
  We now clarified it at line 315.
  
  (2) It would be useful to compare the response amplitude for potassium (Fig. 3C) to a large concentration of both MeAsp and serine. This is a fairer comparison since your work shows potassium acts on both Tar and Tsr. Alternatively, testing a much larger concentration (~10^6 micromolar) at which MeAsp also binds to Tsr would also be useful.
  
  Response: We thank the reviewer for pointing this out. We have now recalculated the response to potassium by correcting the pH-induced effects on fluorescence intensity of CFP and YFP. The response to 30 mM KCl was 1.060.10 times as large as that to 100 μM MeAsp. The aim of the comparison between the responses to potassium and MeAsp was to provide an idea of the magnitude of the chemotactic response to potassium. The stimulus of 100 μM MeAsp is already a saturating amount of attractant and induces zero-kinase activity, thus using a higher stimulus (adding serine or a larger concentration of MeAsp) is probably not needed. Moreover, a larger concentration (~10^6 micromolar) of MeAsp would also induce an osmotactic response.
  
  (3) The fitted Hill coefficient (~0.5) to the FRET response curve is quite small and the authors suggest this indicates negative cooperativity. Do they have a proposed mechanism for negative cooperativity? Have similar coefficients been measured for other responses?
  
  Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspect that this Hill coefficient of slightly less than 1 results from the differing responses of Tar and Tsr receptors to potassium.
  
  (3a) The authors state a few times that the response to potassium is "very sensitive", but the low Hill coefficient indicates that the response is not very sensitive (at least compared to aspartate and serine responses).
  
  Response: We apologize for the confusion. We described the response to potassium as “very sensitive” due to the small value of K0.5. This has now been clarified at line 236.
  
  (3b) Since the measurements are performed in wild-type cells the response amplitude following the addition of potassium may be biased if the cell has already partially adapted. This seems to be the case since the FRET time series does not plateau after the addition of the stimulus. The accuracy of the response curve and hill coefficient would be more convincing if the experiment was repeated with a cheR cheB deficient mutant.
  
  Response: We thank the reviewer for raising these questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.
  
  The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).
  
  We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (ref. 46), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.
  
  The relevant text was added at line 413-424.
  
  (4) The authors show that the measured imprecise adaptation can be (at least partially) attributed to pH impacting the FRET signal by changing eCFP and eYFP brightness.
  
  (4a) Comparing Fig. 5C and D, the chemosensing and pH response time scales look similar. Therefore, does the pH effect bias the measured response amplitude (just as it biases the adapted FRET level)?
  
  Response: We agree with the reviewer that the pH effect on CFP and YFP biases the measured response amplitude. We have now performed the measurement of dose-response curve to potassium for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. The pH effects on CFP and YFP were corrected. The dose-response curve and adaptation curve were recalculated and plotted in Fig. S5.
  
  (4b) It would help to measure a full response curve (at many concentrations) for the no-receptor strain as a control. This would help distinguish, as a function of concentration, how much response can be attributed to pH impacting the FRET signal versus the true chemotactic response.
  
  Response: We thank the reviewer for the suggestion. We have now performed the measurements for the no-receptor strain. The impact of pH on CFP and YFP has been corrected. The pH-corrected results, previously in Fig.3-5, are now presented in Fig. 3, Fig. S5 and Fig. 5, respectively.
  
  (5) The biphasic response of Tar is strange and warrants further discussion. Do the authors have any proposed mechanisms that lead to this behavior? For the 10mM and 30mM KCl measurements there is a repellent response followed by an attractant response for both adding and removing the stimuli, why is this?
  
  Response: We thank the reviewer for pointing this out. The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.
  
  (5a) The fact that Tar and Tsr are both attractant (after the initial repellant response in Tar) appears to be inconsistent with previous work on pH response (Ref 52, Yang and Sourjik Molecular Microbiology (2012) 86(6), 1482-1489). This study also didn't see any biphasic response.
  
  Response: We thank the reviewer for pointing this out. The Tar-only strain shows a repellent response to stepwise addition of low concentrations of potassium, specifically less than 10 mM. This is consistent with previous observations of the response of Tar to changes in intracellular pH (refs. 44,45) and also with the work of Yang and Sourjik (new ref. 53), although the work in ref. 53 dealt with the response to external pH change, and bacteria were known to maintain a relatively stable intracellular pH when external pH changes (Chen & Berg, Biophysical Journal (2000) 78:2280-2284). Interestingly, the Tar-only strain exhibits a biphasic response to high potassium concentrations of 10 mM and above. This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA (ref. 56), which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.
  
  (5b) The response of Tar to the removal of sodium benzoate (Fig. S2) seems to be triphasic, is there any explanation for this?
  
  Response: We thank the reviewer for pointing this out. We have now acknowledged in the legend of Fig. S2 that this response is interesting and warrants further exploration: “The response to the removal of sodium benzoate seems to be a superposition of an attractant and a repellent response, the reason for which deserves to be further explored.”
  
  (6) Fitting the MWC model leads to N=0.35<1. It is fine to use this as a phenomenological parameter, but can the authors comment on what might be causing such a small effective cluster size for potassium response?
  
  Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We now refit the MWC model to the pH-corrected dose-response curve, obtaining N of 0.85. We think the small N is due partly to the fact that we are fitting the curve with four parameters: N, Kon, Koff, and fm, while only three features of the sigmoid does-response curve are relevant (the vertical scale, the midpoint concentration, and the slope of the sigmoid). Future experiments may determine these parameters more accurately, but they should not significantly affect the simulation results as long as the wild-type dose-response curve is accurate.
  
  (7) The results of the modeling are closely related to Zhu et. al. Phys. Rev. Lett. 108, 128101. Is the lag time for large T related to the adaptation time?
  
  Response: We thank the reviewer for pointing this out. We used a similar framework of modeling as Zhu et. al. The potassium response was also analogous to the chemotactic response to MeAsp. Thus, the results are closely related to Zhu et al. We have now cited Zhu et al. (Ref. 52) and noted this at line 366.
  
  The lag time for large T is related to the adaptation time. We have now simulated the chemotaxis to potassium for large T with different adaptation time by varying the methylation rate kR. The results are shown in Fig. S8. The simulated lag time decreases with the methylation rate kR, but levels off at high values of kR. Now this has been added at line 603.
  
  Minor issues:
  
  Fig. 1C: should the axis label be y?
  
  Response: Yes, thank you. Now corrected.
  
  Line 519: Koff given twice, the second should be Kon.
  
  Response: Thank you. Corrected.
  
  When fitting the MWC model (Eq. 3 and Fig. 6B) did you fix a particular value for m?
  
  Response: m was treated as a fitting parameter, grouped in the parameter fm.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor points: - I suggest explaining the acronyms when they first appear in the text (eg CMC, CW, CCW).
  
  Response: Thank you. Now they have been added.
  
  L144. L242. "decrease" is ambiguous since membrane potential is negative. I understand the authors meant less negative (which is an increase). I suggest to avoid this expression.
  
  Response: Thank you for the suggestion. Now they have been replaced by “The absolute value of the transmembrane electrical potential will decrease”.
  
  For Fig 1b - it says the shaded area is SEM in the text, but SD in the legend. Please clarify.
  
  Response: Thank you. The annotation in the legend has now been revised as SEM.
  
  Fig 1C label of x axis should be "y" instead of "x" to be consistent with Fig 1A.
  
  Response: Thank you. It has now been revised.
  
  In Figure 2, the number of independent experiments as well as the number of samples should be included.
  
  Response: Thank you. The response in Fig. 2C is the average of 83 motors from 5 samples for wild-type strain (JY26-pKAF131). The response in Fig. 2D is the average of 22 motors from 4 samples for the chemotaxis-defective strain (HCB901-pBES38). They have now been added to the legend.
  
  Regarding the attractant or repelling action of potassium and sucrose, it would be important to have a move showing the cells' behaviours.
  
  Response: We thank the reviewer for the suggestion. We have now provided Movie S1 to show the cells’ behavior to potassium. As shown in Fig. 3B, the chemotactic response to 60 mM sucrose is very small compared to the response to 30 mM KCl. This implies that a noticeable response to sucrose necessitates higher concentrations of stimulation. However, Jerko et al. [Rosko, J., Martinez, V. A., Poon, W. C. K. & Pilizota, T. Proc. Natl Acad. Sci. USA 114, E7969-E7976 (2017).] have shown that high concentrations of sucrose lead to a significant reduction in the speed of the flagella motor. Thus, in a motility assay for sucrose, the osmolarity-induced motility effect may overwhelm the minor repellent-like response.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.29.555418v2
www.biorxiv.org www.biorxiv.org

Proactive distractor suppression in early visual cortex

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This well-written report uses functional neuroimaging in human observers to provide convincing evidence that activity in the early visual cortex is suppressed at locations that are frequently occupied by a task-irrelevant but salient item. This suppression appears to be general to any kind of stimulus, and also occurs in advance of any item actually appearing. The work in its present form will be valuable to those examining attention, perception, learning and prediction, but with a few additional analyses could more informatively rule out potential alternative hypotheses. Further discussion of the mechanistic implications could clarify further the broad extent of its significance.
  
  We thank the editor and the reviewers for the positive evaluation of our manuscript and the thoughtful comments. Below we provide a detailed point-by-point reply to the reviewers’ comments.
  
  In addition to addressing the reviewers' comments, we have improved the figure legends by explicitly describing the type of error bars depicted in the figures, information which was previously only listed in the Materials and Methods section. Specifically, the statement: “Error bars denote within-subject SEM” was added to several figures, as applicable. We believe that briefly reiterating this information in the figure legends enhances clarity and enables readers to interpret the results more accurately and efficiently. We also updated our code and data sharing statement, as well as opened the repository for the public: “Analysis and experiment code, as well as data required to replicate the results reported in this manuscript are available here: https://doi.org/10.17605/OSF.IO/G4RXV. Raw MRI data is available upon request.”
  
  Public Reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors investigated if/how distractor suppression derived from statistical learning may be implemented in early visual cortex. While in a scanner, participants conducted a standard additional singleton task in which one location more frequently contained a salient distractor. The results showed that activity in EVC was suppressed for the location of the salient distractor as well as for neighbouring neutral locations. This suppression was not stimulus specific - meaning it occurred equally for distractors, targets and neutral items - and it was even present in trials in which the search display was omitted. Generally, the paper was clear, the experiment was well-designed, and the data are interesting. Nevertheless, I do have several concerns mostly regarding the interpretation of the results.
  
  (1) My biggest concern with the study is regarding the interpretation of some of the results. Specifically, regarding the dynamics of the suppression. I appreciate that there are some limitations with what you might be able to say here given the method but I do feel as if you have committed to a single interpretation where others might still be at play. Below I've listed a few alternatives to consider.
  
  We agree with the reviewer that there are important alternatives to consider. Adequately addressing these alternatives will substantially increase the inferences we can draw from our data. Therefore, we address each alternative interpretation in detail below.
  
  (a) Sustained Suppression. I was wondering if there is anything in your results that would speak for or against the suppression being task specific. That is, is it possible that people are just suppressing the HPDL throughout the entire experiment (i.e., also through ITI, breaks, etc., rather than just before and during the search). Since the suppression does not seem volitional, I wonder if participants might apply a blanket suppression to HPDL un l they learn otherwise. Since your localiser comes a er the task you might be able to see hints of sustained suppression in the HPDL during these trials.
  
  It is indeed possible that participants suppressed the HPDL throughout the entire experiment, instead of proactively instantiating suppression on each trial. While possible, we believe that this account is less likely to explain the present results, given the utilized analysis approach, a voxel-wise GLM fit to the BOLD data per run (see Materials and Methods for details). Specifically, we derived parameter estimates from this GLM per location to estimate the relative suppression. Sustained suppression would modulate BOLD responses throughout the run, i.e. presumably also during the implicit baseline period used to estimate the contrast parameter estimates per location. Hence, sustained suppression should not result in a differential modulation between locations, as the BOLD response at the HPDL during the baseline period would be equally suppressed as during the trial. Inspired by the reviewer’s comment, we now clarify this critical point in the manuscript’s Discussion section:
  
  “Third, participants might have suppressed the HPDL consistently throughout the experiment. This sustained suppression account differs from the proactive suppression proposed here. While this alternative is plausible, we believe that it is less likely to account for the present results, given the analysis conducted. Specifically, we computed voxel-wise parameter estimates and contrasted the obtained betas between locations. Under a sustained suppression account, the HPDL would show suppression even during the implicit baseline period, which would obscure the observed BOLD suppression at and near the HPDL.”
  
  (b) Enhancement followed by suppression. Another alternative that wasn't discussed would be an initial transient enhancement of the HPDL which might be brought on by the placeholders followed by more sustained suppression through the search task. Of course, on the whole this would look like suppression, but this still seems like it would hold different implications compared to simply "proactive suppression". This would be something like search and destroy however could be on the location level before the actual onset of the search display.
  
  R1 correctly points out that BOLD data, given the poor temporal resolution, do not allow for the detection of potential transient enhancements at the HPDL followed by a later and more pronounced suppression (akin to “search and destroy”). We fully agree with this assessment. However, we also argue that a transient enhancement followed by sustained suppression before search display onset constitutes proactive suppression in line with our interpretation, because suppression would still arise proactively (i.e., before search, and hence distractor, onset). Whether transient enhancement precedes suppression cannot be elucidated by our data, but we believe that it constitutes an interesting avenue for future studies using me-resolved and spatially specific recording methods. We now clarify this important implementational variation in the updated manuscript.
  
  “Finally, due to the limited temporal resolution of BOLD data, the present data do not elucidate whether the present suppression is preceded by a brief attentional enhancement of the HPDL, as implied by some prior work (Huang et al., 2024). On this account the HPDL would see transient enhancement, followed by sustained suppression, akin to a ‘search and destroy’ mechanism. Critically, we believe that this variation would nonetheless constitute proactive distractor suppression as the suppression would still arise before search onset. Using temporally and spatially resolved methods to explore potential transient enhancements preceding suppression is a promising avenue for future research charting the neural mechanisms underlying distractor suppression.”
  
  (2) I was also considering whether your effects might be at least partially attributable to priming type effects. This would be on the spatial (not feature) level as it is clear that the distractors are switching colours. Basically, is it possible that on trial n participants see the HPDL with the distractor in it and then on trial n+1 they suppress that location. This would be something distinct from the statistical learning framework and from the repetition suppression discussion you have already included. To test for this, you could look at the trials that follow omission or trials. If there is no suppression or less suppression on these trials it would seem fair to conclude that the suppression is at least in part due to the previous trial.
  
  We agree with the reviewer that it is plausible that participants particularly suppress locations which on previous trials contained a distractor. To address this possibility, we conducted a new analysis and adjusted the manuscript accordingly:
  
  “Second, participants may have suppressed locations that contained the distractor on the previous trial, reflecting a spatial priming effect. This account constitutes a complementary but different perspective than statistical learning, which integrates implicit prior knowledge across many trials. We ruled out that spatial priming explains the present results by contrasting BOLD suppression magnitudes on trials with the distractor at the HPDL and trials where the distractor was not at the HPDL on the previous trial. Results, depicted in Supplementary Figure 4 showed that distractor suppression was statistically significant across both trial types, including trials without a distractor at the HPDL on the preceding trial. This indicates that the observed BOLD suppression is unlikely to be driven by priming and is instead more consistent with statistical learning. Moreover, results did not yield a statistically significant difference between trial types based on the distractor location in the preceding trial. However, these results should not be taken to suggest that spatial priming cannot contribute to distractor suppression; for details see: Supplementary Figure 4.” (p. 13).
  
  We note that this analysis approach slightly differs from the reviewer’s suggestion, which considered omission trials. However, we decided to exclude trials immediately following an omission to ensure that both conditions were matched as closely as possible. In particular, omission trials represent extended rest periods, which could alter participants’ state and especially modulate the visually evoked BOLD responses (e.g., potentially increasing the dynamic range) compared to trials that did not follow omissions. Our analysis approach avoids this difference while still addressing the hypothesis put forward by the reviewer. We now provide the full explanation and results figure of this priming analysis in the figure text of Supplementary Figure 4:
  
  Reviewer #2 (Public review):
  
  The authors of this work set out to test ideas about how observers learn to ignore irrelevant visual information. Specifically, they used fMRI to scan participants who performed a visual search task. The task was designed in such a way that highly salient but irrelevant search items were more likely to appear at a given spatial location. With a region-of-interest approach, the authors found that activity in visual cortex that selectively responds to that location was generally suppressed, in response to all stimuli (search targets, salient distractors, or neutral items), as well as in the absence of an anticipated stimulus.
  
  Strengths of the study include: A well-written and well-argued manuscript; clever application of a region of interest approach to fMRI design, which allows articulating clear tests of different hypotheses; careful application of follow-up analyses to rule out alternative, strategy-based accounts of the findings; tests of the robustness of the findings to detailed analysis parameters such as ROI size; and exclusion of the role of regional baseline differences in BOLD responses.
  
  We thank the reviewer for the positive evaluation of our manuscript.
  
  The report might be enhanced by analyses (perhaps in a surface space) that distinguish amongst the multiple "early" retinotopic visual areas that are analysed in the aggregate here.
  
  We agree with the reviewer that an exploratory analysis separating early visual cortex (EVC) into its retinotopic areas could be an interesting addition. Our reasoning to combine early visual areas into one mask in the original analyses was two-fold: First, we did not have an a priori reason to expected distinct neural suppression between these early ROIs. Therefore, we did not acquire retinotopy data to reliably separate early visual areas (e.g. V1, V2 and V3), instead opting to increase the number of search task trials. The lack of retinotopy data inherently limits the reliability of the resulting cortical segmentation. However, we now performed an analysis separating early visual cortex into V1 and V2 and report the details as Supplementary Text 1:
  
  “In an exploratory analysis we investigated whether subdivisions of EVC exhibit different representations of priority signals. In brief, we used FreeSurfer to reconstruct brain surfaces (recon-all) from each subject’s anatomical scan. From these reconstructions we derived V1_exvivo and V2_exvivo labels, which were transformed into volume space using ‘mri_label2vol’ and merged into a bilateral mask for each ROI. We then selected the voxels within each ROI that were most responsive to the four stimulus locations, based on independent localizer data. This voxel selection followed the procedure outlined in the Materials and Methods: Region of Interest (ROI) Definition. To accommodate the subdivision into two ROIs (V1 and V2) compared to the single EVC ROI in the main analysis, we halved the number of voxels selected per location. Finally, we applied the same ROI analysis to investigate distractor suppression during search and omission trials, following the procedure described in Materials and Methods: Statistical Analysis.
  
  Results of this more fine-grained ROI analyses are depicted in Supplementary Figure 1. First, the results from V2 qualitatively mirrored our primary ROI analysis. BOLD responses in V2 differed significantly between stimulus types (main effect of stimulus type: F<sub>(2,54)</sub> = 31.11, p < 0.001, 𝜂 = 0.54). Targets elicited larger BOLD responses compared to distractors (t<sub>(27)</sub> = 3.05, p<sub>holm</sub> = 0.004, d = 0.06) and neutral stimuli (t<sub>(27)</sub> = 7.82, p<sub>holm</sub> < 0.001, d = 0.14). Distractors also evoked larger responses than neutral stimuli (t<sub>(27)</sub> = 4.78, p<sub>holm</sub> < 0.001, d = 0.09). These results likely reflect top-down modulation due to target relevance and bo om-up effects of distractor salience. Consistent with the primary ROI analysis, the manipula on of distractor predictability showed a distinct pattern of location specific BOLD suppression in V2 (main effect of location: F<sub>(1.1,52.8)</sub> = 5.01, p = 0.030, 𝜂 = 0.16). Neural populations with receptive fields at the HPDL showed significantly reduced BOLD responses compared to the diagonally opposite neutral location (NL-far; post hoc test HPDL vs NL-far: t<sub>(27)</sub> = 2.69, p<sub>holm</sub> = 0.022, d = 0.62). Again, this suppression was not confined to the HPDL but also extended to close by neutral locations (NL-near vs NL-far: t<sub>(27)</sub> = 2.79, p<sub>holm</sub> = 0.022, d = 0.65). BOLD responses did not differ between HPDL and NL-near locations (HPDL vs NL-near: t<sub>(27)</sub> = 0.11, p<sub>holm</sub> = 0.915, d = 0.03; BF<sub>10</sub> = 0.13). As in the EVC ROI analysis, this suppression pattern was consistent across distractor, target, and neutral stimuli presented at the HPDL and NL-near locations compared to NL-far. In sum, neural responses in V2 were significantly modulated by the distractor contingencies, evident as reduced BOLD responses in neural populations with receptive fields at the HPDL and neutral locations near the location of the frequent distractor (NL-near), relative to the neutral location diagonally across the HPDL (NL-far).
  
  In V1, BOLD responses also differed significantly between stimulus types (main effect of stimulus type: F<sub>(1.3,35.6)</sub> = 6.69, p = 0.009, 𝜂 = 0.20). Targets elicited larger BOLD responses compared neutral stimuli (t<sub>(27)</sub> = 3.52, p<sub>holm</sub> = 0.003, d = 0.12) and distractors evoked larger responses than neutral stimuli (t<sub>(27)</sub> = 2.62, p<sub>holm</sub> = 0.023, d = 0.09). However, no difference between targets and distractors was observed (t<sub>(27)</sub> = 0.90, p<sub>holm</sub> = 0.375, d = 0.03; BF<sub>10</sub> = 0.17), suggesting reduced sensitivity to task-related effects in V1. Indeed, analyzing the effect of distractor predictability for BOLD responses in V1 showed a different result than in V2 and the combined EVC ROI. There was no significant main effect of location (F<sub>(2,54)</sub> = 2.20, p = 0.120, 𝜂 = 0.08; BF<sub>10</sub> = 0.77). BOLD responses at NL-near and NL-far were similar (BF<sub>10</sub> = 0.171), with the only reliable difference found between target stimuli at the HPDL and NL-far locations (W = 94, p<sub>holm</sub> = 0.012, r = 0.54).”
  
  We include the new result figure as Supplementary Figure 5
  
  We now include reference to these results in the manuscript’s Discussion section:
  
  “Are representations of priority signals uniform across EVC? A priori we did not have any hypotheses regarding distinct neural suppression profiles across different early visual areas, hence our primary analyses focused stimulus responses neural populations in EVC, irrespective of subdivision. However, an exploratory analysis suggests that distractor suppression may show different patterns in V1 compared to V2 (Supplementary Figure 5 and Supplementary Text 1). In brief, results in V2 mirrored those reported for the combined EVC ROI (Figure 4). In contrast, results in V1 appeared to be only partially modulated by distractor contingencies, and if so, the modulation was less robust and not as spatially broad as in V2. This suggests the possibility of different effects of distractor predictability across subdivisions of early visual areas. However, these results should be interpreted with caution. First, our design did not optimize the delineation of early visual areas (e.g., no functional retinotopy), limiting the accuracy of V1 and V2 segmentation. Additionally, analyses were conducted in volumetric space, which further reduces spatial precision. Future studies could improve this by including retinotopy runs to accurately delineate V1, V2, and V3, and by performing analyses in surface space. Higher-resolution functional and anatomical MRI sequences would also help elucidate how distractor suppression is implemented across EVC with greater precision.”
  
  Furthermore, the study could benefit from an analysis that tests the correlation over observers between the magnitude of their behavioural effects and their neural responses.
  
  R2 highlights that behavioral facilitation and neural suppression could be correlated across participants. The rationale is that if neural suppression in EVC is related to the facilitation of behavioral responses, we should expect a positive relationship between neural suppression at the HPDL and RTs across participants. In this analysis we focused on the contrast between HPDL and NL-far, as this contrast was statistically significant in both the RT (Figure 2) and the neural suppression analysis (Figure 4). First, we computed for each participant the behavioural benefit of distractor suppression as: RT<sub>facilitation</sub> = RT<sub>NL-far</sub> – RT<sub>HPDL</sub>. Thereby RT facilitation reflects the response speeding due to a distractor appearing at the high probability distractor location compared to the far neutral location. Next, we computed neural suppression as: BOLD<sub>suppression</sub> = BOLD<sub>NL-far</sub> – BOLD<sub>HPDL</sub> Thus, positive values reflect the suppression of BOLD responses at the HPDL comparted to the NL-far location. The BOLD suppression index was computed for each stimulus type separately, as in the main ROI analysis (i.e. for Targets, Neutrals and Distractors). Finally, we correlated RT<sub>facilitation</sub> with BOLD<sub>suppression</sub> across participants using Pearson correlation. Results showed a small, but not statistically significant correlation between RT facilitation and BOLD suppression for distractor (r<sub>(26)</sub> = 0.22, p = 0.257), target (r<sub>(26)</sub> = 0.10, p = 0.598) and neutral (r<sub>(26)</sub> = 0.13, p = 0.519) stimuli. Thus, while the direc on of the correlation was in line with the specula on by the reviewer in the “ Recommendations for the authors”, results were not statistically reliable and therefore inconclusive. As also noted in our preliminary reply to the reviewer comments, it was a priori unlikely that this analysis would yield a statistically significant correlation. An a priori power analysis suggested that, to reach a power of 0.8 at a standard alpha of 0.05, given the present sample size of n=28, the effect size would need to exceed r > 0.75, which seemed unlikely for the correlation of behavioural and neural difference scores. Given the inconclusive nature of the results, we prefer to not include this additional analysis in the manuscript, as we believe that it does not add to the main message of the paper but have it accessible to the interested reader in the public “peer review process”.
  
  The study provides an advance over previous studies, which iden fied enhancement or suppression in visual cortex as a function of search target/distractor predictability, but in less spatially-specific way. It also speaks to open questions about whether such suppression/enhancement is observed only in response to the arrival of visual information, or instead is preparatory, favouring the la er view. The theoretical advance is moderate, in that it is largely congruent with previous frameworks, rather than strongly excluding an opposing view or providing a major step change in our understanding of how distractor suppression unfolds.
  
  We agree with the reviewer that our results are an advancement of prior work, particularly with respect to narrowing down the role of sensory areas and the proactive nature of distractor suppression. However, we argue that this represents a significant step forward for several reasons. First, to our knowledge, the literature on distractor suppression, and visual search in general, is by no means unanimous with respect to the conclusion that distractor suppression is instantiated proactively (Huang et al., 2021, 2022). Indeed, there are several studies suggesting the opposite account; reactive suppression (Chang et al., 2023) or contributions by both proactive and reactive mechanisms (Sauter et al., 2021; Wang et al., 2019). Moreover, studies in support of proactive distractor suppression did not investigate the involvement of (early) sensory areas during suppression. Conversely, to our knowledge most studies investigating the involvement of sensory cortex during distractor suppression did not address the question whether suppression arises proactive or reactively.
  
  Recommendations for the authors:
  
  Reviewer #1 ( Recommendations for the authors):
  
  Minor Points:
  
  (1) There are several disconnects between the behaviour and the MR results - i.e. not stimulus specific yet there are no deficits for targets appearing the HPDL, also no behavioural suppression for the NLNear but neural suppression found. Nevertheless, the behaviour is used as a way to rule out potential attentional strategies when considering whether there is enhancement in the NL-Far condition. I realise you have a few other points here, but I think it's worth addressing what could be seen as a double standard.
  
  The reviewer points out an important concern, which we feel could have better been addressed in the manuscript. From our point of view a partial dissociation between neural modulations in EVC and eventual behavioural facilitation is not surprising, given the extensive neural processing beyond EVC required for behaviour. However, this assessment may differ, if one stresses an explicit volitional attentional strategy over an implicit statistical learning account. That said, we clearly do not want to create the impression of using a double standard. The lack of behavioural facilitation for targets at NLfar is not a critical part of our argument against explicit attentional strategies. Therefore, we rephrased the relevant paragraph in the Discussion section to now emphasize the importance of the control analysis excluding participants who reported the correct HPDL in the questionnaire (Figure 5), but nonetheless yielded qualitatively identical results to the main ROI analysis (Figure 4). In our opinion, this control analysis provides more compelling evidence against a volitional attentional strategy account without the risk of crea ng the impression of applying a double standard in the interpretation of behavioural data. Additionally, we now acknowledge the limitation of relying on behavioral data in ruling out volitional attentional strategies in the updated manuscript:
  
  “It is well established that attention enhances BOLD responses in visual cortex (Maunsell, 2015; Reynolds & Chelazzi, 2004; Williford & Maunsell, 2006). If participants learned the underlying distractor contingencies, they could deploy an explicit strategy by directing their attention away from the HPDL, for example by focusing attention on the diagonally opposite neutral location. This account provides an alternative explanation for the observed EVC modulations. However, while credible, the current findings are not consistent with such an interpretation. First, there was no behavioral facilitation for target stimuli presented at the far neutral location, contrary to what one might expect if participants employed an explicit strategy. However, given the partial dissociation between neural suppression in EVC and behavioral facilitation, additional neural data analyses are required to rule out volitional attention strategies. Thus, we performed a control analysis that excluded all participants that indicated the correct HPDL location in the questionnaire, thereby possibly expressing explicit awareness of the contingencies. This control analysis yielded qualitatively identical results to the full sample, showing significant distractor suppression in EVC. Therefore, it is unlikely that explicit attentional strategies, and the enhancement of locations far from the HPDL, drive the results observed here. Instead the current finding are consistent with an account emphasizing the automa c deployment of spatial priors (He et al., 2022) based on implicitly learned statistical regularities.”
  
  (2) Does the level of suppression change in any way through the experiment? I.e., does it get stronger in the second vs. first half of the experiment?
  
  The reviewer askes an interesting question, whether BOLD suppression may change across the experiment. To address this question, we performed an additional analysis testing BOLD suppression in EVC during the first compared to second half of the MRI experiment. Here we defined BOLD suppression as: BOLD<sub>suppression</sub> = ((BOLD<sub>NL-far</sub> – BOLD<sub>HPDL</sub>) + (BOLD<sub>NL-far</sub> – BOLD<sub>NL-near</sub>)) / 2. Thus, in this formula on of BOLD suppression we summarize the two primary BOLD suppression effects observed in our main results (Figure 4). Additionally, as we previously did not observe any significant differences in BOLD suppression magnitudes between different stimulus types (i.e. suppression was similar for target, distractor and neutral stimuli), we collapsed across stimulus types in this analysis.
  
  Results, depicted below, showed that during both the initial (Run 1+2) and later part (Run 4+5) of the MRI experiment BOLD suppression was statistically significant (BOLD suppression Run 1+2: W = 331, p = 0.003, r = 0.63; BOLD suppression Run 4+5: W = 320, p = 0.007, r= 0.58) , confirming our main results of reliable distractor suppression even in this subset of trials. However, we did not observe any statistically significant differences between early and late runs of the experiment (t<sub>(27)</sub> = -0.21, p = 0.835, d = -0.04). In fact, a Bayesian paired t-test provided evidence for the absence of a difference in BOLD suppression between early compared to later runs (BF<sub>10</sub> = 0.205), suggesting that distractor suppression in EVC was stable throughout the experiment. A qualitatively similar, pattern was evident during omission trials, with significant distractor suppression during early runs (t<sub>(27)</sub> = 2.70, p = 0.012, d = 0.51), but not quite a statistically significant modulation for later runs (t<sub>(27)</sub> = 1.97, p = 0.059, d = 0.37). Again, there was no evidence for a difference in suppression magnitudes across the experiment (W = 198, p = 0.920, d = -0.025) and support for the absence of a difference in BOLD suppression between early and late runs (BF<sub>10</sub> = 0.278).
  
  Author response image 1.
  
  Analysis of BOLD suppression magnitudes in EVC across the MRI experiment phases. BOLD suppression was comparable between early (Run 1+2) and late (Run 4+5) phases of the MRI experiment, suggesting consistent suppression in EVC following statistical learning. Error-bars denote within-subject SEM. * p < 0.05, ** p < 0.01, = BF<sub>10</sub> < 1/3.
  
  In sum, results suggest that distractor suppression in EVC was stable across runs and did not change significantly throughout the experiment. This result was a priori likely, given that participants already underwent behavioral training before entering the MRI. This enabled them to establish modified spatial priority maps, containing the high probability distractor location contingencies, already before the first MRI run. While specula ve, it is possible that participants may still have consolidated the spatial priority maps during the initial runs, but that this additional consolation is not evident in the data, as later runs may see less engagement by participants due to increasing fa gue towards the end of the MRI experiment. Indeed, rapid learning and stable suppression throughout the remainder of the experiment is also reported by prior work (Lin et al., 2021). We believe that it is highly interesting for future studies to investigate the development of distractor suppression across learning, with initial exposure to the contingencies inside the MRI. However, as the present results are inconclusive, we prefer to not include this analysis in the main manuscript, as it may not provide significant additional insight into the neural mechanisms underlying distractor suppression.
  
  (3) In the methods vs. results you have reported the probabili es slightly differently. In the methods you say the HPDL was 6x more likely to contain a distractor whereas in the results you say 4x. Based on the reported trial numbers I think it should be 4, but probably you want to double check that this is consistent and correct throughout.
  
  We thank the reviewer for bringing this inconsistency to our attention. We have corrected this oversight in the adjusted manuscript:
  
  “One of the four locations of interest was designated the high probability distractor location (HPDL), which contained distractor stimuli (unique color) four mes more o en than any of the remaining three locations of interest. In other words, if a distractor was present on a given trial (42 trials per run), the distractor appeared 57% (24 trials per run) at the HPDL and at one of the other three locations with equal probability (i.e., 14% or 6 trials per run per location).”
  
  Reviewer #2 ( Recommendations for the authors):
  
  The authors have performed their analyses in the volume rather than the surface, and have grouped together V1, V2, and V3 as "early visual cortex". As the authors' claims lean heavily on the idea that they are measuring "early" visual responses, the study would be improved by delinea ng the ROIS within these different retinotopic regions. Such an approach might be facilitated by analysing data on the reconstructed surface.
  
  Please refer to our reply to this analysis suggested in the Public review.
  
  The authors rightly tread carefully on the causal link between their neural findings and the behavioural outcomes. The picture might be clarified somewhat further by testing for a positive relationship between behavioural effect sizes and neural effect sizes across participants. e.g. to what extent is the search advantage when distractors are presented at the "HPDL" linked to greater suppression of BOLD at the HDPL region of early visual cortex?
  
  Please refer to our reply to this analysis suggested in the Public review.
  
  Some of the claims based on null hypotheses would be better supported by Bayesian tests e.g. page 6 "This pattern of results was the same regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NL-far ..." and "BOLD responses between HPDL and NL-near locations did not reliably differ ..." This is similar to the approach that the authors adopted later in the section "Ruling out attentional modulation".
  
  We agree with the reviewer that our ROI analyses would benefit from providing evidence for the absence of a modulation. Accordingly, we updated our results by adding equivalent Bayesian tests. Bayes Factors were computed using JASP 0.18.2 (JASP Team, 2024; RRID:SCR_015823) with default settings; i.e. for Bayesian paired t-tests with a Cauchy prior width of 0.707. Qualitative interpretations of BFs were based on Lee and Wagenmakers (2014). We now report the obtained BF in the Results section.
  
  “BOLD responses between HPDL and NL-near locations did not reliably differ (HPDL vs NL-near: t<sub>(27)</sub> = 0.47, p<sub>holm</sub> = 0.643, d = 0.08; BF<sub>10</sub> = 0.19).”
  
  And:
  
  “Neural responses at HPDL and NL-near did not reliably differ (t<sub>(27)</sub> = 0.21, p<sub>holm</sub> = 0.835 d = 0.04; BF<sub>10</sub> = 0.21).”
  
  Moreover, we now denote any equivalent results (defined as BF<sub>10</sub><1/3) in Fig. 4 and Fig. 5, and included the descrip on of the associated symbol in the figure text (“ = BF<sub>10</sub> < 1/3”).
  
  Additionally, we now also report the BF for all paired t-tests reported in Supplementary Table 1.
  
  Finally, we addressed the statement: “This pattern of results was the same regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NLfar”. Our inten on was to emphasize that the pattern of results reported in the sentence preceding it was evident for distractor, target, or neutral stimulus, and not to suggest that the magnitude of the effect is the same. Hence, to more accurate reflect the results, we changed this sentence to: “This pattern of results was present regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NL-far”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.03.587747v3
www.biorxiv.org www.biorxiv.org

MicroRNA-26b protects against MASH development in mice and can be efficiently targeted with lipid nanoparticles

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Based on previous publications suggesting a potential role for miR-26b in the pathogenesis of metabolic dysfunction-associated steatohepatitis (MASH), the researchers aim to clarify its function in hepatic health and explore the therapeutical potential of lipid nanoparticles (LNPs) to treat this condition. First, they employed both whole-body and myeloid cell-specific miR-26b KO mice and observed elevated hepatic steatosis features in these mice compared to WT controls when subjected to WTD. Moreover, livers from whole-body miR-26b KO mice also displayed increased levels of inflammation and fibrosis markers. Kinase activity profiling analyses revealed distinct alterations, particularly in kinases associated with inflammatory pathways, in these samples. Treatment with LNPs containing miR-26b mimics restored lipid metabolism and kinase activity in these animals. Finally, similar anti-inflammatory effects were observed in the livers of individuals with cirrhosis, whereas elevated miR-26b levels were found in the plasma of these patients in comparison with healthy control. Overall, the authors conclude that miR-26b plays a protective role in MASH and that its delivery via LNPs efficiently mitigates MASH development.
  
  The study has some strengths, most notably, its employ of a combination of animal models, analyses of potential underlying mechanisms, as well as innovative treatment delivery methods with significant promise. However, it also presents numerous weaknesses that leave the research work somewhat incomplete. The precise role of miR-26b in a human context remains elusive, hindering direct translation to clinical practice. Additionally, the evaluation of the kinase activity, although innovative, does not provide a clear molecular mechanisms-based explanation behind the protective role of this miRNA.
  
  Therefore, to fortify the solidity of their conclusions, these concerns require careful attention and resolution. Once these issues are comprehensively addressed, the study stands to make a significant impact on the field.
  
  We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we have addressed these as good as possible during the revision of our manuscript.
  
  We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.
  
  The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.
  
  Reviewer #1 (Recommendations For The Authors):
  
  A list of recommendations for the authors is presented below:
  
  (1) The title should emphasize that the majority of experiments were conducted in mice to accurately reflect the scope of the study.
  
  As suggested we have updated our title to include the statement that we primarily used a murine model:
  
  “MicroRNA-26b protects against MASH development in mice and can be efficiently targeted with lipid nanoparticles.”
  
  (2) It would be useful to know more about miR-26b function, including its target genes, tissue-specific expression, and tissue vs. circulating levels. Is it expected that the two strains of the miRNA (i.e., -3p and -5p) act this similarly? Also, miR-26b expression in the liver of individuals with cirrhosis should be determined.
  
  The function of miR-26b is still rather elusive, making functional studies using this miR very interesting. In a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021) we have eluded several functions of miR-26b that are already investigated. This was particularly already described in carcinogenesis and the neurological field.
  
  Target gene wise, there are already several targets described in miRbase. However, for our experiments we feel that determination of the specific target genes is beyond the scope of the current manuscript and rather a focus of follow-up projects.
  
  Regarding the expression of miR-26b, the liver and blood have rather high and similar expressions of both miR-26b-3p and miR-26b-5p as shown in Author response image 1.
  
  Author response image 1.
  
  Expression of miR-26b-3p and -5p. Expression of miR-26b-3p (left) and miR-26b-5p (right), generated by using the miRNATissueAtlas 2025 (Rishik et al. Nucleic Acids Research, 2024). Unfortunately, due to restrictions in tissue availability and the lack of stored RNA samples, we are unable to measure miR-26b expression in the human livers. However, based on the potency of the miR-26b mimic loaded LNPs in the mice (Revised Supplemental Figure 2A-B), we are confident that these LNPs also resulted in a overexpression of miR-26b in the human livers.
  
  (3) Please explain the rationale behind primarily using whole-body miR-26b KO mice rather than the myeloid cell-specific KO model for the studies.
  
  The main goal of our study is the elucidation of the general role of miR-26b in MASH formation. Therefore, we decided to primarily focus on the whole-body KO model. While we used the myeloid cell-specific KO model to highlight that myeloid cells play an important role in the observed phenotypes, we believe the whole-body KO model is more appropriate as main focus, particularly also in light of the used LNP targeting that also provides a whole-body approach. Furthermore, this focus on the whole-body model also reflects a more therapeutically relevant approach.
  
  (4) The authors claim that treatment with LNPs containing miR-26b "replenish the miR-26b level in the whole-body deficient mouse" but the results of this observation are not presented.
  
  This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Revised Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.
  
  (5) The number of 3 human donors for the precision-cut liver slices is clearly insufficient and clinical parameters need to be shown. Additionally, inconsistencies in individual values in Figures 8B-E need clarification.
  
  Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number for these experiments. Clinical parameters are not available, but the liver slices were from healthy tissue.
  
  We have performed these experiments in duplicates for each individual donor. We have now specified this also in the figure legend to explain the individual values in the graphs:
  
  “…(3 individual donors, cultured in duplicates).”
  
  (6) Figure 2D: Please include representative images.
  
  As suggested we have included representative images in our revised manuscript.
  
  (7) Address discrepancies in the findings across different experimental settings. For example, the expression levels of the lipid metabolism-related genes are not significantly modulated in whole-body miR-26b KO mice (except for Sra), but they are in the myeloid cell-specific model (but not Sra), and none of them are restored after LNPs injections.
  
  Although Cd36 is not significantly increased in the whole-body miR-26b KO mice, there is a clear tendency towards increased expression, which is now also validated on protein level (Revised Figure 1K-L). In the myeloid cell-specific model we see a similar tendency, although the gene expression difference of Sra is not significantly changed. This could be due to the difference in the model, since only myeloid cells are affected, suggesting that the effects on Sra are to a large extend driven by non-myeloid cells. This would also fit to the tendency to decreased Sra expression in the mimic-LNP treated mice. Due to the larger variation, this difference did not reach significance, which is rather a statistical issue due to relatively small n-numbers. At this moment, we cannot exclude that these receptors are differentially regulated by different cell-types. For this, future studies are needed focussing on cell-specific targeting of miR-26b in somatic cells, like hepatocytes.
  
  (8) Figure 4A the images are not representative of the quantification.
  
  We have selected another representative image that is exactly reflecting the average Sirius red positive area, to reflect the quantification appropriately.
  
  (9) Figures 5 and 7: Are there not significantly decreased/increased kinases? A deeper analysis of these kinase alterations is necessary to understand how miR-26b exerts its role. A comparison analysis of these two datasets might clarify this regard.
  
  We indeed very often see in these kinome analysis that the general tendency of kinase activity is unidirectional. We believe that this is caused by the highly interconnected nature of kinases. Activation of one signalling cascade will also results in the activation of many other cascades. However, it is interesting to see which pathways are affected in our study and we find it particularly interesting to see that the tendencies is exactly opposite between both comparisons as KO vs. WT shows increase kinase activities, while KO-LNP vs. KO shows a decrease again. Further showing that the method is reflecting a true biological effect that is mediated by miR26b.
  
  (10) Determinations of the effect of LNPs containing miR-26b in the KO mice are limited to only a few observations (that are not only significant). More extensive findings are needed to conclusively demonstrate the effectiveness of this treatment method. Similar to the experiments with human liver samples (Figures 8A-E).
  
  We have now elaborated our observations in the mouse model using LNPs by also analysing the effects on inflammation and fibrosis. However, it seems that the treatment time was not long enough to see pronounced changes on these later stages of disease development. Interestingly, the expression of Tgfb was significantly reduced, suggesting at least that the LNPs on genetic levels have an effect already on fibrotic processes. Thereby, it can be suggested that longer mLNP treatment may result in more effects on protein level as well, which remains to be determined in future studies.
  
  Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number or read-outs for these experiments at this moment.
  
  (11) In Figures 8F-H, the observed increase in circulating miR-26b levels in the plasma of cirrhotic individuals seems contradictory to its proposed protective role. This discrepancy requires clarification.
  
  In the revised discussion (second to last paragraph), we have now elaborated more on the findings in the plasma of cirrhotic individuals in comparison to our murine in-vivo results, to highlight and discuss this discrepancy.
  
  (12) Figures 8F-H legend mentions using 8-11 patients per group, but the methods section lacks corresponding information about these individuals.
  
  These patients, together with inclusion/exclusion criteria and definition of cirrhosis are described in the method section 2.14.
  
  (13) Figure 8G has 7 data points in the cirrhosis group, instead of 8. Any data exclusion should be justified in the methods section.
  
  As defined in method section 2.15, we have identified outliers using the ROUT = 1 method, which is the reason why Figure 8G only has 7 data points instead of 8.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This manuscript by Peters, Rakateli, et al. aims to characterize the contribution of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. In addition, the authors provide a rescue of the miR-26b using lipid nanoparticles (LNPs), with potential therapeutic implications. In addition, the authors provide useful insights into the role of macrophages and some validation of the effect of miR-26b LNPs on human liver samples.
  
  Strengths:
  
  The authors provide a well-designed mouse model, that aims to characterize the role of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. The rescue of the phenotypes associated with the model used using miR-26b using lipid nanoparticles (LNPs) provides an interesting avenue to novel potential therapeutic avenues.
  
  Weaknesses:
  
  Although the authors provide a new and interesting avenue to understand the role of miR-26b in MASH, the study needs some additional validations and mechanistic insights in order to strengthen the author's conclusions.
  
  (1) Analysis of the expression of miRNAs based on miRNA-seq of human samples (see https://ccb-compute.cs.uni-saarland.de/isomirdb/mirnas) suggests that miR-26b-5p is highly abundant both on liver and blood. It seems hard to reconcile that despite miRNA abundance being similar in both tissues, the physiological effects claimed by the authors in Figure 2 come exclusively from the myeloid (macrophages).
  
  We agree with the reviewer that the effects observed in the whole-body KO model are most likely a combination of cellular effects, particularly since miR-26b is also highly expressed in the liver. However, with the LysM-model we merely want to demonstrate that the myeloid cells at least play an important, though not exclusive, role in the phenotype. In the discussion, we also further elaborate on the fact that the observed changes in the liver can me mediated by hepatic changes.
  
  To stress this, we have adjusted the conclusion of Figure 2:
  
  “Interestingly, mice that have a myeloid-specific lack of miR-26b also show increased hepatic cholesterol levels and lipid accumulation demonstrated by Oil-red-O staining, coinciding with an increased hepatic Cd36 expression (Figure 2), demonstrating that myeloid miR-26b plays a major, but not exclusive, role in the observed steatosis.”
  
  (2) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26a-5p is indeed 4-fold higher than miR-26b-5p both in the liver and blood. Since both miRNAs share the same seed sequence, and most of the supplemental regions (only 2 nt difference), their endogenous targets must be highly overlapped. It would be interesting to know whether deletion of miR-26b is somehow compensated by increased expression of miR-26a-5p loci. That would suggest that the model is rather a depletion of miR-26.
  
  UUCAAGUAAUUCAGGAUAGGU mmu-miR-26b-5p mature miRNA
  
  UUCAAGUAAUCCAGGAUAGGCU mmu-miR-26a-5p mature miRNA
  
  This is a very valid point raised by the reviewer, which we actually already explored in a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021). In this manuscript, we could show that miR-26a is not affected by the deficiency of miR-26b (Figure 1G in: Van der Vorst et al. BMC Genom Data, 2021).
  
  (3) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26b-5p is indeed 50-fold higher than miR-26b-3p in the liver and blood. This difference in abundance of the two strands is usually regarded as one of them being the guide strand (in this case the 5p) and the other being the passenger (in this case the 3p). In some cases, passenger strands can be a byproduct of miRNA biogenesis, thus the rescue experiments using LNPs with both strands in equimolar amounts would not reflect the physiological abundance miR-26b-3p. The non-physiological overabundance of miR-26b-3p would constitute a source of undesired off-targets.
  
  We agree with the reviewer on this aspect and this is something we had to consider while generating the mimic LNPs. However, we believe that we do not observe and undesired off-target effects, as the effects of the mimic LNPs at least on functional outcomes are relatively mild and only restricted to the expected effects on lipids. Furthermore, the effects on the kinase profile due to the mimic LNP treatment are in line with our expectations. Combined these results suggest at least that potential off-target effects are minor.
  
  (4) It would also be valuable to check the miRNA levels on the liver upon LNP treatment, or at least the signatures of miR-26b-3p and miR-26b-5p activity using RNA-seq on the RNA samples already collected.
  
  This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.
  
  (5) Some of the phenotypes described, such as the increase in cholesterol, overlap with the previous publication by van der Vorst et al. BMC Genom Data (2021), despite in this case the authors are doing their model in Apoe knock-out and Western-type diet. I would encourage the authors to investigate more or discuss why the initial phenotypes don't become more obvious despite the stressors added in the current manuscript.
  
  In our previous publication (BMC Genom Data; 2021), we actually did not see any changes in circulating lipid levels. However, in that study we did not evaluate the livers of the mice, so we do not have any information about the hepatic lipid levels.
  
  As mentioned by the reviewer, we believe that we see much more pronounced phenotypes in the current model because we use the combined stressor of Apoe-/- and Western-type diet, which cannot be compared to the wildtype and chow-fed mice used in the BMC Genom Data manuscript.
  
  (6) The authors have focused part of their analysis on a few gene makers that show relatively modest changes. Deeper characterization using RNA-seq might reveal other genes that are more profoundly impacted by miR-26 depletion. It would strengthen the conclusions proposed if the authors validated that changes in mRNA abundance (Sra, Cd36) do impact the protein abundance. These relatively small changes or trends in mRNA expression, might not translate into changes in protein abundance.
  
  As suggested by the reviewer we have now also confirmed that the protein expression of CD36 and SRA is significantly increased upon miR-26b depletion, visualized as Figure 1K-L in the revised manuscript. Unfortunately, we do not have enough material left to perform similar analysis for the LysM-model or the LNP-model, although based on the whole-body effects we are confident that at least for CD36/SRA in this case the gene expression matches effects observed on protein level.
  
  (7) In Figures 5 and 7, the authors run a phosphorylation array (STK) to analyze the changes in the activity of the kinome. It seems that a relatively large number of signaling pathways are being altered, I think that should be strengthened by further validations by Western blot on the collected tissue samples. For quite a few of the kinases, there might be antibodies that recognise phosphorylation. The two figures lack a mechanistic connection to the rest of the manuscript.<br /> On this point we respectfully have to disagree with the reviewer. We have used a kinase activity profiling approach (PamGene) to analyse the real-time activity of kinases in our lysates. This approach is different than the classical Western blot approach in which only the presence or absence of a specific phosphorylation is detected. Thereby, Western blot analysis does not analyse phosphorylation in real-time, but rather determines whether there has been phosphorylation in the past. Our approach actually determines the real-time, current activity of the kinases, which we believe is a different and perhaps even more reliable read-out measurement. Therefore, validation by Western blot would not strengthen these observations.
  
  We have particularly tried to connect these observations to the rest of the manuscript by highlighting the observed signalling cascades that are affected, highlighting a role in inflammation and angiogenesis, thereby providing some mechanistic insights.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I would encourage the authors to follow-up on some of the more miRNA focused comments made above, which would strengthen the mechanistic part of the work presented.
  
  I suggest the authors tone down some of some of the claims made (eg. "clearly increased expression", "exacerbated hepatic fibrosis"), given that some of it might need further validation.
  
  Wherever needed we have tuned down the tone of some claims, although we believe that most claims are already written carefully enough and in line with the observed results.
  
  Some of the panels that are supposed to have the same amount of animals have variable N, despite they come from the same exact number of RNA samples or tissue lysates (eg. 1G and 1H, vs 1I and 1J).
  
  This is indeed correct and caused by the fact that some analysis resulted in statistical outliers as identified using the ROUT = 1 method, as also specified in section 2.15 of the method section.
  
  It would be nice to have representative images of oil-red-o in all the figures where it is quantified (or at least in the supplementary figures).
  
  As suggested by the reviewer, we have now included representative images for the LysM-model (Revised Figure 2D) and the LNP-model (Revised Figure 6D) as well.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.18.580792v2
www.biorxiv.org www.biorxiv.org

The information bottleneck as a principle underlying multi-area cortical representations during decision-making

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  In this study, the authors aim to understand why decision formation during behavioural tasks is distributed across multiple brain areas. They hypothesize that multiple areas are used in order to implement an information bottleneck (IB). Using neural activity recorded from monkey DLPFC and PMd performing a 2-AFC task, they show that DLPFC represents various task variables (decision, color, target configuration), while downstream PMd primarily represents decision information. Since decision information is the only information needed to make a decision, the authors point out that PMd has a minimal sufficient representation (as expected from an IB). They then train 3-area RNNs on the same task and show that activity in the first and third areas resemble the neural representations of DLPFC and PMd, respectively. In order to propose a mechanism, they analyse the RNN and find that area 3 ends up with primarily decision information because feedforward connections between areas primarily propagate decision information.
  
  The paper addresses a deep, normative question, namely why task information is distributed across several areas.
  
  Overall, it reads well and the analysis is well done and mostly correct (see below for some comments). My major problem with the paper is that I do not see that it actually provides an answer to the question posed (why is information distributed across areas?). I find that the core problem is that the information bottleneck method, which is evoked throughout the paper, is simply a generic compression method.
  
  Being a generic compressor, the IB does not make any statements about how a particular compression should be distributed across brain areas - see major points (1) and (2).
  
  If I ignore the reference to the information bottleneck and the question of why pieces of information are distributed, I still see a more mechanistic study that proposes a neural mechanism of how decisions are formed, in the tradition of RNN-modelling of neural activity as in Mante et al 2013. Seen through this more limited sense, the present study succeeds at pointing out a good model-data match, and I could support a publication along those lines. I point out some suggestions for improvement below.
  
  We thank the reviewer for their comments, feedback and suggestions. We are glad to hear you support the good model-data match for this manuscript. With your helpful comments, we have clarified the connections to the information bottleneck principle and also contrasted it against the information maximization principle (the InfoMax principle), an alternative hypothesis. We elaborate on these issues in response to your points below, particularly major points (1) and (2). We also address all your other comments below.
  
  Major points
  
  (1) It seems to me that the author's use of the IB is based on the reasoning that deep neural networks form decisions by passing task information through a series of transformations/layers/areas and that these deep nets have been shown to implement an IB. Furthermore, these transformations are also loosely motivated by the data processing inequality.
  
  On Major Point 1 and these following subpoints, we first want to make a high-level statement before delving into a detailed response to your points as it relates to the information bottleneck (IB). We hope this high-level statement will provide helpful context for the rest of our point-by-point responses.
  
  We want to be clear that we draw on the information bottleneck (IB) principle as a general principle to explain why cortical representations differ by brain area. The IB principle, as applied to cortex, is only stating that a minimal sufficient representation to perform the task is formed in cortex, not how it is formed. The alternative hypothesis to the IB is that brain areas do not form minimal sufficient representations. For example, the InfoMax principle states that each brain area stores information about all inputs (even if they’re not necessary to perform the task). InfoMax isn’t unreasonable: it’s possible that storing as much information about the inputs, even in downstream areas, can support flexible computation and InfoMax also supports redundancy in cortical areas. Indeed, many studies claim that action choice related signals are in many cortical areas, which may reflect evidence of an InfoMax principle in action for areas upstream of PMd.
  
  While we observe an IB in deep neural networks and cortex in our perceptual decision-making task, we stress that its emergence across multiple areas is an empirical result. At the same time, multiple areas producing an IB makes intuitive sense: due to the data processing inequality, successive transformations typically decrease the information in a representation (especially when, e.g., in neural networks, every activation passes through the Relu function, which is not bijective). Multiple areas are therefore a sufficient and even ‘natural’ way to implement an IB, but multiple areas are not necessary for an IB. That we observe an IB in deep neural networks and cortex emerge through multi-area computation is empirical, and, contrasting InfoMax, we believe it is an important result of this paper.
  
  Nevertheless, your incisive comments have helped us to update the manuscript that when we talk about the IB, we should be clear that the alternative hypothesis is non-minimal representations, a prominent example of which is the InfoMax principle. We have now significantly revised our introduction to avoid this confusion. We hope this provides helpful context for our point-by-point replies, below.
  
  However, assuming as a given that deep neural networks implement an IB does not mean that an IB can only be implemented through a deep neural network. In fact, IBs could be performed with a single transformation just as well. More formally, a task associates stimuli (X) with required responses (Y), and the IB principle states that X should be mapped to a representation Z, such that I(X;Z) is minimal and I(Y,Z) is maximal. Importantly, the form of the map Z=f(X) is not constrained by the IB. In other words, the IB does not impose that there needs to be a series of transformations. I therefore do not see how the IB by itself makes any statement about the distribution of information across various brain areas.
  
  We agree with you that an IB can be implemented in a single transformation. We wish to be clear that we do not intend to argue necessity: that multiple areas are the only way to form minimal sufficient representations. Rather, multiple areas are sufficient to induce minimal sufficient representations, and moreover, they are a natural and reasonably simple way to do so. By ‘natural,’ we mean that minimal sufficient representations empirically arise in systems with multiple areas (more than 2), including deep neural networks and the cortex at least for our task and simulations. For example, we did not see minimal sufficient representations in 1- or 2-area RNNs, but we did see them emerge in RNNs with 3 areas or more. One potential reason for this result is that sequential transformations through multiple areas can never increase information about the input; it can only maintain or reduce information due to the data processing inequality.
  
  Our finding that multiple areas facilitate IBs in the brain is therefore an empirical result: like in deep neural networks, we observe the brain has minimal sufficient representations that emerge in output areas (PMd), even as an area upstream (DLPFC) is not minimal. While the IB makes a statement that this minimal sufficient representation emerges, to your point, the fact that it emerges over multiple areas is not a part of the IB – as you have pointed out, the IB doesn’t state where or how the information is discarded, only that it is discarded. Our RNN modeling later proposes one potential mechanism for how it is discarded. We updated the manuscript introduction to make these points:
  
  “An empirical observation from Machine Learning is that deep neural networks tend to form minimal sufficient representations in the last layers. Although multi-layer computation is not necessary for an IB, they provide a sufficient and even “natural” way to form an IB. A representation z = f(x) cannot contain more information than the input x itself due to the data processing inequality[19]. Thus, adding additional layers typically results in representations that contain less information about the input.”
  
  And later in the introduction:
  
  “Consistent with these predictions of the IB principle, we found that DLPFC has information about the color, target configuration, and direction. In contrast, PMd had a minimal sufficient representation of the direction choice. Our recordings therefore identified a cortical IB. However, we emphasize the IB does not tell us where or how the minimal sufficient representation is formed. Instead, only our empirical results implicate DLPFC-PMd in an IB computation. Further, to propose a mechanism for how this IB is formed, we trained a multi-area RNN to perform this task. We found that the RNN faithfully reproduced DLPFC and PMd activity, enabling us to propose a mechanism for how cortex uses multiple areas to compute a minimal sufficient representation.”
  
  In the context of our work, we want to be clear the IB makes these predictions:
  
  Prediction 1: There exists a downstream area of cortex that has a minimal and sufficient representation to perform a task (i.e.,. I(X;Z) is minimal while preserving task information so that I(Z;Y) is approximately equal to I(X;Y)). We identify PMd as an area with a minimal sufficient representation in our perceptual-decision-making task.
  
  Prediction 2 (corollary if Prediction 1 is true): There exists an upstream brain area that contains more input information than the minimal sufficient area. We identify DLPFC as an upstream area relative to PMd, which indeed has more input information than downstream PMd in our perceptual decision-making task.
  
  Note: as you raise in other points, it could have been possible that the IB is implemented early on, e.g., in either the parietal cortex (dorsal stream) or inferotemporal cortex (ventral stream), so that DLPFC and PMd both contained minimal sufficient representations. The fact that it doesn’t is entirely an empirical result from our data. If DLPFC had minimal sufficient representations for the perceptual decision making task, we would have needed to record in other regions to identify brain areas that are consistent with Prediction 2. But, empirically, we found that DLPFC has more input information relative to PMd, and therefore the DLPFC-PMd connection is implicated in the IB process.
  
  What is the alternative hypothesis to the IB? We want to emphasize: it isn’t single-area computation. It’s that the cortex does not form minimal sufficient representations. For example, an alternative hypothesis (“InfoMax”) would be for all engaged brain areas to form representations that retain all input information. One reason this could be beneficial is because each brain area could support a variety of downstream tasks. In this scenario, PMd would not be minimal, invalidating Prediction 1. However, this is not supported by our empirical observations of the representations in PMd, which has a minimal sufficient representation of the task. We updated our introduction to make this clear:
  
  “But cortex may not necessarily implement an IB. The alternative hypothesis to IB is that the cortex does not form minimal sufficient representations. One manifestation of this alternative hypothesis is the “InfoMax” principle, where downstream representations are not minimal but rather contain maximal input information22. This means information about task inputs not required to perform the task are present in downstream output areas. Two potential benefits of an InfoMax principle are (1) to increase redundancy in cortical areas and thereby provide fault tolerance, and (2) for each area to support a wide variety of tasks and thereby improve the ability of brain areas to guide many different behaviors. In contrast to InfoMax, the IB principle makes two testable predictions about cortical representations. Prediction 1: there exists a downstream area of cortex that has a minimal and sufficient representation to perform a task (i.e., I(X; Z) is minimal while preserving task information so that I(Z; Y) ≈ I(X; Y)). Prediction 2 (corollary if Prediction 1 is true): there exists an upstream area of cortex that has more task information than the minimal sufficient area.”
  
  Your review helped us realize we should have been clearer in explaining that these are the key predictions of the IB principle tested in our paper. We also realized we should be much clearer that these predictions aren’t trivial or expected, and there is an alternative hypothesis. We have re-written the introduction of our paper to highlight that the key prediction of the IB is minimal sufficient representations for the task, in contrast to the alternative hypothesis of InfoMax.
  
  A related problem is that the authors really only evoke the IB to explain the representation in PMd: Fig 2 shows that PMd is almost only showing decision information, and thus one can call this a minimal sufficient representation of the decision (although ignoring substantial condition independent activity).
  
  However, there is no IB prediction about what the representation of DLPFC should look like.
  
  Consequently, there is no IB prediction about how information should be distributed across DLPFC and PMd.
  
  We agree: the IB doesn’t tell us how information is distributed, only that there is a transformation that eventually makes PMd minimal. The fact that we find input information in DLPFC reflects that this computation occurs across areas, and is an empirical characterization of this IB in that DLPFC has direction, color and context information while PMd has primarily direction information. To be clear: only our empirical recordings verified that the DLPFC-PMd circuit is involved in the IB. As described above, if not, we would have recorded even further upstream to identify an inter-areal connection implicated in the IB.
  
  We updated the text to clearly state that the IB predicts that an upstream area’s activity should contain more information about the task inputs. We now explicitly describe this in the introduction, copy and pasted again here for convenience.
  
  “In contrast to InfoMax, the IB principle makes two testable predictions about cortical representations. Prediction 1: there exists a downstream area of cortex that has a minimal and sufficient representation to perform a task (i.e., I(X; Z) is minimal while preserving task information so that I(Z; Y) ≈ I(X; Y)). Prediction 2 (corollary if Prediction 1 is true): there exists an upstream area of cortex that has more task information than the minimal sufficient area.
  
  Consistent with the predictions of the IB principle, we found that DLPFC has information about the color, target configuration, and direction. In contrast, PMd had a minimal sufficient representation of the direction choice. Our recordings therefore identified a cortical IB. However, we emphasize the IB does not tell us where or how the minimal sufficient representation is formed. Instead, only our empirical results implicate DLPFC-PMd in an IB computation Further, to propose a mechanism for how this IB is formed, we trained a multi-area RNN to perform this task.”
  
  The only way we knew DLPFC was not minimal was through our experiments. Please also note that the IB principle does not describe how information could be lost between areas or layers, whereas our RNN simulations show that this may occur through preferential propagation of task-relevant information with respect to the inter-area connections.
  
  (2) Now the authors could change their argument and state that what is really needed is an IB with the additional assumption that transformations go through a feedforward network. However, even in this case, I am not sure I understand the need for distributing information in this task. In fact, in both the data and the network model, there is a nice linear readout of the decision information in dPFC (data) or area 1 (network model). Accordingly, the decision readout could occur at this stage already, and there is absolutely no need to tag on another area (PMd, area 2+3).
  
  Similarly, I noticed that the authors consider 2,3, and 4-area models, but they do not consider a 1-area model. It is not clear why the 1-area model is not considered. Given that e.g. Mante et al, 2013, manage to fit a 1-area model to a task of similar complexity, I would a priori assume that a 1-area RNN would do just as well in solving this task.
  
  While decision information could indeed be read out in Area 1 in our multi-area model, we were interested in understanding how the network converged to a PMd-like representation (minimal sufficient) for solving this task. Empirically, we only observed a match between our model representations and animal cortical representations during this task when considering multiple areas. Given that we empirically observed that our downstream area had a minimal sufficient representation, our multi-area model allowed how this minimal sufficient representation emerged (through preferential propagation of task-relevant information).
  
  We also analyzed single-area networks in our initial manuscript, though we could have highlighted these analyses more clearly to be sure they were not overlooked. We are clearer in this revision that we did consider a 1-area network (results in our Fig 5). While a single-area RNN can indeed solve this task, the single area model had all task information present in the representation, and did not match the representations in DLPFC or PMd. It would therefore not allow us to understand how the network converged to a PMd-like representation (minimal sufficient) for solving this task. We updated the schematic in Fig 5 to add in the single-area network (which may have caused the confusion).
  
  We have added an additional paragraph commenting on this in the discussion. We also added an additional supplementary figure with the PCs of the single area RNN (Fig S15). We highlight that single area RNNs do not resemble PMd activity because they contain strong color and context information.
  
  In the discussion:
  
  “We also found it was possible to solve this task with single area RNNs, although they did not resemble PMd (Figure S15) since it did not form a minimal sufficient representation. Rather, for our RNN simulations, we found that the following components were sufficient to induce minimal sufficient representations: (1) RNNs with at least 3 areas, following Dale’s law (independent of the ratio of feedforward to feedback connections).”
  
  I think there are two more general problems with the author's approach. First, transformations or hierarchical representations are usually evoked to get information into the right format in a pure feedforward network. An RNN can be seen as an infinitely deep feedforward network, so even a single RNN has, at least in theory, and in contrast to feedforward layers, the power to do arbitrarily complex transformations. Second, the information coming into the network here (color + target) is a classical xor-task. While this task cannot be solved by a perceptron (=single neuron), it also is not that complex either, at least compared to, e.g., the task of distinguishing cats from dogs based on an incoming image in pixel format.
  
  An RNN can be viewed as an infinitely deep feedforward network in time. However, we wish to clarify two things. First, our task runs for a fixed amount of time, and therefore this RNN in practice is not infinitely deep in time. Second, if it were to perform an IB operation in time, we would expect to see color discriminability decrease as a function of time. Indeed, we considered this as a mechanism (recurrent attenuation, Figure 4a), but as we show in Supplementary Figure S9, we do not observe it to be the case that discriminability decreases through time. This is equivalent to a dynamical mechanism that removes color through successive transformations in time, which our analyses reject (Fig 4). We therefore rule out that an IB is implemented through time via an RNN’s recurrent computation (viewed as feedforward in time). Rather, as we show, the IB comes primarily through inter-areal connections between RNN areas. We clarified that our dynamical hypothesis is equivalent to rejecting the feedforward-in-time filtering hypothesis in the Results:
  
  “We first tested the hypothesis that the RNN IB is implemented primarily by recurrent dynamics (left side of Fig. 4a). These recurrent dynamics can be equivalently interpreted as the RNN implementing a feedforward neural network in time.”
  
  The reviewer is correct that the task is a classical XOR task and not as complex as e.g., computer vision classification. That said, our related work has looked at IBs for computer vision tasks and found them in deep feedforward networks (Kleinman et al., ICLR 2021). Even though the task is relatively straightforward, we believe it is appropriate for our conclusions because it does not have a trivial minimal sufficient representation: a minimal sufficient representation for XOR must contain only target, but not color or target configuration information. This can only be solved via a nonlinear computation. In this manner, we favor this task because it is relatively simple, and the minimal sufficient representations are interpretable, while at the same time not being so trivially simple (the minimal sufficient representations require nonlinearity to compute).
  
  Finally, we want to note that this decision-making task is a logical and straightforward way to add complexity to classical animal decision-making tasks, where stimulus evidence and the behavioral report are frequently correlated. In tasks such as these, it may be challenging to untangle stimulus and behavioral variables, making it impossible to determine if an area like premotor cortex represents only behavior rather than stimulus. However, our task decorrelates both the stimulus and the behaviors.
  
  (3) I am convinced of the author's argument that the RNN reproduces key features of the neural data. However, there are some points where the analysis should be improved.
  
  (a) It seems that dPCA was applied without regularization. Since dPCA can overfit the data, proper regularization is important, so that one can judge, e.g., whether the components of Fig.2g,h are significant, or whether the differences between DLPFC and PMd are significant.
  
  We note that the dPCA codebase optimizes the regularization hyperparameter through cross-validation and requires single-trial firing rates for all neurons, i.e., data matrices of the form (n_Neurons x Color x Choice x Time x n_Trials), which are unavailable for our data. We recognized that you are fundamentally asking whether differences are significant or not. We therefore believe it is possible to address this through a statistical test, described further below.
  
  In order to test whether the differences of variance explained by task variables between DLPFC and PMd are significant, we performed a shuffle test. For this test, we randomly sampled 500 units from the DLPFC dataset and 500 units from the PMd dataset. We then used dPCA to measure the variance explained by target configuration, color choice, and reach direction (e.g., Var<sup>True</sup><sub>DLPFC,Color</sub>, Var<sup>True</sup><sub>PMd,Color</sub>).
  
  To test if this variance was significant, we performed the following shuffle test. We combined the PMd and DLPFC dataset into a pool of 1000 units and then randomly selected 500 units from this pool to create a surrogate PMd dataset and used the remaining 500 units as a surrogate DLPFC dataset. We then again performed dPCA on these surrogate datasets and estimated the variance for the various task variables (e.g., Var<sub>ShuffledDLPFC,Color</sub> ,Var<sub>ShuffledPMd,Color</sub>).
  
  We repeated this process for 100 times and estimated a sampling distribution for the true difference in variance between DLPFC and PMd for various task variables (e.g., Var<sup>True</sup><sub>DLPFC,Color</sub> - Var<sup>True</sup><sub>PMd,Color</sub>). At the same time, we estimated the distribution of the variance difference between surrogate PMd and DLPFC dataset for various task variables (e.g., Var<sub>ShuffleDLPFC,Color</sub> - Var<sub>ShufflePMd,Color</sub>).
  
  We defined a p-value as the number of shuffles in which the difference in variance was higher than the median of the true difference and divided it by 100. Note, for resampling and shuffle tests with n shuffles/bootstraps, the lowest theoretical p-value is given as 2/n, even in the case that no shuffle was higher than the median of the true distribution. Thus, the differences were statistically significant (p < 0.02) for color and target configuration but not for direction (p=0.72). These results are reported in Figure S6 and show both the true sampling distribution and the shuffled sampling distributions.
  
  (b) I would have assumed that the analyses performed on the neural data were identical to the ones performed on the RNN data. However, it looked to me like that was not the case. For instance, dPCA of the neural data is done by restretching randomly timed trials to a median trial. It seemed that this restretching was not performed on the RNN. Maybe that is just an oversight, but it should be clarified. Moreover, the decoding analyses used SVC for the neural data, but a neural-net-based approach for the RNN data. Why the differences?
  
  Thanks for bringing up these points. We want to clarify that we did include SVM decoding for the multi-area network in the appendix (Fig. S4), and the conclusions are the same. Moreover, in previous work, we also found that training with a linear decoder led to analogous conclusions (Fig. 11 of Kleinman et al, NeurIPS 2021). As we had a larger amount of trials for the RNN than the monkey, we wanted to allow a more expressive decoder for the RNN, though this choice does not affect our conclusions. We clarified the text to reflect that we did use an SVM decoder.
  
  “We also found analogous conclusions when using an SVM decoder (Fig. S4).”
  
  dPCA analysis requires trials of equal length. For the RNN, this is straightforward to generate because we can set the delay lengths to be equal during inference (although the RNN was trained on various length trials and can perform various length trials). Animals must have varying delay periods, or else they will learn the timing of the task and anticipate epoch changes. Because animal trial lengths were therefore different, their trials had to be restretched. We clarified this in the Methods.
  
  “For analyses of the RNN, we fixed the timing of trials, obviating the need to to restretch trial lengths. Note that while at inference, we generated RNN trials with equal length, the RNN was trained with varying delay periods.”
  
  (4) The RNN seems to fit the data quite nicely, so that is interesting. At the same time, the fit seems somewhat serendipitous, or at least, I did not get a good sense of what was needed to make the RNN fit the data. The authors did go to great lengths to fit various network models and turn several knobs on the fit. However, at least to me, there are a few (obvious) knobs that were not tested.
  
  First, as already mentioned above, why not try to fit a single-area model? I would expect that a single area model could also learn the task - after all, that is what Mante et al did in their 2013 paper and the author's task does not seem any more complex than the task by Mante and colleagues.
  
  Thank you for bringing up this point. As mentioned in response to your prior point, we did analyze a single-area RNN (Fig. 5d). We updated the schematic to clarify that we analyzed a single area network. Moreover, we also added a supplementary figure to qualitatively visualize the PCs of the single area network (Fig. S15). While a single area network can solve the task, it does not allow us to study how representations change across areas, nor did it empirically resemble our neural recordings. Single-area networks contain significant color, context, and direction information. They therefore do not form minimal representations and do not resemble PMd activity.
  
  Second, I noticed that the networks fitted are always feedforward-dominated. What happens when feedforward and feedback connections are on an equal footing? Do we still find that only the decision information propagates to the next area? Quite generally, when it comes to attenuating information that is fed into the network (e.g. color), then that is much easier done through feedforward connections (where it can be done in a single pass, through proper alignment or misalignment of the feedforward synapses) than through recurrent connections (where you need to actively cancel the incoming information). So it seems to me that the reason the attenuation occurs in the inter-area connections could simply be because the odds are a priori stacked against recurrent connections. In the real brain, of course, there is no clear evidence that feedforward connections dominate over feedback connections anatomically.
  
  We want to clarify that we did pick feedforward and feedback connections based on the following macaque atlas, reference 27 in our manuscript:
  
  Markov, N. T., Ercsey-Ravasz, M. M., Ribeiro Gomes, A. R., Lamy, C., Magrou, L., Vezoli, J., Misery, P., Falchier, A., Quilodran, R., Gariel, M. A., Sallet, J., Gamanut, R., Huissoud, C., Clavagnier, S., Giroud, P., Sappey-Marinier, D., Barone, P., Dehay, C., Toroczkai, Z., … Kennedy, H. (2014). A weighted and directed interareal connectivity matrix for macaque cerebral cortex. Cerebral Cortex , 24(1), 17–36.
  
  We therefore believe there is evidence for more feedforward than feedback connections. Nevertheless, as stated in response to your next point below, we ran a simulation where feedback and feedforward connectivity were matched.
  
  More generally, it would be useful to clarify what exactly is sufficient:
  
  (a) the information distribution occurs in any RNN, i.e., also in one-area RNNs
  
  (b) the information distribution occurs when there are several, sparsely connected areas
  
  (c) the information distribution occurs when there are feedforward-dominated connections between areas
  
  We better clarify what exactly is sufficient.
  
  - We trained single-area RNNs and found that these RNNs contained color information; additionally two area RNNs also contained color information in the last area (Fig 5d).
  
  - We indeed found that the minimal sufficient representations emerged when we had several areas, with Dale’s law constraint on the connectivity. When we had even sparser connections, without Dale’s law, there was significantly more color information, even at 1% feedforward connections; Fig 5a.
  
  - When we matched the percentage of feedforward and feedback connections with Dale’s law constraint on the connectivity (10% feedforward and 10% feedback), we also observed minimal sufficient representations (Fig S9).
  
  Together, we found that minimal sufficient representations emerged when we had several areas (3 or greater), with Dale’s law constraint on the connectivity, independent of the ratio of feedforward/feedback connections. We thank the reviewer for raising this point about the space of constraints leading to minimal sufficient representations in the late area. We clarified this in the Discussion.
  
  “We also found it was possible to solve this task with single area RNNs, although they did not resemble PMd (Figure S15) since it did not form a minimal sufficient representation. Rather, for our RNN simulations, we found that the following components were sufficient to induce minimal sufficient representations: RNNs with at least 3 areas, following Dale’s law (independent of the ratio of feedforward to feedback connections).”
  
  Thank you for your helpful and constructive comments!
  
  Reviewer #2 (Public Review):
  
  Kleinman and colleagues conducted an analysis of two datasets, one recorded from DLPFC in one monkey and the other from PMD in two monkeys. They also performed similar analyses on trained RNNs with various architectures.
  
  The study revealed four main findings. (1) All task variables (color coherence, target configuration, and choice direction) were found to be encoded in DLPFC. (2) PMD, an area downstream of PFC, only encoded choice direction. (3) These empirical findings align with the celebrated 'information bottleneck principle,' which suggests that FF networks progressively filter out task-irrelevant information. (4) Moreover, similar results were observed in RNNs with three modules.
  
  We thank the reviewer for their comments, feedback and suggestions, which we address below.
  
  While the analyses supporting results 1 and 2 were convincing and robust, I have some concerns and recommendations regarding findings 3 and 4, which I will elaborate on below. It is important to note that findings 2 and 4 had already been reported in a previous publication by the same authors (ref. 43).
  
  Note the NeurIPS paper only had PMd data and did not contain any DLPFC data. That manuscript made predictions about representations and dynamics upstream of PMd, and subsequent experiments reported in this manuscript validated these predictions. Importantly, this manuscript observes an information bottleneck between DLPFC and PMd.
  
  Major recommendation/comments:
  
  The interpretation of the empirical findings regarding the communication subspace in relation to the information bottleneck theory is very interesting and novel. However, it may be a stretch to apply this interpretation directly to PFC-PMd, as was done with early vs. late areas of a FF neural network.
  
  In the RNN simulations, the main finding indicates that a network with three or more modules lacks information about the stimulus in the third or subsequent modules. The authors draw a direct analogy between monkey PFC and PMd and Modules 1 and 3 of the RNNs, respectively. However, considering the model's architecture, it seems more appropriate to map Area 1 to regions upstream of PFC, such as the visual cortex, since Area 1 receives visual stimuli. Moreover, both PFC and PMd are deep within the brain hierarchy, suggesting a more natural mapping to later areas. This contradicts the CCA analysis in Figure 3e. It is recommended to either remap the areas or provide further support for the current mapping choice.
  
  We updated the Introduction to better clarify the predictions of the information bottleneck (IB) principle. In particular, the IB principle predicts that later areas should have minimal sufficient representations of task information, whereas upstream areas should have more information. In PMd, we observed a minimal sufficient representation of task information during the decision-making task. In DLPFC, we observed more task information, particularly more information about the target colors and the target configuration.
  
  In terms of the exact map between areas, we do not believe or intend to claim the DLPFC is the first area implicated in the sensorimotor transformation during our perceptual decision-making task. Rather, DLPFC best matches Area 1 of our model. It is important to note that we abstracted our task so that the first area of our model received checkerboard coherence and target configuration as input (and hence did not need to transform task visual inputs). Indeed, in Figure 1d we hypothesize that the early visual areas should contain additional information, which we do not model directly in this work. Future work could model RNNs to take in an image or video input of the task stimulus. In this case, it would be interesting to assess if earlier areas resemble visual cortical areas. We updated the results, where we first present the RNN, to state the inputs explicitly and be clear the inputs are not images or videos of the checkerboard task.
  
  “The RNN input was 4D representing the target configuration and checkerboard signed coherence, while the RNN output was 2D, representing decision variables for a left and right reach (see Methods).”
  
  Another reason that we mapped Area 1 to DLPFC is because anatomical, physiological and lesion studies suggest that DLPFC receives inputs from both the dorsal and ventral stream (Romanski, et, al, 2007; Hoshi, et al, 2006; Wilson, at al, 1993). The dorsal stream originates from the occipital lobe, passes through the posterior parietal cortex, to DLPFC, which carries visuospatial information of the object. The ventral stream originates from the occipital lobe, passes through the inferior temporal cortex, ventrolateral prefrontal cortex to DLPFC, which encodes the identity of the object, including color and texture. In our RNN simulation, Area 1 receives processed inputs of the task: target configuration and the evidence for each color in the checkerboard. Target configuration contains information of the spatial location of the targets, which represents the inputs from the dorsal stream, while evidence for each color by analogy is the input from the ventral stream. Purely visual areas would not fit this dual input from both the dorsal and ventral stream. A potential alternative candidate would be the parietal cortex which is largely part of the dorsal stream and is thought to have modest color inputs (although there is some shape and color selectivity in areas such as LIP, e.g., work from Sereno et al.). On balance given the strong inputs from both the dorsal and ventral stream, we believe Area 1 maps better on to DLPFC than earlier visual areas.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) Line 35/36: Please specify the type of nuisance that the representation is robust to. I guess this refers to small changes in the inputs, not to changes in the representation itself.
  
  Indeed it refers to input variability unrelated to the task. We clarified the text.
  
  (2) For reference, it would be nice to have a tick for the event "Targets on" in Fig.2c.
  
  In this plot, the PSTHs are aligned to the checkerboard onset. Because there is a variable time between target and checkerboard onset, there is a trial-by-trial difference of when the target was turned on, so there is no single place on the x-axis where we could place a “Targets on” tick. In response to this point, we generated a plot with both targets on and check on alignment, with a break in the middle, shown in Supplementary Figure S5.
  
  (3) It would strengthen the comparison between neural data and RNN if the DPCA components of the RNN areas were shown, as they are shown in Fig.2g,h for the neural data.
  
  We include the PSTHs plotted onto the dPCA components here for Area 1 of the exemplar network. Dashed lines indicate a left reach, while solid lines indicate a right reach, and the color corresponds to the color of the selected target. As expected, we find that the dPCA components capture the separation between components. We emphasize that the trajectory paths along the decoder axes are not particularly meaningful to interpret, except to demonstrate whether variables can be decoded or not (as in Fig 2g,h, comparing DLPFC and PMd). The decoder axes of dPCA are not constrained in any way, in contrast to the readout (encoder) axis (see Methods). This is why our manuscript focuses on analyzing the readout axes. However, if the reviewer strongly prefers these plots to be put in the manuscript, we will add them.
  
  Author response image 1.
  
  (4) The session-by-session decode analysis presented in Fig.2i suggests that DLPFC has mostly direction information while in Area 1 target information is on top, as suggested by Fig.3g. An additional decoding analysis on trial averaged neural data, i.e. a figure for neural data analogous to Fig.3g,h, would allow for a more straightforward and direct comparison between RNN and neural data.
  
  We first clarify that we did not decode trial-averaged neural data for either recorded neural data or RNNs. In Fig 3g, h (for the RNN) all decoding was performed on single trial data and then averaged. We have revised the main manuscript to make this clear. Because of this, the mean accuracies we reported for DLPFC and PMd in the text are therefore computed in the same way as the mean accuracies presented in Fig 3g, h. We believe this likely addresses your concern: i.e., the mean decode accuracies presented for both neural data and the RNN were computed the same way.
  
  If the above paragraph did not address your concern, we also wish to be clear that we presented the neural data as histograms, rather than a mean with standard error, because we found that accuracies were highly variable depending on electrode insertion location. For example, some insertions in DLPFC achieved chance-levels of decoding performance for color and target configuration. For this reason, we prefer to keep the histogram as it shows more information than reporting the mean, which we report in the main text. However, if the reviewer strongly prefers us to make a bar plot of these means, we will add them.
  
  (5) Line 129 mentions an analysis of single trials. But in Fig.2i,j sessions are analyzed. Please clarify.
  
  For each session, we decode from single trials and then average these decoding accuracies, leading to a per-session average decoding accuracy. Note that for each session, we record from different neurons. In the text, we also report the average over the sessions. We clarified this in the text and Methods.
  
  (6) Fig.4c,f show how color and direction axes align with the potent subspaces. We assume that the target axis was omitted here because it highly aligns with the color axis, yet we note that this was not pointed out explicitly.
  
  You are correct, and we revised the text to point this out explicitly.
  
  “We quantified how the color and direction axis were aligned with these potent and null spaces of the intra-areal recurrent dynamics matrix of Area 1 ($\W^1_{rec}$). We did not include the target configuration axis for simplicity, since it highly aligns with the color axis for this network.”
  
  (7) The caption of Fig.4c reads: "Projections onto the potent space of the intra-areal dynamics for each area." Yet, they only show area 1 in Fig.4c, and the rest in a supplement figure. Please refer properly.
  
  Thank you for pointing this out. We updated the text to reference the supplementary figure.
  
  (8) Line 300: "We found the direction axis was more aligned with the potent space and the color axis was more aligned with the null space." They rather show that the color axis is as aligned to the potent space as a random vector, but nothing about the alignments with the null space. Contrarily, on line 379 they write "...with the important difference that color information isn't preferentially projected to a nullspace...". Please clarify.
  
  Thank you for pointing this out. We clarified the text to read: “We found the direction axis was more aligned with the potent space”. The text then describes that the color axis is aligned like a random vector: “In contrast, the color axis was aligned to a random vector.”
  
  (9) Line 313: 'unconstrained' networks are mentioned. What constraints are implied there, Dale's law? Please define and clarify.
  
  Indeed, the constraint refers to Dale’s law constraints. We clarified the text: “Further, we found that W<sub>21</sub> in unconstrained 3 area networks (i.e., without Dale's law constraints) had significantly reduced…”
  
  (10) Line 355 mentions a 'feedforward bottleneck'. What does this exactly mean? No E-I feedforward connections, or...? Please define and clarify.
  
  This refers to sparser connections between areas than within an area, as well as a smaller fraction of E-I connections. We clarified the text to read:
  
  “Together, these results suggest that a connection bottleneck in the form of neurophysiological architecture constraints (i.e., sparser connections between areas than within an area, as well as a smaller fraction of E-I connections) was the key design choice leading to RNNs with minimal color representations and consistent with the information bottleneck principle.”
  
  (11) Fig.5c is supposedly without feedforward connections, but it looks like the plot depicts these connections (i.e. identical to Fig.5b).
  
  In Figure 5, we are varying the E to I connectivity in panel B, and the E-E connectivity in panel C. We vary the feedback connections in Supp Fig. S12. We updated the caption accordingly.
  
  (12) For reference, it would be nice to have the parameters of the exemplar network indicated in the panels of Fig.5.
  
  We updated the caption to reference the parameter configuration in Table 1 of the Appendix.
  
  (13) Line 659: incomplete sentence
  
  Thank you for pointing this out. We removed this incomplete sentence.
  
  (14) In the methods section "Decoding and Mutual information for RNNs" a linear neural net decoder as well as a nonlinear neural net decoder are described, yet it was unclear which one was used in the end.
  
  We used the nonlinear network, and clarified the text accordingly. We obtained consistent conclusions using a linear network, but did not include these results in the text. (These are reported in Fig. 11 of Kleinman et al, 2021). Moreover, we also obtain consistent results by using an SVM decoder in Fig. S4 for our exemplar parameter configuration.
  
  (15) In the discussion, the paragraph starting from line 410 introduces a new set of results along with the benefits of minimal representations. This should go to the results section.
  
  We prefer to leave this as a discussion, since the task was potentially too simplistic to generate a clear conclusion on this matter. We believe this remains a discussion point for further investigation.
  
  (16) Fig S5: hard to parse. Show some arrows for trajectories (a) (d) is pretty mysterious: where do I see the slow dynamics?
  
  Slow points are denoted by crosses, which forms an approximate line attractor. We clarified this in the caption.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor recommendations (not ordered by importance)
  
  (1) Be more explicit that the recordings come from different monkeys and are not simultaneously recorded. For instance, say 'recordings from PFC or PMD'. Say early on that PMD recordings come from two monkeys and that PFC recordings come from 1 of those monkeys. Furthermore, I would highlight which datasets are novel and which are not. For instance, I believe the PFC dataset is a previously unpublished dataset and should be highlighted as such.
  
  We added: “The PMd data was previously described in a study by Chandrasekaran and colleagues” to the main text which clarifies that the PMd data was previously recorded and has been analyzed in other studies.
  
  (2) I personally feel that talking about 'optimal', as is done in the abstract, is a bit of a stretch for this simple task.
  
  In using the terminology “optimal,” we are following the convention of IB literature that optimal representations are sufficient and minimal. The term “optimal” therefore is task-specific; every task will have its own optimal representation. We clarify in the text that this definition comes from Machine Learning and Information Theory, stating:
  
  “The IB principle defines an optimal representation as a representation that is minimal and sufficient for a task or set of tasks.”
  
  In this way, we take an information-theoretic view for describing multi-area representations. This view was satisfactory for explaining and reconciling the multi-area recordings and simulations for this task, and we think it is helpful to provide a normative perspective for explaining the differences in cortical representations by brain area. Even though the task is simple, it still allows us to study how sensory/perceptual information is represented, and well as how choice-related information is being represented.
  
  (3) It is mentioned (and even highlighted) in the abstract that we don't know why the brain distributes computations. I agree with that statement, but I don't think this manuscript answers that question. Relatedly, the introduction mentions robustness as one reason why the brain would distribute computations, but then raises the question of whether there is 'also a computational benefit for distributing computations across multiple areas'. Isn't the latter (robustness) a clear 'computational benefit'?
  
  We decided to keep the word “why” in the abstract, because this is a generally true statement (it is unclear why the brain distributes computation) that we wish to convey succinctly, pointing to the importance of studying this relatively grand question (which could only be fully answered by many studies over decades). We consider this the setting of our work. However, to avoid confusion that we are trying to give a full answer to this question, we are now more precise in the first paragraph of our introduction as to the particular questions we ask that will take a step towards this question. In particular, the first paragraph now asks these questions, which we answer in our study.
  
  “For example, is all stimuli and decision-related information present in all brain areas, or do the cortical representations differ depending on their processing stage? If the representations differ, are there general principles that can explain why the cortical representations differ by brain area?”
  
  We also removed the language on robustness, as we agree it was confusing. Thank you for these suggestions.
  
  (4) Figure 2e and Fig. 3d, left, do not look very similar. I suggest zooming in or rotating Figure 2 to highlight the similarities. Consider generating a baseline CCA correlation using some sort of data shuffle to highlight the differences.
  
  The main point of the trajectories is to demonstrate that both Area 1 and DLPFC represent both color and direction. We now clarify this in the manuscript. However, we do not intend for these two plots to be a rigorous comparison of similarity. Rather, we quantify similarity using CCA and our decoding analysis. We also better emphasize the relative values of the CCA, rather than the absolute values.
  
  (5) Line 152: 'For this analysis, we restricted it to sessions with significant decode accuracy with a session considered to have a significant decodability for a variable if the true accuracy was above the 99th percentile of the shuffled accuracy for a session.' Why? Sounds fishy, especially if one is building a case on 'non-decodability'. I would either not do it or better justify it.
  
  The reason to choose only sessions with significant decoding accuracy is that we consider those sessions to be the sessions containing information of task variables. In response to this comment, we also now generate a plot with all recording sessions in Supplementary Figure S7. We modified the manuscript accordingly.
  
  “For this analysis, we restricted it to sessions with significant decode accuracy with a session considered to have a significant decodability for a variable if the true accuracy was above the 99th percentile of the shuffled accuracy for a session. This is because these sessions contain information about task variables. However, we also present the same analyses using all sessions in Fig. S7.”
  
  (6) Line 232: 'The RNN therefore models many aspects of our physiological data and is therefore'. Many seems a stretch?
  
  We changed “many” to “key.”
  
  (7) The illustration in Fig. 4B is very hard to understand, I recommend removing it.
  
  We are unsure what this refers to, as Figure 4B represents data of axis overlaps and is not an illustration.
  
  (8) At some point the authors use IB instead of information bottleneck (eg line 288), I would not do it.
  
  We now clearly write that IB is an abbreviation of Information Bottleneck the first time it is introduced.
  
  (9) Fig. 5 caption is insufficient to understand it. Text in the main document does not help. I would move most part of this figure, or at least F, to supplementary. Instead, I would move the results in S11 and S10 to the main document.
  
  We clarified the caption to summarize the key points. It now reads:
  
  “Overall, neurophysiological architecture constraints in the form of multiple areas, sparser connections between areas than within an area, as well as a smaller fraction of E-I connections lead to a minimal color representation in the last area.”
  
  (10) Line 355: 'Together, these results suggest that a connection bottleneck in the form of neurophysiological architecture constraints was the key design choice leading to RNNs with minimal color representations and consistent with the information bottleneck principle.' The authors show convincingly that increased sparsity leads to the removal of irrelevant information. There is an alternative model of the communication subspace hypothesis that uses low-rank matrices, instead of sparse, to implement said bottlenecks (https://www.biorxiv.org/content/10.1101/2022.07.21.500962v2)
  
  We thank the reviewer for pointing us to this very nice paper. Indeed, a low-rank connectivity matrix is another mechanism to limit the amount of information that is passed to subsequent areas. In fact, the low-rank matrix forms a hard-version of our observations as we found that task-relevant information was preferentially propagated along the top singular mode of the inter-areal connectivity matrix. In our paper we observed this tendency naturally emerges through training with neurophysiological architecture constraints. In the paper, for the multi-area RNN, they hand-engineered the multi-area network, whereas our network is trained. We added this reference to our discussion.
  
  Thank you for your helpful and constructive comments.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.12.548742v2
www.biorxiv.org www.biorxiv.org

Microbes with higher metabolic independence are enriched in human gut microbiomes under stress

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Response to Public Reviewer Comments:
  
  Reviewer 1:
  
  In this work, Veseli et al. present a computational framework to infer the functional diversity of microbiomes in relation to microbial diversity directly from metagenomic data. The framework reconstructs metabolic modules from metagenomes and calculates the per-population copy number of each module, resulting in the proportion of microbes in the sample carrying certain genes. They applied this framework to a dataset of gut microbiomes from 109 inflammatory bowel disease (IBD) patients, 78 patients with other gastrointestinal conditions, and 229 healthy controls. They found that the microbiomes of IBD patients were enriched in a high fraction of metabolic pathways, including biosynthesis pathways such as those for amino acids, vitamins, nucleotides, and lipids. Hence, they had higher metabolic independence compared with healthy controls. To an extent, the authors also found a pathway enrichment suggesting higher metabolic independence in patients with gastrointestinal conditions other than IBD indicating this could be a signal for a general loss in host health. Finally, a machine learning classifier using high metabolic independence in microbiomes could predict IBD with good accuracy. Overall, this is an interesting and well-written article and presents a novel workflow that enables a comprehensive characterization of microbiome cohorts.
  
  We thank the reviewer for their interest in our study, their summary of its findings, and their kind words about the manuscript quality.
  
  Reviewer 2:
  
  This study builds upon the team's recent discovery that antibiotic treatment and other disturbances favour the persistence of bacteria with genomes that encode complete modules for the synthesis of essential metabolites (Watson et al. 2023). Veseli and collaborators now provide an in-depth analysis of metabolic pathway completeness within microbiomes, finding strong evidence for an enrichment of bacteria with high metabolic independence in the microbiomes associated with IBD and other gastrointestinal disorders. Importantly, this study provides new open-source software to facilitate the reconstruction of metabolic pathways, estimate their completeness and normalize their results according to species diversity. Finally, this study also shows that the metabolic independence of microbial communities can be used as a marker of dysbiosis. The function-based health index proposed here is more robust to individuals' lifestyles and geographic origin than previously proposed methods based on bacterial taxonomy.
  
  The implications of this study have the potential to spur a paradigm shift in the field. It shows that certain bacterial taxa that have been consistently associated with disease might not be harmful to their host as previously thought. These bacteria seem to be the only species that are able to survive in a stressed gut environment. They might even be important to rebuild a healthy microbiome (although the authors are careful not to make this speculation).
  
  This paper provides an in-depth discussion of the results, and limitations are clearly addressed throughout the manuscript. Some of the potential limitations relate to the use of large publicly available datasets, where sample processing and the definition of healthy status varies between studies. The authors have recognised these issues and their results were robust to analyses performed on a per-cohort basis. These potential limitations, therefore, are unlikely to have affected the conclusions of this study.
  
  Overall, this manuscript is a magnificent contribution to the field, likely to inspire many other studies to come.
  
  We thank the reviewer for their endorsement of our study and their precision regarding the evaluation of its strengths. We also appreciate their high expectations for its impact in the field.
  
  Reviewer 3:
  
  The major strength of this manuscript is the "anvi-estimate-metabolism' tool, which is already accessible online, extensively documented, and potentially broadly useful to microbial ecologists.
  
  We thank the reviewer for their recognition of the computational advances in this study. We also thank the reviewer for their suggestions that we have addressed below, which allowed us to strengthen our manuscript.
  
  However, the context for this tool and its validation is lacking in the current version of the manuscript. It is unclear whether similar tools exist; if so, it would help to benchmark this new tool against prior methods.
  
  The reviewer brings up a very good point about the lack of context for the `anvi-estimate-metabolism` program. While our efforts that led to the emergence of this software included detailed benchmarking efforts, a formal assessment of its performance and accuracy was indeed lacking. We are thankful for our reviewer to point this out, which motivated us to perform additional analyses to address such concerns. Our revision contains a new, 34-page long supplementary information file (Supplementary File 2) that includes a section titled “Comparison of anvi-estimate-metabolism to existing tools for metabolism reconstruction”. The text therein describes the landscape of currently available software for metabolism reconstruction and describes the features that make `anvi-estimate-metabolism` unique – namely, (1) its implementation of metrics that make it suitable for metagenome-level analyses (i.e., pathway copy number and stepwise interpretation of pathway definitions) and (2) its ability to process user-defined metabolic pathways rather than exclusively relying on KEGG. As described in that section, there is currently no other tool that can compute copy numbers of metabolic pathways from metagenomic data. Hence, it is not quite possible to benchmark the copy number methodology used in our study against prior methods; however, our benchmarking of this functionality with synthetic genomes and metagenomes (described later in this document) does provide necessary quantitative insights into its accuracy and efficiency.
  
  While comparison of the copy number calculations to other tools was not possible due to the unique nature of this functionality, it was possible to benchmark our gene function annotation methodology against existing tools that also annotate genes with KEGG KOfams, which is a step commonly used by various tools that aim to estimate metabolic potential in genomes and metagenomes. In the anvi’o software ecosystem the annotation of genes for metabolic reconstruction is implemented in `anvi-run-kegg-kofams`, and represents a step that is required by `anvi-estimate-metabolism`. As our comparisons were quite extensive and involved additional researchers, we described them in another study which we titled “Adaptive adjustment of significance thresholds produces large gains in microbial gene annotations and metabolic insights” (doi:10.1101/2024.07.03.601779) that is now cited from within our revision in the appropriate context. Briefly, our comparison of anvi’o, Kofamscan, and MicrobeAnnotator using 396 publicly-available bacterial genomes from 11 families demonstrated that `anvi-run-kegg-kofams` is able to identify an average of 12.8% more KO annotations per genome than the other tools, especially in families commonly found in the gut environment (Figure 1). Furthermore, anvi’o recovered the highest proportion of annotations that were independently validated using eggNOG-mapper. Our comparisons also showed that annotations from anvi’o yield at least 11.6% more complete metabolic modules than Kofamscan or MicrobeAnnotator, including the identification of butyrate biosynthesis in Lachnospiraceae genomes at rates similar to manual identification of this pathway in this clade (Figure 2a). Overall, our findings that are now described extensively in DOI:10.1101/2024.07.03.601779 show that our method captures high-quality annotations for accurate downstream metabolism estimates.
  
  We hope these new data help increase the reviewer’s confidence in our results.
  
  Simulated datasets could be used to validate the approach and test its robustness to different levels of bacterial richness, genome sizes, and annotation level.
  
  We thank the reviewer for this suggestion. It was an extremely useful exercise that not only helped us elucidate the nuances of our approach, but also enabled us to further highlight its strengths in our manuscript. We created simulated datasets including a total of 409 synthetic metagenomes that we used to test the robustness of our approach to different genome sizes, community sizes, and levels of diversity. Overall, our tests with these synthetic metagenomes demonstrated that our approach of computing PPCN values to summarize the metabolic capacity within a metagenomic community is accurate and robust to differences in all three critical variables. Most of these variables were weakly correlated between PPCN or PPCN accuracy, and the few correlations that were stronger in fact further supported our original hypothesis that we generated from our comparisons of healthy and IBD gut metagenomes. The methods and results of our validation efforts are explained in detail in our new Supplementary File 2 (see the section titled “Validation of per-population copy number (PPCN) approach on simulated metagenomic data”), but we copy here the subsection that summarizes our findings for the reviewer’s convenience:
  
  Overall impact on the comparison between healthy and IBD gut metagenomes
  
  “In summary, our validation strategy revealed good accuracy at estimating metagenome-level metabolic capacity relative to our genome-level knowledge in the simulated data. While it often underestimated average genomic completeness by ignoring partial copies of metabolic pathways and often overestimated average genomic copy number due to the effect of pathway complementarity between different community members, the magnitude of error was overall limited in range and the error distributions were centered at or near 0. Furthermore, we observed these broad error trends in all cases we tested, and therefore we expect that they would also apply to both sample groups in our comparative analysis. Thus, we next considered how the PPCN approach might have influenced our analyses that considered metagenomes from healthy individuals and from those who have IBD – two groups that differed from one another with respect to some of the variables considered in our tests.
  
  Most of the correlations between PPCN or PPCN accuracy and sample parameters were weak, yet significant (Table 1). They showed that community size and diversity level have limited influence on the PPCN calculation, while genome size does not influence its accuracy. The only exception was the moderate correlation between PPCN and genome size, particularly for the subset of IBD-enriched pathways. It was a negative correlation with the proportion of small genomes in a metagenome, indicating that PPCN values for these pathways are larger when there are more large genomes in the community and suggesting that these pathways tend to occur frequently in larger genomes. This is in line with our observation that IBD communities contain more large genomes and therefore confirms our interpretation that the populations surviving in the IBD gut microbiome are those with the genomic space to encode more metabolic capacities.
  
  If we consider even the weak correlations, two of those relationships indicate that our approach would be more accurate for IBD metagenomes than for healthy metagenomes. For instance, PPCN accuracy was slightly higher for smaller communities (as in IBD samples), with a weakly positive correlation between PPCN error and community size. It was also slightly more accurate for less diverse communities (as in IBD samples), with a weakly positive correlation between PPCN error and number of phyla. The only opposing trend was the weakly positive correlation between PPCN error and proportion of smaller genomes, which favors higher accuracy in communities with smaller genomes (as in healthy samples). Given that our analysis focuses on the pathways enriched in IBD samples, an overall higher accuracy in IBD samples would increase the confidence in our enrichment results.
  
  We also examined the accuracy of our method to predict the number of populations within a metagenome based on the distribution and frequency of single-copy core genes (i.e., the denominator in the calculation of PPCN). Our benchmarks show that the estimates are overall accurate, where most errors reflect a negligible amount of underestimations of the actual number of populations. Errors occurred more frequently for the realistic synthetic assemblies generated from simulated short read data than for the ideal synthetic assemblies generated from the combination of genomic contigs. The correlations between estimation accuracy and sample parameters indicated that the population estimates are more accurate for smaller communities and communities with more large genomes, as in IBD samples (Table 2). Thus, this method is more likely to underestimate the community size in healthy samples, and these errors could lead to overestimation of PPCN in healthy samples relative to IBD samples. Thus, the enrichment of a given pathway in the IBD samples would have to overcome its relative overestimation in the healthy sample group, making it more likely that we identified pathways that were truly enriched in the IBD communities.
  
  Overall, the consideration of our simulations in the context of healthy vs IBD metagenomes suggest that slight biases in our estimates as a function of unequal diversity with sample groups should have driven PPCN calculations towards a conclusion that is opposite of our observations under neutral conditions. Thus, clear differences between healthy vs IBD metagenomes that overcome these biases suggest that biology, and not potential bioinformatics artifacts, is the primary driver of our observations.”
  
  Accordingly, we have added the following sentence summarizing the validation results to our paper:
  
  “Our validation of this method on simulated metagenomic data demonstrated that it is accurate in capturing metagenome-level metabolic capacity relative to genome-level metabolic capacity estimated from the same data (Supplementary File 2, Supplementary Table 6).”
  
  Early in this process of validation, we identified and fixed two minor bugs in our codebase. The bugs did not affect the results of our paper and therefore did not warrant a re-analysis of our data. The first bug, which is detailed in the Github issue https://github.com/merenlab/anvio/issues/2231 and fixed in the pull request https://github.com/merenlab/anvio/pull/2235, led to the overestimation of the number of microbial populations in a metagenome when the metagenome contains both Bacteria and Archaea. None of the gut metagenomes analyzed in our paper contained archaeal populations, so this bug did not affect our community size estimates.
  
  The second bug, which is detailed in the Github issue https://github.com/merenlab/anvio/issues/2217 and fixed in the pull request https://github.com/merenlab/anvio/pull/2218, caused inflation of stepwise copy numbers for a specific type of metabolic pathway in which the definition contained an inner parenthetical clause. This bug affected only 3 pathways in the KEGG MODULE database we used for our analysis, M00083, M00144, and M00149. It is worth noting that one of those pathways, M00083, was identified as an IBD-enriched module in our analysis. However, the copy number inflation resulting from this bug would have occurred equivalently in both the healthy and IBD sample groups and thus should not have impacted our comparative analysis.
  
  Regardless, we are grateful for the suggestion to validate our approach since it enabled us to identify and eliminate these minor issues.
  
  The concept of metabolic independence was intriguing, although it also raises some concerns about the overinterpretation of metagenomic data. As mentioned by the authors, IBD is associated with taxonomic shifts that could confound the copy number estimates that are the primary focus of this analysis. It is unclear if the current results can be explained by IBD-associated shifts in taxonomic composition and/or average genome size. The level of prior knowledge varies a lot between taxa; especially for the IBD-associated gamma-Proteobacteria.
  
  The reviewer brings up an important point, and we are thankful for the opportunity to clarify the impact of taxonomy on our analysis. Though IBD has been associated with taxonomic shifts in the gut microbiome, a major problem with such associations is that the taxonomic signal is extremely variable, leading to inconsistency in the observed shifts across different studies (doi:https://doi.org/10.3390/pathogens8030126). Indeed, one of the most comprehensive prior studies into this topic demonstrated that inter-individual variation is the largest contributor to all multi-omic measurements aiming to differentiate between the gut microbiome of individuals with IBD from that of healthy individuals, including taxonomy (doi:10.1038/s41586-019-1237-9). We therefore took a different approach to study this question that is independent of taxonomy, by focusing on metabolic potential estimated directly from metagenomes to elucidate an ecological explanation behind the reduced diversity of the IBD gut microbiome, which studies of taxonomic composition alone are not able to provide. Furthermore, the variability inherent to taxonomic profiles of the gut microbiome makes it unlikely that taxonomic shifts could confound our analysis, especially given our large sample set encompassing a variety of individuals with different origins, ages, and genders.
  
  We agree with the reviewer that our level of prior knowledge varies substantially across taxa. Regardless, the only prior knowledge with any bearing on our ability to estimate metabolic capacity in a taxonomy-independent manner is the extent of sequence diversity captured by our annotation models for the enzymes used in metabolic pathways. During our analysis, we had observed that metagenomes in the healthy group had fewer gene annotations than those in the IBD group and we therefore shared the reviewer’s concern about potential annotation bias, whereby less-studied genomes are not always incorporated into the Hidden Markov Models for annotating KEGG Orthologs, perhaps making it more likely for us to miss annotations in these genomes (and leading to lower completeness scores for metabolic pathways in the healthy samples). Our annotation method partially addresses this limitation by taking a second look at any unannotated genes and mindfully relaxing the bit score similarity thresholds to capture annotations for any genes that are slightly too different from reference sequences for annotation with default thresholds. As mentioned previously, our recent preprint demonstrates the efficacy of this strategy (doi:10.1101/2024.07.03.601779). To further address this concern, we also investigated the extent of distant homology in these metagenomes using AGNOSTOS (doi:https://doi.org/10.7554/eLife.67667), which showed a higher proportion of unknown genes in the healthy metagenomes and suggested that a substantial portion of the unannotated genes are not distant homologs of known enzymes that we failed to annotate due to lack of prior knowledge about them, but rather are completely novel functions. To describe these results, we added the following paragraph and two accompanying figures (Supplementary Figure 4g-h) to the section “Differential annotation efficiency between IBD and Healthy samples” in Supplementary File 1:
  
  “To understand the potential origins of the reduced annotation rate in healthy metagenomes, we ran AGNOSTOS (Vanni et al. 2022) to classify known and unknown genes within the healthy and IBD sample groups. AGNOSTOS clusters genes to contextualize them within an extensive reference dataset and then categorizes each gene as ‘known’ (has homology to genes annotated with Pfam domains of known function), ‘genomic unknown’ (has homology to genes in genomic reference databases that do not have known functional domains), or ‘environmental unknown’ (has homology to genes from metagenomes or MAGs that do not have known functional domains). The resulting classifications confirm that healthy metagenomes contain fewer ‘known’ genes than metagenomes in the IBD sample group – the proportion of ‘known’ genes classified by AGNOSTOS is about 3.0% less in the healthy metagenomes than in the IBD sample group, which is similar to the ~3.5% decrease in the proportion of ‘unannotated’ genes observed by simply counting the number of genes with at least one functional annotation (Supplementary Figure 4g-h, Supplementary Table 1e). Furthermore, the majority of the unannotated genes in either sample group were categorized by AGNOSTOS as ‘genomic unknown’ (Supplementary Figure 4g), suggesting that the unannotated sequences are genes without biochemically-characterized functions currently associated with them and are thus legitimately lacking a functional annotation in our analysis, rather than representing distant homologs of known protein families that we failed to annotate. Based upon the classifications, a systematic technical bias is unlikely driving the annotation discrepancy between the sample groups.”
  
  Furthermore, we have already discussed this limitation and its implications in our manuscript (see section “Key biosynthetic pathways are enriched in microbial populations from IBD samples”). To further clarify that our approach is independent of taxonomy, we have now also amended the following statement in our introduction:
  
  “Here we implemented a high-throughput, taxonomy-independent strategy to estimate metabolic capabilities of microbial communities directly from metagenomes and investigate whether the enrichment of populations with high metabolic independence predicts IBD in the human gut.”
  
  Finally, the reviewer is also correct that genome size is a part of the equation, as genome size and level of metabolic capacity are inextricable. In fact, we observed this in our analysis, as already stated in our paper:
  
  “HMI genomes were on average substantially larger (3.8 Mbp) than non-HMI genomes (2.9 Mbp) and encoded more genes (3,634 vs. 2,683 genes, respectively)”
  
  Since larger genomes have the space to encode more functional capacity, it follows that having higher metabolic independence would require a microbe to have a larger genome. The validation of our method on simulated metagenomic data supported this idea by demonstrating that the IBD-enriched metabolic pathways are commonly identified in large genomes. The validation also proved that genome size does not influence the accuracy of our approach (Supplementary File 2).
  
  It can be difficult to distinguish genes for biosynthesis and catabolism just from the KEGG module names and the new normalization tool proposed herein markedly affects the results relative to more traditional analyses.
  
  We agree with the reviewer that KEGG module names do not clearly indicate the presence of biosynthetic genes of interest. That said, KEGG is a commonly-used and extensively-curated resource, and many biologists (including ourselves) trust their categorization of genes into pathways. We hope that readers who are interested in specific genes within our results would make use of our publicly-available datasets (which include gene annotations) to conduct a targeted analysis based on their expertise and research question.
  
  However, we would like to respectfully note that the ability to distinguish the genes within each KEGG module may not be very useful to most readers, and is unlikely to have a meaningful impact in our findings. As the reviewer most likely appreciates, the presence of individual genes in isolation can be insufficient to indicate biosynthetic capacity, considering that 1) most biosynthetic pathways involve several biochemical conversions requiring a series of enzymes, 2) enzymes are often multi-functional rather than exclusive to one pathway, and 3) different organisms in a community may utilize enzymes encoded by different genes to perform the same or similar biochemical reaction in a pathway. We therefore made the choice to analyze metabolic capacity at the pathway level, because this would better reflect the biosynthetic abilities encoded by the multiple microbial populations within each metagenome.
  
  The reviewer also suggests that our novel normalization method affects our results, yet we believe that this normalization strategy is one of the strengths of our study in comparison to ‘more traditional analyses’ as it enables an appropriate comparison between metagenomes describing microbial communities of dramatically different degrees of richness. Indeed, we suspect that the lack of normalization in more traditional analyses may be one reason why prior analyses have so far failed to uncover any mechanistic explanation for the loss of diversity in the IBD gut microbiome. We hope that our validation efforts were sufficiently convincing in demonstrating the suitability of our approach, and copy here a particularly illuminating section of the validation results that we have added to Supplementary Information File 2:
  
  “As expected, we observed a significant positive correlation between metagenomic copy number (the numerator of PPCN) and community size in each group, likely driven by the increase in the copy number of core metabolic pathways in larger communities (Supplementary Figure 18). Interestingly, this correlation was much stronger for the subset of IBD-enriched pathways (0.49 <= R <= 0.67) than for all modules (0.12 <= R <=0.13).
  
  “However, the correlation was much weaker and often nonsignificant for the normalized PPCN data in both groups of modules (all modules: 0.01 < R < 0.04, enriched modules: 0.04 < R < 0.09, Supplementary Table 6b, Supplementary Figure 19), which demonstrates the suitability of our normalization method to remove the effect of community size in comparisons of metagenome-level metabolic capacity.”
  
  As such, it seems safer to view the current analysis as hypothesis-generating, requiring additional data to assess the degree to which metabolic dependencies are linked to IBD.
  
  We certainly agree with the reviewer that our study, similar to the vast majority of studies published every year, is a hypothesis-generating work. Any idea proposed in any scientific study in life sciences will certainly benefit from additional data analyses, and therefore we respectfully do not accept this as a valid criticism of our work. The inception of this study is linked to an earlier work that hypothesized high metabolic independence as a determinant of microbial fitness in stressed gut communities (doi:10.1186/s13059-023-02924-x), which lacked validation on larger sets of data. Our study tests this original hypothesis using a large number of metagenomes, and lends further support for it with approaches that are now better validated. Furthermore, there are other studies that agree with our interpretation of the data (doi:10.1101/2023.02.17.528570, doi:10.1038/s41540-021-00178-6), and we look forward to more computational and/or experimental work in the future to generate more evidence to evaluate these insights further.
  
  Response to Recommendations for the Authors
  
  Reviewer 1:
  
  My main comments include:
  
  - From the results reported in lines 178-185, it seems that metabolic pathways in general were enriched in IBD microbiomes, not specifically biosynthetic pathways. Can we really say then that the signal is specific for biosynthesis capabilities?
  
  We apologize for the confusion here. When we read the text again, we ourselves were confused with our phrasing.
  
  The reviewer is correct that a similar proportion of both biosynthetic and non-biosynthetic pathways had elevated per-population copy number (PPCN) values in the IBD samples. However, the low microbial diversity associated with IBD and the on average larger genome size of individual populations contributes to this relative enrichment of the majority of metabolic modules. To remove this bias and identify specific modules whose enrichment was highly conserved across microbial populations associated with IBD, we implemented two criteria: 1) we selected modules that passed a high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value < 2e-10), and 2) we accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12).
  
  This analysis revealed a set of metabolic modules that were consistently and highly significantly enriched in microbial communities associated with IBD. The majority of these metabolic modules encode biosynthesis pathways. Our use of the terms “elevated”, “enriched”, and “significantly enriched” in the previous version of the text was confusing to the reader. We thank the reviewer for pointing this out, and we hope that our revision of the text clarifies the analysis strategy and observations:
  
  “To gain insight into potential metabolic determinants of microbial survival in the IBD gut environment, we assessed the distribution of metabolic modules within samples from each group (IBD and healthy) with and without using PPCN normalization. Without normalizing, module copy numbers were overall higher in healthy samples (Figure 2a) and modules exhibited weak differential occurrence between cohorts (Figure 2b, 2c, Supplementary Figure 3). The application of PPCN reversed this trend, and most metabolic modules were elevated in IBD (Supplementary Figure 5). This observation is influenced by two independent aspects of the healthy and IBD microbiota. The first one is the increased representation of microbial organisms with smaller genomes in healthy individuals (Watson et al. 2023), which increases the likelihood that the overall copy number of a given metabolic module is below the actual number of populations. In contrast, one of the hallmarks of the IBD microbiota is the generally increased representation of organisms with larger genomes (Watson et al. 2023). The second aspect is that the generally higher diversity of microbes in healthy individuals increases the denominator of the PPCN. This results in a greater reduction in the PPCN of metabolic modules that are not shared across all members of the diverse gut microbial populations in health.
  
  To go beyond this general trend and identify modules that were highly conserved in the IBD group, we first selected those that passed a relatively high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value < 2e-10). We then accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12). This stringent filtering revealed a set of 33 metabolic modules that were significantly enriched in metagenomes obtained from individuals diagnosed with IBD (Figure 2d, 2e), 17 of which matched the modules that were associated with high metabolic independence previously (Watson et al. 2023) (Figure 2f). This result suggests that the PPCN normalization is an important step in comparative analyses of metabolisms between samples with different levels of microbial diversity.”
  
  Lines 178-185 from our original submission have been removed to avoid further confusion. These results can be found in Supplementary File 1 (section “Module enrichment without consideration of effect size leads to nonspecific results”).
  
  It is not entirely clear to me what is meant by PPCN normalization. Normalize the number of copy numbers to the overall number of genes?
  
  The idea behind using per-population copy number (PPCN) is to normalize the prevalence of each metabolic module found in an environment with the number of microbial populations within the same sample. PPCN achieves this by dividing the pathway copy numbers by the number of microbial populations in a given metagenome, which we estimate from the frequency of bacterial single-copy core genes. We have updated the description of the per-population copy number (PPCN) calculation to clarify its use:
  
  “Briefly, the PPCN estimates the proportion of microbes in a community with a particular metabolic capacity (Figure 1, Supplementary Figure 2) by normalizing observed metabolic module copy numbers with the ‘number of microbial populations in a given metagenome’, which we estimate using the single-copy core genes (SCGs) without relying on the reconstruction of individual genomes.”
  
  We also note that the equation for PPCN is shown in Figure 1.
  
  It is also not clear to me how the classifier predicts stress on microbiomes rather than dysbiosis.
  
  The reviewer asks an interesting question since it is true that we could also use the term “dysbiosis” rather than “stress”. Yet we refrained from the use of dysbiosis as it is considered a poorly-defined term to describe an altered microbiome often associated with a specific disease (doi:https://doi.org/10.3390/microorganisms10030578), such as IBD, relative to another poorly-defined state, “healthy microbiome” (doi:https://doi.org/10.1002/phar.2731). We do consider that stress is not necessarily a term that is less vague than dysbiosis, yet it has the advantage of being more common in studies of ecology compared to dysbiosis. Our relatively neutral stance towards which term to use has shifted dramatically due to one critical observation in our study: the identical patterns of enrichment of HMI microbes in individuals diagnosed with IBD as well as in healthy individuals treated with antibiotics. We appreciate that the observed changes in the antibiotics case can also fulfill the definition of “dysbiosis”, but the term “stress response” more accurately describes what the classifier identifies in our opinion.
  
  What is the advantage of using the estimate-metabolism pipeline presented in this article over workflows such as those using genome-scale models, which are repeatedly cited and discussed?
  
  Genome-scale models are often appropriate for a big-picture view of metabolism, and especially when the capability to perform quantitative simulations like flux-balance analysis is needed. For our investigation, we wanted a more specific and descriptive summary of metabolic capacity, so we focused on individual KEGG modules, which qualitatively describe subsets of the vast metabolic network with pathway names that all readers can understand, rather than working with an abstract model of the entire network. Furthermore, genome-scale models would have prevented us from assessing the redundancy (copy number) of metabolic pathways, as these networks usually focus on the presence-absence of gene annotations for enzymes in the network rather than the copy number of these annotations. The copy number metric has been critical for our analyses, considering that we are focusing on metabolic capacity at the community level and require the ability to normalize this metabolic capacity by the size of the community described by each metagenome. Finally, assessing a discrete set of metabolic pathways yielded a corresponding set of features that we used to create the machine learning classifier, whereas data from genome-scale models would not be as easily transferable into classifier features.
  
  Minor comments:
  
  Figure 2d and e are mentioned in the text before Figure 2a.
  
  We thank the reviewer for catching this. We have rewritten the section as follows to put the figure references in numerical order:
  
  !To gain insight into potential metabolic determinants of microbial survival in the IBD gut environment, we assessed the distribution of metabolic modules within samples from each group (IBD and healthy) with and without using PPCN normalization. Without normalizing, module copy numbers were overall higher in healthy samples (Figure 2a) and modules exhibited weak differential occurrence between cohorts (Figure 2b, 2c, Supplementary Figure 3). After the application of PPCN, most metabolic modules were elevated in IBD (Supplementary Figure 5). This observation is a product of two independent aspects of the healthy and IBD microbiota. The first one is the increased representation of microbial organisms with smaller genomes in healthy individuals (Watson et al. 2023), which increases the likelihood that the overall copy number of a given metabolic module is below the actual number of populations. In contrast, one of the hallmarks of the IBD microbiota is the generally increased representation of organisms with larger genomes (Watson et al. 2023). The second aspect is that the generally higher diversity of microbes in healthy individuals increases the denominator of the PPCN due to the higher number of populations detected in these samples. This results in a greater reduction in the PPCN of metabolic modules that are not shared across all members of the diverse gut microbial populations in health. To go beyond this general trend and identify modules that were highly conserved in the IBD group, we first selected those that passed a relatively high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value <2e-10). We then accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12). This stringent filtering revealed a set of 33 metabolic modules that were significantly enriched in metagenomes obtained from individuals diagnosed with IBD (Figure 2d, 2e), 17 of which matched the modules that were associated with high metabolic independence previously (Watson et al. 2023) (Figure 2f). This result suggests that the PPCN normalization is an important step in comparative analyses of metabolisms between samples with different levels of microbial diversity.!
  
  How much preparation is needed for users that want to apply the estimate-metabolism pipeline to their own datasets? From the documentation at anvi'o, it still seems like a significant effort.
  
  We thank the reviewer for this important question. The use of anvi-estimate-metabolism is simple, but the concept it makes available and the means it offers its users to interact with their data are not basic, thus its use requires some effort. Anvi’o provides users with the ability to directly interact with their data at each step of the analysis to have full control over the analysis and to make informed decisions on the way. In comparison to pre-defined analysis pipelines that often require no additional input from the user, this approach requires some level of involvement of the user throughout the process – namely, they must run a few programs in series rather than running just one pipeline command that quietly handles everything on their behalf. The most basic workflow for using `anvi-estimate-metabolism` is quite straightforward and requires four simple steps following the installation of anvi’o: 1. Run the program `anvi-setup-kegg-data` to download the KEGG data. 2. Convert the assembly FASTA file into an anvi’o-compatible database format with gene calls by running `anvi-gen-contigs-database`. 3. Annotate genes with KOs with the program `anvi-run-kegg-kofams`. 4. Get module completeness scores and copy numbers by running `anvi-estimate-metabolism`. In addition, we provide simple tutorials (such as the one at https://anvio.org/tutorials/fmt-mag-metabolism/) and reproducible bioinformatics workflows online (including for this study at https://merenlab.org/data/ibd-gut-metabolism/) which helps early career researchers to apply similar strategies to their own datasets. We are happy to report that we have been using this tool in our undergraduate education, and observed that students with no background in computation were able to apply it to their questions without any trouble.
  
  Reviewer 2:
  
  Congratulations on this great work, the manuscript is a pleasure to read. Minor questions that the authors might want to clarify:
  
  L 275: Why use reference genomes from the GTDB (for only 3 phyla) instead of using MAGs reconstructed from the data? I understand that assemblies based on individual samples would probably not yield enough complete MAGs, but I would expect that co-binning the assemblies for the entire dataset would.
  
  We thank the reviewer for their kind words. We certainly agree that metagenome assembled genomes (MAGs) reconstructed directly from the assemblies would by nature represent the populations in these communities better than reference genomes. However, one of our aims in this study was to avoid the often error-prone and time-consuming step of reconstructing MAGs. Most automatic binning algorithms inevitably make mistakes, and especially for metabolism estimation, low quality MAGs can introduce a bias in the analysis. At the same time the manual curation of each bin to remove any contamination would require a substantial effort and make the workflow less accessible for others to use. As an example, in our previous work (doi:10.1186/s13059-023-02924-x), careful refinement of MAGs from just two co-assemblies took two months. Here, we developed the PPCN workflow as a more scalable, assembly-level analysis to avoid the need for binning in the first place.
  
  To supplement and confirm the metagenome-level results, we decided to run a genome-level analysis. We used the GTDB since it represents the most comprehensive, dereplicated collection of reference genomes across the tree of life. We chose those 3 phyla in particular because of their ecological relevance in the human gut environment. Bacteroidetes and
  
  Firmicutes together represent the majority (up to ~90%) of the populations in healthy individuals (doi:10.1038/nature07540), and Proteobacteria represent the next most abundant phylum on average (2% ± 10%) (doi:10.1371/journal.pone.0206484).
  
  L 403: Should the Franzosa and Papa papers be referenced as numbers?
  
  Thanks for pointing this out. The rogue numerical citation was actually an artifact of the submission and was corrected to a long-format citation in the online version of the manuscript on the eLife website.
  
  Reviewer 3:
  
  The lack of any experimental validation contributes to the tentative nature of the conclusions that can be drawn at this time. Numerous studies have looked at the metabolism of gut bacterial species during in vitro growth, which could be mined to test if the in silico predictions of metabolism can be supported. Alternatively, the authors could isolate key strains of interest and study them in culture or in mouse models of IBD.
  
  We appreciate these suggestions and agree with the reviewer that experimental validation is important. However, we do not agree that either the use of mouse models or the isolation of individual microbial strains would be an appropriate experimental test in this case. The use of humanized gnotobiotic mice has critical limitations (see doi:10.1016/j.cell.2019.12.025 and references within the section on “human microbiota-associated murine models”). As it is not possible to establish a mouse model whose gut microbiota fully reflect the human gut microbiome, such an approach would neither be appropriate to validate our findings, nor would it have been possible to produce the insights we have gained based on environmental data. We are not sure how exactly a mouse model, even when ignoring the well established limitations, could improve or validate a comprehensive analysis of a large “environmental” datasets that resulted in highly significant signals.
  
  We are also not sure that we understand how the reviewer believes that the isolation of individual strains would aid in validating our findings. While we appreciate that not all relevant genes are captured by the available annotation routines and that some genes may be misannotated, the large dataset used here renders these concerns negligible. Isolating a small subset of bacterial populations would hardly lead to a representative sample and testing their metabolic capacities in vitro would not improve the reliability of our analysis.
  
  Boilerplate suggestions as vague as “isolate key strains of interest” or “experiment in mouse models of IBD” do not add or retract anything from our findings. Our findings and hypotheses are well supported by our data and extensive analyses.
  
  Line 9 - not sure this approach is hypothesis testing in the traditional sense, you might reword.
  
  Hypothesis testing occurs when one makes an observation, develops an hypothesis that explains the observation, and then gathers and analyzes data to investigate whether additional data support or disprove the hypothesis. We are not convinced a reword is necessary.
  
  Line 40 - the lack of consistent differences in IBD and healthy individuals does not mean that the microbiome doesn't impact disease. It's important to consider all the mechanistic studies in animal models and other systems.
  
  Our study does not claim that microbiome has no impact on the course of disease.
  
  Line 50 - this seemed out of place and undercuts the current findings. Upon checking Ref. 31, the analysis seems distinct enough to not mention in the introduction.
  
  We disagree. Ref 31 uses genome-scale metabolic models to identify the loss of cross-feeding interactions in the gut microbiome of individuals with IBD, which is another way of saying that the microbes in IBD no longer rely on their community for metabolic exchange – in other words, they are metabolically independent. This is an independent observation that is parallel to our results and confirms our analysis; hence, it is important to keep in our introduction.
  
  Line 55 - Ref. 32 looked at FMT, which should be explicitly stated here.
  
  The reviewer’s suggestion is not helpful. Ref 32 has a significant focus on IBD as it compares a total of 300 MAGs generated from individuals with IBD to 264 MAGs from healthy individuals and shows differences in metabolic enrichment between healthy and IBD samples independent of taxonomy, thus setting the stage for our current work. What model has been used to generate the initial insights that led to the IBD-related conclusion in Ref 32 has no significance in this context.
  
  Lines 92-107 - this text is out of place in the Results section and reads more like a review article. Please trim it down and move it to the introduction.
  
  We would like to draw the reviewer’s attention to the fact that this is a “Result and Discussion” section. In this specific case it is important for readers to appreciate the context for our new tool, as the reviewer commented in the public review. We kindly disagree with the reviewer’s suggestion to remove this text as that would diminish the context.
  
  Line 107 - is "selection" the word you meant to use?
  
  If the frequency of a given metabolic module remains the same or increases despite the decreasing diversity of the microbial community, it is conceivable to assume that its enrichment indicates the presence of a selective process to which the module responds. It is indeed the word we meant to use.
  
  Line 110 - this is the first mention of this new method, need to add it to the abstract and introduction.
  
  The reviewer must have overlooked the text passages in which we mention the strategy we developed within the abstract:
  
  “Here, we tested this hypothesis on a large scale, by developing a software framework to quantify the enrichment of microbial metabolisms in complex metagenomes as a function of microbial diversity.”
  
  And in the last paragraph of the introduction:
  
  “Here we implemented a high-throughput, taxonomy-independent strategy to estimate metabolic capabilities of microbial communities directly from metagenomes…”
  
  Figure 1 - a nice summary, but no data is shown to support the validity of this model. Consider shrinking the cartoon and adding validation with simulated datasets.
  
  We hope we have addressed this recommendation with the extensive validation efforts summarized above.
  
  Line 134 - need to state the FDR and effect size cutoffs used.
  
  We have reworded this sentence as follows to clarify which thresholds were used:
  
  “We identified significantly enriched modules using an FDR-adjusted p-value threshold of p < 2e-10 and an effect size threshold of > 0.12 from a Wilcoxon Rank Sum Test comparing IBD and healthy samples.”
  
  I'm also concerned about the simple comparison of IBD to healthy without adjusting for confounders like study, geographical location, age, sex, drug use, diet, etc. More text is needed to explain the nature of these data, how much metadata is available, and which other variables distinguish IBD from healthy.
  
  The reviewer is correct that there is a large amount of interindividual variation between samples due to host and environmental factors. However, the lack of adjusting for confounders was intentional, and in fact one of the critical strengths of our study. We observe a clear signal between healthy individuals and individuals diagnosed with IBD, despite the amount of interindividual variation in our diverse set of samples from 13 different studies (details of which are summarized in Supplementary Table 1). The clear increase in predicted metabolic capacity that we consistently observe in IBD patients using both metagenomes and genomes across diverse cohorts points to metabolic independence as a high-level trend that is predictive of microbial prevalence in stressed gut environments irrespective of host factors.
  
  Line 145 - calling PPCN normalization an "essential step" is a huge claim and requires a lot more data to back it up. Might be best to qualify this statement.
  
  We hope we have addressed this recommendation with our validation efforts. Supplementary Figures 18 and 19 in particular show evidence for the necessity of the normalization step. It is indeed an essential step if the purpose is to compare metabolic enrichment between cohorts of highly different microbial diversity.
  
  Figure 2a - the use of a 1:1 trend line seems potentially misleading. I would replace it with a best-fit line.
  
  Our purpose here was not to show the best fit. Instead, the 1:1 trend line separates the modules based on their relative abundance distribution between healthy individuals and individuals diagnosed with IBD. If the module is to the left of the line, it has a higher median copy number in healthy individuals and if the module is to the right, it has a higher median copy number in individuals with IBD. The line also helps to demonstrate the shift that occurs between the unnormalized data in Figure 2a. Without the normalization, more modules occur to the left of the
  
  1/1 line as a result of the higher raw copy numbers in healthy metagenomes which simply contain more microbial populations. With the normalization (Figure 2d), more modules fall on the right side of the 1/1 line due to higher PPCN values. A best-fit line would not serve well for these purposes.
  
  The text should be revised to state that this analysis actually did find many significant differences and to discuss whether they were the same modules identified in Figure 2d.
  
  We apologize for the confusion and thank the reviewer for bringing this issue to our attention. As mentioned above, the disparate levels of microbial diversity between healthy individuals and individuals with IBD resulted in much larger copy numbers of metabolic modules in healthy samples reflecting the often much larger communities. Hence, we ran statistical tests only on normalized (PPCN) data. The p-values associated with each module in Figure 2a, as well as the colors of each point, are based on the PPCN data in Figure 2d. We aimed to improve the clarity of the visual comparison between normalized and unnormalized results by identifying the same set of IBD-enriched modules in plots a-c and plots d-f.
  
  That being said, the reviewer’s comment made us realize the potential for confusion when using the normalized data’s statistical results in Figure 2a that otherwise shows results from unnormalized data. We have now run the same statistical test on the unnormalized (raw copy number) data and re-generated Figure 2a with the new FDR-adjusted p-values and points colored based on the statistical tests using unnormalized data. We’ve also removed the arrow connecting to Figure 2b (since we no longer show the same set of IBD-enriched modules in Figures 2a and 2b), and added a dashed line to indicate the effect size threshold (similar to the one in Figure 2d). We have updated the legend for Figure 2a-d to reflect these changes:
  
  When we used the same p-value threshold (p < 2e-10) as before and also filtered for an effect size larger than the mean (the same strategy used to set our effect size threshold for the normalized data), there are 10 modules that are significantly enriched based on the unnormalized data. Of course, it is difficult to gauge the relevance of these 10 modules to microbial fitness in the IBD gut environment since their raw copy numbers do not tell us anything about the relative proportion of community members that harbor these modules. Therefore, we are reluctant to add these modules to the results text. For the record, only 3 of those modules were also significantly enriched based on the normalized PPCN values: M00010 (Citrate cycle, first carbon oxidation), M00053 (Pyrimidine deoxyribonucleotide biosynthesis), and M00121 (Heme biosynthesis).
  
  Figure 2c,f - these panels raise a lot of concerns given that the choice of method inverts the trend. Without additional data/validation, it's hard to know which method is right.
  
  We hope we have addressed this recommendation with the extensive validation efforts summarized above. Inversion of the trend is an expected outcome, because the raw copy numbers of most metabolic modules are much lower in the IBD sample group due to lower community sizes.
  
  Line 167 - Need to take the KEGG names with a grain of salt, just because it says "biosynthesis" doesn't mean that the pathway goes in that direction in your bacterium of interest.
  
  We believe the reviewer is under a misapprehension regarding the general reversibility of KEGG metabolic modules, or indeed of metabolic pathways. Most metabolic pathways have one or several (practically) irreversible reactions. To demonstrate this for the 33 IBD-enriched modules, we evaluated their reversibility based upon their corresponding KEGG Pathway Maps, which indicate reaction reversibility via double-sided arrows. Aside from the signature modules M00705 and M00627, in 26 out of 31 pathway modules one or more irreversible reactions render these pathways one-directional. Indeed, on average the majority (54%) of the reactions in a given module are irreversible. When focusing on the 23 “biosynthesis” modules, 22 out of 23 (96%) modules have at least one irreversible reaction, and on average 64% of a given module’s reactions are irreversible. These data (which can be accessed at doi:10.6084/m9.figshare.27203226 for the reviewer’s convenience) challenge the reviewer’s notion that pathway directionality is free to change arbitrarily, since the presence of even one irreversible reaction effectively blocks the flux in the opposing direction. Thus, “biosynthesis” is indeed a meaningful term in KEGG module names.
  
  That said, KEGG Pathway Maps, though highly curated, are likely not the final word on whether a given reaction in a metabolic pathway can be considered reversible or irreversible in each microbial population and under all conditions. And our analysis, like many others that rely on metagenomic data, does not consider the environmental conditions in the gut such as temperature or metabolite concentrations that might influence the Gibbs free energy and thus the directionality of these reactions in vivo. However, even assuming general reversibility of metabolic pathways, this would not invalidate the fact that these microbes have the metabolic capacity to synthesize the respective molecules. In other words, the potential reversibility of pathways is irrelevant to our analysis since we are describing metabolic potential. The lac operon in E. coli might only be expressed in the absence of glucose, but E. coli always has the capability to degrade lactose regardless of whether that pathway is active. Thus, our overall conclusion that gut microbes associated with IBD are metabolically self-sufficient (encoding the enzymatic capability to synthesize certain key metabolites) remains valid irrespective of fixed or flexible pathway directionality.
  
  It's also important to be careful not to conflate KEGG modules (small subsets of a pathway) with the actual metabolic pathway. It's possible to have a module change in abundance while not altering the full pathway. Inspection of the individual genes could help in this respect - are they rate-limiting steps for biosynthesis or catabolism?
  
  The reviewer is absolutely correct that KEGG modules do not necessarily represent full pathways. We have updated the language in our manuscript to explicitly refer to “modules” rather than “pathways” whenever appropriate, to restrict the scope of the analysis to metabolic modules rather than full pathways.
  
  That said, we do not see how “inspection of individual genes” would improve our analysis. The strength of looking at complete modules rather than individual genes is that we can gain conclusive insights into a certain metabolic capacity. Of course, no pathway or module stands alone. However, the enrichment of metabolic modules does conclusively indicate that these modules are beneficial under the given conditions, such as stress caused by inflammation or antibiotic use. Whether a certain step in a module or pathway is rate limiting is completely irrelevant for this analysis.
  
  Line 177 - I'm not a big fan of the HMI acronym. Is there a LMI group? It seems simplistic to lump all of metabolism into dependent or independent, which in reality will differ depending on the specific substrate, the growth condition, and the strain.
  
  While we are sorry that our study failed to provide the reviewer with a term they could be a fan of, their input did not change our view that HMI, an acronym we have adapted from a previously peer-reviewed study (doi:10.1186/s13059-023-02924-x), is a powerfully simplistic means to describe a phenomenon we observe and demonstrate in multiple different ways with our extensive analyses. The argument that HMI or LMI status will differ given the growth condition, substrate availability, or strain differences is not helping this case either: our analyses cut across a large number of humans and naturally occurring microbial systems in their guts that are exposed to largely variable ‘growth conditions’ and ‘substrates’ and composed of many strain variants of similar populations. Yet, we observe a clear role for HMI despite all these differences. Perhaps it is because HMI simply describes a higher metabolic capacity based on a defined subset of largely biosynthetic pathways that we observe to be consistently enriched in a large dataset covering a large variety of host, environmental and diet factors and indicates that a population has a higher metabolic capacity to not rely on ecosystem services. We show in our analysis that in the inflamed gut these capacities are indeed required, which is why HMI populations are enriched in IBD samples. HMI has no relation to any of the constraints mentioned by the reviewer, which is one of the major strengths of this metric.
  
  Line 198 - It seems like a big assumption to state that efflux and drug resistance are unrelated to biosynthesis, as they could be genetically or even phenotypically linked.
  
  We agree with the reviewer and are thankful for their input. We have weakened the assertion in this statement.
  
  “These capacities may provide an advantage since antibiotics are a common treatment for IBDs (Nitzan et al. 2016), but are not necessarily related to the systematic enrichment of biosynthesis modules that likely provide resilience to general environmental stress rather than to a specific stressor such as antibiotics.”
  
  Lines 202-218 - I'd suggest removing this paragraph. The "non-IBD" data introduces even more complications to the meta-analysis and seems irrelevant to the current study.
  
  We thank the reviewer for this suggestion. Non-IBD data is important, but its relevance to the primary aims of the study is indeed negligible. We now have moved this paragraph to Supplementary File 1 (under the section “‘Non-IBD’ samples are intermediate to IBD and healthy samples”).
  
  The health gradient is particularly problematic, putting cancer closer to healthy than IBD.
  
  We took the reviewer’s advice and have swapped the order of the studies in Supplementary Figure 6 to place the cancer samples from Feng et al. closer to the IBD samples, on the other side of the non-IBD samples from the IBD studies.
  
  Lines 235-257 - should trim this down and move to the discussion.
  
  As mentioned above, we have opted for a “Results and Discussion format” for our manuscript, so we believe this discussion is in the correct place. We find it important to clearly highlight the limitations and potential biases of our work and trimming this text would take away from that goal.
  
  Figure 3 - panels are out of order. Need to put the current panel D below current panel C. Also, relabel panel letters to go top to bottom (the bottom panel should be D). Could change current panel 3D to a violin plot to match current 3C.
  
  We have updated Figure 3 by converting panel A into a new supplementary figure (Supplementary Figure 8), moving panels C and D below panel B, and relabeling the panels accordingly.
  
  Figure 3B - this panel was incredibly useful and quite surprising to me in many respects. I would have assumed that the Bacteroides would be in the "HMI" bin. Is this a function of the specific strains included here? Was B. theta or B. fragilis included?
  
  The reviewer makes an excellent observation that has been keeping us awake at night, yet somehow was not appropriately discussed in the text until their input. We are very thankful for their attention to detail here.
  
  It is indeed true that Bacteroides genomes are often detected with increased abundance in individuals with IBD and likely have a survival advantage in the IBD gut environment, Bacteroides fragilis and Bacteroides thetaiotaomicron being some of the most dominant residents of the IBD gut. Their non-HMI status is not a function of which strains were included, since all taxa here are represented by the representative genomes available in the publicly available Genome Taxonomy Database. Their non-HMI status comes from the fact that they have HMI scores of around 24 to 26, which fall slightly below the threshold score of 26.4 that we used to classify genomes as HMI. This threshold is back-calculated from the metabolic completion requirement of at least 80% average completion of all 33 metabolic modules that are significantly enriched in IBD. So these genomes are right there at the edge, but not quite over it.
  
  Thanks to this comment by our reviewer, we started wondering whether we should follow a more ‘literature-driven’ approach to set the threshold for HMI, rather than the 80% cutoff, and in fact attempted to lower the HMI score threshold to see if we could include more of the IBD-associated Bacteroides in the HMI bin. Author response table 1 below shows the relevant subset of our new Supplementary Table 3h, which describes the data from our tests on different thresholds.
  
  Author response table 1.
  
  Number and proportion of Bacteroides genomes classified as HMI at each HMI score threshold. There were 20 total Bacteroides genomes in the set of 338 gut microbes identified from the GTDB. The HMI score is computed by adding the percent completeness of all 33 IBD-enriched KEGG modules. The full table can be viewed in Supplementary Table 3h.
  
  Lowering the threshold to 24.75, which corresponds to an average of 75% completeness in the 33 IBD-enriched modules, enabled the classification of 6 Bacteroides genomes as HMI, including B. fragilis, B. intestinalis, B. theta, and B. faecis. However, it also identified several microbes that are not IBD-associated as HMI, including 75 genomes from the Lachnospiraceae family and 18 genomes from the Ruminococcaceae family. In the latter family, several Faecalibacterium genomes, including 10 representatives of Faecalibacterium prausnitzii, were considered HMI using this threshold. These microbes are empirically known to decrease in abundance during inflammatory gastrointestinal conditions (doi:10.3390/microorganisms8040573, doi:10.1093/femsre/fuad039), and therefore these genomes should not be considered HMI – at least not under the working definition of HMI used in our study. To avoid including such a large number of obvious false positives in the HMI bin, we decided to maintain a higher threshold despite the exclusion of Bacteroides genomes.
  
  This outcome demonstrates that our reductionist approach does not successfully capture every microbial population that is associated with IBD. Nevertheless, and in our opinion very surprisingly, the metric does capture a very large proportion of genomes with increased detection and abundance in IBD samples, as demonstrated by the peaks of detection/abundance that match to HMI status Author response image 1.
  
  Author response image 1.
  
  Screenshots of Figure 3 that demonstrate the overlapping signal between HMI status and genome detection/abundance in IBD.
  
  Furthermore, the violin plots in Figure 3B (formerly Figure 3C) clearly reflect the increased representation of HMI populations in IBD metagenomes. Although our classification method is imperfect, it still demonstrates the predictive power of metabolic competencies in identifying which microbes will survive in stressful gut environments. To ensure that readers recognize the crude nature of this classification strategy and the possibility that high metabolic independence can be achieved in different ways, we have added the following sentences to the relevant section of our manuscript:
  
  “Given the number of ways a genome can pass or fail this threshold, this arbitrary cut-off has significant shortcomings, which was demonstrated by the fact that several species in the Bacteroides group were not classified as HMI despite their frequent dominance of the gut microbiome of individuals with IBD (Saitoh et al. 2002; Wexler 2007; Vineis et al. 2016) (Supplementary File 1). That said, the genomes that were classified as HMI by this approach were consistently higher in their detection and abundance in IBD samples (Figure 3a). It is likely that there are multiple ways to have high metabolic independence which are not fully captured by the 33 IBD-enriched metabolic modules identified in this study.”
  
  We have also included a discussion of these findings in Supplementary Information File 1 (see section “Examining the impact of different HMI score thresholds on genome-level results”).
  
  This panel also makes it clear that many of these modules are widespread in all genomes and thus unlikely to meaningfully differ in the microbiome. It would be interesting to use this type of analysis to identify a subset of KEGG modules with high variability between strains.
  
  The figure makes it ‘look like’ many of these modules are widespread in all genomes and thus unlikely to meaningfully differ in the microbiome, but our quantitative analyses clearly demonstrate that these modules indeed differ meaningfully between microbiomes of healthy individuals and those diagnosed with IBD. For instance, the classifier that we built relying exclusively upon these modules’ PPCN values was able to reliably distinguish between the healthy and IBD sample groups in our dataset. The fact that the differentiating signal does not rely on rare metabolic or signature modules is what makes the classifier powerful enough to differentiate between “healthy” and “stressed” microbiomes in 86% of cases. Modules that are by nature less common could not serve this purpose. That said, we do agree with the reviewer that it might be interesting to study variability of KEGG modules as a function of variability between strains. This does not fall into the scope of this work, but we hope to assist others with the technical aspects of such work.
  
  Considering the entirety of the exchange in this section, perhaps there is a broader discussion to be had around this topic. In retrospect, not being able to perfectly split microbes into two groups that completely recapitulate their enrichment in healthy or IBD samples by a crude metric and an arbitrary threshold is not surprising at all. What is surprising is that such a crude metric in fact works for the vast majority of microbes and predicts their increased presence in the IBD gut by only considering their genetic make up. In some respects, we believe that the inability of this cutoff to propose a perfect classifier is similar to the limited power of metabolic independence concept and the classes of HMI or LMI to capture and fully explain microbial fitness in health and disease. What is again surprising here is that these almost offensively simple classes do capture more than what one would expect. We can envision a few ways to implement a more sophisticated HMI/LMI classifier, and it is certainly an important task that is achievable. However, we are hopeful that this technical work can also be done better by others in our field, and that step forward, along with further scrutinizing the relevance of HMI/LMI classes to understand metabolic factors that contribute to the biodiversity of stressful environments, will have to remain as future work.
  
  We thank the reviewer again for their comment here and pushing us to think more carefully and address the oddity regarding the poor representation of Bacteroides as HMI by our cutoff.
  
  Given that a lot of the gaps are in the Firmicutes, this panel also makes me more concerned about annotation bias. How many of these gaps are real?
  
  Analyses relying on gene annotations all suffer equally from the potential for missannotation or missing annotations, which primarily result from limitations in our reference databases for functional data. For instance, the Hidden Markov models for microbial genes in the KEGG Ortholog database are generated from a curated set of gene sequences primarily originating from cultivable microorganisms and particularly from commonly-used model organisms; hence, they do not capture the full extent of sequence diversity observed in populations that are less well-represented in reference databases – a category which includes several Firmicutes, as the reviewer points out. For KEGG KOfams in particular, the precomputed bit score thresholds for distinguishing between ‘good’ and ‘bad’ matches to a given model are often too stringent to enable annotation of genes that are just slightly too divergent from the set of known sequences, thus resulting in missing annotations. Based on our experience with these sorts of issues, we implemented a heuristic that reduces the number of missing annotations for KOs and captures significantly more homologs than other state-of-the-art approaches, as described in doi:10.1101/2024.07.03.601779. We refer the reviewer to our response to the related public comment about annotation bias above, which includes additional details about our investigations of annotation bias in our data. In comparison to the current standard, the heuristic we implemented improves functional annotation results. However, neither our nor any other bioinformatic study that relies on functional gene annotation can exclude the potential for annotation bias.
  
  Figure 3B plotting issues - need to use the full names of the modules; for example, M00844 is "arginine biosynthesis, ornithine => arginine", which changes the interpretation. Need a key for the heatmap on the figure. The tree is difficult to see, needs a darker font.
  
  We have darkened the lines of the tree and dendrogram, and added a legend for the heatmap gradient (see new version of Figure 3 above). Unfortunately, we could not fit the full names of the modules into the figure due to space constraints. However, the full module name and other relevant information can be found in Supplementary Table 2a, and the matrix of pathway completeness scores in these genomes (e.g., the values plotted in the heatmap) can be found in Supplementary Table 3b. We are not sure what the reviewer refers to when stating that “for example, M00844 is "arginine biosynthesis, ornithine => arginine", which changes the interpretation”. There is no ambiguity regarding the identity of KEGG module M00844, which is arginine biosynthesis from ornithine.
  
  Line 321 - more justification for the 80% cutoff is needed along with a sensitivity analysis to see if this choice matters for the key results.
  
  Inspired by this comment, and the one above regarding the classification of Bacteroides genomes, we tested several HMI score thresholds ranging from 75% to 85% average completeness of the 33 IBD-enriched modules. For each threshold, we computed all the key statistics reported in this section of our paper, including the statistical tests. We found that the choice of HMI score threshold does not influence the overall conclusions drawn in this section of our manuscript. Author response table 2 below shows the relevant subset of our new Supplementary Table 3h, which describes the results for each threshold:
  
  Author response table 2.
  
  Key genome-level results at each HMI score threshold. The HMI score is computed by adding the percent completeness of all 33 IBD-enriched KEGG modules. WRS – Wilcoxon Rank Sum test; KW – Kruskal-Wallis test. The full table can be viewed in Supplementary Table 3h
  
  We’ve summarized these findings in a new section of Supplementary File 1 entitled “Examining the impact of different HMI score thresholds on genome-level results”. We copy below the relevant text for the reviewer’s convenience:
  
  “Determining the HMI status of a given genome required us to set a threshold for the HMI score above which a genome would be considered to have high metabolic independence. We tested several different thresholds by varying the average percent completeness of the 33 IBD-enriched metabolic modules that we expected from the
  
  ‘HMI’ genomes from ≥ 75% (corresponding to an HMI score of ≥ 24.75) to ≥ 85% (corresponding to an HMI score of ≥ 28.05). For each threshold, we computed the same statistics and ran the same statistical tests as those reported in our main manuscript to assess the impact of these thresholds on the results (Supplementary Table 3h). At the highest threshold we tested (HMI score ≥ 28.05), a small proportion of the reference genomes (7%, or n = 24) were classified as HMI, so we did not test higher thresholds.
  
  We found that the results from comparing HMI genomes to non-HMI genomes are similar regardless of which HMI score threshold is used to classify genomes into either group. No matter which HMI score threshold was used, the mean genome size and mean number of genes were higher for HMI genomes than for non-HMI genomes. On average, the HMI genomes were about 1 Mb larger and had 1,032 more gene calls than non-HMI genomes. We ran two Wilcoxon Rank Sum statistical tests to assess the following null hypotheses: (1) HMI genomes do not have higher detection in IBD samples than non-HMI genomes, and (2) HMI genomes do not have higher detection in healthy samples than non-HMI genomes. For both tests, the p-values decreased (grew more significant) as the HMI score threshold decreased due to the inclusion of more genomes in the HMI bin. The first test for higher detection of HMI genomes than non-HMI genomes in IBD samples yielded p-values less than α = 0.05 at all HMI score thresholds. The second test for higher detection of HMI genomes than non-HMI genomes in healthy samples yielded p-values less than α = 0.05 for the three lowest HMI score thresholds (HMI score ≥ 24.75, ≥ 25.08, or ≥ 25.41). However, irrespective of significance threshold and HMI score threshold, there was always far stronger evidence to reject the first null hypothesis than the second, given that the p-value for the first test in IBD samples was 1 to 5 orders of magnitude lower (more significant) than the p-value for the second test in healthy samples.
  
  IBD samples harbored a significantly higher fraction of genomes classified as HMI than healthy or non-IBD samples, regardless of HMI score threshold (p < 1e-15, Kruskal-Wallis Rank Sum test). The p-values for this test increased (grew less significant) as the HMI score threshold decreased. This suggests that, at higher thresholds, relatively more genomes drop out of the HMI fraction in healthy/non-IBD samples than in IBD samples, thereby leading to larger differences and more significant p-values. Consequently, the HMI scores of genomes detected in IBD samples must be higher than the HMI scores of genomes detected in the other sample groups – indeed, the average HMI score of genomes detected within at least one IBD sample is 24.75, while the average score of genomes detected within at least one healthy sample is 22.78. Within a given sample, the mean HMI score of genomes detected within that sample is higher for the IBD group than in the healthy group: the average per-sample mean HMI score is 25.14 across IBD samples compared to the average of 23.00 across healthy samples.”
  
  Lines 357 and 454 - I would remove the discussion of the "gut environment" which isn't really addressed here. The observed trends could just as easily relate to microbial interactions or the effects of diet and pharmaceuticals. Perhaps the issue is the vague nature of this term, which I read to imply changes in the mammalian host. Given the level of evidence, I'd opt to keep the options open and discuss what additional data would help resolve these questions.
  
  We are in complete agreement with the reviewer that microbial interactions are likely an important driver of our observations. In healthy communities, microbial cross-feeding enables microbes with lower metabolic independence to establish and increase microbial diversity. Which is exactly why we are stating that “Community-level signal translates to individual microbial populations and provides insights into the microbial ecology of stressed gut environments”.
  
  Diet or usage of prescription drugs on the other hand, as discussed previously, likely varies substantially over the various cohorts investigated, and is thus not a driver of the observed trends. Instead, HMI works as a high level indicator that is not influenced by these variable host habits.
  
  Lines 354-394 - Could remove or dramatically trim down this text. Too much discussion for a results section.
  
  We kindly remind the reviewer that our manuscript is written following a “Results and Discussion” format. This section provides necessary context and justification for our classifier implementation, so we have left it as-is.
  
  Lines 395-441 - This section raised a lot of issues and could be qualified or even removed. The model was trained on modules that were IBD-associated in the same dataset, so it's not surprising that it worked. An independent test set would be required to see if this model has any broader utility.
  
  The point that we selected the IBD-enriched modules as features should not raise any concerns, as these modules would have emerged as the most important (ie, most highly weighted) features in our model even if we had included all modules in our training data. This is because machine learning classifiers by design pick out the features that best distinguish between classes, and the 33 IBD-associated modules are a selective subset of these (if they were not, they would not have been significantly enriched in the IBD sample group). That said, a carefully conducted feature selection process prior to model training is a standard best-practice in machine learning; thus, if anything, this should be interpreted as a point of confidence rather than a concern. Furthermore, we evaluated our model using cross-validation, a standard practice in the machine learning field that assesses the stability of model performance by training and testing the model on different subsets of the data. This effort established that the model is robust across different inputs as demonstrated by the per-fold confusion matrix and the ROC curve. These are all standard approaches in machine learning to quantify the model tradeoff between bias and variance. As for the independent test set, we went far and beyond, and applied our model to the antibiotic time-series dataset described later in this section, which, in our opinion, and likely also in the opinion of many experts, serves as one of the most convincing ways to test the utility of any model. Classification results here show that our hypothesis concerning the relevance of metabolic independence to microbial survival in stressed gut environments applies beyond the IBD case and includes antibiotic use, which is indeed a stronger validation for this hypothesis than any test we could have done on other IBD-related datasets. Regardless, we agree that any ‘broader’ utility of our model, such as its applications in clinical settings for diagnostic purposes, is something we certainly can not make strong claims about without more data. We have therefore qualified this section by adding the following sentence:
  
  “Determining whether such a model has broader utility as a diagnostic tool requires further research and validation; however, these results demonstrate the potential of HMI as an accessible diagnostic marker of IBD.”
  
  The application to the antibiotic intervention data raises additional concerns, as the model will predict IBD (labeled "stress" in Figure 5) where none exists.
  
  We apologize for this misunderstanding. The label “stress” actually means stress, not IBD. The figure the reviewer is referring to demonstrates that metabolic modules enriched in the gut microbiome of IBD patients are also temporarily enriched in the gut microbiome of healthy individuals treated with antibiotics for the duration of the treatment. While the classifier uses PPCN values for 33 metabolic modules enriched in microbiomes of IBD patients, it does not mean that this enrichment is exclusive to IBD. The classifier will distinguish between metagenomes in which the PPCN values for those 33 metabolic modules is higher and metagenomes in which the PPCN values are lower. Hence, our analysis demonstrates that during antibiotic usage in healthy individuals, the PPCN values of these 33 metabolic modules spike in a similar fashion to how they would in the gut community of a person with IBD. This points to a more general trend of high metabolic independence as a factor supporting microbial survival in conditions of stress; that is, the increase in metabolic independence is not specific to the IBD condition but rather a more generic ecological response to perturbations in the gut microbial community. We have clarified this point with the following addition to the paragraph summarizing these results:
  
  “All pre-treatment samples were classified as ‘healthy’ followed by a decline in the proportion of ‘healthy’ samples to a minimum 8 days post-treatment, and a gradual increase until 180 days post treatment, when over 90% of samples were classified as ‘healthy’ (Figure 5, Supplementary Table 4b). In other words, the increase in the HMI metric serves as an indicator of stress in the gut microbiome, regardless of whether that stress arises from the IBD condition or the application of antibiotics. These observations support the role of HMI as an ecological driver of microbial resilience during gut stress caused by a variety of environmental perturbations and demonstrate its diagnostic power in reflecting gut microbiome state.”
  
  We’ve also added the following sentence to the end of the legend for Figure 5:
  
  “Samples classified as ‘healthy’ by the model were considered to have ‘no stress’ (blue), while samples classified as ‘IBD’ were considered to be under ‘stress’ (red).”
  
  Figure S5A - should probably split this into 2 graphs since different data is analyzed.
  
  It is true that different sets of modules are used in either half of the figure; however, there is a significant amount of overlap between the sets (17 modules), which is why there are lines connecting the points for the same module as described in the figure legend. We are using this figure to make the point that the median PPCN value of each module increases, in both sets of modules, from the healthy sample group to the IBD sample group. Therefore, we believe the current presentation is appropriate.
  
  Figure S6A – this shows a substantial study effect and raises concerns about reproducibility.
  
  We examined potential batch effects in Supplementary Information File 1 (see section “Considerations of Batch Effect”), and found that any study effect was minor and overcome by the signal between groups:
  
  “The similar distribution of the median normalized copy number for each of the 33 IBD-enriched metabolic modules (summarized across all samples within a given study), across all studies within a given sample group (Supplementary Figure 6b), confirms that the sample group explains more of the trend than the study of origin.”
  
  Furthermore, within Supplementary Figure 6a, there is a clear increase between the non-IBD controls from Franzosa et al. 2018 and the IBD samples from the same study, as well as between the non-IBD controls from Schirmir et al. 2018 and the IBD samples from that study. As there is no study effect influencing those two comparisons, this reinforces the evidence that there is a true increase in the normalized copy numbers of these modules when comparing samples from more healthy individuals to those from less healthy individuals.
  
  Figure S7B - check numbers, which I think should sum to 33.
  
  The numbers should not sum to 33. In this test to determine whether the two largest studies had excessive influence on the identity of the IBD-enriched modules, we repeated our strategy to obtain 33 IBD-enriched modules (those with the 33 smallest p-values from the statistical test) from each set of samples – either (1) samples from Le Chatelier et al. 2013 and Vineis et al. 2016, or (2) samples that are not from those two studies. The 2 sets, containing 33 modules each, gives us a total of 66 IBD-enriched modules. By comparing those two sets, we found that 20 modules were present in both sets – hence the value of 20 in the center of the Venn Diagram. In each set, 13 modules were unique – hence the value of 13 on either side. 13 + 13 + 2*20 = 66 total modules.
  
  We again thank our reviewers for their time and interest, and invaluable input.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.10.540289v3
www.biorxiv.org www.biorxiv.org

Synthesis and Biological Assessment of Chalcone and Pyrazoline Derivatives as novel inhibitor for ELF3-MED23 Interaction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reveiwer#1 (Public Review):
  
  Weaknesses:
  
  While the novel compound showed a promising potency to the HER2-positive gastric cancer cells and xenograft model, it would be great to also to be evaluated with the HER2-positive breast cancer cell models. The author did not compare the current compounds with other therapeutic strategies targeting HER2 expression at the genetic level. It is unclear whether the EGFR inhibitors gefitinib and canertinib but not HER2-specific inhibitors (i.e. tucatinib) were used as a control in the manuscript.
  
  We appreciate the reviewer’s insightful comments. Evaluating compound 10 on HER2-positive breast cancer cells is indeed crucial, especially given the established HER2-targeting therapies for breast cancer. In response to this concern, we conducted additional experiments to investigate the impact of compound 10 on HER2-positive breast cancer cell lines AU565 and BT474, specifically assessing its HER2 downregulating activity (Author response image 1).
  
  Author response image 1.
  
  HER2 downregulatory effect of compound 10 in HER2-positive breast cancer cell lines, AU565 and BT474.
  
  The selection of gefitinib (an EGFR tyrosine kinase inhibitor) and canertinib (a pan-HER inhibitor) as positive controls in our manuscript is based on their demonstrated ability to inhibit the protein-protein interaction (PPI) between ELF3 and MED23, as previously reported (J Adv Res. 47, (2023) 173-87. 10.1016/j.jare.2022.08.003; Cancer letters. 325, (2012) 72-9. 10.1016/j.canlet.2012.06.004). In referenced studies, SEAP reporter gene assay was utilized to screen compounds for their capacity to disrupt the ELF3-MED23 PPI. This assay involves GAL4-ELF3 binding to a GAL4 binding site in the SEAP reporter gene, followed by interaction with MED23, leading to RNA polymerase II recruitment and SEAP expression in cells (J Am Chem Soc. 2004, 126(49), 15940. doi: 10.1021/ja0445140). Canertinib exhibited stronger inhibitory activity against ELF3-MED23 PPI compared to gefitinib, but also showed non-specific cytotoxicity. YK1 was subsequently developed based on structural analysis of the interfaces between gefitinib and MED23, and between ELF3 and MED23. Considering the previously validated inhibitory activities of gefitinib and canertinib, these drugs were selected as positive controls in the current study to compare the ELF3-MED23 inhibitory efficacy of novel compounds.
  
  Reveiwer#1 (Recommendations For the Authors):
  
  (1) It is unclear how compound 5 did not inhibit HER2 overexpression at mRNA but at protein levels as compounds 3 and 10. Could the author further explain the potential mechanism for compound 5?
  
  While the exact mechanism remains unclear, the results indicated that compound 5 likely affects the protein level of HER2 through somewhat non-specific mechanisms rather than by inhibiting the ELF3-MED23 PPI. Based on this assessment, compound 5 was excluded from further investigation.
  
  (2) The HER2 expression and its downstream signaling pathway assay are unclear about the approach. It needs to be included in the methods or supplementary.
  
  We investigated the ELF3-MED23 PPI inhibitory activity and its subsequent effect on HER2 downregulation using a comprehensive approach involving multiple techniques to ensure precise and unbiased experimental results.
  
  To assess PPI inhibition, we employed the following assays:
  
  · SEAP reporter gene assay
  
  · Fluorescence polarization (FP)
  
  · Split-luciferase complementation assay
  
  · GST-pulldown
  
  · Immunoprecipiation (IP)
  
  HER2 expression levels were evaluated through:
  
  · SEAP reporter gene assay
  
  · Luciferase promoter assay
  
  · Quantification of HER2 mRNA using qPCR
  
  · Measurement of HER2 protein levels via western blot analysis
  
  To evaluate downstream signaling of HER2, we analyzed:
  
  · Phosphorylation levels of MAPK (pMAPK) and AKT (pAKT)
  
  These methods were systematically applied to elucidate the mechanism of action of compound 10 in inhibiting ELF3-MED23 interaction and subsequently downregulating HER2.
  
  For clarity, we have revised the manuscript to provide a detailed description of the experimental methods to assess PPI, as described below.
  
  “SEAP assay was performed as previously described to measure ELF3-MED23 PPI-dependent HER2 transcription [29]. In this assay, the GAL4-ELF3 fusion protein binds to one of the five GAL4 binding sites on the reporter gene (pG4IL2SX). The interaction between the GAL4-ELF3 fusion protein and endogenous MED23 induces the expression of the SEAP. Once expressed, SEAP acts as a phosphatase on the substrate 4-MUP (4-methyl umbelliferyl phosphate), resulting in increased fluorescence. The mammalian expression vector, …”
  
  “FP assay was conducted following a previously described method to evaluate the molecular interaction between ELF3 and MED23 [29]. The FP assay operates on the principle of the molecular rotation dynamics. When a fluorescently labeled small molecule is excited by polarized light, the emitted fluorescence can be polarized or depolarized depending on the molecular status. Free small molecules rotate rapidly, altering the orientation of their fluorescence dipole and emitting depolarized light. However, when these small molecules bind to large molecules, such as proteins, the resulting complex rotates more slowly, and the emitted light retains much of its original polarization. In this study, different concentrations of (His)6-MED23391–582, as the large molecule, and 10 nM of FITC-labeled ELF3129–145 peptide, as the fluorescence-labeled small molecule, were combined in …”
  
  (3) It is confusing to me about the order of the experiments, in which the SAR work came after the synthesis and a series of biochemical studies for the characterization of the synthetic compounds. What is the specific reason for this order?
  
  We concluded that the current approach is appropriate because the analysis was not intended for structural modification and optimization through SAR (Structure-Activity Relationship) analysis. Instead, the primary objective was to elucidate the structural basis underlying the efficacy of PPI inhibition among compounds sharing the same scaffold. We believe this will provide valuable insights for future design and synthesis of new compounds.
  
  (4) The yield for each step of the general synthesis needs to be included in the scheme 1.
  
  Scheme 1 has been updated to include the yield of each step of the synthesis process.
  
  (5) In line 532, the authors stated 28 compounds, should it be 26?
  
  ‘Twenty-eight compounds’ includes 26 newly synthesized compounds and 2 positive controls, gefitinib and canertinib.
  
  (6) Introduction part, lines 74 to 75, "While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression" may not be confirmed in lung cancers.
  
  HER2 overexpression is usually a direct consequence of gene amplification, although overexpression can occur by other mechanisms [Nat Rev Cancer. 2009;9:463–475. doi: 10.1038/nrc2656.; Cell. 2007;129:1275–1286. doi: 10.1016/j.cell.2007.04.034.]. The levels of HER2 protein expression and gene amplification are linearly associated and highly concordant in breast cancer, colorectal cancer, ovarian cancer, and esophageal adenocarcinoma [World J Gastrointest Oncol. 2019, 11(4): 335–347. doi: 10.4251/wjgo.v11.i4.335; J Clin Oncol. 2002;20:719–26. doi.org/10.1200/JCO.2002.20.3.71; Oncology. 2001;61(Suppl 2):14–21. doi.org/10.1159/000055397; Science. 1989, 244(4905):707-12. doi: 10.1126/science.2470152; Cancer. 2014 Feb 1; 120(3): 415–424. doi: 10.1002/cncr.28435]. As reviewer mentioned, the linear association between of HER2 protein expression and gene amplification has not been fully established for NSCLC [ESMO Open. 2022, 100395. doi: 10.1016/j.esmoop.2022.100395].
  
  Therefore, we change the sentence as describe below.
  
  “While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression in most HER2-positive cancers, except in lung cancer [16], high transcription rates of HER2 per gene copy have also been observed to contribute.”
  
  (7) The abstract part, lines 31 and 32, the detailed experimental data for SEAP needs to be expressed in another way.
  
  SEAP is a type of reporter gene assay. We revised the manuscript as follows and we additionally described it method part.
  
  “Upon systematic analysis, candidate compound 10 was selected due to its potency in downregulating reporter gene activity of HER2 promoter confirmed by SEAP activity and its effect on HER2 protein and mRNA levels.”
  
  (8) The author should combine the box for Chalcone, pyrazoline, Licochalcone E, and YK-1, Figures 1 and 2 into a new single Figure.
  
  We revised the manuscript following the reviewer's comments.
  
  (9) Provide the list of antibodies and sources for the cell-based and western blot assays.
  
  Table S1 presents detailed information about the antibodies and dilution ratios used in the cell-based and western blot assays.
  
  Reveiwer#2 (Public Reviews):
  
  Weaknesses:
  
  The rationale behind the proposed structural modifications for the three groups of compounds is not clear.
  
  Reveiwer#2 (Recommendations For the Authors):
  
  (1) Based on previous work experience, it would be interesting to evaluate the in silico mode of interaction of compound 10.
  
  As suggested by the reviewers, we additionally performed in silico docking study to identify the mode of interaction of compound 10 (Author response image 2). As shown below, the results indicate that compound 10 shares a similar binding orientation with YK1, forming an H-bond with the H449 residue. Although it does not interact with the D400 residue, it was predicted to create an additional H-bond with S450, which is right next to H449, thereby reinforcing the overall binding of compound 10 to MED23. Moreover, compound 10 was additionally predicted to form a pi-pi interaction with F399, which has been previously identified as an important interaction for compounds to demonstrate outstanding PPI inhibitory effect against ELF3 and MED23.
  
  Author response image 2.
  
  Docking analysis of compound 10.
  
  (2) The chalcones presented in this study are structurally similar to those previously presented by the group (ref 29). In said work, most of the compounds exhibited activities with IC50 values between 1.3 and 3 μM, with inhibition values at 10 μM ranging between 80 and 90% in the SEAP assay. These results are similar to those observed in this paper for the same assay. Can an explanation be found?
  
  Chalcones are inherently flexible molecules, giving them a high chance of occupying critical hotspot residues within the binding interface of ELF3-MED23, irrespective of the side chains introduced to this moiety. However, depending on the type of side chains introduced, the overall drug-like properties of compounds can be significantly altered, while still maintaining their PPI inhibitory effect. The significance of this study lies in our effort to enhance metabolic stability through extensive introduction of methoxy groups and other hydrophobic side chains to the chalcone skeleton, while preserving high PPI inhibitory activity.
  
  (3) Is the replacement of H and OH by OMe necessary? Does it improve any property (activity, selectivity, bioavailability, solubility, etc.)? Regarding the derivatives of group 2, why did they decide to replace the O-H, which in silico demonstrated favorable hydrogen bond interactions with Asp400? How do these molecules look in the binding site? Perhaps this is a point to discuss since the substitution of OH led to the obtaining of inactive molecules, or is the effect due to substitution with the terminal aromatic ring with 3 OMe?
  
  We modified the hydroxyl group moiety of YK-1 into a methoxy group to reduce the polarity of the compound, thereby enhancing its cell membrane permeability (Author response image 3) and reducing the likelihood of rapid elimination through phase II metabolic pathways in vivo. Additionally, we considered the potential conversion of the methoxy group back to a hydroxyl group via phase I metabolism in vivo.
  
  Author response image 3.
  
  Impact of methoxy group introduction on TPSA (total polar surface area) of each molecule. TPSA of each molecule containing chalcone structure were calculated using the Molinspiration webserver.
  
  (4) Lines 134 and 134: "Only compounds are in red."
  
  We revised the manuscript following the reviewer's comments.
  
  (5) Line 171: "Chalcone skeleton, shown in red."
  
  We revised the manuscript following the reviewer's comments.
  
  (6) Line 350: "N-1-acetyl-4,5-dihydropyrazoline."
  
  We revised the manuscript following the reviewer's comments.
  
  (7) Scheme 1. Replace "h" with "hr".
  
  We revised the manuscript following the reviewer's comments. Scheme 1 has been replaced by a new version.
  
  (8) Where is "Table S1" in SI?
  
  Tables S1 and S2 are supposed to be included in SI. We will ensure that Tables S1 and S2 are properly uploaded to the SI section.
  
  (9) In Figure 6, Graph D, to enhance comprehension, please incorporate red arrows indicating drug administration.
  
  We revised Figure 6 (D) following the reviewer's comments. Red arrows indicating drug administration have been incorporated, along with a descriptive comment "Drug administration" next to each arrow. Additionally, the figure legend now includes a clear description of these additions.
  
  Reveiwer#3 (Public review):
  
  Weaknesses:
  
  Compound 10 potency as PPI inhibitor has been shown in only one cell line NCI-N87.
  
  Reveiwer#3 (Recommendations For the Authors):
  
  (1) The authors should show this compound 10 is effective in other gastric cancer cells like KATOIII, SNU1.
  
  We evaluated the HER2 downregulating activity of compound 10 in the gastric cancer cell line, SNU216, which is confirmed to express high level of HER2 protein (Author response image 4).
  
  Author response image 4.
  
  HER2 downregulatory effect of compound 10 in HER2-positive gastric cancer cell line, SNU216. (A) Expression levels of HER2 and ELF3 in various gastric cancer cell lines. (B) HER2 downregulation in the SNU216 cell line following treatment with compound 10.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.01.583029v2
www.biorxiv.org www.biorxiv.org

The nanoscale organization of the Nipah virus fusion protein informs new membrane fusion mechanisms

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this work by Wang et al., the authors use single-molecule super-resolution microscopy together with biochemical assays to quantify the organization of Nipah virus fusion protein F (NiV-F) on cell and viral membranes. They find that these proteins form nanoscale clusters which favors membrane fusion activation, and that the physical parameters of these clusters are unaffected by protein expression level and endosomal cleavage. Furthermore, they find that the cluster organization is affected by mutations in the trimer interface on the NiV-F ectodomain and the putative oligomerization motif on the transmembrane domain, and that the clusters are stabilized by interactions among NiV-F, the AP2-complex, and the clathrin coat assembly. This work improves our understanding of the NiV fusion machinery, which may have implications also for our understanding of the function of other viruses.
  
  Strengths:
  
  The conclusions of this paper are well-supported by the presented data. This study sheds light on the activation mechanisms underlying the NiV fusion machinery.
  
  Weaknesses:
  
  The authors provide limited details of the convolutional neural network they developed in this work. Even though custom-codes are made available, a description of the network and specifications of how it was used in this work would aid the readers in assessing its performance and applicability. The same holds for the custom-written OPTICS algorithm. Furthermore, limited details are provided for the imaging setup, oxygen scavenging buffer, and analysis for the single-molecule data, which limits reproducibility in other laboratories. The claim of 10 nm resolution is not backed up by data and seems low given the imaging conditions and fluorophores used. Fourier Ring Correlation analysis would have validated this claim. If the authors refer to localization precision rather than resolution, then this should be specified and appropriate data provided to support this claim.
  
  We thank reviewer 1 for these suggestions. We described key steps in imaging setup, singlemolecule data reconstruction, the OPTICS algorithm in cluster identification, and 1D CNN in
  
  classification of the OPTICS data in the Materials and Methods section. We also provided a recipe for the imaging buffer. We refer to 10 nm localization precision rather than resolution. The localization precision achieved by our SMLM system is shown in the Author response image 1.
  
  Author response image 1.
  
  The localization precision of the custom-built SMLM. Shows the distribution of localization error at the x (dX), y (dY), and z (dZ) direction in nanometer of blinks generated from Alexa Flour 647 labeled to NiV-F expressed on the plasma membrane of PK13 cells. The lateral precision is <10 nm and the axial precision is < 20 nm.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this manuscript, Wang and co-workers employ single molecule light microscopy (SMLM) to detect NiV fusion protein (NiV-F) in the surface of cells. They corroborate that these glycoproteins form microclusters (previously seen and characterized together with the NiVG and Nipah Matrix protein by Liu and co-workers (2018) also with super-resolution light microscopy). Also seen by Liu and coworkers the authors show that the level of expression of NiV-F does not alter the identity of these microclusters nor endosomal cleavage. Moreover, mutations and the transmembrane domain or the hexamer-of-trimer interface seem to have a mild effect on the size of the clusters that the authors quantified.
  
  Importantly, it has also been shown that these particles tend to cluster in Nipah VLPs.
  
  We thank reviewer #2 for the comments and suggestions. This paper is built on Liu et al 1 to further characterize the nanoclusters formed by NiV-F and their role in membrane fusion activation. While Liu et al. studied the NiV glycoprotein distribution at the NiV assembly sites to inform mechanisms in NiV assembly and release, Wang et al. analyzed the nanoorganization and distribution of NiV-F at the prefusion conformation, providing insights into the membrane fusion activation mechanisms.
  
  Strengths:
  
  The authors have tried to perform SMLM in single VLPs and have shown partially the importance of NiV-F clustering.
  
  Weaknesses:
  
  The labelling strategy for the NiV-F is not sufficiently explained. The use of a FLAG tag in the extracellular domain should be validated and compared with the unlabelled WT NiV-F when expressed in functional pseudoviruses (for example HIV-1 based particles decorated with NiV-F). This experiment should also be carried out for both infection and fusion (including BlaM-Vpr as a readout for fusion). I would also suggest to run a time-of-addition BlaM experiment to understand how this particular labelling strategy affects single virion fusion as compared to the the WT.
  
  We thank reviewer #2 for this suggestion. We have made various efforts to validate the expression and function of FLAG-tagged NiV-F. The NiV-F-FLAG shows comparable cell surface expression levels and induces similar cell-cell fusion levels in 293T cells as that of untagged NiV-F 1. The NiV-F-FLAG also showed similar levels of virus entry as untagged NiV-F when both were pseudotyped on a recombinant Vesicular Stomatitis Virus (VSV) with the VSV glycoprotein replaced by a Renilla luciferase reporter gene (VSV-ΔG-rLuc; Fig. S1D). We also performed a virus entry kinetics assay using NiV VLPs expressing NiV-M-βlactamase (NiV-M-Bla), NiV-G-HA, and NiV-F-FLAG, NiV-F-AU1 or untagged NiV-F. The intracellular AU1 tag is located at the C-terminus of NiV-F (Genbank accession no. AY816748.1). However, we detected different levels of NiV-M-Bla in equal volume of VLPs, suggesting that the tags in NiV-F affect the budding of the VLPs (Author response image 2A). Therefore, we performed fusion kinetics assay by using VLPs expressing the same levels of NiV-M-Bla. Among them, the NiV-F-FLAG on VLPs shows the most efficient fusion between VLP and HEK293T cell membranes (Author response image 2B), significantly more efficient than that of untagged NiV-F and NiV-FAU1. However, we cannot attribute the enhanced fusion activity to the FLAG tag, because the readout of this assay relies on both the levels of β-lactamase (introduced by NiV-M-Bla in VLPs) and the NiV-F constructs. The tags in NiV-F could affect both the budding of VLPs and the stoichiometry of F and M in individual VLPs. We did not use the HIV-based pseudovirus system because the incorporation of NiV-F into HIV pseudoviruses requires a C-terminal deletion 2,3.
  
  In summary, the FLAG tag does not affect cell-cell fusion 1 and virus entry when pseudotyped to the recombinant VSV-ΔG-rLuc viruses (Fig. S1D). Given that we do not observe any difference in clustering between an HA- and FLAG-tagged NiV-F constructs on PK13 cell surface (Fig. S1A-C), we conclude that the FLAG tag has minimal effect on both the fusion activity and the nanoscale distribution of NiV-F.
  
  Author response image 2.
  
  Viral entry is not affected by labeling of NiV-F. A) Western blot analysis of NiV-M-Bla in NiV-VLPs generated by HEK293T cells expressing NiV-M-Bla, NiV-G-HA and NiV-F-FLAG, untagged NiV-F, or NiV-F-AU1. Equal volume of VLPs were separated by a denaturing 10% SDS–PAGE and probed against β-lactamase (SANTA CRUZ, sc-66062). B) NiV-VLPs expressing NiV-M-BLa, NiV-G-HA, and NiV-F-FLAG, untagged NiV-F or NiV-F-AU1 expression plasmids were bond to the target HEK293T cells loaded with CCF2-AM dye at 4°C. The Blue/Green (B/G) ratio was measured at 37°C for 4 hrs at a 3-min interval. Results were normalized to the maximal B/G ratio of NiV-F-FLAG-NiV VLPs. Results from one representative experiment out of three independent experiments are shown.
  
  It would also be very important to compare the FLAG labelling approach with recent advances in the field (for instance incorporating noncanonical amino acids (ncAAs) into NiVF by amber stop-codon suppression, followed by click chemistry).
  
  We are greatly thankful for this comment from reviewer #2. Labeling noncanonical amino acids (ncAAs) with biorthogonal click chemistry is indeed a more precise labeling strategy compared to the traditional epitope labeling approach used in this paper. We will explore the applications of ncAAs labeling in single-molecule localization imaging and virus-host interactions in future projects.
  
  In this paper, the FLAG tag inserted in NiV-F protein seems to have minimal effect on the NiV-F-induced virus entry and cell-cell fusion 1 (Fig. S1). Although the FLAG tag labeling approach may increase the detectable size of NiV-F nanoclusters due to the use of the antibody complex, it should not affect our conclusions drawn from the relative comparisons between wt and mutant NiV-F or control and drug-treated cells.
  
  The correlation between the existence of microclusters of a particular size and their functionality is missing. Only cell-cell fusion assays are shown in supplementary figures and clearly, single virus entry and fusion cannot be compared with the biophysics of cell-cell fusion. Not only the environment is completely different, membrane curvature and the number of NiV-F drastically varies also. Therefore, specific fusion assays (either single virus tracking and/or time-of-addition BlaM kinetics with functional pseudoviruses) are needed to substantiate this claim.
  
  We thank Reviewer 2 for the suggestion. To support the link between F clustering and viruscell membrane fusion, we conducted pseudotyped virus entry and VLP fusion kinetics assays, as shown in revised Figure S4. The viral entry results (Fig. S4 E and F) corroborate that of the cell-cell fusion assay (Fig. S4A and B) and previously published data 4. The fusion kinetics confirmed that the real-time fusion kinetics was affected by mutations at the hexameric interface, with the hypo-fusogenic mutants L53D and V108D exhibited reduced entry efficiency while the hyper-fusogenic mutant Q393L showed increased efficiency (Fig. S4G and H). The results were described in detail in the revised manuscript.
  
  Additionally, we performed a pseudotyped virus entry assay on the LI4A (Fig. S6F and G) and YA (Fig. S7F and G) mutants to verify the function of these mutants on viruses in revised Supplemental Figures. Neither LI4A nor YA incorporated into the VSV/NiV pseudotyped viruses as shown by the Western blot analyses of the pseudovirions (Fig. S6F and S7F), and thus did not induce virus entry, consisting with the cell-cell fusion results (Fig. S6C, D and Fig. S7C, D). We did not perform the entry kinetic assay of these two mutants as they do not incorporate into VLPs or pseudovirions.
  
  The authors also claim they could not characterize the number of NiV-F particles per cluster. Another technique such as number and brightness (Digman et al., 2008) could support current SMLM data and identify the number of single molecules per cluster. Also, this technology does not require complex microscopy apparatus. I suggest they perform either confocal fluorescence fluctuation spectroscopy or TIRF-based nandb to validate the clusters and identify how many molecule are present in these clusters.
  
  We thank reviewer 2 for this suggestion. Determining the true copy number of NiV-F in individual clusters could verify whether the F clusters on the plasma membrane are hexamer-of-trimer assemblies. Regardless, it does not affect our conclusion that the organization of NiV-F into nanoclusters affects the membrane fusion triggering ability. The confocal fluorescence fluctuation spectroscopy (FFS) and TIRF-based analyses are accessible tools for quantifying fluorophore copy numbers and/or stoichiometry based on fluorescence fluctuation or photobleaching. However, these methods are unable to quantify the number of proteins in individual clusters because they analyze fluorophores either in the entire cell (as in wide-field epifluorescence microscopy coupled with FFS and TIRF-coupled photobleaching) 5–7 or within a large excitation volume (confocal laser scanning microscopycoupled FFS) 8. Both of these volumes are significantly larger than a single NiV-F cluster, which has an average diameter of 24-26 nm (Fig. 1F).
  
  The current SMLM setup is useful for characterizing the protein distribution and organization. However, quantifying the true protein copy number within a nanocluster is challenging because of the stochasticity of fluorophore blinking and the unknown labeling stoichiometry 9–11. To address the challenge in fluorophore blinking, quantitative DNA-PAINT (qDNA-PAINT) may be used because the on-off frequency of the fluorophores is tied to the well-defined kinetic constants of DNA binding and the influx rate of the imager strands, rather than the stochasticity of fluorophore blinking. Thus, the frequency of blinks can be translated to protein counting 12. To address the challenge in unknown labeling stoichiometry, DNA origami can be used as a calibration standard 11. DNA origami supports handles at a regular space with several to tens of nanometers apart, and the handles can be conjugated with a certain number of proteins of interest. The copy number of protein interest in the experimental group can be determined by comparing the SMLM localization distribution of the sample to that of the DNA origami calibration standard. Given the requirement of a more sophisticated SMLM setup and a high-precision calibration tool, we will explore the quantification of NiV-F copy numbers in nanoclusters in a future project.
  
  Also, it is not clear how many cells the authors employ for their statistics (at least 30-50 cells should be employed and not consider the number of events blinking events. I hope the authors are not considering only a single cell to run their stats... The differences between the mutants and the NiV-F is minor even if their statistical analyses give a difference (they should average the number and size of the clusters per cell for a total of 30-50 cells with experiments performed at least in three different cells following the same protocol). Overall, it seems that the authors have only evaluated a very low number of cells.
  
  We disagree with this comment from Reviewer #2. The sample size for cluster analysis in SMLM images was chosen by considering the target of the study (cells and VLPs) and the data acquisition and analysis standards in the SMLM imaging field. We also noted the sample size (# of ROI and cells) in the figure legend.
  
  Below, we compared the sample sizes in our study to those in similar studies that used comparable imaging and cluster analysis methods from 2015 to 2024. The classical clustering analysis methods are categorized into global clustering (e.g. nearest neighbor analysis, Ripley’s K function, and pair correlation function) and complete clustering, such as density-based analysis (e.g. DBSCAN, Superstructure, FOCAL, ToMATo) and Tessellationbased analysis (e.g. Delaunay triangulation, Voronoii Tessellation). The global clustering analysis method provides spatial statistics for global protein clustering or organization (e.g. clustering extent), while the complete clustering approach extracts information from a single-cluster level, such as the morphology and localization density of individual clusters. We used the density-based analyses, DBSCAN and OPTICS, for cluster analysis on cell plasma membranes and VLP membranes.
  
  Author response table 1.
  
  The comparison of imaging methods, analysis methods, and sample size in the current study to other studies conducted from 2015 to 2024.
  
  They should also compare the level of expression (with the number of molecules per cell provided by number and brightness) with the total number of clusters.
  
  We thank reviewer 2 for this suggestion. We compared the level of expression with the total number of clusters for F-WT in Figure 1I in the main text.
  
  The same applies to the VLP assay. I assume the authors have only taken VLPs expressing both NiV-M and NiV-F (and NiV-G). But even if this is not clearly stated I would urge the authors to show how many viruses were compared per condition (normally I would expect 300 particles per condition coming from three independent experiments. As a negative control to evaluate the cluster effect I would mix the different conditions. Clearly you have clusters with all conditions and the differences in clustering depending on each condition are minimal. Therefore you need to increase the n for all experiments.
  
  We thank reviewer 2 for this comment. We acquired and analyzed more images of NiV VLPs bearing F-WT, Q393L, L53D, and V108D. Results are shown in the revised Figure 4 and the number of VLPs (>300) used for analysis is specified in the figure legend. An increased number of VLP images does not affect the classification result in Figure 4C.
  
  As for the suggestion on “evaluating the cluster effect at different mixed conditions”, I assume that reviewer 2 would like to see how the presence of different viral structural proteins (F, M, and G) on VLPs could affect F clustering. We showed that the organization of NiV envelope proteins on the VLP membrane is similar in the presence or absence of NiV-M by direct visualization 27, suggesting that the effect of NiV-M on F-WT clustering on VLPs is minimal. We also show comparable incorporation of NiV-F among the NiV-F hexamer-oftrimer mutants (Fig. 4A). Therefore, we did not test the F clustering at different F, M, and G combinations in this paper. However, this could be an interesting question to pursue in a paper focusing on NiV VLP production.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The manuscript by Wang and colleagues describes single molecule localization microscopy to quantify the distribution and organization of Nipah virus F expressed on cells and on virus-like particles. Notably the crystal structure of F indicated hexameric assemblies of F trimers. The authors propose that F clustering favors membrane fusion.
  
  Strengths:
  
  The manuscript provides solid data on imaging of F clustering with the main findings of:
  
  - F clusters are independent of expression levels
  
  - Proteolytic cleavage does not affect F clustering
  
  - Mutations that have been reported to affect the hexamer interface reduce clustering on cells and its distribution on VLPs - - F nanoclusters are stabilized by AP
  
  Weaknesses:
  
  The relationship between F clustering and fusion is per se interesting, but looking at F clusters on the plasma membrane does not exclude that F clustering occurs for budding. Many viral glycoproteins cluster at the plasma membrane to generate micro domains for budding.
  
  This does not exclude that these clusters include hexamer assemblies or clustering requires hexamer assemblies.
  
  We thank reviewer #3 for this question. We did not focus on the role of NiV-F clusters for budding in the current manuscript, although this is an interesting topic to pursue. In this manuscript, we observed that NiV VLP budding is decreased for some cluster-disrupting mutants, such as F-YA, and F-LI4A. however, F-V108D showed increased budding compared to F-WT (Fig. 4A). We also observed that VLPs and VSV/NiV pseudoviruses expressing L53D have little NiV-G (Fig. 4A, Fig. S4F and S4H), although the incorporation level of L53D is comparable to that of wt F in both VLPs and pseudovirions (Fig. 4A and Fig. S4F). L53D is a hypofusogenic mutant with decreased clustering ability. Therefore, our current data do not show a clear link between F clustering and NiV VLP budding or glycoprotein incorporation.
  
  We reported that both NiV-F and -M form clusters at the plasma membrane although NiV-F clusters are not enriched at the NiV-M positive membrane domains 1. This result indicates that NiV-M is the major driving force for assembly and budding, while NiV-F is passively incorporated into the assembly sites. The central role of NiV-M in budding is also supported by a recent study showing that NiV-M induces membrane curvature by binding to PI(4,5)P2 in the inner leaflet of the plasma membrane 28. However, the expression of NiV-F alone induces the production of vesicles bearing NiV-F 29 and NiV-F recruits vesicular trafficking and actin cytoskeleton factors to VLPs either alone or in combination with NiV-G and -M, indicating a potential autonomous role in budding 30. Additionally, several electron microscopy studies show that the paramyxovirus F forms 2D lattice interspersed above the M lattice, suggesting the participation of F in virus assembly and budding. Nonetheless, the evidence above suggests that NiV-F may play a role in budding, but our data cannot correlate NiV-F clustering to budding.
  
  Assuming that the clusters are important for entry, hexameric clusters are not unique to Nipah virus F. Similar hexameric clusters have been described for the HEF on influenza virus C particles (Halldorsson et al 2021) and env organization on Foamy virus particles (Effantin et al 2016), both with specific interactions between trimers. What is the organization of F on Nipah virus particles? If F requires to be hexameric for entry, this should be easily imaged by EM on infectious or inactivated virus particles.
  
  We thank reviewer #3 for this suggestion. The hexamer-of-trimer NiV-F is observed on the VLP surface by electron tomography 4. The NiV-F hexamer-of-trimers are arranged into a soccer ball-like structure, with one trimer being part of multiple hexamer-of-trimers. The implication of NiV-F clusters in virus entry and the potential mechanism for NiV-F higherorder structure formation are discussed in the revised manuscripts.
  
  AP stabilization of the F clusters is curious if the clusters are solely required for entry? Virus entry does not recruit the clathrin machinery. Is it possible that F clusters are endocytosed in the absence of budding?
  
  We thank reviewer #3 for this question. The evidence from the current study does not exclude the role of NiV-F clustering in virus budding. NiV-F is known to be endocytosed in the virus-producing cells for cleavage by Cathepsin B or L at endocytic compartments at a pH-dependent manner31–33 in the absence of budding. However, given that all cleaved and uncleaved NiV-F have an endocytosis signal sequence at the cytoplasmic tail and are able to interact with AP-2 for endosome assembly and the cleaved and uncleaved F may have similar clustering patterns (Fig. 2), we do not think NiV-F clustering is specifically regulated for the cleavage of NiV-F. A plausible hypothesis is that NiV-F clusters are stabilized by multiple intrinsic factors (e.g. trimer interface) and host factors (e.g. AP-2) on cell membrane for cell-cell fusion and virus budding. We linked the clustering to the fusion ability of NiV-F in this study, but the NiV-F clustering may also be important in facilitating virus budding. Once in the viruses, the higher-order assembly of the clusters (e.g. lattice) may form due to protein enrichment, and the cell factors may not be the major maintenance force.
  
  Clusters are required for budding.
  
  Other points:
  
  Fig. 3: Some of the V108D and L53D clusters look similar in size than wt clusters. It seems that the interaction is important but not absolutely essential. Would a double mutant abrogate clustering completely?
  
  We thank Reviewer #3 for the suggestion. We generated a double mutant of NIV-F with L53D and V108D (NiV-F-LV) and assessed its expression and processing. Although the mutant retained processing capability, it exhibited minimal surface expression, making it unfeasible to analyze its nano-organization on the cell or viral membrane.
  
  Author response image 4.
  
  The expression and fusion activity of Flag-tagged NiV-F and NiV-F L53D-V108D (LV). (A) Representative western blot analysis of NiV-F-WT, LV in the cell lysate of 293T cells. 293T cells were transfected by NiV-F-WT or the LV mutant. The empty vector was used as a negative control. The cell lysates were analyzed on SDS-PAGE followed by western blotting after 28hrs post-transfection. F0 and F2 were probed by the M2 monoclonal mouse antiFLAG antibody. GAPDH was probed by monoclonal mouse anti-GAPDH. (B) Representative images of 293T cell-cell fusion induced by NiV-G and NiV-F-WT or NiV-F-LV. 293T cells were co-transfected with plasmids coding for NiV-G and empty vector (NC) or NiV-F constructs. Cells were fixed at 18 hrs post-transfection. Arrows point to syncytia. Scale bar: 10um. (C) Relative cell-cell fusion levels in 293T cells in (B). Five fields per experiment were counted from three independent experiments. Data are presented as mean ± SEM. (D) The cell surface expression levels of NiV-F-WT, NiV-F-LV in 293T cells measured by flow cytometry. Mean fluorescence Intensity (MFI) values were calculated by FlowJo and normalized to that of F-WT. Data are presented as mean ± SEM of three independent experiments. Statistical significance was determined by the unpaired t-test with Welch’s correction (*P<0.05, **P<0.01, ***P<0.001, ****P<0.0001). Values were compared to that of the NiV-F-WT.
  
  Fig. 4: The distribution of F on VLPs should be confirmed by cryoEM analyses. This would also confirm the symmetry of the clusters. The manuscript by Chernomordik et al. JBC 2004 showed that influenza HA outside the direct contact zone affects fusion, which could be further elaborated in the context of F clusters and the fusion mechanism.
  
  We thank reviewer 3 for this suggestion. The distribution of F on VLPs was resolved by electron tomogram which showed that the NiV-F hexamer-of-trimers are arranged into a soccer ball-like structure 4. The role of influenza HA outside of the contact zone in fusion activation is an interesting phenomenon. It may address the energy transmission within and among clusters. We will pursue this topic in a future project.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  • Please define all used abbreviations throughout the manuscript and in the SI.
  
  We defined the abbreviations at their first usage.
  
  • The sentence starting with "Additionally, ..." on line 155 appears to be incomplete.
  
  We corrected this sentence.
  
  • The statement starting with "As reported, ..." on line 181 should be supported by a reference.
  
  We added a reference.
  
  • In Fig. 4C, it is unclear what the x and y axes represent.
  
  Fig. 4C is a t-SNE plot for visualizing high-dimensional data in a low-dimensional space. It maintains the local data structure but does not represent exact quantitative relationships. In other words, points that are close together in Fig. 4C are also close in the high-dimensional space, meaning the OPTICS plots, which reflect the clustering patterns, are similar for two points that are positioned near each other in Fig. 4C. Therefore, the x and y axes do not represent the original, quantitative data, and thus the axis titles are meaningless.
  
  • The reference on line 306 appears to be unformatted.
  
  We reformatted the reference.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The authors need to include the overall statistics for each experiment (at least 30 to 50 cells with three independent experiments are needed).
  
  We highlighted the sample size (number of ROI and number of cells) used for analysis in the figure legend. The determination of the sample size is justified in Table 1 in the response letter.
  
  The authors need to generate a functional pseudovirus system (for example HIVpp/NiV F) to run both infectivity and fusion experiments (including Apr-BlaM assay).
  
  We tested viral entry using a VSV/NiV pseudovirus system and the viral entry kinetics using VLPs expressing NiV-M-β-lactamase. The results are presented in Fig. S1, S4, S6, and S7.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Even low resolution EM data on VLPs or viruses would strengthen the conclusions.
  
  We thank this reviewer for the suggestion. We cited the NiV VLP images acquired by electron tomography 4, but we currently have limited resources to perform cryoEM on NiV VLPs.
  
  References.
  
  (1) Liu, Q., Chen, L., Aguilar, H. C. & Chou, K. C. A stochastic assembly model for Nipah virus revealed by super-resolution microscopy. Nature Communications 9, 3050 (2018).
  
  (2) Khetawat, D. & Broder, C. C. A Functional Henipavirus Envelope Glycoprotein Pseudotyped Lentivirus Assay System. Virology Journal 7, 312 (2010).
  
  (3) Palomares, K. et al. Nipah Virus Envelope-Pseudotyped Lentiviruses Efficiently Target ephrinB2Positive Stem Cell Populations In Vitro and Bypass the Liver Sink When Administered In Vivo. J Virol 87, 2094–2108 (2013).
  
  (4) Xu, K. et al. Crystal Structure of the Pre-fusion Nipah Virus Fusion Glycoprotein Reveals a Novel Hexamer-of-Trimers Assembly. PLoS Pathog 11, e1005322 (2015).
  
  (5) Bakker, E. & Swain, P. S. Estimating numbers of intracellular molecules through analysing fluctuations in photobleaching. Sci Rep 9, 15238 (2019).
  
  (6) Nayak, C. R. & Rutenberg, A. D. Quantification of Fluorophore Copy Number from Intrinsic
  
  Fluctuations during Fluorescence Photobleaching. Biophys J 101, 2284–2293 (2011).
  
  (7) Salavessa, L. & Sauvonnet, N. Stoichiometry of ReceptorsReceptors at the Plasma MembranePlasma membrane During Their EndocytosisEndocytosis Using Total Internal Reflection Fluorescent (TIRF) MicroscopyMicroscopy Live Imaging and Single-Molecule Tracking. in Exocytosis and Endocytosis: Methods and Protocols (eds. Niedergang, F., Vitale, N. & Gasman, S.) 3–17 (Springer US, New York, NY, 2021). doi:10.1007/978-1-0716-1044-2_1.
  
  (8) Slenders, E. et al. Confocal-based fluorescence fluctuation spectroscopy with a SPAD array detector. Light Sci Appl 10, 31 (2021).
  
  (9) Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Identification of clustering artifacts in photoactivated localization microscopy. Nat Methods 8, 527–528 (2011).
  
  (10) Baumgart, F. et al. Varying label density allows artifact-free analysis of membrane-protein nanoclusters. Nat Methods 13, 661–664 (2016).
  
  (11) Zanacchi, F. C. et al. A DNA origami platform for quantifying protein copy number in super-resolution. Nat Methods 14, 789–792 (2017).
  
  (12) Jungmann, R. et al. Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT. Nature Methods 11, 313–318 (2014).
  
  (13) Rubin-Delanchy, P. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat Methods 12, 1072–1076 (2015).
  
  (14) Griffié, J. et al. 3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse. Sci Rep 7, 4077 (2017).
  
  (15) Dynamic Bayesian Cluster Analysis of Live-Cell Single Molecule Localization Microscopy Datasets - Griffié - 2018 - Small Methods - Wiley Online Library. https://onlinelibrary.wiley.com/doi/full/10.1002/smtd.201800008.
  
  (16) Caetano, F. A. et al. MIiSR: Molecular Interactions in Super-Resolution Imaging Enables the Analysis of Protein Interactions, Dynamics and Formation of Multi-protein Structures. PLOS Computational Biology 11, e1004634 (2015).
  
  (17) Malkusch, S. & Heilemann, M. Extracting quantitative information from single-molecule superresolution imaging data with LAMA – LocAlization Microscopy Analyzer. Sci Rep 6, 34486 (2016).
  
  (18) Zhang, Y., Lara-Tejero, M., Bewersdorf, J. & Galán, J. E. Visualization and characterization of individual type III protein secretion machines in live bacteria. Proceedings of the National Academy of Sciences 114, 6098–6103 (2017).
  
  (19) Tobin, S. J. et al. Single molecule localization microscopy coupled with touch preparation for the quantification of trastuzumab-bound HER2. Sci Rep 8, 15154 (2018).
  
  (20) Levet, F. et al. SR-Tesseler: a method to segment and quantify localization-based super-resolution microscopy data. Nature Methods 12, 1065–1071 (2015).
  
  (21) Peters, R., Griffié, J., Burn, G. L., Williamson, D. J. & Owen, D. M. Quantitative fibre analysis of singlemolecule localization microscopy data. Sci Rep 8, 10418 (2018).
  
  (22) Levet, F. et al. A tessellation-based colocalization analysis approach for single-molecule localization microscopy. Nat Commun 10, (2019).
  
  (23) Banerjee, C. et al. ULK1 forms distinct oligomeric states and nanoscopic structures during autophagy initiation. Science Advances 9, eadh4094 (2023).
  
  (24) Pageon, S. V. et al. Functional role of T-cell receptor nanoclusters in signal initiation and antigen discrimination. Proceedings of the National Academy of Sciences 113, E5454–E5463 (2016).
  
  (25) Cresens, C. et al. Flat clathrin lattices are linked to metastatic potential in colorectal cancer. iScience 26, 107327 (2023).
  
  (26) Seeling, M. et al. Immunoglobulin G-dependent inhibition of inflammatory bone remodeling requires pattern recognition receptor Dectin-1. Immunity 56, 1046-1063.e7 (2023).
  
  (27) Liu, Q. T. et al. The nanoscale organization of Nipah virus matrix protein revealed by super-resolution microscopy. Biophysical Journal 121, 2290–2296 (2022).
  
  (28) Norris, M. J. et al. Measles and Nipah virus assembly: Specific lipid binding drives matrix polymerization. Science Advances 8, eabn1440 (2022).
  
  (29) Patch, J. R. et al. The YPLGVG sequence of the Nipah virus matrix protein is required for budding. Virol. J. 5, 137 (2008).
  
  (30) Johnston, G. P. et al. Nipah Virus-Like Particle Egress Is Modulated by Cytoskeletal and Vesicular Trafficking Pathways: a Validated Particle Proteomics Analysis. mSystems 4, e00194-19 (2019).
  
  (31) Diederich, S. et al. Activation of the Nipah Virus Fusion Protein in MDCK Cells Is Mediated by Cathepsin B within the Endosome-Recycling Compartment. J Virol 86, 3736–3745 (2012).
  
  (32) Diederich, S., Thiel, L. & Maisner, A. Role of endocytosis and cathepsin-mediated activation in Nipah virus entry. Virology 375, 391–400 (2008).
  
  (33) Pager, C. T., Craft, W. W., Patch, J. & Dutch, R. E. A mature and fusogenic form of the Nipah virus fusion protein requires proteolytic processing by cathepsin L. Virology 346, 251–257 (2006).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.07.579372v2
www.biorxiv.org www.biorxiv.org

New submission 30/11/2023, 07:52:15

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The regulation of motor autoinhibition and activation is essential for efficient intracellular transport. This manuscript used biochemical approaches to explore two members in the kinesin-3 family. They found that releasing UNC-104 autoinhibition triggered its dimerization whereas unlocking KLP-6 autoinhibition is insufficient to activate its processive movement, which suggests that KLP-6 requires additional factors for activation, highlighting the common and diverse mechanisms underlying motor activation. They also identified a coiled-coil domain crucial for the dimerization and processive movement of UNC-104. Overall, these biochemical and single-molecule assays were well performed, and their data support their statements. The manuscript is also clearly written, and these results will be valuable to the field.
  
  Thank you very much!
  
  Ideally, the authors can add some in vivo studies to test the physiological relevance of their in vitro findings, given that the lab is very good at worm genetic manipulations. Otherwise, the authors should speculate the in vivo phenotypes in their Discussion, including E412K mutation in UNC-104, CC2 deletion of UNC-104, D458A in KLP-6.
  
  We have shown the phenotypes unc-104(E412K) mutation in C. elegans (Niwa et al., Cell Rep, 2016) and described about it in discussion (p.14 line 3-4). The mutant worm showed overactivation of the UNC-104-dependent axonal transport, which is consistent with our biochemical data showing that UNC-104(1-653)(E412K) is prone to form a dimer and more active than wild type.
  
  It has been shown that L640F mutation induces a loss of function phenotype in C. elegans (Cong et al., 2021). The amount of axonal transport is reduced in unc-104(L640F) mutant worms. L640 is located within the CC2 domain. To show the importance of CC2-dependent dimerization in the axonal transport in vivo, we biochemically investigated the impact of L640F mutation.
  
  By introducing L640F into UNC-104(1-653)(E412K), we performed SEC analysis. The result shows that UNC-104(1-653)(E412K,L640F) failed to form stable dimers despite the release of their autoinhibition (new Figure S8). This result strongly suggests the importance of the CC2 domain in the axonal transport in vivo. Based on the result, we discussed it in the revised manuscript (p.13 line 6-8).
  
  Regarding KLP-6(D458A), we need a genetic analysis using genome editing and we would like to reserve it for a future study. We speculate that the D458A mutation could lead to an increase in transport activity in vivo similar to unc-104(E412K). This is because the previous study have shown that wild-type KLP-6 was largely localized in the cell body, while KLP-6(D458A) was enriched at the cell periphery in the N2A cells (Wang et al., 2022). We described it in discussion (p.14 line 13-14).
  
  While beyond the scope of this study, can the author speculate on the candidate for an additional regulator to activate KLP-6 in C. elegans?
  
  The heterodimeric mechanoreceptor complex, comprising LOV-1 and PKD-2, stands as potential candidates for regulating KLP-6 dimerization. We speculate the heterodimerization property is suitable for the enhancement of KLP-6 dimerization. On the other hand, it's noteworthy that KLP-6 can undergo activation in Neuro 2a cells upon the release of autoinhibition (Wang et al., 2022). This observation implies the involvement of additional factors which are not present in sf9 cells may be able to induce dimerization. Post-translational modifications would be one of the candidates. We discussed it in p14 line 7-14.
  
  The authors discussed the differences between their porcine brain MTs and chlamydonomas axonemes in UNC-104 assays. However, the authors did not really retest UNC-104 on axonemes after more than two decades, thereby not excluding other possibilities.
  
  We thought that comparing different conditions used in different studies is essential for the advancement of the field of molecular motors. Therefore, we newly performed single-molecule assay using Chlamydomonas axonemes and compared the results with brain MTs (Fig. S6). Just as observed in the study by Tomoshige et al., we were also unable to observe the processive runs of UNC-104(1-653) on Chlamydomonas axonemes (Fig. S6A). Furthermore, we found that the landing rate of UNC-104(1-653) on Chlamydomonas axonemes was markedly lower in comparison to that on purified porcine microtubules (Fig. S6B).
  
  Reviewer #1 (Recommendations For The Authors):
  
  More discussion as suggested above would improve the manuscript.
  
  We have improved our manuscript as described above.
  
  Reviewer #2 (Public Review):
  
  The Kinesin superfamily motors mediate the transport of a wide variety of cargos which are crucial for cells to develop into unique shapes and polarities. Kinesin-3 subfamily motors are among the most conserved and critical classes of kinesin motors which were shown to be self-inhibited in a monomeric state and dimerized to activate motility along microtubules. Recent studies have shown that different members of this family are uniquely activated to undergo a transition from monomers to dimers.
  
  Niwa and colleagues study two well-described members of the kinesin-3 superfamily, unc104 and KLP6, to uncover the mechanism of monomer to dimer transition upon activation. Their studies reveal that although both Unc104 and KLP6 are both self-inhibited monomers, their propensities for forming dimers are quite different. The authors relate this difference to a region in the molecules called CC2 which has a higher propensity for forming homodimers. Unc104 readily forms homodimers if its self-inhibited state is disabled while KLP6 does not.
  
  The work suggests that although mechanisms for self-inhibited monomeric states are similar, variations in the kinesin-3 dimerization may present a unique form of kinesin-3 motor regulation with implications on the forms of motility functions carried out by these unique kinesin-3 motors.
  
  Thank you very much!
  
  Reviewer #2 (Recommendations For The Authors):
  
  The work is interesting but the process of making constructs and following the transition from monomers to dimers seems to be less than logical and haphazard. Recent crystallographic studies for kinesin-3 have shown the fold and interactions for all domains of the motor leading to the self-inhibited state. The mutations described in the manuscript leading to disabling of the monomeric self-inhibited state are referenced but not logically explained in relation to the structures. Many of the deletion constructs could also present other defects that are not presented in the mutations. The above issues prevent wide audience access to understanding the studies carried out by the authors.
  
  We appreciate this comment. We improved it as described bellow.
  
  Suggestions: Authors should present schematic, or structural models for the self-inhibited and dimerized states. The conclusions of the papers should be related to those models. The mutations should be explained with regard to these models and that would allow the readers easier access. Improving access to the readers in and outside the motor field would truly improve the impact of the manuscript on the field.
  
  The structural models illustrating the autoinhibited state have been included in new Figure S4, accompanied by an explanation of the correlation between the mutations and these structures in the figure legend. Additionally, schematic models outlining the dimerization process of both UNC-104 and KLP-6 have been provided in Figure S9 to enhance reader comprehension of the process.
  
  Reviewer #3 (Public Review):
  
  In this work, Kita et al., aim to understand the activation mechanisms of the kinesin-3 motors KLP-6 and UNC-104 from C. elegans. As with many other motor proteins involved in intracellular transport processes, KLP-6 and UNC-104 motors suppress their ATPase activities in the absence of cargo molecules. Relieving the autoinhibition is thus a crucial step that initiates the directional transport of intracellular cargo. To investigate the activation mechanisms, the authors make use of mass photometry to determine the oligomeric states of the full-length KLP-6 and the truncated UNC-104(1-653) motors at sub-micromolar concentrations. While full-length KLP-6 remains monomeric, the truncated UNC-104(1-653) displays a sub-population of dimeric motors that is much more pronounced at high concentrations, suggesting a monomer-to-dimer conversion. The authors push this equilibrium towards dimeric UNC-104(1-653) motors solely by introducing a point mutation into the coiled-coil domain and ultimately unleashing a robust processivity of the UNC-104 dimer. The authors find that the same mechanistic concept does not apply to the KLP-6 kinesin-3 motor, suggesting an alternative activation mechanism of the KLP-6 that remains to be resolved. The present study encourages further dissection of the kinesin-3 motors with the goal of uncovering the main factors needed to overcome the 'self-inflicted' deactivation.
  
  Thank you very much!
  
  Reviewer #3 (Recommendations For The Authors):
  
  126-128: It is surprising that surface-attachment does not really activate the full-length KLP6 motor (v=48 {plus minus} 42 nm/s). Can the authors provide an example movie of the gliding assay for the FL KLP6 construct? Gliding assays are done by attaching motors via their sfGFP to the surface using anti-GFP antibodies. Did the authors try to attach the full-length KLP-6 motor directly to the surface? If the KLP-6 motor sticks to the surface via its (inhibitory) C-terminus, this attachment would be expected to activate the motor in the gliding assay, ideally approaching the in vivo velocities of the activated motor.
  
  We have included an example kymograph showing the gliding assay of KLP-6FL (Fig. S1A). When we directly attached KLP-6FL to the surface, the velocity was 0.15 ± 0.02 µm/sec (Fig. S1B), which is similar to the velocity of KLP-6(1-390). While the velocity observed in the direct-attachment condition is much better than those observed in GFP-mediated condition, the observed velocity remains considerably slower than in vivo velocities. Firstly, we think this is because dimerization of KLP-6 is not induced by the surface attachment. Previous studies have shown that monomeric proteins are generally slower than dimeric proteins in the gliding assay (Tomishige et al., 2002). These are consistent with our observation that KLP-6 remains to be monomeric even when autoinhibition is released. Secondly, in vitro velocity of motors is generally slower than in vivo velocity.
  
  156-157: It seems that the GCN4-mediated dimerization induces aggregation of the KLP6 motor domains as seen in the fractions under the void volume in Figure 3B (not seen with the Sf9 expressed full-length constructs, see Figure 1B). Also, the artificially dimerized motor construct does not fully recapitulate the in vivo velocity of UNC-104. Did the authors analyze the KLP-6(1-390)LZ with mass photometry and is it the only construct that is expressed in E. coli?
  
  KLP-6::LZ protein is not aggregating. We have noticed that DNA and RNA from E. coli exists in the void fraction and they occasionally trap recombinant kinesin-3 proteins in the void fraction. To effectively remove these nucleic acids from our protein samples, we employed streptomycin sulfate as a purification method (Liang et al., Electrophoresis, 2009). Please see Purification of recombinant proteins in Methods. In the size exclusion chromatography analysis, we observed that KLP-6(1-393)LZ predominantly eluted in the dimer fraction (New Figure 3). Subsequently, we reanalyzed the motor's motility using a total internal reflection fluorescence (TIRF) assay, as shown in the revised Figure 3. Even after these efforts, the velocity was not changed significantly. The velocity of KLP-6LZ is about 0.3 µm/sec while that of cellular KLP-6::GFP is 0.7 µm/sec (Morsci and Barr, 2011). Similar phenomena, "slower velocity in vitro", has been observed in other motor proteins.
  
  169: In Wang et al., (2022) the microtubule-activated ATPase activities of the mutants were measured in vitro as well, with the relative activities of the motor domain and the D458A mutant being very similar. The D458A mutation is introduced into the full-length motor in Wang et al., while in the present work, the mutation is introduced into the truncated KLP-6(1-587) construct. Can the authors explain their reasoning for the latter?
  
  (1) Kinesins are microtubule-stimulated ATPases. i.e. The ATPase activity is induced by the binding with a microtubule.
  
  (2) Previous studies have shown that the one-dimensional movement of the monomeric motor domain of kinesin-3 depends on the ATPase activity even when the movement does not show clear plus-end directionality (Okada et al., Science, 1998).
  
  (3) While KLP-6(1-587) does not bind to microtubules, both KLP-6(1-390) (= the monomeric motor domain) and KLP-6(1-587)(D458A) similarly bind to microtubules and show one dimensional diffusion on microtubules (Fig. 4E and S2B).
  
  Therefore, the similar ATPase activities of the motor domain(= KLP-6(1-390)) and KLP-6(D458A) observed by Wang et al. is because both proteins similarly associate with and hydrolyze ATP on microtubules, which is consistent with our observation. On the other hand, because KLP-6(wild type) cannot efficiently bind to microtubules, the ATPase activity is low.
  
  Can the authors compare the gliding velocities of the KLP-6(1-390)LZ vs KLP-6(1-587) vs KLP-6(1-587)(D458A) constructs to make sure that the motors are similarly active?
  
  We conducted a comparative analysis of gliding velocities involving KLP-6(1-390), KLP-6(1-587), and KLP-6(1-587)(D458A) (Fig. S1C). We used KLP-6(1-390) instead of KLP-6(1-390)LZ, aligning with the protein used by Wang et al.. We demonstrated that both KLP-6(1-587) and KLP-6(1-587) (D458A) exhibited activity levels comparable to that of KLP-6(1-390). The data suggests that the motor of all recombinant proteins are similarly active.
  
  Please note that, unlike full length condition (Fig. 1D and S1A and S1B), the attachment to the surface using the anti-GFP antibody can activates KLP-6(1-587). The data suggests that, due to the absence of coverage by the MBS and MATH domain (Wang et al., Nat. Commun., 2022), the motor domain of KLP-6(1-587) to some extent permits direct binding to microtubules under gliding assay conditions.
  
  Are the monomeric and dimeric UNC-104(1-653) fractions in Figure 5B in equilibrium? Did the authors do a re-run of the second peak of UNC-104(1-653) (i.e. the monomeric fraction with ~100 kDa) to assess if the monomeric fraction re-equilibrates into a dimer-monomer distribution?
  
  We conducted a re-run of the second peak of UNC-104(1-653) and verified its re-equilibration into a distribution of dimers and monomers after being incubated for 72 hours at 4°C (Fig. S5).
  
  UNC-104 appears to have another predicted coiled-coiled region around ~800 aa (e.g. by NCoils) that would correspond to the CC3 in the mammalian homolog KIF1A. This raises the question if the elongated UNC-104(1-800) would dimerize more efficiently than UNC-104(1-653) (authors highlight the sub-population of dimerized UNC-104(1-653) at low concentrations in Figure 5C) and if this dimerization alone would suffice to 'match' the UNC-104(1-653)E412K mutant (Figure 5D). Did the authors explore this possibility? This would mean that dimerization does not necessarily require the release of autoinhibition.
  
  We have tried to purify UNC-104(1-800) and full-length UNC-104 using the baculovirus system. However, unfortunately, the expression level of UNC-104(1-800) and full length UNC-104 was too low to perform in vitro assays even though codon optimized vectors were used. Instead, we have analyzed full-length human KIF1A. We found that full-length KIF1A is mostly monomeric, not dimeric (Please look at the Author response image 1). The property is similar to UNC-104(1-653) (Figure 5A-C). Therefore, we think CC3 does not strongly affect dimerization of KIF1A, and probably its ortholog UNC-104. Moreover, a recent study has shown that CC2 domain, but not other CC domains, form a stable dimer in the case of KIF1A (Hummel and Hoogenraad, JCB, 2021). Given the similarity in the sequence of KIF1A and UNC-104, we anticipate that the CC2 domain of UNC-104 significantly contributes to dimerization, potentially more than other CC domains. We explicitly describe it in the Discussion in the revised manuscript.
  
  Author response image 1.
  
  Upper left, A representative result of size exclusion chromatography obtained from the analysis of full-length human KIF1A fused with sfGFP. Upper right, A schematic drawing showing the structure of KIF1A fused with sfGFP and a result of SDS-PAGE recovered from SEC analysis. Presumable dimer and monomer peaks are indicated. Lower left, Presumable dimer fractions in SEC were collected and analyzed by mass photometry. The result confirms that the fraction contains considerable amount of dimer KIF1A. Lower right, Presumable monomer fractions were collected and analyzed by mass photometry. The result confirms that the fraction mainly consists of monomer KIF1A. Note that these results obtained from full-length KIF1A protein are similar to those of UNC-104(1-653) protein shown in Figure 5A-C.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.18.537280v3
www.biorxiv.org www.biorxiv.org

Steady-state neuron-predominant LINE-1 encoded ORF1p protein and LINE-1 RNA increase with aging in the mouse and human brain

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain. They claim that ORF1p is expressed in the human and mouse brain at a steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is not surprising, but important to document.
  
  Thank you for recognizing the importance of this study. The two cited papers have indeed reported the presence of full-length transcripts in the mouse and human brain. However, the first (PMID: 38773348) report has shown evidence of full-length LINE-1 RNA and ORF1 protein expression in the mouse hippocampus (but not elsewhere) and the second (PMID: 37910626) shows full-length LINE-1 RNA expression and H3K4me3-ChIP data in the frontal and temporal lobe of the human brain, but not protein expression.
  
  Strengths:
  
  Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the evidence for steady-state expression of ORF1p in the mouse brain appears robust.
  
  Weaknesses:
  
  Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments:
  
  (1) The expression of ORF1p in the human brain shown in Figure 1j is not convincing. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not nonspecific labelling? Additional validations and controls are needed to verify the specificity of this signal.
  
  We have validated the antibody against human ORF1p (Abcam 245249-> https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr22227-6-ab245249), which we use for Western blotting experiments (please see Fig1J and new Suppl Fig.2A,B and C), by several means.
  
  (1) We have done immunoprecipitations and co-immunoprecipitations followed by quantitative mass spectrometry (LC-MS/MS; data not shown as they are part of a different study). We efficiently detect ORF1p in IPs (Western blot now added in Suppl Fig2B) and by quantitative mass spectrometry (5 independent samples per IP-ORF1p and IP-IgG: ORF1p/IgG ratio: 40.86; adj p-value 8.7e-07; human neurons in culture; data not shown as they are part of a different study). We also did co-IPs followed by Western blot using two different antibodies, either the Millipore clone 4H1 (https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F) or the Abcam antibody to immunoprecipitate and the Abcam antibody for Western blotting on human brain samples. Indeed, the Millipore antibody does not work well on Western Blots in our hands. We consistently revealed a double band indicating that both bands are ORF1p-derived. We have added an ORF1p IP-Western blot as Suppl Fig. 2B which clearly shows the immunoprecipitation of both bands by the Abcam antibody. Abcam also reports a double band, and they suspect that the lower band is a truncated form (see the link to their website above). ORF1p Western blots done by other labs with different antibodies have detected a second band in human samples
  
  Sato, S. et al. LINE-1 ORF1p as a candidate biomarker in high grade serous ovarian carcinoma. Sci Rep 13, 1537 (2023) in Figure 1D
  
  McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 119, e2115999119 (2022)) showing a Western blot of an inducible LINE-1 (ORFeus) detected by the MABC1152 ORF1p antibody from Millipore Sigma in Figure 7 - Walter et al. eLife 2016;5:e11418. (DOI: 10.7554/eLife.11418) in mouse ES cells with an antibody made inhouse (gift from another lab; in Figure 2B)
  
  The lower band might thus be a truncated form of ORF1p or a degradation product which appears to be shared by mouse and human ORF1p. We have now mentioned this in the revised version of the paper (lines 183-189).
  
  (2) We have used the very well characterized antibody from Millipore ((https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NFMABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F)) for immunostainings and detect ORF1p staining in human neurons in the very same brain regions (Fig 2H, new Suppl Fig. 2E) including the cerebellum in the human brain. We added a 2nd antibody-only control (Suppl Fig. 2E).
  
  (3) We also did antibody validation by siRNA knock-down. However, it is important to note, that these experiments were done in LUHMES cells, a neuronal cell line which we differentiated into human dopaminergic neurons. In these cells, we only occasionally detect a double band on Western blots, but mostly only reveal the upper band at ≈ 40kD. The results of the knockdown are now added as Suppl Fig. 2C.
  
  Altogether, based on our experimental validations and evidence from the literature, we are very confident that it is indeed ORF1p that we detect on the blots and by immmunostainings in the human brain.
  
  (2) The data shown in Figure 2g are not convincing. How can the authors be sure that this signal controls are needed to verify the specificity of this signal. represents ORF1p expression and not non-specific labelling? Extensive additional validations and
  
  In line 117-123 of the manuscript, we had specified “Importantly, the specificity of the ORF1p antibody, a widely used, commercially available antibody [18,34–38], was confirmed by blocking the ORF1p antibody with purified mouse ORF1p protein resulting in the complete absence of immunofluorescence staining (Suppl Fig. 1A), by using an inhouse antibody against mouse ORF1p[17] which colocalized with the anti-ORF1p antibody used (Suppl Fig. 1B, quantified in Suppl Fig. 1C), and by immunoprecipitation and mass spectrometry used in this study (see Author response image 1)”.
  
  Figure 2G shows a Western blot using an extensively used and well characterized ORF1p antibody from abcam (mouse ORF1p, Rabbit Recombinant Monoclonal LINE-1 ORF1p antibody-> (https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr21844-108-ab216324; cited in at least 11 publications) after FACS-sorting of neurons (NeuN+) of the mouse brain. We have validated this ORF1p antibody ourselves in IPs (please see Fig 6A) and co-IP followed by mass spectrometry (LC/MS-MS; see Fig 6, where we detect ORF1p exclusively in the 5 independent ORF1p-IP samples and not at all in 5 independent IgG-IP control samples, please also see Suppl Table 2). In this analysis, we detect ORF1p with a ratio and log2fold of ∞ , indicating that this proteins only found in IP-ORF1p samples (5/5) and not in the IP-control samples ((not allowing for the calculation of a ratio with p-value), please see Suppl Table 2)
  
  Author response image 1.
  
  In addition, we have added new data showing the entire membrane of the Western blot in Fig1H (now Suppl Fig.1E) and a knock-down experiment using siRNA against ORF1p or control siRNA in mouse dopaminergic neurons in culture (MN9D; new Suppl Fig.1D). This together makes us very confident that we are looking at a specific ORF1p signal. The band in Figure 2G is at the same height as the input and there are no other bands visible (except the heavy chain of the NeuN antibody, which at the same time is a control for the sorting). We added some explanatory text to the revised version of the manuscript in lines 120-124 and lines 253-256).
  
  Please note that in the IP of ORF1p shown in Fig6A, there is a double band as well, strongly suggesting that the lower band might be a truncated or processed form of ORF1p. As stated above, this double band has been detected in other studies (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells using an in-house generated antibody against mouse ORF1p. Thus, with either commercial or in-house generated antibodies in some mouse and human samples, there is a double band corresponding to full-length ORF1p and a truncated or processed version of it.
  
  We noticed that we have not added the references of the primary antibodies used in Western blot experiments in the manuscript, which was now corrected in the revised version.
  
  (3) The data showing a reduction in ORF1p expression in the aged mouse brain is confusing and maybe even misleading. Although there is an increase in the intensity of the ORF1p signal in ORF1p+ cells, the data clearly shows that fewer cells express ORF1p in the aged brain. If these changes indicate an overall loss or gain of ORF1p, expression in the aged brain is not resolved. Thus, conclusions should be more carefully phrased in this section. It is important to show the quantification of NeuN+ and NeuN- cells in young vs aged (not only the proportions as shown in Figure 3b) to determine if the difference in the number of ORF1p+ cells is due to loss of neurons or perhaps a sampling issue. More so, it would be essential to perform WB and/or proteomics experiments to complement the IHC data for the aged mouse samples.
  
  We thank the reviewer for this comment and we agree that the representation has been confusing, which is why we added data to Suppl Fig.5 (F-K) using a different representation. As suggested by the reviewer, in new Suppl Fig. 5F-K, we now show the number of ORF1p+, NeuN+ or NeuN- cells per mm2. These graphs indicate that the number per mm2 of ORF1p+ cells overall do not decrease significantly (with the dorsal striatum as an exception, but possibly due to technical limitations which we now discuss in the results section, line 332-335). Globally, there is thus no loss of ORF1p+ expressing cells. There is also no global nor region-specific decrease in the number of neuronal cells (NeuN+ per mm2) although proportions change (Suppl Fig 2E, confocal acquisitions), thus most likely due to a gain of non-neuronal cells in this region. Concerning Western blots on mouse brain tissues from young and aged individuals, we unfortunately ran into limits regarding tissue availability of aged mice.
  
  (4) The transcriptomic data presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). Given the read length and the unstranded sequencing approach, I would at least ask the authors to add genome browser tracks of the upregulated loci so that we can properly assess the clarity of the results. I would also suggest adding the mappability profile of the elements in question. In addition, since this manuscript focuses on ORF1p, it would be essential to document changes in protein levels (and not just transcripts) in the ageing human brain.
  
  We agree that there are limitations to the analysis of TEs with short read sequencing and we have added more text on this aspect in the revised version (results section) and highlighted the problem of limited and disequilibrated sample size in the discussion (line 638-644). The approaches shown in PMID: 38773348 & PMID: 37910626 or even a combination of them, would be ideal of course. However, here we re-analyzed a unique preexisting dataset (Dong et al, Nature Neuroscience, 2018; http://dx.doi.org/10.1038/s41593-018-0223-0), which contains RNA-seq data of human post-mortem dopaminergic neurons in a relatively high number of brain-healthy individuals of a wide age range including some “young” individuals which is rare in post-mortem studies. Such data is unfortunately not available with long read sequencing or any other more appropriate approach yet. Limitations are evident, but all limitations will apply equally to both groups of individuals that we compare. The general mappability profile of the full-length LINE-1 “UIDs” was shown in old Suppl Fig 6A. We have colorhighlighted now in new Suppl Fig 8C the specific elements in this graph. Most importantly, we have now used, as a condensate of suggestions by all reviewers, a combination of mappability score, post-hoc power calculation, visualization and correlation with adjacent gene expression in order to retain a specific locus with confidence or not. Using these criteria, we retained UID-68 (Fig 5D) which has a relatively high mappability score (Suppl Fig.8C) plus an overlap of umap 50 mappability peaks and read mapping when visualizing the locus in IGV (new Fig. 5E), very high post-hoc power (96.6%; continuous endpoint, two independent samples, alpha 0.05) and no correlation with adjacent gene expression per individual (Fig. 5F, G). Based on these criteria, we had to exclude UID-129, UID-37, UID-127 and UID-137, reinforcing the notion that a combination of quality control criteria might be crucial to retain a specific locus with confidence. This is now mentioned in the manuscript in the discussion in line 427430).
  
  We will not be able to document changes in protein levels in aged human dopaminergic neurons as we do not have access to this material. We have tried to obtain human substantia nigra tissues but were not able to get sufficient amounts to do laser-capture microdissection or FACS analyses, especially of young individuals. There are still important limitations to tissue availability, especially of young individuals, and even more so of specific regions of interest like the substantia nigra pars compacta affected in Parkinson disease.
  
  (5) More information is needed on RNAseq of microdissections of dopaminergic neurons from 'healthy' postmortem samples of different ages. No further information on these samples is provided. I would suggest adding a table with the clinical information of these samples (especially age, sex, and cause of death). The authors should also discuss whether this experiment has sufficient power. The human ageing cohort seems very small to me.
  
  This is a re-analysis of a published dataset (Dong et al, Nat Neurosci, 2018; doi:10.1038/s41593-018-0223-0), available through dbgap (phs001556.v1.p1). In this original article, the criteria for inclusion as a brain-healthy control were as follows:
  
  “…Subjects… were without clinicopathological diagnosis of a neurodegenerative disease meeting the following stringent inclusion and exclusion criteria. Inclusion criteria: (i) absence of clinical or neuropathological diagnosis of a neurodegenerative disease, for example, PD according to the UKPDBB criteria[47], Alzheimer’s disease according to NIA-Reagan criteria[48], or dementia with Lewy bodies by revised consensus criteria[49]; for the purpose of this analysis incidental Lewy body cases (not meeting clinicopathological diagnostic criteria for PD or other neurodegenerative disease) were accepted for inclusion; (ii) PMI ≤ 48 h; (iii) RIN[50] ≥ 6.0 by Agilent Bioanalyzer (good RNA integrity); and (iv) visible ribosomal peaks on the electropherogram. Exclusion criteria were: (i) a primary intracerebral event as the cause of death; (2) brain tumor (except incidental meningiomas); (3) systemic disorders likely to cause chronic brain damage.”
  
  We do not have access to the cause of death, but we have added available metadata as Suppl_Table 5 to the manuscript.
  
  We have performed a post-hoc power analysis (using the “Post-hoc Power Calculator” https://clincalc.com/stats/Power.aspx, which evaluates the statistical power of an existing study and added the results to the revision. Due to this analysis, we have indeed taken out Suppl Fig 7 as a whole which had shown data of three full-length LINE-1 loci (UID-37, UID-127 and UID-137) with low power (between 17-66% power). The locus shown in Fig. 5D of the UID-68) had a post-hoc power score of 96.6% which increases our confidence in this full-length LINE-1 element being upregulated in aged dopaminergic neurons. UID-129 had a post-hoc power score of 97%. However, visualization and mappability analysis of the UID-129 locus led us to exclude this UID.
  
  The post-hoc power analysis for L1HS and L1PA2 revealed a low power (28.4% and 32.8% respectively). We have added these results to the manuscript (line 359-362), but decided to keep the data in as this will hopefully be a motivation for future confirmation studies knowing that the availability of similar data from brain-healthy human dopaminergic neurons especially of young individuals will be low.
  
  (6) The findings in this manuscript apply to both human and mouse brains. However, the landscape of the evolutionarily young L1 subfamilies between these two species is very different and should be part of the discussion. For example, the regulatory sequences that drive L1 expression are quite different in human and mouse L1s. This should be discussed.
  
  Indeed, they are different. We have added a paragraph to the discussion (lines 539-548).
  
  (7) On page 3 the authors write: "generally accepted that TE activation can be both, a cause and consequence of aging". This statement does not reflect the current state of the field. On the contrary, this is still an area of extensive investigation and many of the findings supporting this hypothesis need to be confirmed in independent studies. This statement should be revised to reflect this reality.
  
  We agree, this is overstated, we have changed this sentence accordingly to:
  
  “It is now, 31 years after the initial proposition of the “transposon theory of aging” by Driver and McKechnie [14], still a matter of debate whether TE activation can be both, a cause and a consequence of aging [15,16].”
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals, and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS.
  
  Strengths:
  
  A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in the brain is also a strength in that it provides a novel dataset for future studies.
  
  Thank you for highlighting the strength of our study.
  
  Weaknesses:
  
  The main weakness of the study is that cell type specificity of ORF1p expression was not examined beyond neuron (NeuN+) vs non-neuron (NeuN-). Indeed, a recent study (Bodea et al. 2024, Nature Neuroscience) found that ORF1p expression is characteristic of parvalbumin-positive interneurons, and it would be very interesting to query whether other neuronal subtypes in different brain regions are distinguished by ORF1p expression.
  
  We agree that this point is important to address. We have mentioned in the manuscript our previous work, which showed that in the mouse ventral midbrain, dopaminergic neurons (TH+/NeuN+) express ORF1p and that these neurons express higher levels of ORF1p than adjacent non-dopaminergic neurons (TH-/NeuN+; Blaudin de Thé et al, EMBO J, 2018). Others have shown evidence of full-length L1 RNA expression in both excitatory and inhibitory neurons but much less expression in non-neuronal cells (Garza et al, SciAdv, 2023). Further, ORF1p expression was documented in excitatory (CamKIIa-positive) and CamKIIa-negative neurons in the mouse frontal cortex (Zhang et al, Cell Res, 2022, doi.org/10.1038/s41422-022-00719-6). We do detect ORF1p staining in mouse (Fig. 1B, panel 10) and human Purkinje cells (based on morphology and in accordance with data from Takahashi et al, Neuron, 2022; DOI: 10.1016/j.neuron.2022.08.011) and most probably basket cells (based on anatomical location in the molecular layer near Purkinje cells) of the cerebellum (Suppl Fig.4). Some Purkinje cells express PV in mice (https://doi.org/10.1016/j.mcn.2021.103650 and 10.1523/JNEUROSCI.22-1607055.2002), as do stellate and basket cells of the molecular layer (10.1523/JNEUROSCI.22-16-07055.2002). While ORF1p is expressed in PV cells of the hippocampus (Bodea et al, Nat Neurosci, 2024) and in the human and mouse cerebellum in PV-expressing neurons, it does not seem as if ORF1p expression is restricted to PV cells overall. To adress this question experimentally, we have now performed ORF1p stainings in different brain regions (hippocampus, cortex, hindbrain, thalamus, ventral midbrain and cerebellum) together with parvalbumin (PV) stainings and in some cases including the lectin WFA (Wisteria floribunda agglutinin, which specifically stains glycoproteins surrounding PV+ neurons). We have added this data to the manuscript as Suppl Fig.4. While PV-positive neurons often co-stain with ORF1p, not all ORF1p positive cells are PV-positive. We have also deepened the discussion of this aspect in the revised manuscript (line 579-599).
  
  The data suggesting that ORF1p expression is increased in aged mouse brains is intriguing, although it seems to be based upon modestly (up to 27%, dependent on brain region) higher intensity of ORF1p staining rather than a higher proportion of ORF1+ neurons. Indeed, the proportion of NeuN+/Orf1p+ cells actually decreased in aged animals. It is difficult to interpret the significance and validity of the increase in intensity, as Hoechst staining of DNA, rather than immunostaining for a protein known to be stably expressed in young and aged neurons, was used as a control for staining intensity.
  
  We have now separated the analysis of NeuN+, ORF1p+ and NeuN- cells (please see new Suppl Fig5F-K) which highlights the fact that there is indeed no change in the number of ORF1p+ cells in the young compared to the aged mouse brain. However, while neuronal cell numbers throughout the brain do not change significantly (new Suppl Fig.5F), while cell proportions in the ventral midbrain (confocal microscopy based quantifications) change, possibly due to a combination of a slight loss in neurons and a gain in non-neuronal cell numbers (Suppl Fig3E). Please also keep in mind that the ventral midbrain region on images taken on a confocal microscope are a much smaller region than the midbrain motor region as specified by ABBA on images taken by the slide scanner. A different marker than DNA as a control requires the use of a protein that is stably expressed throughout the brain and throughout age. We are not aware of a protein for which this has been established. To nevertheless try to address this issue, we used whole-brain imaging intensity data for the protein Rbfox3 (NeuN) which we originally used as a marker for cell identity. We have now added the quantifications of the protein Rbfox3 (NeuN) to Fig3 (new Fig3B). As shown in this figure, NeuN intensity is not stable from one individual to another, neither in control mice nor in the aged control group. Most importantly, NeuN staining intensity does not increase in aged mice. As we did not use NeuN intensity but presence or absence of NeuN as a marker for cell identity, the instability of NeuN intensity from one individual mouse to another does not have an influence on the data presented in this manuscript. It does indicate however, that the overall increase of ORF1p in aged mice is not a mere reflection of a general decrease in protein turnover. As stated above, the DNA staining with Hoechst controls for technical artefacts. Using Hoechst and NeuN as control, we have thus provided evidence for the fact that the increase in ORF1p intensity per cell is indeed specific for ORF1p. This is now added to the results section (line 299-301).
  
  The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue.
  
  As stated in the manuscript, the list of previously published datasets does include a mouse dataset with ORF1p interacting proteins in mouse spermatocytes (please see line 479-480: “ORF1p interactors found in mouse spermatocytes were also present in our analysis including CNOT10, CNOT11, PRKRA and FXR2 among others (Suppl_Table4).”) -> De Luca, C., Gupta, A. & Bortvin, A. Retrotransposon LINE-1 bodies in the cytoplasm of piRNA-deficient mouse spermatocytes: Ribonucleoproteins overcoming the integrated stress response. PLoS Genet 19, e1010797 (2023)). We indeed did not validate any interactors for several reasons (economic reasons and time constraints (post-doc leaving)). However, we feel that the significant overlap with previously published interactors highlights the validity of our data and we anticipate that this list of ORF1p protein interactors in the mouse brain will be of further use for the community.
  
  The authors achieved the goals of broadly characterizing ORF1p expression across different regions of the mouse brain, and identifying putative ORF1p interactors in the mouse brain. However, findings from both parts of the study are somewhat superficial in depth.
  
  This provides a useful dataset to the field, which likely will be used to justify and support numerous future studies into L1 activity in the aging mammalian brain and in neurodegenerative disease. Similarly, the list of ORF1p interacting proteins in the brain will likely be taken up and studied in greater depth.
  
  Reviewer #3 (Public Review):
  
  The question about whether L1 exhibits normal/homeostatic expression in the brain (and in general) is interesting and important. L1 is thought to be repressed in most somatic cells (with the exception of some stem/progenitor compartments). However, to our knowledge, this has not been authoritatively / systematically examined and the literature is still developing with respect to this topic. The full gamut of biological and pathobiological roles of L1 remains to be shown and elucidated and this area has garnered rapidly increasing interest, year-by-year. With respect to the brain, L1 (and repeat sequences in general) have been linked with neurodegeneration, and this is thought to be an aging-related consequence or contributor (or both) of inflammation. This study provides an impressive and apparently comprehensive imaging analysis of differential L1 ORF1p expression in mouse brain (with some supporting analysis of the human brain), compatible with a narrative of non-pathological expression of retrotransposition-competent L1 sequences. We believe this will encourage and support further research into the functional roles of L1 in normal brain function and how this may give way to pathological consequences in concert with aging. However, we have concerns with conclusions drawn, in some cases regardless of the lack of statistical support from the data. We note a lack of clarity about how the 3rd party pre-trained machine learning models perform on the authors' imaging data (validation/monitoring tests are not reported), as well as issues (among others) with the particular implementation of co-immunoprecipitation (ORF1p is not among the highly enriched proteins and apparently does not reach statistical significance for the comparison) - neither of which may be sufficiently rigorous.
  
  Thank you for your comments on our manuscript.
  
  We have addressed the concerns about the machine learning paradigm (see Author response image 1). Concerning the co-IP-MS, we can confirm that ORF1p is among the highly enriched proteins as it was not found in the IgG control (in 5 independent samples), only in the ORF1p-IP (in 5 out of 5 independent samples). This is what the infinite sign in Suppl Table 2 indicates and this is why there is no p-value assigned as infinite/0 doesn’t allow to calculate a pvalue. We have made this clearer in the revised version of the manuscript and added a legend to Suppl Table 2.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I would recommend the authors remove the human data and expand the analysis of the aged mice. This would most likely result in a much stronger manuscript.
  
  We do think that the imaging data and the Western blots are convincing (please also see our detailed response above to the criticism concerning the antibody we used and the newly added data) and very much reflects what we find in the mouse brain, i.e. concerning the percentage of neurons expressing ORF1p and the percentage of ORF1p+ cells being neuronal. When it comes to the transcriptomic data on aged dopaminergic neurons, we have further discussed the limitations of this study in the revised manuscript and hope that the findings inspire others in the field to redo these types of analyses using the now state-of-the-art NGS technologies to address the question and validate what we have found.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The characterization of ORF1p expression across the mouse brain would be vastly more informative if cell identity was established beyond NeuN+/NeuN---the neuronal predominance of L1 activity in the brain has long been observed. Indeed, even corroboration of the PV+ interneuron signature previously reported would both lend credence to the present study and provide valuable confirmation to the field.
  
  We agree. Please see our response above as well as the new experimental data we added (Suppl Fig5.F-K).
  
  The increased intensity (but not prevalence in terms of % of Orf1p positive cells) of Orf1p expression in aged mouse brains would be more convincing with further context and perhaps better controls. Is overall protein turnover in aged neurons simply slower than in neurons from younger brains? Immunostaining with another protein marker, rather than Hoescht staining of DNA, to demonstrate that increased staining intensity is unique to Orf1p, would make this result more compelling.
  
  To address this question, we have now added the quantifications of the protein Rbfox3 (NeuN) to Fig3 (Fig. 3B). As shown in this figure, NeuN intensity is not stable from one individual to another, neither in control mice nor in the aged control group. As we did not use NeuN intensity but presence or absence of NeuN as a marker for cell identity, this does not have any influence on the data presented in this manuscript. It does indicate however, that the overall increase of ORF1p in aged mice is not a mere reflection of a general decrease in protein turnover. As stated above, the DNA staining with Hoechst controls for technical artefacts. Using Hoechst and NeuN as control, we have thus provided evidence for the fact that the increase in ORF1p intensity per cell is indeed specific for ORF1p.
  
  Western blotting on cell lysates from aged vs young NeunN+ sorted cells would also strengthen this conclusion, although I appreciate the technical challenge of physically isolating whole mature neuronal cells.
  
  Indeed, this would be feasible but only after FACS sorting, which is technically challenging on whole brain cells (less so on nuclei). We unfortunately do not have the possibility to embark on this right now.
  
  Concerning data presentation, Figure 3A would be much more informative if the graph was broken down to show the proportion of ORF1p+ and ORF1p- cells, regardless of NeuN status, and the proportion of NeuN+ and NeuN- cells shown independently of Orf1p status. It is difficult to ascertain the relationship of either of these variables to age, as the graph is presented now.
  
  We followed the suggestions of the reviewer agreeing that breaking down this figure into either ORF1p+ or NeuN+ or NeuN- cells without double attribution is easier to interpret. However, we also chose to use cell densities (cell numbers/ per mm2) to represent the data (new Suppl Fig.5F-K) which is even more precise while proportions are now shown in Suppl Fig.3A-E. Indeed, while it is important to realize that the variables ORF1p+/- or NeuN+/- are not completely independent of each other (as shown in proportions of old Fig4A and B, new Suppl Fig3A and B) as they form four categories (NeuN+/ORF1p+; NeuN+/ORF1p-. NeuN-/ORF1p+, NeuN-/ORF1p-), we can see from the data that there is no overall change in neuron number in the mouse brain between 3 month and 16 months of age. There isn’t an overall change of the density of ORF1p+ cells nor NeuN- cells in the mouse brain with the exception of a decrease in cell density of ORF1p-positive cells in the dorsal striatum accompanied by an increase in non-neuronal cell density (but as discussed above and in the manuscript (line 332-337), this might be due to technical limitations). Thus, while ORF1p intensities per cell increase significantly in older mice, here is no significant change in ORF1p+ cell number.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) According to the description in Materials and Methods on the analysis of the confocal images (lines 731-743) the authors used Cell-Pose for both the nuclei and cell segmentation tasks, using model=cyto and diameter=30 for the first (nuclei) and model=cyto2 and diameter=40 for the second (cell). Description of analysis of sagittal brain regions (lines 746-764) indicates the pre-trained model DSB2018 from StarDist 2D was used for nuclei detection, and Cell-Pose using model cyto2 and diameter=30 for cell segmentation. Detected nuclei were then matched to segmented cell areas based on overlap criteria and each nucleus was labeled as 'positive' or 'negative' for either OFR1P or NEU-N.
  
  As described in its three publications (1, 2, 3), Cell-Pose as a segmentation tool is trained in different datasets, with cyto2 being trained on a more varied dataset than cyto. In their library they also offer a model specific for nuclei2. Some description and explanation on the reasons two different models were used for nuclei detection and not choosing the offered specific pre-trained model by Cell-Pose in either case.
  
  According to the cellpose library documentation "Changing the diameter will change the results that the algorithm outputs. When the diameter is set smaller than the true size then cellpose may over-split cells. Similarly, if the diameter is set too big then cellpose may over-merge cells.". It would be useful to offer the justification of the pixels chosen for the analysis (possibly average pixel counts in a subsample of Hoechst images).
  
  Answers to questions 1-5:
  
  Regarding ABBA, slices were first positioned and oriented manually along the Z-axis, without using DeepSlice. Automated affine registration was then applied in the XY plane, followed by manual refinement. 1 slice per mouse brain, 4 mouse brains per condition.
  
  Regarding the gradient heatmap, as stated in the figure legend of Fig3F; Represented is the fold-change in percent (aged vs young) of the “mean of the mean” ORF1p expression per ORF1p+ cell quantified mapped onto the nine different regions analyzed. More precisely, the heatmap shows the percentage increase in the mean of all mean cell intensities in the aged condition, normalized to the mean of all mean cell intensities in the young condition. The pre-trained models and hyperparameters were selected based on their optimal performance across our image datasets. For slide scanner images, the StarDist DSB 2018 model was chosen over a Cellpose model because it more effectively avoided detecting out-of-focus nuclei, which were common in slide scanner images due to the lack of optical sectioning. This issue was not present in confocal images, where Cellpose cyto model was used instead. To assess the performance of each model and diameter setting, we computed the average precision (AP) metric, which is defined as AP = TP/(TP+FP+FN), where TP = true positives, FP = false positives, and FN = false negatives. The AP was calculated at the commonly used Intersection over Union (IoU) threshold of 0.5. For confocal images, Cellpose models and hyperparameters were evaluated on eight images per channel, capturing intensity variability across different mouse ages and brain regions. A total of approximately 2,000 nuclei and 1,000 NeuN and ORF1p cells were manually annotated. The AP values at an IoU threshold of 0.5 were: 0.995 for nuclei, 0.960 for NeuN, and 0.974 for ORF1p cells. These high AP values confirm that the selected models and diameter settings were well-suited for analyzing the entire dataset. For slide scanner images, nuclei and cell detection were evaluated on 14 images per channel, with approximately 800 nuclei and 400 NeuN and ORF1p cells manually annotated. The AP values were lower compared to confocal images, mainly due to a lower signal-to-noise ratio, which led to an increased number of false positives and false negatives: 0.806 for nuclei, 0.675 for NeuN, and 0.695 for ORF1p cells. This decline in performance was expected given the challenges posed by slide scanner images, including background noise and out-of-focus objects. Notably, the observed false positives primarily correspond to small-sized nuclei/cells or those with low intensity, which evade the stringent filters that were applied. While fine-tuning the models could further enhance detection robustness, we considered that the selected models and diameter settings were suitable for processing the entire dataset.
  
  We added a paragraph to the materials & methods section with this new information; for confocal images (line 847-855), slide scanner images (line 878-885).
  
  Author response table 1.
  
  (2) Next to no information is offered regarding the brain segment registration and how the results were analyzed: The ABBA plug-in has two modules manual and automatic, via a DL pre-trained model called DeepSlice. The authors should report which mode of ABBA they used, how many slices per mouse brain, and how many brains. Moreover, there is no explanation of how the gradient heatmap of the brain regions (Figure 3G) was calculated.
  
  Please see above
  
  (3) Even the best algorithms produce some False predictions. In this application of the (3rd party) cellpose, StarDist, and ABBA pre-trained models, such cases of wrong predictions would have amplified downstream effects on the analysis e.g., wrongly characterizing certain cells as 'negative' (falsely not detected cell, falsely detected nucleus), or worse, biasing against certain cell subgroups (falsely not detected 'type' of nuclei). This is even more troubling with the variety of models used for the nuclei segmentation task, and the parameters in each. It is possible the authors performed optimizations and reported exactly such optimized values for their dataset, they should however still explicitly offer these detailed validation and optimization processes. The low statistical significance throughout the quantified results from these IF experiments (Figures 1-3) is also a cause for needing an explicit description of how these algorithms perform on the authors' data.
  
  It is good practice that a pre-trained model when applied to a new dataset like the one that the authors produced for this work, would require basic monitoring for how it performs in the new, previously unseen dataset, even when the model's generalizability has been reported previously as great. It would be best if the authors had handannotated a few images as the validation set and produced some model performance metrics as a supplemental table for all pre-trained models they used, in the datasets they used them at. Alternatively, the authors are offered the ability by the cellpose team to fine-tune the model for their data, and this could be used to perform the experiments for this work instead if the performance metrics of the used cellpose (cyto and cyto2) models prove to be poor.
  
  Please see above
  
  (4) The legend for Figure 1A indicates that Cell-Pose was used for cell detection and StarDist for nuclei detection in the confocal images (line 960). This needs clarification and correction.
  
  Please see above
  
  (5) Some explanation of why the models used were changed when using confocal or the slide scanner microscope would be nice.
  
  Please see above
  
  (6) The legend title of Figure 3 (line 1040) "Fig. 3: ORF1p expression is increased throughout the whole mouse brain in the context of aging" is misleading as half the panels in the figure demonstrate a decrease in ORF1pexpressing cells. The two can be both true, but in a more nuanced relationship. A more modest representation of the data in the title is also warranted by the unimpressive statistical significance achieved (notably with no correction for multiple testing, which would further inflate them).
  
  We have toned down the tile of Fig. 3 to “ORF1p expression is increased in some regions of the aged mouse brain” while leaving its meaning as globally. There is indeed no significant loss of ORF1p expressing cells (Suppl Fig. 5F; except in the dorsal striatum (Supl Fig. 5I, please see also discussion above), but there is a significant increase in ORF1p intensity per cell overall (Fig. 3A,C,F) and in several regions of the mouse brain (Fig E, G and H).
  
  (7) Figure 4 suffers for significance. For example in panel A, the few genes with the highest -log10P value, ie above 1.3 (p-value of ~0.05) have a log2-fold change of 0.2-0.3 (fold change 1.14-1.23). There are no hits with even the modest log2-fold change of 0.5 (fold-change 1.4). The big imbalance between young/old samples for these RNA seq experiments (6 vs 36 mice) could be an issue here too.
  
  The reviewer refers to mouse samples (“6 to 36 mice”), but this is data of human post-mortem dopaminergic neurons from brain-healthy individuals which were laser-captured and sequenced as reported by Dong et al, Nat Neurosci, 2018. There is indeed a big imbalance between young and old samples which are linked to the difficulties in availability of brain-healthy post-mortem tissues from young individuals which are obviously much rarer than from older people. We agree that the fold-enrichment are modest and p-values rather high, but we argue to keep this data in as it is based on rare post-mortem human brain tissues which were difficult to obtain and will be very difficult to obtain in sufficient number in future studies. We hope however, that these results will encourage such studies in the future and motivate researchers to further look into the expression of TEs in aging brain tissues with higher sample sizes and more suitable sequencing techniques. We have now in the revised version toned down some sentences (i.e. line 359: modest, but significant increase in several young…) and have now also added a post-hoc power analysis (results section line 359-362: “There was a modest but significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the “name” level (Fig. 4A, B), an analysis which was however underpowered (post-hoc power calculation; L1HS: 28.4%; L1PA2: 32.8%) and thus awaits further confirmation in independent studies.”)
  
  (8) Figure legend 4C (line 1088) should offer more explanation on what is compared for these correlations: the young vs old results, all intensities of all experiments, and intensities separately for each sample.
  
  We have added the missing information to Figure legend 4C (line 1209-1215): “Correlation of the RNA expression levels of LINE-1 elements with known transposable element regulators in human dopaminergic neurons (all ages included). What was compared are the expression levels of LINE-1 elements with known regulators of TEs for each individual sample, all ages included.”
  
  (9) Figure 5, panel D. The regressions are all driven by 1-2 outliers. Should be removed as they don't add anything.
  
  We agree and therefore have performed an outlier test (ROUT (Q=1%) and identified outliers (1 in each graph) have been taken out from the analysis. We argue that the information of a non-correlation of UID-68 and adjacent gene expression is important as it rules out a dependency of expression of the full-length LINE-1 depending on neighboring gene expression (see new Fig5E-G).
  
  (10) Figure 6 panel B. It is unexpected that the GO terms with the highest enrichment also show weak significance and vice-versa. Fold enrichment in the PANTHER tool is defined as the % of GO-term genes in the sample divided by the %GO-term genes in the background (organism).
  
  This is not unexpected as GO terms contain different numbers of proteins. Indeed, the significance can be different if the GO term contains for example 3 or 300 proteins. A GO term containing only few proteins with a high fold change between the conditions (here: ORF1p-IP vs whole mouse genome) will lead to a rather low significance for example. If you look at the last 6 categories in Fig 6B, you can appreciate that they have very similar values for enrichment but very different significance levels (FDR).
  
  (11) Many citations in the References sections are referred to by doi and "Published online" date. These should be corrected to include the citation in standard format (journal name, volume, issue, pages, etc).
  
  We apologize for this and have corrected this in the revised version.
  
  (12) (line 970) Legend of Figure 1 is missing label referencing panel C (ie (C) Bar plot showing the total....).
  
  Thank you for pointing this out, this has been corrected.
  
  (13) The bottom violin plot in Figure 1C lacks sufficient explanation (what are the M1-4 categories?). The same problem with panel G (same Figure 1).
  
  This has now been better explained. The M1-M4 categories denominate individual mice numbered from 1 to 4 for (results are shown per individual).
  
  -> specified in line 1098-1099 (Fig.1C) and new text (1117-1118: Fig.1G): Four three-month-old Swiss/ OF1 mice (labeled as M1 to M4) are represented each by a different color, the scattered line represents the median. ****p<0.0001, nested one-way ANOVA. Total cells analyzed = 4645
  
  (14) Figure 1B; confocal image 2 (Hippocampus) does not seem to tell the same story as the main slide scanner image. Overall, more explicit phrasing regarding how the Images in Figure 1B are not blow-outs of the bigger one but different, confocal images of the same regions.
  
  We have changed the sentence to: “Representative images acquired on a confocal microscope of immunostainings showing ORF1p expression (orange) in 10 different regions of the mouse brain.”, which hopefully helps to indicate that these images are indeed not blow-outs of the slide scanner image.
  
  (15) Young are defined as 3 months and 'old' as 16 months mice. 16-month group name would be better as "adults". Example of age range considered 'old': "Young (3-6-month-old) and aged (18-27-month-old) male mice were age- and source-matched for each experiment." https://www.cell.com/cell-metabolism/fulltext/S1550-4131(23)00462X?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS15504131230 0462X%3Fshowall%3Dtrue
  
  This is true, but the 16-month age group does not have a designation when looking at Mouse Life history stages in C57Bl/6 mice from the Jackson laboratory (see https://www.jax.org/news-and-insights/jax-blog/2017/november/when-are-mice-considered-old#), they are neither middle-aged nor old. We therefore believe that the designation as “aged” still holds true.
  
  (16) Lines 63-65 > To our understanding, both ORF1 and ORF2 proteins are thought to exhibit cis preference.
  
  Yes, that is true, but the sentence as it is does not make a claim about ORF2p not having cis-preference.
  
  (17) Figure 1I is only referred to as "Figure I". Twice. Page 8, line 173 & 176.
  
  Thank you, has been corrected.
  
  (18) Lines 178-182 >To investigate intra-individual expression patterns of ORF1p in the post-mortem human brain, we analyzed three brain regions of a neurologically healthy individual (Figure 1J) by Western blotting. ORF1p was expressed at different levels in the cingulate gyrus, the frontal cortex, and the cerebellum underscoring a widespread expression of human ORF1p across the human brain." > It is difficult for us to gauge how believable the blots are without knowing the amount of protein loaded.
  
  We have loaded 10ug of tissue lysate per lane (tissue pulverized with a Covaris Cryoprep; amount now mentioned in the materials & methods section). We have added some more information on the antibody in the revised manuscript (line 183-194).
  
  We say this from our experience conducting similar blots of anti-ORF1p IPs from human brain tissues using the same antibody (4H1) without successful detection of enriched protein by western blot (of course there can be many reasons for that, but knowing the amount of protein loaded is important for reproducibility). In addition, we find the "double" ORF1p bands they see in almost every blot atypical.
  
  In our hands, the 4H1 antibody does not work well on Western blots, but it immunoprecipitates well and works very well on immunostainings. However, the abcam AB 245249 works well for Western blotting (and IPs) which is why we used this antibody for these applications, respectively. As described above, there is evidence that the double band is not atypical, but rather frequent, which we now also mention in the revised manuscript line 183191: “To investigate intra-individual expression patterns of ORF1p in the post-mortem human brain, we analyzed three brain regions of a neurologically-healthy individual (Fig. 1J, entire Western blot membrane in Suppl Fig. 2A) by Western blotting using a commercial and well characterized antibody which we further validated by several means. The double band pattern in Western blots has been observed in other studies for human ORF1p outside of the brain (Sato et al, SciRep, 2023, McKerrow et al, PNAS, 2022) as well as for mouse ORF1p (Walter et al, eLife, 2016). We also validated the antibody by immunoprecipitation and siRNA knock-down in human dopaminergic neurons in culture (differentiated LUHMES cells, Suppl Fig. 2B and 2C) where we detect however in most cases the upper band only. The nature of the lower band is unknown, but might be due to truncation, specific proteolysis or degradation. ORF1p was expressed at different levels in the human post-mortem cingulate gyrus, the frontal cortex and the cerebellum underscoring a widespread expression of human ORF1p across the human brain. This was in accordance with ORF1p immunostainings of the human post mortem cingulate gyrus (Fig. 2H and Suppl Fig. 2E) and frontal cortex (Suppl Fig. 2E), with an absence of ORF1p staining when using the secondary antibody only (Suppl Fig. 2E).”
  
  In some images a band is labeled as IgG heavy chain (e.g. presumably from the FACS, Figure 2G, and IP, Figure 6A - which could contain residual antibody) - however, this is avoidable by using a different antibody for capture than detection - which also helps reduce false positive results.
  
  Unfortunately, we have only an antibody raised in rabbit available to perform IPs and Western blots on mouse tissues and therefore cannot avoid the detection of the IgG heavy chain.
  
  Aside from these, there seem to be persistent 'double bands' in the region of ORF1p. Generally, we are unaccustomed to seeing such 'double bands' in human anti-ORF1p western blots and IP-western blots, and since, in this study, this is seen in both mouse and human blots, it raises some doubts. Having the molecular mass ladder on each blot to at least allow for the assessment of migration consistency and would therefore be very helpful.
  
  We have added the molecular weights on the Western blots (Fig.1H, Fig. 2G and Suppl Fig.1D and E). As discussed also above, there is accumulating evidence that in some tissues, there are persistent double bands detected using ORF1p antibodies in both, mouse and human tissues.
  
  Human ORF1p detection:
  
  We have validated the antibody against human ORF1p (Abcam 245249-> https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr22227-6-ab245249), which we use for Western blotting experiments (please see Fig1J and new Suppl Fig.2A,B and C), by several means.
  
  (1) We have done immunoprecipitations and co-immunoprecipitations followed by quantitative mass spectrometry (LC-MS/MS; data not shown as they are part of a different study). We efficiently detect ORF1p in IPs (Western blot now added in Suppl Fig2B) and by quantitative mass spectrometry (5 independent samples per IP-ORF1p and IP-IgG: ORF1p/IgG ratio: 40.86; adj p-value 8.7e-07; human neurons in culture; data not shown as they are part of a different study). We also did co-IPs followed by Western blot using two different antibodies, either the Millipore clone 4H1 (https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone- 4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F) or the Abcam antibody to immunoprecipitate and the Abcam antibody for Western blotting on human brain samples. Indeed, the Millipore antibody does not work well on Western Blots in our hands. We consistently revealed a double band indicating that both bands are ORF1p-derived. We have added an ORF1p IP-Western blot as Suppl Fig. 2B which clearly shows the immunoprecipitation of both bands by the Abcam antibody. Abcam also reports a double band, and they suspect that the lower band is a truncated form (see the link to their website above). ORF1p Western blots done by other labs with different antibodies have detected a second band in human samples
  
  Sato, S. et al. LINE-1 ORF1p as a candidate biomarker in high grade serous ovarian carcinoma. Sci Rep 13, 1537 (2023) in Figure 1D
  
  McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 119, e2115999119 (2022)) showing a Western blot of an inducible LINE-1 (ORFeus) detected by the MABC1152 ORF1p antibody from Millipore Sigma in Figure 7 - Walter et al. eLife 2016;5:e11418. (DOI: 10.7554/eLife.11418) in mouse ES cells with an antibody made inhouse (gift from another lab; in Figure 2B)
  
  The lower band might thus be a truncated form of ORF1p or a degradation product which appears to be shared by mouse and human ORF1p. We have now mentioned this in the revised version of the paper (lines 183-189).
  
  (2) We have used the very well characterized antibody from Millipore ((https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F)) for immunostainings and detect ORF1p staining in human neurons in the very same brain regions (Fig 2H, new Suppl Fig. 2E) including the cerebellum in the human brain. We added a 2nd antibody-only control (Suppl Fig. 2E).
  
  (3) We also did antibody validation by siRNA knock-down. However, it is important to note, that these experiments were done in LUHMES cells, a neuronal cell line which we differentiated into human dopaminergic neurons. In these cells, we only occasionally detect a double band on Western blots, but mostly only reveal the upper band at ≈ 40kD. The results of the knockdown are now added as Suppl Fig. 2C.
  
  Altogether, based on our experimental validations and evidence from the literature, we are very confident that it is indeed ORF1p that we detect on the blots and by immmunostainings in the human brain.
  
  Mouse ORF1p detection: In line 117-123 of the manuscript, we had specified “Importantly, the specificity of the ORF1p antibody, a widely used, commercially available antibody [18,34–38], was confirmed by blocking the ORF1p antibody with purified mouse ORF1p protein resulting in the complete absence of immunofluorescence staining (Suppl Fig. 1A), by using an inhouse antibody against mouse ORF1p[17] which colocalized with the anti-ORF1p antibody used (Suppl Fig. 1B, quantified in Suppl Fig. 1C), and by immunoprecipitation and mass spectrometry used in this study (see Author response image 1)”.
  
  Figure 2G shows a Western blot using an extensively used and well characterized ORF1p antibody from abcam (mouse ORF1p, Rabbit Recombinant Monoclonal LINE-1 ORF1p antibody-> (https://www.abcam.com/enus/products/primary-antibodies/line-1-orf1p-antibody-epr21844-108-ab216324; cited in at least 11 publications) after FACS-sorting of neurons (NeuN+) of the mouse brain. We have validated this ORF1p antibody ourselves in IPs (please see Fig 6A) and co-IP followed by mass spectrometry (LC/MS-MS; see Fig 6, where we detect ORF1p exclusively in the 5 independent ORF1p-IP samples and not at all in 5 independent IgG-IP control samples, please also see Suppl Table 2). In this analysis, we detect ORF1p with a ratio and log2fold of ∞ , indicating that this proteins only found in IP-ORF1p samples (5/5) and not in the IP-control samples ((not allowing for the calculation of a ratio with p-value), please see Suppl Table 2)
  
  In addition, we have added new data showing the entire membrane of the Western blot in Fig1H (now Suppl Fig.1E) and a knock-down experiment using siRNA against ORF1p or control siRNA in mouse dopaminergic neurons in culture (MN9D; new Suppl Fig.1D). This together makes us very confident that we are looking at a specific ORF1p signal. The band in Figure 2G is at the same height as the input and there are no other bands visible (except the heavy chain of the NeuN antibody, which at the same time is a control for the sorting). We added some explanatory text to the revised version of the manuscript in lines 120-124 and lines 253-256).
  
  Please note that in the IP of ORF1p shown in Fig6A, there is a double band as well, strongly suggesting that the lower band might be a truncated or processed form of ORF1p. As stated above, this double band has been detected in other studies (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells using an in-house generated antibody against mouse ORF1p. Thus, with either commercial or in-house generated antibodies in some mouse and human samples, there is a double band corresponding to full-length ORF1p and a truncated or processed version of it.
  
  We noticed that we have not added the references of the primary antibodies used in Western blot experiments in the manuscript, which was now corrected in the revised version.
  
  (19) Figure 1H, 1J, 6A: Show/indicate molecular weight marker.
  
  The molecular weight markers were added (please see Fig.1H, Fig. 2G and Suppl Fig.1D and E).
  
  (20) Page 10, line 223. " ...expressing ORF1p and ORF1p"?
  
  Thank you, this was corrected.
  
  (21) Lines 279-280 "An increase of ORF1p expression was also observed in three other regions albeit not significant." > This means it is not distinguishable as a change under the assumptions and framework of the analysis; please remove this statement.
  
  We agree, we removed this sentence.
  
  (22) Page 13, line 301. Labeling the group with a mean age of 57.5 as "young" might be a bit misleading.
  
  This is why we put the “young” in quotation marks.
  
  (23) Lines 309-311 "however there was a significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the "name" level (Figure 4A, B)". > Effect size is tiny; is this really viable as biologically significant? Maybe just remove the volcano plot? Does panel A add anything not covered by B?
  
  We would like to keep the Volcano plot, even though effect sizes are small (which we acknowledge in the manuscript line 359-362: “There was a modest but significant increase in several younger LINE-1 elements including L1HS and L1PA2 at the “name” level (Fig. 4A, B), an analysis which was however underpowered (posthoc power calculation; L1HS: 28.4%; L1PA2: 32.8%) and thus awaits further confirmation in independent studies.” The reason for this decision is to illustrate a general increase in expression (even with a small effect size) of several LINE-1 elements at the name level with the youngest LINE-1 elements being amongst those with the highest effect.
  
  (24) Lines 327-328 "The transcripts of these genes showed, although not statistically significant, a trend for decreased expression in the elderly (Supplementary Figure 5D-G). > I do not recommend doing this.
  
  We agree and take it out.
  
  (25) Lines 339-342 "While several tools using expectation maximization algorithms in assigning multi-mapping reads have been developed and successfully tested in simulations 48,54, we used a different approach in mapping unique reads to the L1Base annotation of full-length LINE-1" > Generally, this section is not clear - what is the rationale for the approach (compared to the stated norms)? Ideally, justify this analytical choice and provide a basic comparison to other more standard approaches (even if briefly in a supplement).
  
  We thank the reviewer for his comment. Indeed, randomly assigning multi-mapping reads is usually a good strategy to quantify the expression of repeats at the family level (Teissandier et al. 2019) which we did in the first part of the analysis (class, family and name level). However, our main goal was to focus on specific single fulllength LINE elements which can encode ORF1p. We therefore decided to only use uniquely mapped reads, which is by definition the only way to be sure that a sequencing read really comes from a specific genomic location, and which will to not over-estimate their expression level. In this sense, we have added some explanatory text to this specific section. We also added a section to the discussion (line 638-644): This analysis has technical limitations inherent to transcriptomic analysis of repeat elements especially as it is based on short-read sequences and on a limited and disequilibrated number of individuals in both groups. Nevertheless, we tried to rule out several biases by demonstrating that mappability did not correlate with expression overall and used a combination of visualization, post-hoc power analysis and analysis of the mappability profile of each differentially expressed fulllength LINE-1 locus.
  
  (26) Page 16, line 389. The age span covered is 59 years although the difference in mean age between the two groups is only 25.5 years - please indicate both metrics.
  
  We have added this additional metric in line 432.
  
  (27) Lines 394-397 "Further, correlation analyses suggest that L1HS expression might possibly be controlled by the homeoprotein EN1, a protein specifically expressed in dopaminergic neurons in the ventral midbrain 50, the heterochromatin binding protein HP1, two known regulators of LINE-1, and the DNA repair proteins XRCC5/6." > This reads like a drastic reach unless framed explicitly as a 'tempting speculation' (or similar). I don't think this claim should be made as it is without further validation.
  
  We believe to have used careful language (“correlation analysis suggests”.“might possibly be controlled”) in the results section as well as in the discussion (line 660-671): “Matrix correlation analysis of several known LINE-1 regulators, both positive and negative, revealed possible regulators of young LINE-1 sequences in human dopaminergic neurons. Despite known and most probable cell-type unspecific regulatory factors like the heterochromatin binding protein CBX5/HP1 [51] or the DNA repair proteins XRCC5 and XRCC6 [49], we identified the homeoprotein EN1 as negatively correlated with young LINE-1 elements including L1HS and L1PA2. EN1 is an essential protein for mouse dopaminergic neuronal survival [50] and binds, in its properties as a transcription factor, to the promoter of LINE-1 in mouse dopaminergic neurons [17]. As EN1 is specifically expressed in dopaminergic neurons in the ventral midbrain, our findings suggests that EN1 controls LINE-1 expression in human dopaminergic neurons as well and serves as an example for a neuronal sub-type specific regulation of LINE-1.” To this we added: “Although these proteins are known regulators of LINE-1, this correlative relationship awaits experimental validation.”
  
  (28) Mouse protein/gene names are all capital letters on page 17/18. Changes on page 18/19. This should be consistent.
  
  Thank you, this has been corrected (all capital).
  
  (29) Page 23, line 559. The estimated ORF1p/ORF2p ratio referenced is based on an overexpression of L1 from a plasmid (ref87). > It should be made clear to the reader that it is still unknown whether such a ratio is representative of native conditions.
  
  OK, this is indeed true. Thank you for pointing this out. (line 621-622)
  
  (30) Lines 613-616 "Further, GO term analysis contained expected categories like "P-body", mRNA metabolism related categories, and "ribonucleoprotein granule". We also identified NXF1 as a protein partner of ORF1p, a protein found to interact with LINE-1 RNA related to its nuclear export 89." > There is no reason to speculate that the proteins in the pulldown are specific to L1 RNAs.
  
  We did not speculate that the proteins in the pulldown are specific to LINE-1 RNA. We just mentioned that NXF1 was an ORF1p protein partner and that it had been found previously as a LINE-1 RNA interactor.
  
  ORF1p is present in large heterogeneous assemblies - not every protein should be assigned an L1-related function and many proteins will be participating in general RNA-granule functions (given L1 ORFs are known to accumulate in such structures). Moreover, the granules are not the same in every cell type. IP is done in low salt and overnight incubation (poorly controlled for non-specific accumulation).
  
  We state that these key interactors are “probably” essential for completing or repressing the LINE-1 life cycle. It is true that we cannot affirm this. We therefore added a sentence to the discussion (line 679): “This supports the validity of the list of ORF1p partners identified, although we cannot rule out the possibility that unspecific protein partners might be pulled down due to colocalization in the same subcellular compartment.”
  
  (31) Lines 629-631" These results complete the picture of the post-transcriptional and translational control of ORF1p and suggest that these mechanisms, despite a steady-state expression, are operational in neurons." > Stating that these results complete the picture, which is still very much open for completion (granted, these results add to the picture), is an unneeded over-reach.
  
  We agree. We changed “complete” to “add to “ the picture.
  
  (32) Lines 641-644 "Finally, we found components of RNA polymerase II and the SWI/SNF complex as partners of ORF1p. This further indicates that ORF1p has access to the nucleus in mouse brain neurons as described for other cells 95,96, implying that ORF1p potentially has access to chromatin." > There is no way to know if this is a post-lysis effect - we have no real specificity information. The mock IP control is insufficient for this conclusion without further validation.
  
  We added: “however a bias due to a post-lysis effect cannot be excluded.” Line 711
  
  (33) ab216324 for IF and ab245122 for IP - why? What is the difference? Both are rated equally for IF and IP - please provide a rationale for reagent selection and use.
  
  These two antibodies are the same except their storage buffer. ab245122 is azide and BSA-free, while ab216324 contains the preservative sodium azide (0.01%) and the following constituents: PBS, 40% Glycerol (glycerin, glycerine), 0.05% BSA. As azide and BSA can affect coupling of antibodies to beads, antibodies which do not contain these components in their buffer are preferred for IPs (but can be stored less long).
  
  (34) Page 35, line 862. "1.3 x 105" should be "1.3 x 105".
  
  We added a regular x but we are not sure if this is what the reviewer was referring to ?
  
  (35) MS comparison in Figure 6. Why is the comparison not being made between young vs. old brain/neurons? This would be more informative instead of just showing what they IP over a mock IgG control and the comparison would track better with other experiments in the rest of the paper.
  
  Yes, that is true. However, we did not do this at the time as we did not have old mouse brain tissue available. Services from official animal providers in France have unfortunately only recently expanded their offer with regard to the availability of aged animals.
  
  (36) Supplementary Table 2 (MS data) is lacking information. How many peptides (unique/total) were discovered for each protein? Why are all ratios and p-values not listed for every protein in the table? LFQ protein intensity values should also be listed. Each supplementary table should have a legend as a separate tab in the document.
  
  As stated in the SupplTable2 and now made clearer in an independent tab file in SupplTable2 which contains a legend to the table, some proteins do not have associated p values and ratios as these proteins are found only in the ORF1p IP and not in the IgG control. This is why these proteins have an indefinite sign instead of a foldenrichment and no p-value assigned as we cannot calculate a ratio with X/0 which again makes it impossible to obtain a p-value. Concerning the absence of LFQ protein intensity values, as stated in the materials & methods section, we did not use these values (linear model) but instead the intensity values of the peptides: “The label free quantification was performed by peptide Extracted Ion Chromatograms (XICs), reextracted by conditions and computed with MassChroQ version 2.2.21 109. For protein quantification, XICs from proteotypic peptides shared between compared conditions (TopN matching) with missed cleavages were used. Median and scale normalization at peptide level was applied on the total signal to correct the XICs for each biological replicate (n=5). To estimate the significance of the change in protein abundance, a linear model (adjusted on peptides and biological replicates) was performed, and p-values were adjusted using the Benjamini–Hochberg FDR procedure.”
  
  The number of peptides unique/total for each protein has been added to Suppl_Table2 along other available information.
  
  (37) Poor overlap in 6C could in part be explained by the use of different sample/tissue types, but more likely the big difference could come from the very different conditions at which the IPs were performed (buffers and incubation times etc.).
  
  The overlap seems poor, but nevertheless is bigger as by chance (representation factor 2.6, p<5.4e-08). We agree that this can be in part explained by different experimental conditions which we now added to the discussion (line 478: “However, differences in experimental conditions could also influence this overlap.”)
  
  (38) Figure 6D is a very uninspiring representation of the data. What is the point of showing several binary interactions? Was the IgG control proteome also analyzed? Have proteins displayed in Figure 6 been corrected for that?
  
  The point of showing these interactions is that OFR1p interacts with clustered proteins. ORF1p interacts with proteins that belong to specific GO terms (Fig6b), but these proteins are also interacting with each other more than expected (Fig6C). This is the benefit of showing a STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) representation, which is a database of known and predicted protein–protein interactions. Indeed, proteins in Fig6 have been corrected for the IgG proteome. We only show proteins that were enriched or uniquely present in the ORF1p IP condition compared to the IgG control (please see Suppl_Table2).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.12.571308v3
www.biorxiv.org www.biorxiv.org

Intermittent fasting promotes ILC3s secreting IL-22 contributing to the beigeing of white adipose tissue

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Comment 1: The authors showed increased plasma IL-22 and its expression in the intestine. Are intestinal ILC3s the main source of plasma IL-22?
  
  Reply: ILC3s are the main source of IL-22 as reported previously (PMID: 30700914). In the small intestine, ILC3s account for about 62% of IL22+ cells. Other IL22+ cells include γδ T, Foxp3+T and CD4+T cells.
  
  Comment 2: The authors transplanted intestinal ILC3s from NCD mice to DIO mice and showed significant metabolic improvements. However, in Fig. 1, intermittent fasting increased IL-22positive ILC3s proportion rather than changing the total number. Please clarify whether this transplantation is due to increasing ILC3s number or introducing more IL-22 positive ILC3s (which are decreased in DIO). Are these transplanted ILC3s by default homing to the intestine rather than to other tissues?
  
  Reply: We believe that the transplantation increases ILC3s number, leading to the increment in IL22 levels. The transplanted ILC3s by default are homing to the intestine rather than to other tissues because ILC3s express several homing receptors such as CCR7, CCR9, and α4β7, which modulate their capacity to migrate to the gut (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). Our observation that ILC3s in adipose tissue remained unchanged by ILC3 cell transplantation (Supplementary Figure 5F) also supports this concept.
  
  Comment 3: Thermogenesis in this acute cold challenge is mainly by brown adipose tissue. Beiging is a chronic and adaptive response. Based on the data in WAT, there is a beiging phenotype, but the core body temperature in acute cold challenge is not an accurate readout. It would be a missed opportunity by not evaluating thermogenic activity in BAT. More browning genes should be included to strengthen the beiging phenotype of WAT. Moreover, inflammation in WAT can be examined to provide a whole picture of adipose tissue remodeling through this pathway.
  
  Reply: Per suggestion, we performed additional experiments to measure levels of inflammation genes such as Il4, Il1b, Il6, Il22, Il23, Il17a. As shown in supplemental figure 2D, these inflammation relevant genes were not altered.
  
  Comment 4: For the SVF beige adipocyte differentiation, 100 ng/mL IL-22 was used. This is highly above the physiological concentration at ~5 pg/mL. Please justify this high concentration used.
  
  Reply: We agree with the reviewer that the dose of IL-22 used is high. However, the efficient dose at 100 ng/ml used in our studies is consistent with the literatures. Previous reports have shown that IL-22 directly activates Stat3 in adipose tissue and primary adipocytes, and promotes the expression of genes involved in triglyceride lipolysis (Lipe and Pnpla2) and fatty-acid β-oxidation (Acox1) at the dose of 100 ng/ml (Wang X, Ota N, et al. Nature. 2014). Consistently, other studies have reported that IL-22 at 100 ng/ml significantly reversed the enhanced expression of CCL2, CCL20 and IL1B mRNAs in granulosa cells in vitro (Qi X, et al. Nat Med. 2019).
  
  Comment 5: The authors showed increased Ucp1 and Cidea expression by IL-22 treatment in SVFs. Please be aware that these increases are likely due to boosted adipogenesis as told by the morphology. Please examine more adipogenic markers to confirm. Is this higher adipogenesis caused by the high concentration of IL-22?
  
  Reply: Per suggestion, we examined the expression of adipogenic marker genes such as Pparγand Fabp4. We found that IL-22 did not increase the levels of these adipogenic marker genes relevant to the PBS control as shown in supplemental figure 6F.
  
  Author response image 1.
  
  Comment 6: In line 201, the authors drew the conclusion that IL-22 increased SVF beige differentiation. To fully support this conclusion, the authors should assure adipogenesis at the same baseline and then compare beiging, or examine the effect of IL-22 on normal adipogenesis to compare with beige differentiation.
  
  Reply: We examined the expression of adipogenic marker genes such as Pparγ and Fabp4 and found that IL-22 did not increase the expression of these adipogenic marker genes relevant to the PBS control.
  
  Reviewer #2:
  
  This study aims to investigate the mediatory role of intestinal ILC3-derived IL-22 in intermittent fasting-elicited metabolic benefits.
  
  Strengths:
  
  The observation of induction of IL-22 production by intestinal ILC3 is significant, and the scRNAseq provides new information into intestine-resident immune cell profiling in response to repeated fasting and refeeding.
  
  Weaknesses:
  
  The experimental design for some studies needs to be improved to enhance the rigor of the overall study. There is a lack of direct evidence showing that the metabolically beneficial effects of IF are mediated by intestinal ILC3 and their derived IL-22. The mechanism by which IL-22 induces a thermogenic program is unknown. The browning effect induced by IF may involve constitutive activation of lipolysis, which was not considered.
  
  Comment 1: Lack of direct evidence showing that IL-22-expressing ILC3s in intestine is the key contributor to intermittent fasting (IF)-mediated elevation of circulating IL-22 levels. The fraction of IL-22-expressing cells was increased threefold by IF but the increase in circulating IL-22 is moderate (Figs. 1J and 1K).
  
  Reply: IL-22 in circulation is subjected to clearance, degradation, and binding with plasma proteins, et al. Thus, circulating levels of IL-22 may be much lower than the amount secreted by the intestinal IL-22 positive ILC3s.
  
  Comment 2: The loss of fat mass by IF suggests that the active lipolysis may explain the white fat browning which was not considered. This may apply to the observations in IL-22 treated mice as well as IL-22R KO mice.
  
  Reply: We analyzed the expression of genes relate to lipolysis in NCD and NCD-IF mice and found that IF did not alter the levels of these genes in white adipose tissues (Supplementary figure 2D). We have addressed this concerns in lines 119, page 6.
  
  Author response image 2.
  
  Comment 3: IL-22 administration and adoptive transfer of ILC3 had no significant effect on body weight. Not clear how IL-22 improves insulin sensitivity in this case.
  
  Reply: Our results are consistent with previous report showing that IL-22 administration improves insulin sensitivity without change in body weight (Qi X, et al. Nat Med. 2019). In addition, previous studies have demonstrated that IL-22 can increase Akt phosphorylation in muscle, liver and adipose tissues, leading to improvement in insulin sensitivity (Wang X, et al. Nature. 2014). We have addressed this potential mechanism in lines192-195, page 9.
  
  Comment 4: The energy expenditure data look unusual given that there was little increase in oxygen consumption during dark cycle compared to light cycle (Fig.3).
  
  Reply: The not so obvious difference in oxygen consumption between dark cycle and light cycle may be due to the technical problem of the system.
  
  Comment 5: The thermogenic capacity for the whole fat pad needs to consider the expression of UCP1 in certain amount of tissue and the total mass for each individual animal because the mRNA level itself does not reflect the whole tissue capacity.
  
  Reply: We used the whole subcutaneous adipose tissue from one side for qPCR to reflect the whole tissue capacity.
  
  Comment 6: The design of studies for the adoptive transfer of ILC3 was concerned. The PBS is not a good control for the group with ILC3 cells (Figs. 2A-2H). Similar issue applies for the co-culture study in which beige only is not an ideal control for Beige+ILC3 (Figs. 2I-2J).
  
  Reply: We agree with the reviewer that the PBS is not a good control. Because we cannot find a similar immune cell without any effect on adipocytes, we designed this experiment based on other studies in which saline or PBS are used as ILC transfer experiment controls (Sasaki T, et al. Cell Rep. 2019; Wang H, et al. Nat Commun. 2019)
  
  Comment 7: The induction of thermogenesis by IL-22 treatment may be related to enhanced differentiation rather than direct activation of thermogenic genes (Figs. 4G and 4H).
  
  Reply: Our observation that IL-22 did not alter the levels of genes related to adipogenesis (Supplemental figure 6F) indicates that IL-22 may not alter the differentiation of adipocytes. We addressed this concern in Lines 211-212, page 10.
  
  Reviewer #3:
  
  Chen et al. investigated how intermittent fasting causes metabolic benefits in obese mice and found that intestinal ILC3 and IL-22-IL-22R signaling contribute to the beiging of white adipose tissue (WAT) and consequent metabolic benefits including improved glucose and lipid metabolism in diet-induced obese mice. They demonstrate that intermittent fasting causes increased IL22+ILC3 in small intestines of mice. Adoptive transfer of purified intestinal ILC3 or administration of exogenous IL-22 can lead to increases in UCP1 gene expression and energy expenditure as well as improved glucose metabolism. Importantly, the above metabolic benefits caused by intermittent fasting are abolished in IL-22R-/- mice. Using an in vitro experiment, the authors show that ILC3derived IL-22 may directly act on adipocytes to promote SVF beige differentiation. Finally, by performing sc-RNA-seq analysis of intestinal immune cells from mice with different treatments, the authors indicate a possible way of intestinal ILC3 being activated by intermittent fasting. Overall, this study provides a new mechanistic explanation for the metabolic benefits of intermittent fasting and reveals the role of intestinal ILC3 in the enhancement of the whole-body energy expenditure and glucose metabolism likely via IL-22-induced beige adipogenesis.
  
  Although this study presents some interesting findings, particularly IL-22 derived from intestinal ILC3 could induce beiging of WAT by directly acting on adipocytes, the experimental data are not sufficient to support the key claims in the manuscript.
  
  Comment 1: Only increased UCP1 expression on mRNA level is not enough to support the beiging of WAT. More methods such as western blotting and immunostaining of UCP1 in WAT are needed to confirm the enhanced beige adipogenesis.
  
  Reply: Additional experiments have been performed to measure the UCP1 protein by Western blot. The data is included in Figure 4I and Supplementary Figure 2E.
  
  Comment 2: IL-22 is known to modulate metabolic pathways via multiple downstream functions. The use of whole-body knockout of IL-22R could not exclude the indirect effect on the promotion of beiging of WAT. Specific deletion of IL-22R in adipose tissues is therefore needed to confirm the direct effect of IL-22 on adipocytes which is suggested by the in vitro study.
  
  Reply: We agreed with the reviewer that specific deletion of IL-22R in adipose tissues is critical to confirm the direct effect of IL-22 on adipocytes. We will generate the AdioQ-IL-22R-/- mice to test this concept further in vivo.
  
  Comment 3: The authors failed to show the cellular distribution of IL-22R in adipose tissues. This is important because the mechanism that explains the increased beige adipogenesis could be different based on the expression of IL-22R in adipose progenitor cells or mature adipocytes. So it is not appropriate to conclude that "IL-22 then directly activates IL-22R on adipocytes, leading to subsequent induction of beiging of white adipose tissue" in line 407. Additionally, Oil red O staining is needed for Fig 4G and Fig 5J, and protein levels of UCP1 and adipogenesis-related markers are needed to evaluate beige fat differentiation and the whole adipogenesis.
  
  Reply: Per suggestion, we have added the expression of IL-22R in adipose progenitor cells or mature adipocytes (Supplementary Figure 6E). In addition, protein levels of UCP1 and adipogenesis-related markers to evaluate the whole adipogenesis (Figure 4I, Supplementary figure 6F) are now included. We have also addressed this issue in lines 207-215, page 10.
  
  Comment 4: Although the authors provided some hypothesis about how intermittent fasting increases IL-22+ILC3 in small intestines by sc-RNA-seq analysis, some functional assays are needed to identify the factors, for example, how about the levels of macrophage-derived IL-23 or AHR ligands in small intestines and whether they contribute to increased percentages of intestinal IL-22+ILC3 following intermittent fasting.
  
  Reply: We used flow cytometry sorting of macrophages combined with qPCR experiments to preliminarily demonstrate that intermittent fasting increases the expression of molecules such as Cd44 and CCl4 (Supplementary Figure 10B), which may contribute to the increase in the proportion of IL-22+ ILC3s in the intestine under intermittent fasting. Our observation that IL-23 mRNA levels were not changed indicates that this molecule may not the major contributor for the communication between macrophage and ILC3s. Other potential molecules such as AHR ligands remain to be explored.
  
  Comment 5: What are the differences between adipose ILC3 and intestinal ILC3? Why do transferred ILC3 only migrate to the small intestine but not WAT of recipient mice? It would be better to examine or at least discuss whether other factors from intestinal ILC3 may also contribute to beiging of WAT following intermittent fasting.
  
  Reply: Intestinal ILC3s specifically express gut homing receptors CCR7, CCR9, and α4β7 (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). This may explain transplantation of intestinal ILC3s can migrate mainly to the intestine instead of adipose tissue (PMID: 34625492). The proportion of ILC3s in adipose tissue of mice is very small. Their functions have not been clarified yet. We have addressed this issue in lines 156-158, page 8.
  
  There are some other factors from intestinal ILC3 which may also contribute to beiging of WAT following intermittent fasting. By secreting IL-22, ILC3 enhanced the intestinal mucosal barrier, leading to reduction of the influx of LPS and PGN into the bloodstream under high-fat diet conditions, and subsequent increase in the beiging of white adipose tissue (Chen H, et al. Acta Pharm Sin B. 2022). We have addressed this potential mechanism in lines 344-347, page 16.
  
  Comment 6: The sensitivity of the IL-22 ELISA kit used in the manuscript was 8.2 pg/mL, according to the information from the methods, however, in Fig. 1J and Fig. 2B, the IL-22 levels in mouse plasma were lower than 6 pg/mL, which was below the sensitivity of the ELISA kit and also the assay range. Please explain.
  
  Reply: We have double-checked the original data and found that we have made a mistake in calculating the concentration of IL-22. We have corrected this error (Fig. 1J, Fig. 2B).
  
  Comment 7: In Fig 7A, the significance of the Hypothesis testing should be marked. In Fig 7F and 7G, the contrast between the two groups is not apparent, other comparing ways could be used to enhance the readability.
  
  Reply: Per suggestion, we have marked the significance of the hypothesis testing between HFD vs NCD and HFD-IF vs HFD in Fig7A. Shown in Fig 7F and 7G are the top 20 enriched interacting proteins between different cell types. The dot plot displays the average expression level and significance of protein interactions in cell types.
  
  Comment 8: The total food intake of fasting mice fed with NCD or HFD was less than those without fasting, and the food intake rate the author showed in Fig S1 represents the value that was normalized to body weight. So the author should describe it precisely In line 114.
  
  Reply: We have revised the statement accordingly in line 114-115.
  
  Comment 9: Western blotting analysis has been described in methods, however, there is no corresponding experimental data in the result part.
  
  Reply: The Western blotting results are now included.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.29.555436v2
www.biorxiv.org www.biorxiv.org

New submission 30/06/2023, 08:47:48

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  After thoroughly reviewing the comments and suggestions provided by the reviewers, we have revised our manuscript. We sincerely appreciate the reviewers' constructive approach and valuable feedback. We believe that the edited version of the manuscript is now more comprehensible and reader-friendly. Please find our responses to the comments below.
  
  Reviewer #1 (Public Review):
  
  This EEG study probes the prediction of a mechanistic account of P300 generation through the presence of underlying (alpha) oscillations with a non-zero mean. In this model, the P300 can be explained by a baseline shift mechanism. That is, the non-zero mean alpha oscillations induce asymmetries in the trial-averaged amplitudes of the EEG signal, and the associated baseline shifts can lead to apparent positive (or negative) deflections as alpha becomes desynchronized at around P300 latency. The present paper examines the predictions of this model in a substantial data set (using the typical P300-generating oddball paradigm and careful analyses). The results show that all predictions are fulfilled: the two electrophysiological events (P300, alpha desynchronization) share a common time course, anatomical sources (from inverse solutions), and covariations with behaviour; plus relate (negatively) in amplitude, while the direction of this relationship is determined by the non-zero-mean deviation of alpha oscillations pre-stimulus (baseline shift index, BSI). This is indicative of a tight link of the P300 with underlying alpha oscillations through a baseline shift account, at least in older adults, and hence that the P300 can be explained in large parts by non-zero mean brain oscillations as they undergo post-stimulus changes.
  
  Specific comments
  
  1) The baseline shift model predicts an inverse temporal similarity between alpha envelope changes and P300, confirmed over posterior regions (negative maxima over Pz, Fig 2B). It is therefore intriguing to see in this Figure a very high (positive) correlation in left frontal electrodes. I acknowledge that this is covered in the discussion, but given that this is somewhat unexpected at this point, I suggest providing the readers with a pointer in the Figure legend to this observation and the discussion. Also, I would recommend being more careful with the discussion of this left frontal positive correlation, where a "negative P300" over these areas is mentioned. Given the use of average-referenced sensor data (as opposed to source localized data) and the clear posterior localization of the P300 (Fig 4A), it is likely that what is picked up as "negative ERP potential" over left frontal sites is the posterior P300 forward-projected and inverted through the calculation of the average reference. Accordingly, the interpretation in terms of polarity (positive) of the correlation is likely misleading but what this observation seems to suggest is that other oscillatory processes (than posterior alpha) (e.g. of motor preparation during evidence accumulation) do substantially correlate with the posterior P300 build-up.
  
  We agree that the name P300 should be used rather for positive potential over posterior sites. We edited the text, substituting mentions of “negative P300” for “negative ER”. Also, the following text has been added to the legend of Figure 2:
  
  “Note the positive correlation between the low-frequency signal and the alpha amplitude envelope over central sites. Due to the negative polarity of ER over the fronto-central sites, such correlation may still indicate a temporal relationship between the P300 process and oscillatory amplitude envelope dynamics (due to the use of a common average reference). However, it cannot be entirely excluded that additional lateralized response-related activity contributes to this positive correlation (Salisbury et al., 2001).”
  
  2) Parts of the conclusions are based on a relationship between alpha-amplitude modulation and size of P300-amplitude (amplitude-amplitude) using data binning (illustrated in Fig 3) and the bins seem to include different participants, rather than trials. As this is an analysis of EEG data, I wonder how much of this relationship can be explained by a confound of skull thickness (or other individual differences in anatomy picked up with the scalp measures such as gyral folding patterns and current source orientations etc). E.g. those with thicker/thinner skulls are expected to show less/more of a modulation in all signals. This could be ruled out by relating the bins in alpha modulation not to the P300 but to another event that does not coincide in time with the alpha changes (e.g. P100), where no changes across bins would be expected.
  
  We are grateful for the suggestions on confound estimation. We repeated the analysis of binning of alpha rhythm amplitude normalised change in relation to early ER, which in our auditory paradigm was N100. The largest change in the alpha amplitude occurs later in the poststimulus window, but that does not necessarily mean that the activity in the window right after the stimulus onset is unaffected. As can be seen in Figure 3 (t-statistics between alpha bins), there is already a significant difference around 100 ms over the central regions of the scalp. For this plot, the broadband data was filtered from 0.1 to 3 Hz, thus assessing only changes in low-frequency signals. We repeated the same analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz, these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). Importantly, this range (4–45 Hz) includes the frequency of N100, which is typically in the alpha range. It means that the differences in N100 are riding on top of the baseline shift created by an unfolding alpha amplitude decrease. When this low-frequency baseline shift was removed, significant differences were no longer visible. This is an indication that differences in P300 amplitude between alpha bins are restricted to the low-frequency range and are not propagated to other ERs with higher frequency content.
  
  We added Figure S5 to the Supplementary material and introduced it in the main text, the Results section, as follows:
  
  “The cluster within the earlier window (100–200 ms) over central regions (Figure 3C) possibly reflects the previously shown effect of prestimulus alpha amplitude on earlier ERs (Brandt et al., 1991, Babiloni et al., 2008) but may also be a manifestation of BSM. We tested this assumption for early ER, which in our auditory task was N100. We repeated the binning analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz (the range that includes the frequency of N100 but not low-frequency baseline shifts), these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). It means that the difference in N100 amplitudes over frontal sites is driven by the baseline shift created by an unfolding alpha amplitude decrease. The significant difference at the TP9 electrode possibly reflects a genuine physiological effect of alpha rhythm amplitude on the excitability of a neuronal network and, as a consequence, on the amplitude of ER (as opposed to the baseline-shift mechanism, where the alpha rhythm doesn’t affect the amplitude of ER but creates an additional component of ER; Iemi et al. 2019).”
  
  3) Related to the above: I assume it can be ruled out that the relationship between baseline-shift index and P300 amplitude (also determined through binning, Fig 6) could be influenced by the above-mentioned confounds, given the inverse relationship?
  
  As in previous studies alpha rhythm power was found to depend on the size of the head (Candelaria-Cook et al., Cerebral Cortex, 2022), we agree that the contribution of this confounding factor should be estimated (and we did estimate it). However, we would like to point out that we looked into dependencies based on ratios, which eliminates absolute units potentially being affected by head size, skull thickness, etc. For instance, the baseline-shift index is estimated as the Pearson correlation coefficient between the alpha rhythm envelope and low-frequency signal during the resting state. Therefore, multiplying the alpha amplitude envelope by an arbitrary scale would not cause the correlation to change. Nonetheless, for a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. For each electrode, we computed the Pearson correlation between the variable of interest and total intracranial volume. Variables of interest were the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised amplitude (computed as ), and the magnitude of the baseline shift index (BSI). The p-value was set at Bonferroni corrected 0.05. For P300, only one electrode, namely C4, demonstrated a significant correlation of –0.10. However,the C4 electrode is outside of the typical electrode range for P300. For alpha envelope amplitude, significant correlations were observed all over the head (19 out of 31 electrodes, maximum at Cz), and a larger total intracranial volume was related to a higher amplitude of alpha rhythm.
  
  Candelaria-Cook et al. (Cerebral Cortex, 2022) showed a similar association in longitudinal data from children and adolescents, but the increase in alpha rhythm power in that study might have been due to additional factors beyond a growing head. Conversely, normalised alpha amplitude showed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, only alpha amplitude shows a prominent correlation to total brain volume, thus reducing the concern that head size may be a confound.
  
  4) This study is based on a sample of older participants. One wonders to what extent this is needed to reveal the alpha-P300 relationships (e.g. more variability in this population than in younger controls), and/or whether other mechanisms may be at play across the lifespan.
  
  Our study is indeed based on a sample of older participants. However, in our previous study (Studenova et al., PLOS Comp Bio, 2022), we compared young and elderly participants using resting-state data. There, we measured the baseline-shift index (BSI) at rest, and BSI serves as a proxy for baseline shifts present in the task-based data (under the assumptions of the baseline-shift mechanism, ER is in essence a baseline shift). We found that BSIs for elderly participants were smaller in comparison to those for young participants. Yet, the distribution of BSI values across the scalp (as in Figure 6A) was similar between the two age groups.
  
  Additionally, we observed that larger alpha rhythm power was positively correlated with the magnitude of BSI, but only for younger participants, which points out possible difficulties arising from the fact that elderly people have reduced alpha power. Therefore, we believe that for a sample of young participants, the results should not be different.
  
  5) Legend to Figure 6: sentence under A: "A positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude, a case that corresponds to negative mean oscillations." I find this sentence at this place in the legend confusing, as Fig 6A seems to illustrate the BSI only (not yet any relationship?).
  
  We expanded the text in the legend with this paragraph:
  
  “BSI serves as a proxy for the relation between ER polarity and the direction of alpha amplitude change (Nikulin et al., 2010). Here, we observe predominantly negative BSIs (and thus negative mean oscillations) at posterior sites, which indicates the inverted relation between P300 and alpha amplitude change. Indeed, in the task data, a positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude.”
  
  6) Page 4: repetition of "has been" "has been" one after each other in the text We are thankful for this catch. We removed the repetition.
  
  Reviewer #2 (Public Review):
  
  The authors attempt to show that event-related changes in the alpha band, namely a decrease in alpha power over parieto/occipital areas, explain the P300 during an auditory target detection task. The proposed mechanism by which this happens is a baseline-shift, where ongoing oscillations which have a non-zero mean undergo an event-related modulation in amplitude which then mimics a low frequency event-related potential. In this specific case, it is a negative-mean alpha-band oscillation that decreases in power post-stimulus and thus mimics a positivity over parieto-occipital areas, i.e. the P300. The authors lay out 4 criteria that should hold if indeed alpha modulation generates the P300, which they then go about providing evidence for.
  
  Strengths:
  
  The authors do go about showing evidence for each prediction rigorously, which is very clearly laid out. In particular, I found the 3rd section connecting resting-state alpha BSI to the P300 quite compelling.
  
  The study is obviously very well-powered.
  
  Very well-written and clearly laid out. Also, the EEG analysis is thorough overall, with sensible analysis choices made.
  
  I also enjoyed the discussion of the literature, albeit with certain strands of P300 research missing.
  
  Weaknesses:
  
  In general, if one were to be trying to show the potential overlap and confound of alpha-related baseline shift and the P300, as something for future researchers to consider in their experimental design and analysis choices, the four predictions hold well enough. However, if one were to assert that the P300 is "generated" via alpha baseline shift, even partially, then the predictions either do not hold, or if they do, they are not sufficient to support that hypothesis. This general issue is to be found throughout the review. I will briefly go through each of the predictions in turn:
  
  1) The matching temporal course of alpha and P300 is not as clear as it could be. Really, for such a strong statement as the P300 being generated by alpha modulation, one would need to show a very tight link between the signals temporally. There are many neural and ocular signals which occur over the course of target detection paradigms: P300, alpha decrease, motor-related beta decrease, the LRP, the CNV, microsaccade rate suppression etc. To specifically go above and beyond this general set of signals and show a tighter link between alpha and P300 requires a deeper comparison. To start, it would be a good idea to show the signals overlapping on the same plot to really get an idea of temporal similarity. Also, with the P300-alpha correlation, how much of this correlation is down to EEG-related issues such as skull thickness, cortical folding, or cognitive issues such as task engagement? One could perhaps find another slow wave ERP, e.g. the Lateralised Readiness Potential, and see if there is a similar strength correlation. If there is not, that would make the P300 relationship stand out.
  
  Thank you for this comment. In our study, we outline the prerequisites for the baseline-shift mechanism (BSM) and show how they hold for the obtained data. Overall, for all the prerequisites, the evidence could be found in favour of BSM. However, as it is the case for all EEG/MEG data, the non-invasive nature of the data puts constraints on the interpretation of the results. In order to specifically address the points raised by the reviewer about the results, we provide additional information about the overlap (Figure 2) and non-specific anatomical parameters.
  
  The baseline-shift mechanism makes a general prediction about the generation of some ERs (those that coincide with a change in oscillatory amplitudes). The fact that neuronal oscillations (especially alpha oscillations) are modulated in almost any task indicates that other ERs can also contain a contribution from the baseline-shift mechanism. In our study, it is plausible that several sources of alpha oscillations orchestrated several ER components that appeared on the scalp after the presentation of a target stimulus. Due to the substantial spatial mixing and temporal overlap, it is difficult to disentangle the processes indexing perceptual, memory, or motor functions. However, currently, we are working on showing that the readiness potential (movement related potential) in the classical Libet’s paradigm also complies with the baseline-shift mechanism.
  
  Concerns about confounds such as skull thickness are valid; therefore, we performed additional analysis. For a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. We tested the correlation between total intracranial volume and several variables of interest: the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised change, and the magnitude of the baseline shift index (BSI). For P300 amplitude, only the C4 electrode showed a significant correlation of –0.10. For alpha envelope amplitude, there were significant correlations all over the head (19 out of 31 electrodes, maximum at Cz). The correlations showed that a larger total intracranial volume was related to a higher amplitude of alpha rhythm. For a normalised change in alpha amplitude, we observed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, alpha amplitude indeed shows a prominent correlation to total brain volume, but none of the relational variables (normalised amplitude change, BSI) show any correlation.
  
  In Figure 3, it is clear that alpha binning does not account for even 50% of the variance of P300 amplitude. Again, if there is such a tight link between the two signals, one would expect the majority of P300 variance to be accounted for by alpha binning. As an aside, the alpha binning clearly creates the discrepancy in the baseline period, with all alpha hitting an amplitude baseline at approx. 500ms. I wonder if could you NOT, in fact, baseline your slow wave ERP signal, instead using an appropriate high pass filter (see "EEG is better left alone", Arnaud Delorme, 2023) and show that the alpha binning creates the difference in ERP at the baseline which then is reinterpreted as a P300 peak difference after baselining.
  
  The difference in the baseline window for alpha rhythm amplitude is indeed prominent (Figure R1A,B), so we proceed with the suggested analysis. Before anything else, we would like to reiterate that the baseline correction per se does not generate ER; it just moves the whole curve (in the pre- and poststimulus intervals) up and down. Firstly, we repeated the analysis without baseline correction (filter 0.1–3 Hz) and still observed the difference in P300 amplitude across bins (Figure R1D). Moreover, based on cluster-based permutation testing, ERs in the two most extreme bins were not significantly different in the prestimulus window. However, when we opt for no baseline correction, there will still be a baseline, namely, the average of the signal will be zero within a filtering window (e.g., 10 sec for a high-pass filter at 0.1 Hz). Thus, secondly, we computed an ER but with the baseline in the poststimulus window (400–600 ms; Figure R1E). In this case, the difference between bin 1 and bin 5 (for the prestimulus interval) in the window before 0 ms was significant in the posterior regions. The differences in the baseline are perceived as being smaller than the differences in alpha amplitude. This can be attributed to the fact that there are other low-frequency processes in the EEG signal that are different from alpha baseline shifts. Additionally, P300 in bin 1 in comparison with P300 in bin 5 is significantly different in shape (Figure R1C). This can be an indication of overlapping components; namely, for bin 5 (where alpha amplitude change is the highest), associated baseline shift dominates, and for bin 1 (where alpha amplitude change is the smallest), associated baseline shift is hidden behind other components. We believe that this proposed analysis demonstrates the intuition behind the baseline-shift mechanism: the baseline shift is generated due to a change in the oscillatory amplitude; and the change is simply the difference between two time points.
  
  Author response image 1.
  
  The difference in the strength of alpha amplitude modulation correlates with the difference in P300 amplitude. A. The alpha rhythm amplitude was binned according to the percentage of change. The bins were the following: (66, –25), (–25, –37), (–37, –47), (–47, –58), (–58,–89) % change. A is identical to Figure 3A, main text. B. The alpha rhythm amplitude is multiplied by –1 and evened within the prestimulus window. This may be an approximation for baseline shifts in the low-frequency signal. C. P300 responses are sorted into the corresponding bins. The C is identical to Figure 3B, main text. D. P300 are obtained without applying a baseline correction and are sorted into the corresponding bins. The difference in peak amplitude of P300 remains visible and significant. E. P300 is baselined at 400–600 ms. As a consequence, there are significant differences in the prestimulus window.
  
  2) The topographies are somewhat similar in Figure 4, but not overwhelmingly so. There is a parieto-occipital focus in both, but to support the main thesis, I feel one would want to show an exact focus on the same electrode. Showing a general overlap in spatial distribution is not enough for the main thesis of the paper, referring to the point I make in the first paragraph re Weaknesses. Obviously, the low density montage here is a limitation. Nevertheless, one could use a CSD transform to get more focused topographies (see https://psychophysiology.cpmc.columbia.edu/software/csdtoolbox/), which apparently does still work for lower-density electrode setups (see Kayser and Tenke, 2006).
  
  As we mentioned in our provisional response, we believe that we would not benefit from using CSD. First, the CSD transform is a spatial high-pass filter, and, hence, it is commonly used for spatially localised activities. In our case, we have two activities—P300 and alpha amplitude decrease—that are widespread with low spatial frequency, and we believe that applying CSD is not helpful. Second, CSD is more sensitive to surface sources that emanate from the crowns of gyri. For activity in the P300 window, there is a possibility that sources are localised within the longitudinal fissure. Third, as we completely agree that low density montage is a limitation, we used source reconstruction with eLoreta (Figure 5) to clarify the spatial localisation of the potential source of P300 and alpha amplitude change, which indeed shows a considerable spatial overlap.
  
  3) Very nice analysis in Figure 6, probably the most convincing result comparing BSI in steady state to P300, thus at least eliminating task-related confounds.
  
  4) Also a good analysis here, wherein there seem to be similar correlation profiles across P300 and alpha modulation. One analysis that would really nail this down would be a mediation analysis (Baron and Kenny, 1986; https://davidakenny.net/cm/mediate.htm), where one could investigate if e.g. the relationship between P300 amplitude and CERAD score is either entirely or partially mediated by alpha amplitude. One could do this for each of the relationships. To show complete mediation of P300 relationship with a cog task via alpha would be quite strong.
  
  We agree that mediation analysis better suits the purpose of our claim. We added this analysis to the edited version of the manuscript. Additionally, we became concerned that the total alpha power effect may be driving the correlation. Therefore, we used alpha amplitude change in percentage instead of the absolute values of the amplitude. Significant mediation was present only for attention and executive scores.
  
  In the updated version of the manuscript, the Methods section reads as follows:
  
  “The correlation between cognitive scores (see Methods/Cognitive tests) and the amplitude and latency of P300 and alpha oscillations was calculated with linear regression using age as a covariate (R lme4, Bates et al., 2015). To estimate what proportion of the correlation between P300 and cognitive score is mediated by alpha oscillations, we used mediation analysis (Baron et al., 1986; R mediation, Tingley et al, 2014). First, we estimated the effect of P300 on the cognitive variable of interest (total effect, cogscore ~ P300+age). Second, we computed the association between P300 and alpha oscillations (the effect on the mediator, alpha ~ P300). Third, we run the full model (the effect of the mediator on the variable of interest, cogscore ~ P300+alpha+age). Lastly, we estimated the proportion mediated.”
  
  The Results section reads as follows:
  
  “Stimulus-based changes in brain signals are thought to reflect cognitive processes that are involved in the task. A simultaneous and congruent correlation of P300 and alpha rhythm to a particular cognitive score would be another evidence in favour of the relation between P300 and alpha oscillations. Moreover, if thus found, the correlation directions should correspond to the predictions according to BSM. Along with the EEG data, in the LIFE data set, a variety of cognitive tests were collected, including the Trail-making Test (TMT) A&B, Stroop test, and CERADplus neuropsychological test battery (Loeffler et al., 2015). From the cognitive tests, we extracted composite scores for attention, memory, and executive functions (Liem et al., 2017, see Methods/Cognitive tests) and tested the correlation between composite cognitive scores vs. P300 and vs. alpha amplitude modulation. The scores were available for a subset of 1549 participants (out of 2230), age range 60.03–80.01 years old. Cognitive scores correlated significantly with age (age and attention: −0.25, age and memory: −0.20, age and executive function: −0.23). Therefore, correlations between cognitive scores and electrophysiological variables were evaluated, regressing out the effect of age. To rule out the possibility of a absolute alpha power association with cognitive scores, for this analysis, we used alpha amplitude normalised change computed as , where 𝐴 𝑝𝑜𝑠𝑡 is at the latency of strongest amplitude decsease. Computed this way, negative alpha amplitude change would correspond to a more pronounced decrease, i.e., stronger oscillatory response.
  
  To increase the signal-to-noise ratio of both P300 and alpha rhythm, we performed spatial filtering (see Methods/Spatial filtering, Figures 7B,C). Following this procedure, both P300 and alpha latency, but not amplitude, significantly correlated with attention scores (Figure 7A, left column). Larger latencies were related to lower attentional scores, which corresponded to a longer time-to-complete of TMT and Stroop tests and hence poorer performance. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.12. Memory scores were positively related to P300 amplitude and negatively to P300 latency (Figure 7A, middle column). The direction of correlation is such that higher memory scores, which reflected more recalled items, corresponded to a higher P300 amplitude and an earlier P300 peak. The association between alpha rhythm parameters and memory scores is not significant, but it goes in the same direction as the association for P300. Executive function (Figure 7A, right column) were related significantly to both P300 and alpha amplitude latencies. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.14. Overall, the direction of correlation is similar for P300 and alpha oscillations, as expected for BSM. Moreover, the direction of correlation is consistent across cognitive functions.
  
  And an additional paragraph in the Discussion:
  
  “The mediation analysis showed that the modulation of alpha oscillations only partially explained the correlation between P300 and cognitive variables. This, in general, corresponds to the idea that not the whole P300 but only its fraction can be explained by the changes in the alpha amplitudes. Figure 5 shows that alpha oscillations change not only in the cortical areas where P300 is generated; therefore, we cannot expect a complete correspondence between the two processes. Moreover, since cognitive tests and EEG recordings were performed at different time points, the associations between the cognitive variables and EEG markers are expected to be rather weak and to reflect only some neuronal processes common to P300, alpha rhythm, and tasks. For these reasons, a complete mediation of one EEG variable through another EEG variable in the context of a separate cognitive assessment cannot be expected.”
  
  One last point, from the methods it appears that the task was done with eyes closed? That is an extremely important point when considering the potential impact of alpha amplitude modulation on any other EEG component due to the well-known substantial increase in alpha amplitude with eyes closed versus open. I wonder, would we see any of these effects with eyes opened?
  
  The task was auditory and was indeed conducted in an eyes-closed state. In an eyes-closed state, alpha rhythm amplitude in the occipital regions shows a prominent increase. However, we believe that in our case, it was neither an advantage nor a disadvantage. First, occipital sources of alpha rhythm that demonstrate an increase in amplitude are not likely to be those sources that attenuate as a reaction to a target tone. The source reconstruction of alpha rhythm amplitude change (although with a limited number of channels) displayed widespread regions with a prominent decrease on the posterior midline, including the precuneus and posterior cingulate cortex (which contain polymodal association areas; Leech et al., Brain, 2014; Al-Ramadhani et al., Epileptic Disord, 2021). Second, in our previous study, we tested resting-state data with both eyes-closed and eyes-open conditions. There, we computed the baseline-shift index (BSI), which serves as an approximation for estimating if oscillations have a non-zero mean. We found no significant difference between the eyes-open and eyes-closed states in terms of the absolute value of the BSI. Moreover, the average distribution of BSIs on the scalp was the same for both conditions.
  
  Overall, there is a mix here of strengths of claims throughout the paper. For example, the first paragraph of the discussion starts out with "In the current study, we provided comprehensive evidence for the hypothesis that the baseline-shift mechanism (BSM) is accountable for the generation of P300 via the modulation of alpha oscillations." and ends with "Therefore, P300, at least to a certain extent, is generated as a consequence of stimulus-triggered modulation of alpha oscillations with a non-zero mean." In the limitations section, it says the current study speaks for a partial rather than exhausting explanation of the P300's origin. I would agree with the first part of that statement, that it is only partial. I do not agree, however, that it speaks to the ORIGIN of the P300, unless by origin one simply means the set of signals that go to make up the ERP component at the scalp-level (as opposed to neural origin).
  
  We have edited parts of the manuscript that have overly exuberant claims. However, we would argue further that alpha rhythm amplitude change does partially explain P300 origin. When a stimulus is being processed by the neuronal network, some part of this network presumably breaks from synchronous oscillation mode. Hence, on the scalp, we observe a decrease in oscillatory amplitude. According to the baseline-shift mechanism (BSM), this stimulus-related decrease in the amplitude generates the baseline shift in the frequency range of modulation (under 3 Hz for alpha rhythm). The P300 component that is explained by alpha rhythm amplitude modulation is, in essence, a baseline shift. Therefore, the origin of a part of P300 is the oscillating network that was pushed out of its synchronous oscillating regime.
  
  Again, I can only make these hopefully helpful criticisms and suggestions because the paper is very clearly written and well analysed. Also, the fact that alpha amplitude modulation potentially confounds with P300 amplitude via baseline shift is a valuable finding.
  
  Specific comments:
  
  Perhaps give a brief overview of the task involved at the start. I know it is not particularly relevant, but I think necessary for those unfamiliar with cog tasks.
  
  We added a short description of a task in the Introduction section.
  
  “In this data set, the experimental task was an auditory oddball paradigm. Participants would hear tones, one type of which—the target tone—would occur in only 12% of trials. Target tones elicit both P300 and the modulation of the alpha amplitude. ”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.20.529191v3
www.biorxiv.org www.biorxiv.org

Pinging the Hidden Attentional Priority Map: Suppression Needs Attention

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors tested whether learning to suppress (ignore) salient distractors (e.g., a lone colored nontarget item) via statistical regularities (e.g., the distractor is more likely to appear in one location than any other) was proactive (prior to paying attention to the distractor) or reactive (only after first attending the distractor) in nature. To test between proactive and reactive suppression the authors relied on a recently developed and novel technique designed to "ping" the brain's hidden priority map using EEG inverted encoding models. Essentially, a neutral stimulus is presented to stimulate the brain, resulting in activity on a priority map which can be decoded and used to argue when this stimulation occurred (prior to or after attending to a distracting item). The authors found evidence that despite learning to suppress the high probability distractor location, the suppression was reactive, not proactive in nature.
  
  Overall, the manuscript is well-written, tests a timely question, and provides novel insight into a long-standing debate concerning distractor suppression.
  
  Strengths (in no particular order):
  
  (1) The manuscript is well-written, clear, and concise (especially given the complexities of the method and analyses).
  
  (2) The presentation of the logic and results is mostly clear and relatively easy to digest.
  
  (3) This question concerning whether location-based distractor suppression is proactive or reactive in nature is a timely question.
  
  (4) The use of the novel "pinging" technique is interesting and provides new insight into this particularly thorny debate over the mechanisms of distractor suppression.
  
  Weaknesses (in no particular order):
  
  (1) The authors tend to make overly bold claims without either A) mentioning the opposing claim(s) or B) citing the opposing theoretical positions. Further, the authors have neglected relevant findings regarding this specific debate between proactive and reactive suppression.
  
  (2) The authors should be more careful in setting up the debate by clearly defining the terms, especially proactive and reactive suppression which have recently been defined and were more ambiguously defined here.
  
  (3) There were some methodological choices that should be further justified, such as the choice of stimuli (e.g., sizes, colors, etc.).
  
  (4) The figures are often difficult to process. For example, the time courses are so far zoomed out (i.e., 0, 500, 100 ms with no other tick marks) that it makes it difficult to assess the timing of many of the patterns of data. Also, there is a lot of baseline period noise which complicates the interpretations of the data of interest.
  
  (5) Sometimes the authors fail to connect to the extant literature (e.g., by connecting to the ERP components, such as the N2pc and PD components, used to argue for or against proactive suppression) or when they do, overreach with claims (e.g., arguing suppression is reactive or feature-blind more generally).
  
  We thank the reviewer for their insightful feedback and have made several adjustments to address the concerns raised. To provide a balanced discussion, we tempered our claims about suppression mechanisms and incorporated additional references to opposing theoretical positions, including the signal suppression hypothesis, while clarifying the definitions of proactive and reactive suppression based on recent terminology (Liesefeld et al., 2024). We justified methodological choices, such as the slight size differences between stimuli to achieve perceptual equivalence and the randomization of target and distractor colors to mitigate potential luminance biases. We have revised our figure to enhance figure clarity. Lastly, while our counterbalanced design precluded reliable ERP assessments (e.g., N2pc, PD), we discussed their potential relevance for future research and ensured consistency with the broader literature on suppression mechanisms.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors investigate the mechanisms supporting learning to suppress distractors at predictable locations, focusing on proactive suppression mechanisms manifesting before the onset of a distractor. They used EEG and inverted encoding models (IEM). The experimental paradigm alternates between a visual search task and a spatial memory task, followed by a placeholder screen acting as a 'ping' stimulus -i.e., a stimulus to reveal how learned distractor suppression affects hidden priority maps. Behaviorally, their results align with the effects of statistical learning on distractor suppression. Contrary to the proactive suppression hypothesis, which predicts reduced memory-specific tuning of neural representations at the expected distractor location, their IEM results indicate increased tuning at the high-probability distractor location following the placeholder and prior to the onset of the search display.
  
  Strengths:
  
  Overall, the manuscript is well-written and clear, and the research question is relevant and timely, given the ongoing debate on the roles of proactive and reactive components in distractor processing. The use of a secondary task and EEG/IEM to provide a direct assessment of hidden priority maps in anticipation of a distractor is, in principle, a clever approach. The study also provides behavioral results supporting prior literature on distractor suppression at high-probability locations.
  
  Weaknesses:
  
  (1) At a conceptual level, I understand the debate and opposing views, but I wonder whether it might be more comprehensive to present also the possibility that both proactive and reactive stages contribute to distractor suppression. For instance, anticipatory mechanisms (proactive) may involve expectations and signals that anticipate the expected distractor features, whereas reactive mechanisms contribute to the suppression and disengagement of attention.
  
  This is an excellent point. Indeed, while many studies, including our own, have tried to dissociate between proactive and reactive mechanisms, as if it is one or the other, the overall picture is arguably more nuanced. We have added a paragraph to the discussion on page 19 to address this. At the same time, (for more details see our responses to your comments 3 and 5), we have added a paragraph where we provide an alternative explanation of the current data in the light of the dual-task nature of our experiment.
  
  (2) The authors focus on hidden priority maps in pre-distractor time windows, arguing that the results challenge a simple proactive view of distractor suppression. However, they do not provide evidence that reactive mechanisms are at play or related to the pinging effects found in the present paradigm. Is there a relationship between the tuning strength of CTF at the high-probability distractor location and the actual ability to suppress the distractor (e.g., behavioral performance)? Is there a relationship between CTF tuning and post-distractor ERP measures of distractor processing? While these may not be the original research questions, they emerge naturally and I believe should be discussed or noted as limitations.
  
  Thank you for raising these important points. While CTF slopes have been shown to provide spatially and temporally resolved tracking of covert spatial attention and memory representations at the group level, to the best of our knowledge, no study to date has found a reliable correlation between CTFs and behavior. Moreover, the predictive value of the learned suppression effect, while also highly reliable at the group level, has been proven to be limited when it comes to individual-level performance (Ivanov et al. 2024; Hedge et al., 2018). Nevertheless, based on your suggestion, we explored whether there was a correlation between the averaged gradient slope within the time window where the placeholder revived the memory representation and the average distance slope in reaction times for the learned suppression effect. This correlation was not significant (r = .236, p = 0.267), which, considering our sample size and the reasons mentioned earlier, is not particularly surprising. Given that our sample size was chosen to measure group level effects, we decided not to include individual differences analysis it in the manuscript.
  
  Regarding the potential link between the CTF tuning profile and post-distractor ERP measures like N2pc and Pd, our experimental design presented a specific challenge. To reliably assess lateralized ERP components like N2pc or Pd the high probability location must be restricted to static lateralized positions (e.g., on the horizontal midline). Our counterbalanced design (see also our response to comment 9 by reviewer 1), which was crucial to avoid bias in spatial encoding models, precluded such a targeted ERP analysis.
  
  (3) How do the authors ensure that the increased tuning (which appears more as a half-split or hemifield effect rather than gradual fine-grained tuning, as shown in Figure 5) is not a byproduct of the dual-task paradigm used, rather than a general characteristic of learned attentional suppression? For example, the additional memory task and the repeated experience with the high-probability distractor at the specific location might have led to longer-lasting and more finely-tuned traces for memory items at that location compared to others.
  
  Thank you for raising these important points. Indeed, a unique aspect of our study that sets it apart from other studies, is that the effects of learned suppression were not measured directly via an index of distractor processing, but rather inferred indirectly via tuning towards a location in memory. The critical assumption here, that we now make explicit on page 18, is that various sources of attentional control jointly determine the priority landscape, and this priority landscape can be read out by neutral ping displays. An alternative however, as suggested by the reviewer, is that memory representations may have been sharper when they remembered location was at the high probability distractor location. We believe this is unlikely for various reasons. First, at the behavioral level there was no evidence that memory performance differed for positions overlapping high and low probability distractor locations (also see our response to reviewer 3 minor comment 4). Second, there was no hint whatsoever that the memory representation already differed during encoding or maintenance (This is now explicitly indicated in the revised manuscript on page 14), which would have been expected if the spatial distractor imbalance modulated the spatial memory representations.
  
  Nevertheless, as discussed in more detail in response to comment 5, there is an alternative explanation for the observed gradient modulation that may be specific to the dual nature of our experiment.
  
  (4) It is unclear how IEM was performed on total vs. evoked power, compared to typical approaches of running it on single trials or pseudo-trials.
  
  Thank you for pointing out that our methods were not clear. We did not run our analysis on single trials because we were interested in separately examining the spatial selectivity of both evoked alpha power (phase locked activity aligned with stimulus onset) and total alpha power (all activity regardless of signal phase). It is only possible to calculate evoked and total power when averaging across trials. Thus, when we partitioned the data into sets for the IEM analysis, we averaged trials for each condition/stimulus location to obtain a measurement of evoked and total power each condition for each set. This is the same approach used in previous work (e.g. Foster et al., 2016; van Moorselaar et al., 2018).
  
  We reviewed our method section and can see why this was unclear. In places, we had incorrectly described the dimensions of training and test data as electrodes x trials. To address this, we’ve rewritten the “Time frequency analysis”, “Inverted encoding model” sections, and added a new “Training and test data” section. We hope that these sections are easier to follow.
  
  (5) Following on point 1. What is the rationale for relating decreased (but not increased) tuning of CTF to proactive suppression? Could it be that proactive suppression requires anticipatory tuning towards the expected feature to implement suppression? In other terms, better 'tuning' does not necessarily imply a higher signal amplitude and could be observable even under signal suppression. The authors should comment on this and clarify.
  
  We appreciate your highlighting of these highly relevant alternative explanations. In response, we have revised a paragraph in the General Discussion on page 18 to explicitly outline our rationale for associating decreased tuning with proactive suppression. However, in doing so, we now also consider the alternative perspective that proactive suppression might actually require enhanced tuning towards the expected feature to implement suppression effectively.
  
  It's important to note that both of these interpretations – decreased tuning as a sign of suppression and increased tuning as a preparatory mechanism for suppression – diverge significantly from the commonly held model (including our own initial assumptions) wherein weights at the to-be-suppressed location are simply downregulated.
  
  Minor:
  
  (1) In the Word file I reviewed, there are minor formatting issues, such as missing spaces, which should be double-checked.
  
  Thank you! We have now reviewed the text thoroughly and tried our best to avoid formatting issues.
  
  (2) Would the authors predict that proactive mechanisms are not involved in other forms of attention learning involving distractor suppression, such as habituation?
  
  Habituation is a form of non-associative learning where the response to a repetitive stimulus decreases over time. As such, we would not characterize these changes as “proactive”, as it only occurs following the (repeated) exposure to the stimulus.
  
  (3) A clear description in the Methods section of how individual CTFs for each location were derived would help in understanding the procedure.
  
  Thank you. We have now added several sentences on page 27 to clarify how individual CTFs in Figure 3 and distance CTFs in Figure 5 are calculated.
  
  “The derived channel responses (8 channels × 8 location bins) were then used for the following analyses: (a) calculating individual Channel Tuning Functions (CTFs) based on each of the eight physical location bins (e.g., Figure 3C and 3D); (b) grouping responses according to the distance between each physical location and the high-probability distractor location to calculate distance CTFs (e.g., Figure 5); and (c) averaging across location bins to represent the general strength of spatial selectivity in tracking the memory cue, irrespective of its specific location (e.g., Figure 3A and 3B).”
  
  (4) Why specifically 1024 resampling iterations?
  
  Thank you for your question. The statistical analysis was conducted using the permutation_cluster_1samp_test function within the MNE package in Python. We have clarified this on page 25. The choice of 1024 permutations reflects the default setting of the function, which is generally considered sufficient for robust non-parametric statistical testing. This number provides a balance between computational efficiency and the precision of p-value estimation in the context of our analyses.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this experiment, the authors use a probe method along with time-frequency analyses to ascertain the attentional priority map prior to a visual search display in which one location is more likely to contain a salient distractor. The main finding is that neural responses to the probe indicate that the high probability location is attended, rather than suppressed, prior to the search display onset. The authors conclude that suppression of distractors at high-probability locations is a result of reactive, rather than proactive, suppression.
  
  Strengths:
  
  This was a creative approach to a difficult and important question about attention. The use of this "pinging" method to assess the attentional priority map has a lot of potential value for a number of questions related to attention and visual search. Here as well, the authors have used it to address a question about distractor suppression that has been the subject of competing theories for many years in the field. The paper is well-written, and the authors have done a good job placing their data in the larger context of recent findings in the field.
  
  Weaknesses:
  
  The link between the memory task and the search task could be explored in greater detail. For example, how might attentional priority maps change because of the need to hold a location in working memory? This might limit the generalizability of these findings. There could be more analysis of behavioral data to address this question. In addition, the authors could explore the role that intertrial repetition plays in the attentional priority map as these factors necessarily differ between conditions in the current design. Finally, the explanation of the CTF analyses in the results could be written more clearly for readers who are less familiar with this specific approach (which has not been used in this field much previously).
  
  We appreciate the reviewer's valuable feedback and have made significant revisions to address the concerns raised. To clarify the connection between the memory and search tasks, we conducted additional analyses to explore the effects of spatial distance between the memory cue location and the high-probability distractor location on behavioral performance. We also investigated the potential influence of intertrial repetition effects on the observed results by removing trials with location repetitions. To enhance clarity, we revised the explanation of the CTF analyses in the Results section and improved figure annotations to ensure accessibility for readers unfamiliar with this approach. Collectively, these updates further discuss how the pattern of CTF slopes reflect the interplay between memory and search tasks while addressing key methodological and interpretative considerations.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Suggestions/Critiques (in no particular order)
  
  (1) The authors discuss the tripartite model (bottom-up, top-down, and selection history) but neglect recent and important discussions of why this trichotomy might be unnecessarily complicated (e.g., Anderson, 2024: Trichotomy revisited: A monolithic theory of attentional control). Simply put, one of the 3 pillars (i.e., selection history) likely does not fall into a unitary construct or "box"; instead, it likely contains many subcomponents (e.g., reward associations, stimulus-response habit learning, statistical learning, etc.). Since the focus of the current study is learned distractor suppression based on the statistical regularities of the distractor, the authors should comment on which aspects of selection history are relevant, perhaps by using this monolithic framework.
  
  We appreciate the reviewer's insightful suggestion regarding theoretical frameworks of attentional control. While Anderson (2024) proposes a monolithic theory that challenges the traditional tripartite model, our study deliberately maintains a pragmatic approach. The main purpose of our experiment is empirically investigating the mechanisms of learned distractor suppression, rather than adjudicating between competing theoretical models.
  
  We agree that selection history is not a unitary construct but comprises multiple subcomponents, including reward associations, stimulus-response habit learning, and statistical learning. In this context, our study specifically focuses on statistical learning as a key mechanism of distractor suppression. By explicitly acknowledging the multifaceted nature of selection history and referencing Anderson's monolithic perspective, we invite readers to consider the theoretical implications while maintaining our research's primary focus on empirical investigation. To this end, we have modified the manuscript to read (see page 3):
  
  "The present study investigates the mechanisms underlying statistical learning, specifically learned distractor suppression, which represents one critical subcomponent of selection history. While theoretical models like the tripartite framework and the recent monolithic theory (Anderson, 2024) offer complementary perspectives on attentional control, our investigation focuses on empirically characterizing the statistical learning mechanisms underlying learned distractor suppression."
  
  (2) The authors discuss previous demonstrations of location-based and feature-based learned distractor suppression. The authors admit that there have been a large number of studies but seem to mainly cite those that were conducted by the authors themselves (with the exception being Vatterott & Vecera, 2012). For example, there are other studies investigating location-based suppression (Feldmann-Wüstefeld et al., 2021; Sauter et al., 2021), feature-based suppression (Gaspelin & Luck, 2018a; Stilwell et al., 2022; Stilwell & Gaspelin, 2021; Vatterott et al., 2018), or both (Stilwell et al., 2019). The authors do not cite Gaspelin and colleagues at all in the manuscript, despite claiming that singleton-based suppression is not proactive.
  
  We appreciate your pointing out the need for a more comprehensive citation of the literature on learned distractor suppression, particularly with respect to location-based and feature-based suppression. In response to your comment, we have now expanded the reference list on page 4 to include relevant studies that further support our discussion of both location-based and feature-based suppression mechanisms.
  
  (3) The authors use the terms "proactive" and "reactive" suppression without taking into consideration the recent terminology paper, which one of the current authors, Theeuwes, helped to write (Liesefeld et al., 2024, see Figure 8). The terms proactive and reactive suppression need to be defined relative to a time point. The authors need to be careful in defining proactive suppression as prior to the first shift of attention, but after the stimuli appear and reactive suppression as after the first shift of attention and after the stimuli appear. Thus, the critical time point is the first shift of attention. Does suppression occur before or after the first shift of attention? The authors could alleviate this by using the term "stimulus-triggered suppression" to refer to "suppression that occurs after the distractor appears and before it captures attention" (Liesefeld et al., 2024).
  
  Thank you for pointing out that this was insufficiently clear in the previous version. In the revised version we specifically refer to the recent terminology paper on page 5 to make clear that suppression could theoretically occur at three distinct moments in time, and that the present paper was designed to dissociate between suppression before or after the first shift of attention.
  
  (4) Could the authors justify why the circle stimulus (2° in diameter) was smaller than the diamonds (2.3° x 2.3°)? Are the stimuli equated for the area? Or, for width and height? Doesn't this create a size singleton target on half of all trials (whenever the target is a circle) in addition to the lone circle being a shape singleton? Along these lines, could the authors justify why the colors were used and not equiluminant? This version of red is much brighter than this version of green if assessed by a spectrophotometer. Thus, there are sensory imbalances between the colors. Further, the grey used as the ping is likely not equiluminant to both colors. Thus, the grey "ping" is likely dimmer for red items but brighter for green items. Is this a fair "ping"?
  
  Thank you for raising these important points. We chose, as is customary in this experimental paradigm (e.g., Huang et al., 2023; Duncan et al., 2023), to make the diamond slightly larger (2.3° x 2.3°) than the circle (2° in diameter) to ensure a better visual match in overall size appearance. If the circle and diamond stimuli were equated strictly in terms of size (both at 2°), the diamond would appear visually smaller due to the differences in geometric shape. By adjusting the dimensions slightly, we aimed to minimize any unintentional differences in perceptual salience.
  
  As for the colors used in the experiment, the reviewer is right that there might be sensory imbalances between the red and green stimuli, with red appearing brighter than green based on measurements such as spectrophotometry. To ensure that any effects couldn’t be explained by sensory imbalance in the displays, we randomized target and distractor colors across trials, meaning that roughly half the trials had a red distractor and half had a green distractor. This randomization should have mitigated any systematic biases caused by color differences.
  
  We appreciate your feedback and have clarified these points in method section in the revised manuscript on page 22:
  
  "Please note that although the colors were not equiluminant, the target and distractor colors were randomized across trials such that roughly half the trials had a red distractor, and half had a green distractor. This randomization process should help mitigate any systematic biases this may cause."
  
  (5) For the eye movement artifact rejection, the authors use a relatively liberal rejection routine (i.e., allowing for eye movements up to 1.2° visual angle and a threshold of 15 μV). Given that every 3.2 μV deviation in HEOG corresponds to ~ ± 0.1° of visual angle (Lins, et al., 1993), the current oculomotor rejection allows for eye movements between 0.5° and 1.2° visual angle to remain which might allow for microsaccades (e.g., Poletti, 2023) to contaminate the EEG signal (e.g., Woodman & Luck, 2003).
  
  The reviewer correctly points out that our eye rejection procedure, which is the same as in our previous work (e.g., Duncan et al., 2023), still allows for small, but systematic biases in eye position towards the remembered location and potentially towards or away from the high probability distractor location. While we cannot indefinitely exclude this possibility, we believe this is unlikely for the following reasons. First, although there is a link between microsaccades and covert attention, it has been demonstrated that subtle biases in eye position cannot explain the link between alpha activity and the content of spatial WM (Foster et al., 2016, 2017). Specifically, Foster et al. (2017) found no evidence for a gaze-position-related CTF, while an analysis on that same data yielded clear target related CTFs. Similarly, within the present data set there was no evidence that the observed revival induced by the ping display could be attributed to systematic changes in gaze position, as a multivariate cross-session decoding analysis with x,y positions from the tracker did not yield reliable above-chance decoding of the location in memory.
  
  Author response image 1.
  
  (6) The authors claim that "If the statistically learned suppression was spatial-based and feature-blind, one would also expect impaired target processing at the high-probability location." (p. 7, lines 194-195). Why is it important that suppression is feature-blind here? Further, is this a fair test of whether suppression is feature-blind? What about inter-trial priming of the previous trial? If the previous trial's singleton color repeated RTs might be faster than if it switched. In other words, the more catastrophic the interference (the target shape, target color, distractor shape, distractor color) change between trials, the more RTs might slow (compared with consistencies between trials, such that the target and distractor shapes repeat and the target and distractor colors repeat). Lastly, given the variability across both the shape and color dimensions, the claim that this type of suppression is feature-blind might be an artifact of the design promoting location-based instead of feature-based suppression.
  
  Thank you for raising this point. In the past we have used the finding that learned suppression was not specific to distractors, but also generalized to targets to argue in favor of proactive (or stimulus triggered) suppression. However, we agree that given the current experimental parameters it may be an oversimplification to conclude that the effect was feature-blind based on the impaired target processing as observed here. As this argument is also not relevant to our main findings, we have removed this interpretation and simply report that the effect was observed for both distractor and targets. Nevertheless, we would like to point out that while inter-trial priming could influence reaction times, the features of both target and distractors (shape and color) were randomly assigned on each trial. This should mitigate consistent feature repetitions effects. Additionally, previous research has demonstrated that suppression effects persist even when immediate feature repetitions are controlled for or statistically accounted for (e.g., Wang & Theeuwes 2018 JEP:HPP; Huang et al., 2021 PB&R).
  
  (7) The authors should temper claims such as "suppression occurs only following attentional enhancement, indicating a reactive suppression mechanism rather than proactive suppression." (p. 15, lines 353-353). Perhaps this claim may be true in the current context, but this claim is too generalized and not supported, at least yet. Further, "Within the realm of learned distractor suppression, an ongoing debate centers around the question of whether, and precisely when, visual distractors can be proactively suppressed. As noted, the idea that learned spatial distractor suppression is applied proactively is largely based on the finding that the behavioral benefit observed when distractors appear with a higher probability at a given location is accompanied by a probe detection cost (measured via dot offset detection) at the high probability distractor location (Huang et al., 2022, 2023; Huang, Vilotijević, et al., 2021)." (p. 15, lines 355-361). Again, the authors should either cite more of the opposing side of the debate (e.g., the signal suppression hypothesis, Gaspelin & Luck, 2019 or Luck et al., 2021) and the many lines of converging evidence of proactive suppression) or temper the claims.
  
  Thank you for your constructive feedback regarding our statements on suppression mechanisms. We acknowledge that our original claim was intended to reflect our specific findings within the context of this study and was not meant to generalize across all research in the field. To prevent any misunderstanding, we have tempered our claims to avoid overgeneralization by clarifying that our findings suggest a tendency toward reactive suppression within the specific experimental conditions we investigated (see page 17).
  
  Furthermore, learned distractor suppression is multifaceted, encompassing both feature-based suppression (as proposed by the signal suppression hypothesis) and spatial-based suppression (as examined in the current study). The signal suppression hypothesis provides proactive evidence related to the suppression of specific feature values (Gaspelin et al., 2019; Gaspelin & Luck, 2018b; Stilwell et al., 2019). We have incorporated references to these studies to offer a more comprehensive perspective on the ongoing debate at a broader level (see page 17).
  
  (8) "These studies however, mainly failed to find evidence in support of active preparatory inhibition (van Moorselaar et al., 2020, 2021; van Moorselaar & Slagter, 2019), with only one study observing increased preparatory alpha contralateral to the high probability distractor location (Wang et al., 2019)." (p. 15, lines 367-370). This is an odd phrasing to say "many studies" have shown one pattern (citing 3 studies) and "only" one showing the opposite, especially given these were all from the current authors' labs.
  
  Agreed. We have rewritten this text on page 17.
  
  “These studies however, failed to find evidence in support of active preparatory inhibition as indexed via increased alpha power contralateral to the high probability distractor location (van Moorselaar et al., 2020, 2021; van Moorselaar & Slagter, 2019; but see Wang et al., 2019).”
  
  (9) Could the authors comment on why total power was significantly above baseline immediately (without clearer timing marks, ~10-50 ms) after the onset of the cue (Figure 3)? Is this an artifact of smearing? Further, it appears that there is significant activity (as strong as the evoked power of interest) in the baseline period of the evoked power when the memory item is presented on the vertical midline in the upper visual field (this is also true, albeit weaker, for the memory cue item presented on the horizontal midline to the right). This concern again appears in Figure 4 where the Alpha CTF slope was significantly below or above the baseline prior to the onset of the memory cue. Evoked Alpha was already significantly higher than baseline in the baseline period. In Figure 5, evoked power is already higher and different for the hpl than the lpls even at the memory cue (and before the memory cue onsets). There are often periods of differential overlap during the baseline period, or significant activity in the baseline period or at the onset of the critical, time-locked stimulus array. The authors should explain why this might be (e.g., smearing).
  
  Thank you for pointing this out. As suggested by the reviewer, this ‘unexpected’ pre-stimulus decoding is indeed the result of temporal smearing induced by our 5th order Butterworth filter. The immediate onset of reliable tuning (sometimes even before stimulus onset) is then also a typical aspect of studies that track tuning profiles across time in the lower frequency bands such as alpha (van Moorselaar & Slagter 2019; van Moorselaar et al., 2020; Foster et al., 2016).
  
  Indeed, visual inspection also suggests that evoked activity tracked items at the top of the screen, an effect that is unlikely to result from temporal smearing as it is temporally interrupted around display onset. However, it is important to note that CTFs by location are based on far fewer trials, making them inherently noisier. The by-location plots primarily serve to show that the observed pattern is generally consistent across locations. In any case, given that the high probability distractor location was counterbalanced across participants it did not systematically influence our results.
  
  (10) Given that EEG was measured, perhaps the authors could show data to connect with the extant literature. For example, by showing the ERP N2pc and PD components. A strong prediction here is that there should be an N2pc component followed by a PD component if there is the first selection of the singleton before it is suppressed.
  
  Thank you for your great suggestion regarding the analysis of ERP components such as N2pc and Pd. To reliably assess lateralized ERP components like N2pc or Pd the high probability location must be restricted to static lateralized positions (e.g., on the horizontal midline such as Wang et al., 2019). In contrast, our study was designed to utilize an inverted encoding model to investigate the mechanisms underlying spatial suppression. To avoid bias in training the spatial model toward specific spatial locations (see also the previous comment), we counterbalanced the high-probability location across participants, ensuring an equal distribution of high-probability locations within the sample. Given this counterbalanced design, it was not feasible to reliably assess these components within the scope of the current study. Yet, we agreed with the reviewer that it would be of theoretical interest to examine Pd and N2pc evoked by the search display, particularly in this scenario where suppression has been triggered prior to search onset.
  
  (11) Figure 2 (behavioral results) is difficult to see (especially the light grey and white bars). A simple fix might be to outline all the bars in black.
  
  Thank you! We have incorporated your suggestion by outlining all the bars on page 10.
  
  Reviewer #3 (Recommendations For The Authors):<br /> (1) I'm wondering about the link between the memory task and the search task. I think the interpretation of the data should include more discussion of the fact that much of the search literature doesn't involve simultaneously holding an unrelated location in memory. How might that change the results?
  
  For example - what happens behaviorally on the subset of trials in which the location to be held in memory is near the high probability distractor location? All the behavioral data is more or less compartmentalized, but I think some behavioral analysis of this and related questions might be quite useful. I know there are comparisons of behavior in single vs. dual-task cases (for the memory task at least), but I think the analyses could go deeper.
  
  Thank you for your great suggestion. To investigate the potential interactions between the spatial memory task and the visual search task, we conducted additional analyses on the behavioral data. First, we examined whether memory recall was influenced by the spatial distance (dist0 to dist4) between the memory cue location and the high-probability distractor location. As shown in the figure below, memory recall is not systematically biased either toward or away from the high-probability distractor location (p = .562, ηp<sup>2</sup> = .011).
  
  We also assessed how the memory task might affect search performance. Specifically, we plotted reaction times as a function of the spatial overlap between the memory cue location and any of the search items, separating trials by distractor-present (match-target, match-distractor, match-neutral) and distractor-absent (match-target, match-neutral) conditions. Although visually the result pattern seems to suggest that search performance was facilitated when the memory cue spatially overlapped with the target and interfered with when it overlapped with the distractor, this pattern did not reach statistical significance (distractor-present: p = .249, ηp<sup>2</sup> = .002; distractor-absent: p = .335, ηp<sup>2</sup> = .002). We have now included these analyses in our supplemental material.
  
  Beyond additional data analyses, there are also theoretical questions to be asked. For example, one could argue that in order to maintain a location near or at the high probability distractor location in working memory, the priority map would have to shift substantially. This doesn't necessarily mean that proactive suppression always occurs in search when there is a high probability location. Instead, one could argue that when you need to maintain a high probability location in memory but also know that this location might contain a distractor, the representation necessarily looks quite different than if there were no memory tasks. Maybe there are reasons against this kind of interpretation but more discussion could be devoted to it in the manuscript. I guess another way to think of this question is - how much is the ping showing us about attentional priority for search vs. attentional priority for memory, or is it simply a combination of those things, and if so, how might that change if we could ping the attentional priority map without a simultaneous memory task?
  
  Thank you for this valuable suggestion. The aim of our study was to explore how the CTFs elicited by the memory cue were influenced by the search task. We employed a simultaneous memory task because directly measuring CTFs in relation to the search task was not feasible, as the HPL typically does not vary within individual participants. Consequently, CTFs locked to placeholder onsets could reflect arbitrary differences between (subgroups of) participants rather than true differences in the HPL. To address this, we combined the search task with a VWM task, leveraging the fact that location-specific CTFs can reliably be elicited by a memory cue and that the location of this cue relative to the HPL can be systematically varied within participants (Foster et al., 2016, 2017; van Moorselaar et al., 2018). This approach allowed us to examine the CTFs elicited by the memory cue and how these were modulated by their distance from the HPL.
  
  While it is theoretically possible that the observed changes resulted from alterations in how the memory cue was maintained in memory only, this explanation seems unlikely, for memory performance (recall) did not vary as a function of the cue's distance from the HPL, suggesting that the distance-related changes in the CTFs are reflections of both tasks. Moreover, distractor learning typically occurs without awareness (Gao & Theeuwes 2022; Wang & Theeuwes 2018). It is difficult to understand how such unconscious processes could lead to anticipations in the memory task and subsequently modulate the representation of the consciously remembered memory cue only. We therefore believe that if we would have pinged the attentional priority map without a simultaneous memory task, the results would have been similar to those obtained in the present experiment, indicating stronger tuning at the HPL. Yet, this work still needs to be done.
  
  To address this comment, we have added a paragraph on p. 18:
  
  “However, two alternative explanations warrant consideration. First, one could argue that observed modulations in the revived CTFs do not provide insight into the mechanisms underlying distractor suppression but instead reflect changes in the memory representation itself, potentially triggered by the anticipation of the HPL in the search task. According to this view, the changes in the revived CTFs would be unrelated to how search performance (in particular distractor suppression) was achieved. While this is theoretically possible, we believe it to be unlikely. Memory performance (recall) did not vary as a function of the cue's distance from the HPL, whereas the revived CTFs did, indicating that these changes likely reflect contributions from both tasks. Additionally, distractor learning typically occurs without conscious awareness (Gao & Theeuwes 2022; Wang & Theeuwes 2018). It is difficult to conceive how such unconscious processes could produce anticipatory effects in the memory task and selectively modulate the representation of the consciously remembered memory cue. Second, the apparent lack of suppression and the presence of a pronounced tuning at the high-probability distractor location could actually reflect a proactive mechanism that manifests in a way that seems reactive due to the dual-task nature of our experiment.”
  
  (2) When the distractor appears at a particular location with a high probability it necessarily means that intertrial effects differ between high and low probability distractor locations. Consecutive trials with a distractor at the same location are far more frequent in the high probability condition. You may not have enough power to look at this, and I know this group has analyzed this behaviorally in the past, but I do wonder how much that influences the EEG data reported here. Are CTFs also sensitive to distractors/targets from the most recent trial? And does that contribute to the overall patterns observed here?
  
  Thank you for your thoughtful comment. Indeed, Statistical distractor learning studies naturally involve a higher proportion of intertrial effects for high-probability distractors compared to low-probability ones. Previous research, including the present study, has demonstrated that while distractor location improves performance—shown by faster response times (t(23) = 6.32, p < .001, d = 0.33) and increased accuracy (t(23) = 4.21, p < .001, d = 0.86)—intertrial effects alone cannot fully account for the learned suppression effects induced by spatial distractor imbalances. This analysis in now reflected in the revised manuscript on page 9.
  
  However, as noted by the reviewer, this leaves uncertain to what extent the neural indices of statistical learning, in this case the modulation of channel tuning functions, capture the effects of interest beyond the contributions of intertrial priming. To address this issue, one possible approach is to rerun the CTF analysis after excluding trials with location repetitions. Since the distractor location is unknown to participants at the time the CTF is revived by the placeholder, we removed trials where the memory cue location repeated the distractor location from the preceding trial, rather than trials with distractor location repetitions between consecutive trials. Our analyses indicate that after trials removal (~ 9% of overall trials), the spatial gradient pattern in the CTF slopes remains similar. However, the cluster-based permutation analysis fails to reveal any significant findings, and a one-sample t-test on the slopes averaged within the 100 ms time window of interest yields a p-value of 0.106. While this could suggest that the current pattern is influenced by distractor-cue repetition, it is more likely that the trial removal resulted in an underpowered analysis. To investigate this, we randomly removed an equivalent number of trials (9%), which similarly resulted in insignificant findings, although the overall result pattern remained comparable (p = 0.066 for the one-sample t-test on the slopes average within the interested time window of 100 ms).
  
  Author response image 2.
  
  Also, in our previous pinging study we observed that, despite the trial imbalance, decoding was approximately equal between high probability trailing (i.e., location intertrial priming) and non-trailing trials, suggesting that the ping is able to retrieve the priority landscape that build up across longer timescales.
  
  (3) Maybe there is too much noise in the data for this, but one could look at individual differences in the magnitude of the high probability distractor suppression and the magnitude of the alpha CTF slope. If there were a correlation here it would bolster the argument about the relationship between priority to the distractor location and subsequent behavior reduction of interference from that distractor.
  
  Thank you for this valuable suggestion. We investigated whether there was a correlation between the average gradient slope during the time window in which the placeholder revived the memory representation and the average distance slope in reaction times for the learned suppression effect. This correlation was not significant (r = .236, p = 0.267), which is perhaps expected given the potential noise levels, as noted by the reviewer. Furthermore, while the learned suppression effect is robust at the group level, its predictive value for individual-level performance has been shown to be limited (Ivanov et al., 2024; Hedge et al., 2018). Consequently, we chose not to include this analysis in the manuscript (see also our response to comment 2 by reviewer 2).
  
  (4) The results sections are a bit dense in places, especially starting at the bottom of page 11. For readers who are familiar with the general questions being asked but less so with the particular time-frequency analyses and CTF approaches being used (like myself), I think a bit more time could be spent setting up these analyses within the results section to make extra clear what's going on.
  
  Thank you for your feedback regarding the clarity of our Results section. We have revised this section to make it more understandable and easier to follow, especially for readers who may be less familiar with the specific time-frequency analyses and modeling approaches used in our study. Specifically, we have provided additional interpretations alongside the reported results from page 10 to page 13 to aid comprehension and ensure that the methodology and findings are accessible to a broader audience. Additionally, we have revised the figure notes to further enhance clarity and understanding.
  
  Other comments:
  
  Abstract: "a neutral placeholder display was presented to probe how hidden priority map is reconfigured..." i think the word "the" is missing before "priority map"
  
  Thank you. We have added the word “the” before “hidden priority map”.
  
  p. 4, Müller's group also has a number of papers that demonstrate how learned distractor regularities impact search (From the ~2008-2012 range, probably others as well), it might be worth citing a few here.
  
  Thank you for your suggestion. In the revised manuscript, we have added citations to several key papers from Muller’s group on page 4 as well as other research groups.
  
  p.5 - Chang et al. (2023) seems highly relevant to the current study (and consistent with its results) - depending on word limits, it might make sense to expand the description of this in the introduction to make clear how the present study builds upon it
  
  Thank you! We have expanded the discussion of Chang et al. (2023) on page 5 to provide more detailed elaboration of their study and its relevance to our work.
  
  p. 7 - maybe not for the current study, but I do wonder whether the distortion of spatial memory by the presence of the search task occurs only when there is a relevant regularity in the search task. In other words, if the additional singleton task had completely unpredictable target and distractor locations, would there be memory distortions? Possibly for the current dataset, the authors could explore whether the behavioral distortion is systematically towards or away from the high probability distractor location.
  
  Thank you for your insightful suggestion. Following your recommendation, we conducted an additional analysis to examine memory recall as a function of the distance between the memory cue location and the high-probability distractor location. Figure S1A illustrates the results, depicting memory recall deviation across various distances (dist0 to dist4) from the high-probability distractor location.
  
  Our statistical analysis indicates that memory recall is not systematically biased either towards or away from the high-probability distractor location (p = .562, η<sub>p</sub><sup>2</sup> = .011). This finding suggests that spatial memory recall remains relatively stable and is not heavily influenced by the presence of regularities in the distractor locations.
  
  p. 7 - in addition to stats it would be helpful to report descriptive statistics for the high probability vs. other distractor location comparisons
  
  Thank you! We have added descriptive statistics on page 8 and page 9.
  
  p. 19, "64%" repeated unnecessarily - also, shouldn't it be 65% if it's 5% at each of the other seven locations?
  
  Thank you. This is now corrected in the revised manuscript.
  
  p. 20 "This process continued until participants demonstrated a thorough understanding of the assigned tasks" Were there objective criteria to measure this?
  
  Thank you for pointing out this issue. To clarify, objective criteria were indeed used to assess participants’ readiness to proceed. Specifically:
  
  For the training phase practice trials, participants were required to achieve an average memory recall deviation of less than 13°.
  
  For the test phase practice trials, participants needed to demonstrate a minimum of 65% accuracy in the search task. In addition, participants were asked to verbally confirm their understanding of the task goals with the experimenter before proceeding.
  
  We have revised the manuscript to clearly indicate these criteria on p. 23.
  
  p. 21 "P-values were Greenhouse-Geiser corrected in case where the..." I think "case" should be "cases"
  
  Thank you. We have corrected this in the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.17.589855v2
www.biorxiv.org www.biorxiv.org

New submission 08/09/2023, 13:41:17

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 1
  
  We thank the reviewer for their thoughtful comments. We have addressed them below, and we believe that have significantly strengthened the clarity of the manuscript.
  
  Main Comments:
  
  In Fig. 2C-D, I am not sure I understand why ≈ 100 mutations fix with β = 0. In the absence of epistasis, and since the coefficients hi are sampled from a symmetric distribution centered at zero, it is to be expected that roughly half of the mutations will have positive fitness effects and thus will eventually fix in the population. With L = 250, I would have expected to see the number of fixed mutations approach ≈ 125 for β = 0. Perhaps I am missing something?
  
  • In our simulations, we initialize all populations from a state where there are only 100 available beneficial mutations (i.e., the initial rank is always 100). Without epistasis, these initial beneficial mutations are the only beneficial mutations that will be present throughout the entire trajectory. Hence, for β = 0, only 100 beneficial mutations can fix. Previously, this information could be found in the “Materials and methods” section of the SI. To make this aspect of our simulation more clear in the revision, we have added a discussion of the initial rank to the “Landscape structure” subsection of the model definition section. In addition, we have merged “Materials and methods” with “Further simulation details” in the SI into one section, and have listed the values for the simulation parameters in the model definition section.
  
  Along these lines, the authors show that increasing β leads to a higher number of fixed mutations. I am not sure I understand their explanation for this. In line 209 they write that as β increases, “mutations are needed to cease adaptation”. The way I see it, in the absence of epistasis the fitness peak should correspond to a genotype with ≈ L/2 mutations (the genotype carrying all mutations with hi > 0). Increasing the magnitude of microscopic epistasis (i.e., increasing β ), and assuming that there is no bias towards positive epistasis (which there shouldn’t be based on the model formulation, i.e., section "Disorder statistics" on page 4), can change the “location” of the fitness peak, such that it now corresponds to a different genotype. Statistically speaking, however, there are more genotypes with L/2 mutations than with any other number of mutations, so I would have expected that, on average, the number of mutations fixed in the population would still have been ≈ L/2 (naturally with somewhat large variation across replicates, as seems to be the case).
  
  • With epistasis, the situation becomes more complex. The structure of our model imposes significant sign epistasis in general (i.e. mutations can be beneficial on one background genotype and deleterious on another). This means that in the presence of epistasis, more than 100 mutations can be required to reach a local optimum even when the initial rank was 100. Intuitively, this occurs because mutations that were deleterious on the ancestral background genotype can become beneficial on future genotypes. We find that this occurs consistently throughout adaptation, leading to the accumulation of more mutations with increasing epistasis.
  
  • Please note that we use the value L = 1000 in our simulations. We have also made the fact that we use L = 1000 more clear by moving the description of the simulation parameters to the main text.
  
  I do see how, in the clonal interference regime, there can be multiple genotypes in the population at a given time (each with a different mutational load), thus making the number of fixed mutations larger than L/2 when aggregating over all genotypes in the population. But this observation makes less intuitive sense to me in the SSWM regime. In lines 207-208, the authors state that “as beta increases, a greater number of new available beneficial mutations are generated per each typical fixation event”. While this is true, it is also the case that a greater number of mutations that would have been beneficial in the absence of epistasis are now deleterious due to negative epistasis (if I am understanding what the authors mean correctly).
  
  • The reviewer is correct to note that in the strong clonal interference regime, there will be more accumulated mutations across the entire population than in any single strain. However, we report the number mutations that have fixed, i.e., become present in the entire population.
  
  • We find that the typical decrease in rank (per fixation event) of the population decreases with increasing epistasis — i.e., the number of available beneficial mutations that are “consumed” when a mutation fixes is typically lower in systems with stronger epistasis.
  
  Similarly, I am not sure I understand how one goes from equation (6) to equation (7). In particular, it would seem to me that the term 4αiαj Ji j in equation (6) should be equally likely to be positive or negative (again assuming no bias towards positive Ji j). I thus do not see why ηi j in equation (7) is sampled from a normal distribution with mean µβ instead of just mean zero.
  
  • The reviewer is correct that, for a uniformly random initial state, αi , αj , and Ji j will be uncorrelated so that the distribution of 4αiαj Ji j can be computed exactly (and has mean zero). However, we initialize from a state with rank 100, so that we need to compute the distribution of the random variable E[αiαj Ji j|αiαj Ji j > 0, R = 100]. This is mathematically very challenging, because there are nontrivial correlations between spins even at initialization. For these reasons, we found the uniformly random approximation insufficient. This is described in the paragraph following Equation (7) in the resubmission.
  
  Minor Comments:
  
  The authors use a model including terms up to second-order epistasis. To be clear, I think this choice is entirely justified: as they mention in their manuscript, this structure allows to approximate any fitness model defined on a Boolean hypercube. As I understand it, the reason for not incorporating higher-order terms (as in e.g. Reddy and Desai, eLife 2021) has to do with computational efficiency, i.e., accommodating higher-order terms in equation (10) may lead to a substantial increase in computation time. Is this the case?
  
  • The author is correct that the incorporation of higher-order terms leads to significantly more expensive computation. It’s an interesting direction of future inquiry to see if our adaptive fast fitness computation method can be extended to higher-order interactions.
  
  Reviewer 2
  
  We would like to thank the reviewer for their careful reading and their useful comments connecting our work to spin glass physics. We believe the resulting additions to the paper have made our contributions stronger, and that they reveal some novel connections between the substitution trajectory and correlation functions in spin glasses. A summary of our investigation is provided below, and we have added two paragraphs to the discussion section under the heading “Connections to spin glass physics”.
  
  Main Comments:
  
  In spin glasses, slowdown of dynamics could have contributions from stretched exponential relaxation of spin correlations as well as aging, each of which are associated with their own exponents. In the present model, these processes could be quantified by computing two-point correlations associated with genomic overlap, as a function of lag time as well as waiting time (generation number). The population dynamics of competing strains makes the analysis more complicated. But it should be possible to define these correlations by separately averaging over lineages starting from a single parent genome, and over distinct parent genomes. It would be interesting to see how exponents associated with these correlations relate to the exponent c associated with asymptotic fitness growth.
  
  • To investigate this point, we first considered the two-point correlation function 〈αi (tw)αi (tw+ ∆t)〉 for waiting time tw and lag time ∆t. Because all spins are statistically identical, it is natural to average this over the spin index i, leading to the quantity
  
  Viewed as a function of ∆t for any fixed tw, it is clear that . If m mutations with respect to α(tw) have fixed at time tw + ∆t, a similar calculation shows that . Surprisingly, this simple derivation reveals that the two-spin correlation function commonly studied in spin glass physics is an affine transformation of the substitution trajectory commonly studied in population genetics. Moreover, it shows that the effect of tw is to change the definition of the ancestral strain, so that we may set tw = 0 without loss of generality and study the correlation function χ2(t) = 1 − 2m(t) where m(t) is the mean substitution trajectory of the population. Much of our analysis proceeds by analyzing the effect of epistasis on the accumulation of mutations. This relation provides a novel connection between this analysis and the analysis of correlation functions in the spin glass literature.
  
  • It is well known that in the SSWM limit without epistasis, the substitution trajectory follows a power law similar to the fitness trajectory with relaxation exponent 1.0 [1]. Informed by this identity, we performed simulations in the SSWM limit and fit power laws to the correlation function χ2 as a function of time. We have verified that χ2(t) obeys a power- law relaxation with exponent roughly 1.0 for β = 0; moreover, as anticipated by the reviewer, the corresponding exponent decreases with increasing β . Nevertheless, we find that these relaxation exponents are distinct from those found for the fitness trajectory, despite following the same qualitative trend. This point is particularly interesting, as it highlights that the dynamics of fixation induce a distinct functional form at the level of the correlation functions when compared to, for example, the Glauber dynamics in statistical physics.
  
  The strength of dynamic correlations in spin glasses can be characterized by the four-point susceptibility, which contains information about correlated spin flips. These correlations are maximized over characteristic timescales. In the context of evolution, such analysis may provide insights on the correlated accumulation of mutations on different sets of loci over different timescales. It would be interesting to see how these correlations change as a function of the mutation rate as well as the strength of epistasis.
  
  • To study this point, we considered the four-point correlation function
  
  Because spins are statistically identical, we found numerically that the genotype average is roughly equivalent to the angular average over trajectories. Inter-changing the order of the summation and the angular averaging, we then find that
  
  so that the information contained in the four-point correlation function is the same as the information contained in the two-point correlation function.
  
  Fig. 2E and Fig. 5 together suggests an intriguing possibility when interpreted in the spin glass context. It is clear that in the absence of epistasis, clonal interference accelerates fitness growth. Fig. 2E additionally suggests that this scenario will continue to hold even in the presence of weak, but finite epistasis, but disappears for sufficiently strong epistasis. I wonder if the two regimes are separated by a phase transition at some non-trivial strength of epistasis. Indeed, the qualitative behavior appears to change from that of a random field Ising spin glass for small β , to that of a zero field Sherrington-Kirkpatrick spin glass for sufficiently large β . While the foregoing comments are somewhat speculative, perhaps a discussion along these lines, and what it means in the context of evolution could be a useful addition to the discussion section of the paper.
  
  • We thank the reviewer for this interesting suggestion, and we have added a discussion of this point to the text in the future directions section, lines 483–489.
  
  Minor Comments:
  
  In the abstract (line 17-18), I recommend use of the phrase "a simulated evolving population" to avoid a possible misinterpretation of the work as experimental as opposed to numerical.
  
  • We have added the word “simulated”.
  
  In line 70, the word "the" before "statistical physics" is redundant.
  
  • We have removed “the”.
  
  To make the message in lines 294-295 visually clear, I recommend keeping the Y-axis scale bars constant across Fig. 4A and Fig. 4B.
  
  • We appreciate the suggestion. However, we found that when putting the two figures on the same scale, because the agreement is only qualitative and not quantitative (as emphasized in the text), it becomes difficult to view the trend in both systems. For this reason, we have chosen to keep the figure as-is.
  
  Fig. 6 caption states: "Without epistasis, the rank decreases with increasing µ". It should be "rank increases".
  
  • We have fixed this.
  
  In the last sentence in the caption to Fig. 8, the labels "(A, β =0)" and "(B, β =0.25)" need to be swapped.
  
  • We have fixed this.
  
  Editor Comments
  
  We thank the editor for pointing our attention towards these three interesting references, in particular the second, which appears most relevant to our work. We have added a discussion of reference 2 in the future directions section (lines 471–482), commenting on how to determine the contribution of within-path clonal interference to the fitness dynamics in our model. We have also added a reference to article 3 in the model description, commenting on the importance of sign epistasis and the prevalence of sign epistasis in our model with β > 0.
  
  References:
  
  Good BH, Desai MM. The impact of macroscopic epistasis on long-term evolutionary dynamics. Genetics. 2015.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.16.524306v2
www.biorxiv.org www.biorxiv.org

New submission 04/10/2023, 09:07:34

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The enteroviruses comprise a medically important genus in the large and diverse picornavirus family, and are known to be released without lysis from infected cells in large vesicles containing numerous RNA genome-containing capsids - a feature allowing for en bloc transmission of multiple viral genomes to newly infected cells that engulf these vesicles. SIRT-1 is an NAD-dependent protein deacetylase that has numerous and wide ranging effects on cellular physiology and homeostasis, and it is known to be engaged in cellular responses to stress and autophagy.
  
  Jassey et al. show that RNAi depletion of SIRT-1 impairs the release of enterovirus D-68 (EVD68) in EVs recovered from the supernatant fluids of infected cells using a commercial exosome isolation kit. The many functions attributed to SIRT-1 in the literature reflect its capacity to deacetylate various cell proteins engaged in transcription, DNA repair, and regulation of metabolism, apoptosis and autophagy. However, Jassey et al. make the surprising claim that the proviral role of SIRT-1 in promoting enterovirus release is not dependent on its deacetylase activity. Fig. S1C is crucial to this suggestion, as it is said to show that reconstituting expression with a catalytically-inactive mutant can rescue virus release from SIRT-1 depleted cells. However, no information is provided concerning the levels of endogenous and ectopicallyexpressed SIRT-1 proteins in this experiment, making it very difficult to interpret the results. Is the mutant SIRT-1 protein expressed at a higher level than the non-mutant protein? Is there a 'sponging' effect with these transfections that lessens the siRNA efficiency and reduces knockdown of the endogenous protein? Fig. S1B and Fig. 4C convincingly show that EX527, a small molecule inhibitor of the deacetylase activity of SIRT-1, inhibits extracellular release of the virus. This suggests that the deacetylase activity of SIRT-1 is in fact required for the proviral effect of SIRT-1. This is a fundamentally important question that will require more investigation.
  
  We have included western blot data (Fig. S1D), which shows comparable levels of expression between the wild-type and mutant SIRT-1 constructs as well as the endogenous SIRT-1. While both constructs partially rescued EV-D68 titers in SIRT-1 knockdown cells, only the wild-type construct rescued SERCA2A protein levels, indicating that SIRT-1 deacetylase activity is required for SERCA2A expression but not for EV-D68 infection.
  
  Fig. 6 shows how SIRT-I knockdown impacts the release of enterovirus D68 in EVs recovered from cell culture supernatant using a commercial 'Total Exosome Isolation Kit'. The authors should describe the principle this kit exploits to isolate 'exosomes' (affinity isolation?) and specify which antibodies it involves (anti-phosphatidylserine, anti-CD63, others?) This could impact the outcome of these experiments, and moreover is important to include in the longterm scientific record. The authors are appropriately cautious in describing the vesicles they presume to be isolated by the kit as simply 'extracellular vesicles', since there are multiple types of EVs with very different mechanisms of biogenesis, of which 'exosomes' are but one specific type. It would have been more elegant had the authors shown that SIRT-1 is required for EVD68 release in detergent-sensitive vesicles with low buoyant density in isopycnic gradients, and to characterize the size and number of viral capsids in these vesicles by electron microscopy.
  
  We have added a description of the Total Exosome Isolation Kit principle to the materials and methods. The reagent, in brief, ties up water molecules and forces less soluble components, such as vesicles, out of the culture media, which can then be pelleted by centrifugation. The purity and size distribution of exosomes isolated with this kit is comparable to ultracentrifugation.
  
  Fig. 6 shows that SIRT-1 depletion upregulates CD63 expression, but has no apparent impact on the release of CD63-positive 'EVs' from uninfected cells. EV-D68 infection also upregulates CD63 expression in SIRT-1 replete cells, and in this case, increases the release of CD63-positive EVs. The combination of infection and SIRT-1 depletion massively upregulates CD63 expression, but appears to eliminate the enhanced release of CD63-positive EVs resulting from infection alone. These are interesting results, from which the authors infer CD63 is associated with EVs containing EV-D68. But, do we know this? Can a CD63 pulldown immunoprecipitate EV-D68 capsid proteins or viral RNA? CD63 is strongly associated with exosomes released from cells through the multi-vesicular body pathway, which are distinct from the LC3-positive EVs released by secretory autophagy that have previously been associated with enteroviruses. The authors suggest that 'knockdown of SIRT-1 may prevent the exocytosis of CD63-positive EVs", but this is a very broad claim (and not really demonstrated by Fig. 6): it requires a clearer definition of what the authors mean by 'exocytosis' and a much more detailed analysis of the size and buoyant density of EVs released in a SIRT-1-dependent process.
  
  We have toned down this suggestion, which sets up our logic for what is now Figure 7 but we agree does not prove the specific nature of these vesicles.
  
  The authors suggest that almost all EV-D68 released from infected cells is released without cell lysis in EVs. However, they generally show data from only a single time point following infection (5 or 6 hrs post-infection). It would have been interesting to see a more complete temporal analysis, and to know whether a high proportion of virus continues to be released in EVs, or if it is swamped out ultimately by lytic release of nonenveloped virus.
  
  In these cells, very little virus is released at earlier timepoints, and after 6hpi it is difficult to analyze virus release because of cell detachment and lysis. In a future publication we will use less susceptible cells to analyze a time course of release.
  
  Fig. 1D indicates that a small fraction of SIRT-1 leaks from the nucleus in EV-D68 infected cells. The authors suggest this is due to targeted nuclear export, rather than simply leaky nuclear pores which are well known to exist in enterovirus-infected cells. The authors present similar fluorescent microscopy data showing inhibition of TFEB export in leptomycin-B treated cells in Fig. S2A in support of their claim that this is specific SIRT-1 export, but these data are far from convincing - there is equivalent residual TFEB and SIRT-1 in the cytoplasm of the treated cells. Quantitative immunoblots of cytoplasmic and nuclear cell fractions might prove more compelling.
  
  We have changed the text to remove the word “block” and instead suggest that there is inhibition, given the difference we observe with and without leptomycin-B.
  
  Finally, the authors should be more specific in describing the viruses they have studied (EV-D68 and PV). It would be preferable to describe these as 'enteroviruses' (including in the title of the manuscript), rather than more broadly as 'picornaviruses'. There is no certainty that the requirement for SIRT-1 in non-lytic release of virus extends to hepatoviruses or other picornaviral genera, for which mechanisms of nonlytic release may be quite different.
  
  We have made this change and thank the reviewer for pointing this out.
  
  Reviewer #2 (Public Review):
  
  The authors aimed to connect SIRT-1 to EV-D68 virus release through mediating ER stress. They are successful in robustly connecting these pathways experimentally and show a new role for SIRT-1 in EV-D68 infection. These results extend to additional viruses, suggesting role(s) for SIRT-1 in diverse virus infection.
  
  The authors note that EV-D68 does not significantly impact SIRT-1 protein levels (Fig 1E and F), though this has been described for other picornaviruses (Xander et al., J Immunol 2019; Han et al., J Cell Sci 2016; Kanda et al Biochem Biophys Res Commun 2015). This may be of interest to note in the manuscript.
  
  We have cited the above papers in the manuscript and thank the reviewer for these suggestions.
  
  The data regarding CVB3 (Fig S4) are especially interesting because they show no discernable impact on infection. The manuscript should describe this further and perhaps speculate on potential reasons. Could it be due to inefficient knockdown?
  
  We have shown that both genetic and pharmacological inhibition of SIRT-1 does not significantly alter CVB3 titers. We do not think this is due to inefficient knockdown since the CVB3 and PV experiments were done concurrently. We are currently investigating why CVB3 responds differently from EV-D68 and PV.
  
  SIRT-1 (and other sirtuins) have been linked to an innate interferon response. Are any of the phenotypes observed here due to IFN responses? The use of H1HeLa cells would suggest this is not the case.
  
  We think this is unlikely because H1HeLas are not IFN-competent and the knockdown of SIRT1 did not significantly alter viral RNA replication
  
  Reviewer #1 (Recommendations For The Authors):
  
  In Fig. 1, it would be informative to show an immunoblot of the protein in knockdown vs control cells (this is shown in different experiments in Fig. 2A and 3C, with variable degrees of knockdown efficiency, but ideally should be shown here also).
  
  The knockdown efficiency of SIRT-1 is now shown in Fig. S1D. We thank the reviewer for this suggestion.
  
  Why is the extracellular virus titer in the control cells in Fig. 1C so much lower (over a 1.5 logs) than in Fig. 1B? Has the plasmid transfection induced an innate immune response, and could this be confounding the experiment?
  
  We think this is due to stress induced by transfection and not an innate immune response, since H1Hela are not interferon competent.
  
  SIRT-1 is recognized to have a regulatory role in autophagy, but the author's claim that it is "essential for stress induced and basal autophagy" would be strengthened by including in Fig. 2B control images of starved and CCCP-treated cells.
  
  LC3 lipidation and p62 degradation are the hallmarks of autophagy initiation and flux, which are shown in Fig. 2A. The goal of Fig. 2B was to verify the impact of SIRT-1 knockdown in restricting basal autophagic degradation. We will examine the effect of starvation and CCCP treatment in future studies. We thank the reviewer for understanding.
  
  The BiP immunoblot shown in Fig. 4B does not support the claim that 'TG [thapsigargin] treatment induced BiP protein levels' whereas 'EV-D68 infection reduced BiP levels...suggesting that EV-D68 blocks ER stress.' The apparent differences in BiP expression are minimal and of questionable biological significance.
  
  We have consistently observed a reduction in BiP levels during EV-D68 infection in both hSABCi-NS1.1 as indicated in Fig. 4B and H1HeLa (see Author response image 1), which is consistent with an ER stress blockade during EV-D68 infection.
  
  Author response image 1.
  
  Minor comments:
  
  1) The variable and wide-ranging scale of the y-axis in Figs. 1A-C and S1 is distracting, exaggerates small differences, and makes it difficult to assess the magnitude of differences in virus titers. The scale should be standardized and held constant in graphs showing results from similar types of experiments.
  
  Our graphs are plotted based on the viral titers from experiments, mostly done on different days. We are confident that the variabilities in the y-axis do not affect the statistical analyses.
  
  2) The number and types of (technical or biological?) of experimental replicates should be indicated in the figure legends. Ideally, each replicate should be individually plotted in graphs.
  
  All experiments are repeated at least three times unless otherwise indicated. We have added this information to the figure legends.
  
  3) Fig. S5C - how many replicates were done, and is there a statistically significant difference in viral RNA abundance at the last time point?
  
  The experiment was done three times, twice with a low MOI (0.1) and once with a high MOI (30). There is no statistical difference at the last time point as shown in the graphs in Author response image 2.
  
  Author response image 2.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Figure 1D would benefit from staining for viral replication compartments (J2, for instance) to correlate the amount of viral dsRNA with nuclear egress of SIRT-1. Similar data would benefit Figure 5A. The data in Figure S5 suggests that most, but not all cells, are infected, so having this control seems important for their IFA experiments.
  
  SIRT-1 dsRNA staining for EV-D68 infection is shown in Fig. S5A and all cells appear to be infected. The IFA data (Author response image 3) shows dsRNA staining of CVB3-infected cells.
  
  Author response image 3.
  
  Are EVs not released as efficiently with SIRT-1 knockdown? The authors show that knockdown reduces CD63 levels in purified EVs, but this could be explained if exosomes are not generated as robustly with SIRT-1 knockdown.
  
  We don’t want to use the word “exosomes” since their definition is very specific, and only use it once in our manuscript, to describe known membrane associations of CD63. We do not think SIRT-1 knockdown affects the intracellular generation of EVs, since depleting SIRT-1 leads to the buildup of CD63 positive signals in the whole cell lysates compared to the scramble control (Fig. 7B and C). Instead, our data suggest that SIRT-1 regulates the release of EVs during EV-D68 infection.
  
  Labels of graphs for "Infection" versus treatment ("TG" or "EX527") is unclear. All samples are presumably infected, so perhaps the authors meant to label these diagrams as untreated.
  
  We have made the changes in the labels and thank the reviewer for helping make these graphs more clear.
  
  The induction of ER stress with TG and repression of stress with EV-D68 infection is clear from BiP western blots. Are BiP levels reduced in SIRT-1 knockdown cells? Their data with TG treatment and knockdown suggests this may be possible.
  
  We have not examined the impact of SIRT-1 knockdown on BiP protein levels. But since SIRT1 KD increases ER stress, as evidenced by a reduction in SERCA2A levels (Fig. 3C and E), we would expect an increase in BiP levels in SIRT-1 depleted cells.
  
  Would the authors expect TG to reduce EVs with EV-D68 as well? Presumably, combination of TG with SIRT-1 would reduce EVs similar to the results shown in Figure 6C. They mention in the discussion that TG and SIRT-1 "share common cellular targets" so it would be interesting to determine if TG acts similar to SIRT-1 knockdown with regard to EVs.
  
  We think TG will similarly reduce EVs in EV-D68-infected cells, and we are currently testing this hypothesis.
  
  Because of the inclusion of the SARS-CoV-2 data and mention in the abstract, it may be appropriate to include that data (Fig S7) in the main figures. The authors mention SIRT-1 as important to MERS-CoV infection in the introduction, but SIRT-1 has been implicated in RNA virus infection, including picornaviruses (noted above). The expansion of this section to provide additional context would benefit the introduction and discussion.
  
  We have moved the former Fig. S7 to the main manuscript as Fig. 6.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.08.12.503821v3
www.biorxiv.org www.biorxiv.org

Perceptual error based on Bayesian cue combination drives implicit motor adaptation

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  eLife assessment
  
  This study presents an important finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants, hereby linking perception and action during implicit adaptation. The evidence supporting the claims of the authors is convincing. The normative approach of the proposed PEA model, which combines ideas from separate lines of research, including vision research and motor learning, opens avenues for future developments. This work will be of interest to researchers in sensory cue integration and motor learning.
  
  Thank you for the updated assessment. We are also grateful for the insightful and constructive comments from the reviewers, which have helped us improve the manuscript again. We made necessary changes following their comments (trimmed tests, new analysis results, etc) and responded to the comments in a point-by-point fashion below. We hope to publish these responses alongside the public review. Thank you again for fostering the fruitful discussion here.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  I appreciate the normative approach of the PEA model and am eager to examine this model in the future. However, two minor issues remain:
  
  (1) Clarification on the PReMo Model:
  
  The authors state, "The PReMo model proposes that this drift comprises two phases: initial proprioceptive recalibration and subsequent visual recalibration." This description could misinterpret the intent of PReMo. According to PReMo, the time course of the reported hand position is merely a read-out of the *perceived hand position* (x_hat in your paper). Early in adaptation, the perceived hand position is biased by the visual cursor (x_hat in the direction of the cursor); towards the end, due to implicit adaptation, x_hat reduces to zero. This is the same as PEA. I recommend that the authors clarify PReMo's intent to avoid confusion.
  
  Note, however, the observed overshoot of 1 degree in the reported hand position. In the PReMo paper, we hypothesized that this effect is due to the recalibration of the perceived visual target location (inspired by studies showing that vision is also recalibrated by proprioception, but in the opposite direction). If the goal of implicit adaptation is to align the perceived hand position (x_hat) with the perceived target position (t_hat), then there would be an overshoot of x_hat over the actual target position.
  
  PEA posits a different account for the overshoot. It currently suggests that the reported hand position combines x_hat (which takes x_p as input) with x_p itself. What is reasoning underlying the *double occurrence* of x_p?
  
  There seem to be three alternatives that seem more plausible (and could lead to the same overshooting): 1) increasing x_p's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase), 2) decreasing sigma_p (assuming that participants pay more attention to the hand during the report phase), 3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration. All these options, at least to me, seem equally plausible and testable in the future.
  
  For clarification of the PReMo model’s take on Fig4A, we now write:
  
  “The PReMo model proposes that the initial negative drift reflects a misperceived hand location, which gradually reduces to zero, and the late positive drift reflects the influence of visual calibration of the target (Tsay, Kim, Saxena, et al., 2022). ”
  
  However, we would like to point out that the PEA model does not predict a zero (perceived hand location) even at the late phase of adaptation: it remains negative, though not as large as during initial adaptation (see Figure 4A, red line). Furthermore, we have not seen any plausible way to use a visually biased target to explain the overshoot of the judged hand location (see below when we address the three alternative hypotheses the reviewer raised).
  
  We don’t think the “double” use of xp is a problem, simply because there are TWO tasks under investigation when the proprioceptive changes are measured along with adaptation. The first is the reaching adaptation task itself: moving under the influence of the clamped cursor. This task is accompanied by a covert estimation of hand location after the movement (). Given the robustness of implicit adaptation, this estimation appears mandatory and automatic. The second task is the hand localization task, during which the subject is explicitly asked to judge where the hand is. Here, the perceived hand is based on the two available cues, one is the actual hand location xp, and the other is the influence from the just finished reaching movement (i.e., ). For Bayesian modeling from a normative perspective, sensory integration is based on the available cues to fulfill the task. For the second task of reporting the hand location, the two cues are xp and (with a possible effect of the visual target, which is unbiased since it is defined as 0 in model simulation; thus, its presence does not induce any shift effect). xp is used sequentially in this sense. Thus, its dual use is well justified.
  
  Our hypothesis is that the reported hand position results from a combination of from the previous movement and the current hand position xp. However, specifically for the overshoot of the judged hand location in the late part of the adaptation (Fig4A), the reviewer raised three alternative explanations by assuming that the PReMo model is correct. Under the PReMo model, the estimated hand location is only determined by , and xp is not used in the hand location report phase. In addition, (with xp used once) and a visual recalibration of the target can explain away the gradual shift from negative to positive (overshoot).
  
  We don’t think any of them can parsimoniously explain our findings here, and we go through these three hypotheses one by one:
  
  (1) increasing xp's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase)
  
  (2) decreasing σp (assuming that participants pay more attention to the hand during the report phase)
  
  The first two alternative explanations basically assume that xp has a larger contribution (weighting in Bayesian terms) in the hand location report phase than in the adaptation movement phase, no matter due to an increase in visual uncertainty (alternative explanation 1) or a reduction in proprioceptive uncertainty (alternative explanation 2). Thus, we assume that the reviewer suggests that a larger weight for xp can explain why the perceived hand location changes gradually from negative to positive. However, per the PReMo model, a larger weight for the xp will only affect , which is already assumed to change from negative to zero. More weight in in the hand report phase (compared to the adaptation movement phase) would not explain away the reported hand location from negative to positive. This is because no matter how much weight the xp has, the PReMo model assumes a saturation for the influence of xp on . Thus would not exceed zero in the late adaptation. Then, the PReMo model would rely on the so-called visual shift of the target to explain the overshoot. This leads us to the third alternative the reviewer raised:
  
  (3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration.
  
  The PReMo model originally assumed that the perceived target location was biased in order to explain away the positive overshoot of the reported hand location. We assume that the reviewer suggests that the perceived target position, which is shifted to the positive direction, also “biases” the perceived hand position. We also assume that the reviewer suggests that the perceived hand location after a clamp trial () is zero, and somehow the shifted perceived target position “biases” the reported hand location after a clamp trial. Unfortunately, we did not see any mathematical formulation of this biasing effect in the original paper (Tsay, Kim, Haith, et al., 2022). We are not able to come up with any formulation of this hypothesized biasing effect based on Bayesian cue integration principles. Target and hand are two separate perceived items; how one relates to another needs justification from a normative perspective when discussing Bayesian models. Note this is not a problem for our PEA models, in which both cues used are about hand localization, one is and the other is xp.
  
  We believe that mathematically formulating the biasing effect (Figure 4A) is non-trivial since the reported hand location changes continuously from negative to positive. Thus, quantitative model predictions, like the ones our PEA model presents here, are needed.
  
  To rigorously test the possible effect of visual recalibration of the target, there are two things to do: 1) use the psychometric method to measure the biased perception of the target, and 2) re-do Tsay et al. 2020 experiment without the target. For 2), compared to the case with the target, the PEA model would predict a larger overshoot, while the PReMo would predict a smaller overshoot or even zero overshoot. This can be left for future studies.
  
  (2) Effect of Visual Uncertainty on Error Size:
  
  I appreciate the authors' response about methodological differences between the cursor cloud used in previous studies and the Gaussian blob used in the current study. However, it is still not clear to me how the authors reconcile previous studies showing that visual uncertainty reduced implicit adaptation for small but not large errors (Tsay et al, 2021; Makino, et al 2023) with the current findings, where visual uncertainty reduced implicit adaptation for large but not small errors.
  
  Could the authors connect the dots here: I could see that the cursor cloud increases potential overlap with the visual target when the visual error is small, resulting in intrinsic reward-like mechanisms (Kim et al, 2019), which could potentially explain attenuated implicit adaptation for small visual errors. However, why would implicit adaptation in response to large visual errors remain unaffected by the cursor cloud? Note that we did verify that sigma_v is increased in (Tsay et al. 2021), so it is unlikely due to the cloud simply failing as a manipulation of visual uncertainty.
  
  In addition, we also reasoned that testing individuals with low vision could offer a different test of visual uncertainty (Tsay et al, 2023). The advantage here is that both control and patients with low vision are provided with the same visual input-a single cursor. Our findings suggest that uncertainty due to low vision also shows reduced implicit adaptation in response to small but not large errors, contrary to the findings in the current paper. Missing in the manuscript is a discussion related to why the authors' current findings contradict those of previous results.
  
  For connecting the dots for two previous studies (Tsay et al., 2021, 2023); Note Makino et al., 2023 is not in this discussion since it investigated the weights of multiple cursors, as opposed to visual uncertainty associated with a cursor cloud):
  
  First, we want to re-emphasize that using the cursor cloud to manipulate visual uncertainty brings some confounds, making it not ideal for studying visuomotor adaptation. For example, in the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (σv) in our model), but it additionally affects the mean of the distribution (µ). This unnecessary confound is neatly avoided by using cursor blurring, which is still a cursor with its center (µ) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2020, the cursor cloud often overlaps with the visual target; this "target hit" would affect adaptation, possibly via a reward learning mechanism (Kim et al., 2019). This is a second confound that accompanies the cursor cloud. Yes, the cursor cloud was verified as associated with high visual uncertainty (Tsay et al., 2021); this verification was done with a psychophysics method with a clean background, not in the context of a hand reaching a target that is needed. Thus, despite the cursor cloud having a sizeable visual uncertainty, our criticisms for it still hold when used in error-clamp adaptation.
  
  Second, bearing these confounds of the cursor cloud in mind, we postulate one important factor that has not been considered in any models thus far that might underlie the lack of difference between the single-cursor clamp and the cloud-cursor clamp when the clamp size is large: the cursor cloud might be harder to ignore than a single cursor. For Bayesian sensory integration, the naive model is to consider the relative reliability of cues only. Yes, the cloud is more uncertain in terms of indicating the movement direction than a single cursor. However, given its large spread, it is probably harder to ignore during error-clamp movements. Note that ignoring the clamped cursor is the task instruction, but the large scatter of the cursor cloud is more salient and thus plausible and harder to ignore. This might increase the weighting of the visual cue despite its higher visual uncertainty. This extra confound is arguably minimized by using the blurred cursor as in our Exp4 since the blurred cursor did not increase the visual angle much (Figure 5D; blurred vs single cursor: 3.4mm vs 2.5mm in radius, 3.90o vs 2.87o in spread). In contrast, the visual angle of the dot cloud is at least a magnitude larger (cursor cloud vs. single cursor: at least 25o vs. 2.15o in the spread, given a 10o standard deviation of random sampling).
  
  Third, for the low-vision study (Tsay et al., 2023), the patients indeed show reduced implicit adaptation for a 3 o clamp (consistent with our PEA model) but an intact adaptation for 30-degree clamp (not consistent). Though this pattern appears similar to what happens for normal people whose visual uncertainty is upregulated by cursor cloud (Tsay et al., 2021), we are not completely convinced that the same underlying mechanism governs these two datasets. Low-vision patients indeed have higher visual uncertainty about color, brightness, and object location, but their visual uncertainty about visual motion is still unknown. Due to the difference in impairment among low vision people (e.g., peripheral or central affected) and the different roles of peripheral and central vision in movement planning and control (Sivak & Mackenzie, 1992), it is unclear about the overall effect of visual uncertainty in low vision people. The direction of cursor movement that matters for visuomotor rotation here is likely related to visual motion perception. Unfortunately, the original study did not measure this uncertainty in low-vision patients. We believe our Exp1 offers a valid method for this purpose for future studies. More importantly, we should not expect low-vision patients to integrate visual cues in the same way as normal people, given their long-term adaptation to their vision difficulties. Thus, we are conservative about interpreting the seemingly similar findings across the two studies (Tsay et al., 2021, 2023) as revealing the same mechanism.
  
  A side note: these two previous studies proposed a so-called mis-localization hypothesis, i.e., the cursor cloud was mislocated for small clamp size (given its overlapping with the target) but not for large clamp size. They suggested that the lack of uncertainty effect at small clamp sizes is due to mislocalization, while the lack of uncertainty effect at large clamp sizes is because implicit adaptation is not sensitive to uncertainty at large angles. Thus, these two studies admit that cursor cloud not only upregulates uncertainty but also generates an unwanted effect of so-called “mis-localization” (overlapping with the target). Interestingly, their hypothesis about less sensitivity to visual uncertainty for large clamps is not supported by a model or theory but merely a re-wording of the experiment results.
  
  In sum, our current study cannot offer an easy answer to "connect the dots" in the aforementioned two studies due to methodology issues and the specialty of the population. However, for resolving conflicting findings, our study suggests solutions include using a psychometric test to quantify visual uncertainty for cursor motion (Exp1), a better uncertainty-manipulation method to avoid a couple of confounds (Exp4, blurred cursor), and a falsifiable model. Future endeavors can solve the difference between studies based on the new insights from the current.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration
  
  Strengths:
  
  In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influence by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.
  
  The authors discussed in the revised version that the proposed model can capture the general implicit motor learning process in addition to the visuomotor rotation task. In the discussion, they emphasize two main principles: the automatic tracking of effector position and the combination of movement cues using Bayesian integration. These principles are suggested as key to understanding and modeling various motor adaptations and skill learning. The proposed model could potentially become a basis for creating new computational models for skill acquisition, especially where current models fall short.
  
  Weaknesses:
  
  The proposed model is described as elegant. In this paper, the authors test the model within a limited example condition, demonstrating its relevance to the sensorimotor adaptation mechanisms of the human brain. However, the scope of the model's applicability remains unclear. It has shown the capacity to explain prior data, thereby surpassing previous models that rely on elementary mathematics. To solidify its credibility in the field, the authors must gather more supporting evidence.
  
  Indeed, our model here is based on one particular experimental paradigm, i.e., the error-clamp adaptation. We used it simply because 1) this paradigm is one rare example that implicit motor learning can be isolated in a clean way, and 2) there are a few conflicting findings in the literature for us to explain away by using a unified model.
  
  For our model’s broad impact, we believe that as long as people need to locate their effectors during motor learning, the general principle laid out here will be applicable. In other words, repetitive movements with a Bayesian cue combination of movement-related cues can underlie the implicit process of various motor learning. To showcase its broad impact, in upcoming studies, we will extend this model to other motor learning paradigms, starting from motor adaptation paradigms that involve both explicit and implicit processes.
  
  Reviewer #3 (Public Review):
  
  (2.1) Summary
  
  In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.
  
  In the introduction, the authors notice that implicit adaptation (as measured in error-clamp based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease importance assigned to visual feedback which could explain lower asymptotes.
  
  The authors characterize visual uncertainty for 3 rotation sizes in a first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors are tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately compare their data with an unsuitable other data set (Tsay et al. 2020, instead of Tsay et al. 2021). Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation), than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular the model fits for experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.
  
  (2.2) Strengths
  
  In this study the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in a first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are: 1) a learning and 2) a retention rate, as used in popular state space models and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from, and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.
  
  (2.3) Weaknesses
  
  Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.
  
  For a proper citation of the “concave” adaptation function: we assume the reviewer is referring to the study by Morehead, 2017 which tested large clamp sizes up to 135 o and 175 o. Unsurprisingly, the 135 o and 175 o conditions lead to nearly zero adaptation, possibly due to the trivial fact that people cannot even see the moving cursor. We have quoted this seminar study from the very beginning. All other error-clamp studies with a block design emphasized an invariant or saturated implicit adaptation with large rotations (e.g., Kim, et al., 2019).
  
  The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between movement endpoint and disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.
  
  For the issues related to visual uncertainty measurement in Exp1:
  
  First, our visual uncertainty is about cursor motion direction in the display plane, and the measurement in Exp1 has never been done before. Thus, we do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work (Klein & Levi, 1987; Levi et al., 1987) in vision science since their studies showed that the deviation from the fixation is associated with an increase in visual uncertainty. Their study thus inspired us to conduct Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. Any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out-reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles. This is particularly important since we need to estimate parameters from one experiment to predict behaviors in another experiment.
  
  Second, the 1s delay of the reference cursor has minimal impact on the estimate of visual uncertainty based on previous vision studies. Our Exp1 used a similar visual paradigm by (White et al., 1992), which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6).
  
  These two problems have been addressed in the revised manuscript, with proper citations listed.
  
  The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20° which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2°) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.
  
  Reviewer 3 thinks Tsay 2020 dataset is not appropriate for our theorization, but we respectfully disagree. For the three points raised here, we would like to elaborate:
  
  (1) As we addressed in the previous response, the reported hand location in Figure 4A (Tsay et al., 2020) is not from a test of proprioceptive recalibration as conventionally defined. In the revision, we explicitly state that this dataset is not about proprioceptive recalibration and also delete texts that might mislead people to think so (see Results section). Instead, proprioceptive recalibration is measured by passive movement, as in our Exp3 (Figure 4E). For error-clamp adaptation here, "the remembered position of the target" is the target. Clearly, the participants did not report the target position, which is ever-present. Instead, their reported hand location shows an interestingly continuous change with ongoing adaptation.
  
  (2) Since the Tsay 2020 dataset is not a so-called proprioceptive recalibration, we need not take the difference between the reported location and the actual hand location. Indeed, the difference would be ~20 degrees, but comparing it to the previously reported proprioceptive recalibration is like comparing apples to oranges. In fact, throughout the paper, we refer to the results in Fig 4A as “reported hand location”, not proprioceptive recalibration. The target direction is defined as zero degree thus its presence will not bias the reported hand in the Bayesian cue combination (as this visual cue has a mean value of 0). Using the target as the reference also simplifies our modeling.
  
  (3) Exp3 is crucial for our study since it shows our model and its simple Bayesian cue combination principle are applicable not only to implicit adaptation but also to proprioceptive measures during adaptation. Furthermore, it reproduced the so-called proprioceptive recalibration and explained it away with the same Bayesian cue combination as the adaptation. We noticed that this field has accumulated an array of findings on proprioceptive changes induced by visuomotor adaptation. However, currently, there is a lack of a computational model to quantitatively explain them. Our study at least made an initial endeavor to model these changes.
  
  Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real life situations.
  
  The largest caveat raised by the reviewer appears to be directed to the error-clamp paradigm in general, not only to our particular study. In essence, this paradigm indeed requires participants to ignore the clamped error; thus, its induced adaptive response can be attributed to implicit adaptation. The original paper that proposed this paradigm (Morehead et al., 2017) has been cited 220 times (According to Google Scholar, at the time of this writing, 06/2024), indicating that the field has viewed this paradigm in a favorable way.
  
  Furthermore, we agree that this kind of instruction and feedback (invariant clamp) differ from daily life experience, but it does not prevent us from gaining theoretical insights by studying human behaviors under this kind of "artificial" task setting. Thinking of the saccadic adaptation (Deubel, 1987; Kojima et al., 2004): jumping the target while the eye moves towards it, and this somewhat artificial manipulation again makes people adapt implicitly, and the adaptation itself is a "disastrous" strategy for real-life situations. However, scientists have gained an enormous understanding of motor adaptation using this seemingly counterproductive adaptation in real life. Also, think of perceptual learning of task-irrelevant stimuli (Seitz & Watanabe, 2005, 2009): when participants are required to learn to discriminate one type of visual stimuli, the background shows another type of stimuli, which people gradually learn even though they do not even notice its presence. This "implicit" learning can be detrimental to our real life, too, but the paradigm itself has advanced our understanding of the inner workings of the cognitive system.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations For The Authors):
  
  L101: There is a typo: (Tsay et al., 2020), 2020) should be corrected to (Tsay et al., 2020).
  
  Thanks for pointing it out, we corrected this typo.
  
  L224-228: It would be beneficial to evaluate the validity of the estimated sigma_u and sigma_p based on previous reports.
  
  We can roughly estimate σu by evaluating the variability of reaching angles during the baseline phase when no perturbation is applied. The standard deviation of the reaching angle in Exp 2 is 5.128o±0.190o, which is close to the σu estimated by the model (5.048o). We also used a separate perceptual experiment to test the proprioceptive uncertainty (n = 13, See Figure S6), σp from this experiment is 9.737o±5.598o, also close to the σp extracted by the model (11.119o). We added these new analysis results to the final version of the paper.
  
  L289-298: I found it difficult to understand the update equations of the proprioceptive calibration based on the PEA model. Providing references to the equations or better explanations would be helpful.
  
  We expanded the process of proprioceptive calibration in Supplementary Text 1 with step-by-step equations and more explanations.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Suggestions (or clarification of previous suggestions) for revisions
  
  The authors persist on using the Tsay et al 2020 paper despite its many drawbacks which the authors attempt to address in their reply. But the main drawback is that the results in the 2020 paper is NOT relative to the unseen hand but to the visual target the participants were supposed to move their hand to. If the results were converted so to be relative to the unseen hand, the localization biases would be over 20 deg in magnitude.
  
  The PEA simulations are plotted relative to the unseen hand which makes sense. If the authors want to persist using the Tsay 2020 dataset despite any issues, they at least need to make sure that the simulations are mimicking the same change. That is, the data from Tsay 2020 needs to be converted to the same variable used in the current paper.
  
  If the main objection for using the Tsay 2021 is that the design would lead to forgetting, we found that active localization (or any intervening active movements like no-cursor reach) does lead to some interference or forgetting (a small reduction in overall magnitude of adaptation) this is not the case for passive localization, see Ruttle et al, 2021 (data on osf). This was also just a suggestion, there may of course also be other, more suitable data sets.
  
  As stated above, changing the reference system is not necessary, nor does it affect our results. Tsay et al 2020 dataset is unique since it shows the gradual change of reported hand location along with error-clamp adaptation. The forgetting (or reduction in proprioceptive bias), even if it exists, would not affect the fitting quality of our model for the Tsay 2020 dataset: if we assume that forgetting is invariant over the adaptation process, the forgetting would only reduce the proprioceptive bias uniformly across trials. This can be accounted for by a smaller weight on . The critical fact is that the model can explain the gradual drift of the proprioceptive judgment of the hand location.
  
  By the way, Ruttle et al.'s 2021 dataset is not for error-clamp adaptation, and thus we will leave it to test our model extension in the future (after incorporating an explicit process in the model).
  
  References
  
  Deubel, H. (1987). Adaptivity of gain and direction in oblique saccades. Eye Movements from Physiology to Cognition. https://www.sciencedirect.com/science/article/pii/B9780444701138500308
  
  Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. ELife, 8. https://doi.org/10.7554/eLife.39882
  
  Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.
  
  Kojima, Y., Iwamoto, Y., & Yoshida, K. (2004). Memory of learning facilitates saccadic adaptation in the monkey. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 24(34), 7531–7539.
  
  Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.
  
  Morehead, J. R., Taylor, J. A., Parvin, D. E., & Ivry, R. B. (2017). Characteristics of implicit sensorimotor adaptation revealed by task-irrelevant clamped feedback. Journal of Cognitive Neuroscience, 29(6), 1061–1074.
  
  Seitz, & Watanabe. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329–334.
  
  Seitz, & Watanabe. (2009). The phenomenon of task-irrelevant perceptual learning. Vision Research, 49(21), 2604–2610.
  
  Sivak, B., & Mackenzie, C. L. (1992). Chapter 10 The Contributions of Peripheral Vision and Central Vision to Prehension. In L. Proteau & D. Elliott (Eds.), Advances in Psychology (Vol. 85, pp. 233–259). North-Holland.
  
  Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.
  
  Tsay, J. S., Kim, H. E., Saxena, A., Parvin, D. E., Verstynen, T., & Ivry, R. B. (2022). Dissociable use-dependent processes for volitional goal-directed reaching. Proceedings. Biological Sciences / The Royal Society, 289(1973), 20220415.
  
  Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. ELife, 11, e76639.
  
  Tsay, J. S., Parvin, D. E., & Ivry, R. B. (2020). Continuous reports of sensed hand position during sensorimotor adaptation. Journal of Neurophysiology, 124(4), 1122–1130.
  
  Tsay, J. S., Tan, S., Chu, M. A., Ivry, R. B., & Cooper, E. A. (2023). Low Vision Impairs Implicit Sensorimotor Adaptation in Response to Small Errors, But Not Large Errors. Journal of Cognitive Neuroscience, 35(4), 736–748.
  
  White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This study presents a valuable finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants. The evidence supporting the claims of the authors is solid, although a better discussion of the link between the model variables and the outcomes of related behavioral experiments would strengthen the conclusions. The work will be of interest to researchers in sensory cue integration and motor learning.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This valuable study demonstrates a novel mechanism by which implicit motor adaptation saturates for large visual errors in a principled normative Bayesian manner. Additionally, the study revealed two notable empirical findings: visual uncertainty increases for larger visual errors in the periphery, and proprioceptive shifts/implicit motor adaptation are non-monotonic, rather than ramp-like. This study is highly relevant for researchers in sensory cue integration and motor learning. However, I find some areas where statistical quantification is incomplete, and the contextualization of previous studies to be puzzling.
  
  Thank you for your feedback and the positive highlights of our study. We appreciate your insights and will address the concerns in our revisions.
  
  Issue #1: Contextualization of past studies.
  
  While I agree that previous studies have focused on how sensory errors drive motor adaptation (e.g., Burge et al., 2008; Wei and Kording, 2009), I don't think the PReMo model was contextualized properly. Indeed, while PReMo should have adopted clearer language - given that proprioception (sensory) and kinaesthesia (perception) have been used interchangeably, something we now make clear in our new study (Tsay, Chandy, et al. 2023) - PReMo's central contribution is that a perceptual error drives implicit adaptation (see Abstract): the mismatch between the felt (perceived) and desired hand position. The current paper overlooks this contribution. I encourage the authors to contextualize PReMo's contribution more clearly throughout. Not mentioned in the current study, for example, PReMo accounts for the continuous changes in perceived hand position in Figure 4 (Figure 7 in the PReMo study).
  
  There is no doubt that the current study provides important additional constraints on what determines perceived hand position: Firstly, it offers a normative Bayesian perspective in determining perceived hand position. PReMo suggests that perceived hand position is determined by integrating motor predictions with proprioception, then adding a proprioceptive shift; PEA formulates this as the optimal integration of these three inputs. Secondly, PReMo assumed visual uncertainty to remain constant for different visual errors; PEA suggests that visual uncertainty ought to increase (but see Issue #2).
  
  Thank you for the comments and suggestions. We have now incorporated the citation for (Tsay et al., 2024), to acknowledge their clarification on the terms of perceptual error. We also agree that our model differs in two fundamental ways. One is to ditch the concept of proprioceptive shift and its contribution to the perceived hand location; instead, we resort to a “one-shot” integration of three types of cues with Bayesian rules. This is a more elegant and probably more ecological way of processing hand location per Occam's Razor. The second essential change is to incorporate the dependency of visual uncertainty on perturbation size into the model, as opposed to resorting to a ramp function of proprioceptive changes relative to perturbation size. The ramp function is not well grounded in perception studies. Yes, we acknowledged that PReMo is the first to recognize the importance of perceptual error, but highlighted the model differences in our Discussion.
  
  We also think the PReMo model has the potential to explain Fig 4A. But the Tsay et al., 2022 paper assumes that “a generic shift in visual space” explains the gradual proprioceptive changes from negative to positive (see page 17 in Tsay et al., 2022). We do not think that evoking this visual mechanism is necessary to explain Fig 4A; instead, the proprioceptive change is a natural result of hand deviations during implicit adaptation. As the hand moves away from the target (in the positive direction) during adaptation, the estimated hand location goes alone with it. We believe this is the correct way of explaining Fig4A results. As we played around with the PReMo model, we found it is hard to use visual shift to explain this part of data without additional assumptions (at least not with the ones published in Tsay et al., 2022). Furthermore, our PEA model also parsimoniously explains away the proprioceptive shift observed in a completely different setting, i,e., the proprioceptive changes measured by the passive method as a function of perturbation size in Exp 3.
  
  We expanded the discussion about the comparison between the two models, especially about their different views for explaining Fig4A.
  
  Issue #2: Failed replication of previous results on the effect of visual uncertainty.
  
  (2a) A key finding of this paper is that visual uncertainty linearly increases in the periphery; a constraint crucial for explaining the non-monotonicity in implicit adaptation. One notable methodological deviation from previous studies is the requirement to fixate on the target: Notably, in the current experiments, participants were asked to fixate on the target, a constraint not imposed in previous studies. In a free-viewing environment, visual uncertainty may not attenuate as fast, and hence, implicit adaptation does not attenuate as quickly as that revealed in the current design with larger visual errors. Seems like this current fixation design, while important, needs to be properly contextualized considering how it may not represent most implicit adaptation experiments.
  
  First, we don’t think there is any previous study that examined visual uncertainty as a function of perturbation size. Thus, we do not have a replication problem here. Secondly, our data indicate that even without asking people to fixate on the target, people still predominantly fixate on the target during error-clamp adaptation (when they are “free” viewing). For our Exp 1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1 now， also see below). We also collected eye-tracking data in Exp 4, which is a typical error-clamp experiment. More than 95% fall with +/- 50 pixels around the center of the screen, even slightly higher than Exp 1. This is well understandable: the typical error-clamp adaptation requires people to ignore the cursor and move the hand towards the target. To minimize the interference of the concurrently moving cursor, people depend on the fixation on the target, the sole task-relevant visual marker in the workspace, to achieve the task goal.
  
  In sum, forcing the participants to fixate on the target is not because we aimed to make up the linear dependency of visual uncertainty; we required them to do so to mimic the eye-tracking pattern in typical error-clamp learning, which has been revealed in our pilot experiment. The visual uncertainty effect is sound, our study is the first to clearly demonstrate it.
  
  Author response image 1.
  
  On a side note (but an important one), the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in Bromberg et al., 2019; de Brouwer et al., 2018, we have an upcoming paper to show this). This is one reason that our new theory would also be applicable to other types of motor adaptation.
  
  (2b) Moreover, the current results - visual uncertainty attenuates implicit adaptation in response to large, but not small, visual errors - deviates from several past studies that have shown that visual uncertainty attenuates implicit adaptation to small, but not large, visual errors (Tsay, Avraham, et al. 2021; Makino, Hayashi, and Nozaki, n.d.; Shyr and Joshi 2023). What do the authors attribute this empirical difference to? Would this free-viewing environment also result in the opposite pattern in the effect of visual uncertainty on implicit adaptation for small and large visual errors?
  
  We don’t think all the mentioned previous studies manipulated the visual uncertainty in a parametric way, and none of them provided quantitative measures of visual uncertainty. As we detailed in our Exp4 and in our Discussion, we don’t think Tsay et al., 2021 paper’s manipulation of visual uncertainty is appropriate (see below for 2d). Makino et al., 2023 study used multiple clamped cursors to perturb people, and its effect is not easily accountable since additional processes might be invoked given this kind of complex visual feedback. More importantly, we do not think this is a direct way of modulating visual uncertainty, nor did they provide any evidence.
  
  (2c) In the current study, the measure of visual uncertainty might be inflated by brief presentation times of comparison and referent visual stimuli (only 150 ms; our previous study allowed for a 500 ms viewing time to make sure participants see the comparison stimuli). Relatedly, there are some individuals whose visual uncertainty is greater than 20 degrees standard deviation. This seems very large, and less likely in a free-viewing environment.
  
  For our 2AFC, the reference stimulus is the actual clamped cursor, which lasts for 800 ms. The comparison stimulus is a 150-ms dot representation appearing near the reference. For measuring perception of visual motion, this duration is sufficient as previous studies used similar durations (Egly & Homa, 1984; Owsley et al., 1995). We think the 20-degree standard deviation is reasonable given that people fixate on the target, with only peripheral vision to process the fast moving cursor. The steep linear increase in visual uncertainty about visual motion is well documented. The last author of this paper has shown that the uncertainty of visual motion speed (though not about angels) follows the same steep trend (Wei et al., 2010). It is noteworthy that without using our measured visual uncertainty in Exp1, if we fit the adaptation data in Exp2 to “estimate” the visual uncertainty, they are in fact well aligned with each other (see Figure S7 and Supplementary Text 2). This is a strong support that our estimation is valid and accurate. We think this high visual uncertainty is an important message to the field. Thus we now highlighted its magnitude in our Discussion.
  
  (2d) One important confound between clear and uncertain (blurred) visual conditions is the number of cursors on the screen. The number of cursors may have an attenuating effect on implicit adaptation simply due to task-irrelevant attentional demands (Parvin et al. 2022), rather than that of visual uncertainty. Could the authors provide a figure showing these blurred stimuli (gaussian clouds) in the context of the experimental paradigm? Note that we addressed this confound in the past by comparing participants with and without low vision, where only one visual cursor is provided for both groups (Tsay, Tan, et al. 2023).
  
  Thank you for raising this important point about types of visual stimuli for manipulating uncertainty. We used Gaussian blur of a single cursor (similar to Burge et al., 2008) instead of a cloud of dots. We now added a figure inset to show how this blur looks.
  
  Using a cursor cloud Makino et al., 2023; Tsay et al., 2021 to modulate visual uncertainty has inherent drawbacks that make it unsuitable for visuomotor adaptation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v in our model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target, this “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019). This is a second confound that accompanies the cursor cloud.
  
  Issue #3: More methodological details are needed.
  
  (3a) It's unclear why, in Figure 4, PEA predicts an overshoot in terms of perceived hand position from the target. In PReMo, we specified a visual shift in the perceived target position, shifted towards the adapted hand position, which may result in overshooting of the perceived hand position with this target position. This visual shift phenomenon has been discovered in previous studies (e.g., (Simani, McGuire, and Sabes 2007)).
  
  Visual shift, as it is called in Simani et al., 2007, is irrelevant for our task here. The data we are modeling are motor adaptation (hand position changes) and so-called proprioceptive changes (hand localization changes), both are measured and referenced in the extrinsic coordinate, not referenced to a visual target. For instance, the proprioceptive changes are either relative to the actual hand location (Exp 3) or relative to the goal (Fig 4A). We also don’t think visual shift is necessary in explaining the perceptual judgment of an unseen hand (the target shown during the judgment indeed has an effect of reducing the biasing effect of PE, see below for responses to reviewer 3).
  
  In the PEA model, the reported hand angle is the result of integrating cues from the actual hand position and the estimated hand position (x_hand_hat) from previous movements. This integration process leads to the combined reported hand position potentially overshooting or undershooting, depending on the degree of adaptation. It is the changed proprioceptive cue (because the actively moved hand slowly adapted to the error clamp) leading to the overshoot of the perceived hand position.
  
  In Results, we now explain these value changes with parentheses. Model details about the mechanisms of cue combination and model predictions can be found in Supplementary Text 1. We believe these detailed explanations can make this apparent.
  
  (3b) The extent of implicit adaptation in Experiment 2, especially with smaller errors, is unclear. The implicit adaptation function seems to be still increasing, at least by visual inspection. Can the authors comment on this trend, and relatedly, show individual data points that help the reader appreciate the variability inherent to these data?
  
  Indeed, the adaptation for small errors appears not completely saturated with our designated number of trials. However, this will not affect our model analysis. Our model fitting for PEA and other competing models is done on the time-series of adaptation, not on the saturated adaptation extent (see Fig 3A). Thus, despite that some conditions might not produce the full range of adaptation, the data is sufficient to constrain the models. We now mention this concern in Results; we also emphasize that the model not only explains the adaptation magnitude (operationally defined as adaptation extent measured at the same time, i.e., the end of the adaptation phase) but also the full learning process.
  
  In response, we have included individual data points in the revised Figure 3B-D to provide a clear illustration of the extent of implicit adaptation, particularly for small perturbations.
  
  (3c) The same participants were asked to return for multiple days/experiments. Given that the authors acknowledge potential session effects, with attenuation upon re-exposure to the same rotation (Avraham et al. 2021), how does re-exposure affect the current results? Could the authors provide clarity, perhaps a table, to show shared participants between experiments and provide evidence showing how session order may not be impacting results?
  
  Thank you for raising the issue of session and re-exposure effects. First, we don’t think Exp1 has an effect on Exp4. Exp1 is a perceptual task and Exp4 is a motor adaptation task. Furthermore, Exp1 used random visual stimuli on both sides, thus it did not lead to any adaptation effect on its own. Second, Exp4 indeed had three sessions performed on three days, but the session effect does not change our main conclusion about the visual uncertainty. We used a 3-way repeated-measures anova (3 day x 3 perturbation x 2 visual uncertainty) revealed a significant main effect of day (F(2,36) = 17.693, p<0.001), indicating changes in performance across sessions (see Figure below). Importantly, the effects of perturbation and visual uncertainty (including their interactions) remain the same. The day factor did not interact with them. The main effect of day shows that the overall adaptation effect is reduced across days. Post-hoc pairwise comparisons elucidated that single-trial learning (STL) performance on Day 1 was significantly higher than on Day 2 (p = 0.004) and Day 3 (p < 0.001), with no significant difference between Day 2 and Day 3 (p = 0.106). Other ANOVA details: significant main effects for perturbation (F(1,36) = 8.872, p<0.001) and visual uncertainty (F(1,18) = 49.164, p<0.001), as well as a significant interaction between perturbation size and visual uncertainty (F(2,36) = 5.160, p = 0.013). There were no significant interactions involving the day factor with any other factors (all p > 0.182). Thus, the overall adaptation decreases over the days, but the day does not affect our concerned interaction effect of visual uncertainty and perturbation. The fact that their interaction preserved over different sessions strengthened our conclusion about how visual uncertainty systematically affects implicit adaptation.
  
  Author response image 2.
  
  (3d) The number of trials per experiment should be detailed more clearly in the Methods section (e.g., Exp 4). Moreover, could the authors please provide relevant code on how they implemented their computational models? This would aid in future implementation of these models in future work. I, for one, am enthusiastic to build on PEA.
  
  We have clarified the number of trials conducted in each experiment, with detailed information now readily available in the Methods section of the main text. In addition, we have made the code for data analysis and modeling publicly accessible. These resources can be found in the updated "Data Availability" section of our paper.
  
  (3f) In addition to predicting a correlation between proprioceptive shift and implicit adaptation on a group level, both PReMo and PEA (but not causal inference) predict a correlation between individual differences in proprioceptive shift and proprioceptive uncertainty with the extent of implicit adaptation (Tsay, Kim, et al. 2021). Interestingly, shift and uncertainty are independent (see Figures 4F and 6C in Tsay et al, 2021). Does PEA also predict independence between shift and uncertainty? It seems like PEA does predict a correlation.
  
  Thank you for addressing this insightful question. Our PEA model indeed predicts a positive correlation (although not linear) between the proprioceptive uncertainty and the amplitude of the estimated hand position (x_hand_hat). This prediction is consistent with the simulations conducted, using the same parameters that were applied to generate the results depicted in
  
  Figure 4B of our manuscript (there is a sign flip as x_hand_hat is negative).
  
  Author response image 3.
  
  Regarding the absence of a correlation observed in Tsay et al., 2021, we offer several potential explanations for this discrepancy. First, the variability observed in passive hand localization during motor adaptation (as in Tsay et al., 2021) does not directly equal proprioceptive uncertainty, which typically requires psychophysical testing to accurately assess. Second, our study showed that the proprioceptive bias attenuates during the repetitive measurements; in our Exp3, it decreased within a block of three trials. We noticed that Tsay et al., 2021 study used 36 measurements in a row without interleaving adaptation trials. Thus, the “averaged” proprioceptive bias in Tsay’s study might not reflect the actual bias during adaptation. We also noticed that that study showed large individual differences in both proprioceptive bias and proprioceptive variability (not uncertainty), thus getting a positive result, if it were really there, would require a large number of participants, probably larger than their n=30ish sample size. These putative explanations are not put in the revision, which already has a long discussion and has no space for discussing about a null result.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration
  
  Strengths:
  
  In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influenced by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.
  
  Weaknesses:
  
  The paper provides a comprehensive account of visuomotor rotation paradigms, addressing incongruent behavioral results with a solid Bayesian integration model. However, its focus is narrowly confined to visuomotor rotation, leaving its applicability to broader motor learning paradigms, such as force field adaptation, saccadic adaptation, and de novo learning paradigms, uncertain. The paper's impact on the broader fields of neuroscience and cognitive science may be limited due to this specificity. While the paper excellently demonstrates that specific behavioral results in visuomotor rotation can be explained by Bayesian integration, a general computational principle, its contributions to other motor learning paradigms remain to be explored. The paper would benefit from a discussion on the model's generality and its limitations, particularly in relation to the undercompensating effects in other motor learning paradigms.
  
  Thank you for your thoughtful review and recognition of the contributions our work makes towards understanding implicit motor adaptation through the Perceptual Error Adaptation (PEA) model. We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.
  
  Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.
  
  We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.
  
  We added more discussion on the possible broad implications of our model in the revision.
  
  Reviewer #3 (Public Review):
  
  Summary
  
  In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim, et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.
  
  In the introduction, the authors notice that implicit adaptation (as measured in error-clamp-based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease the importance assigned to visual feedback which could explain lower asymptotes.
  
  The authors characterize visual uncertainty for 3 rotation sizes in the first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors is tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate, and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately, compare their data with an unsuitable other data set. Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation) than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular, the model fits experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.
  
  Strengths
  
  In this study, the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods, and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in the first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are 1) learning and 2) retention rate, as used in popular state space models, and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.
  
  Weaknesses
  
  Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.
  
  The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between the movement endpoint and the disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.
  
  The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever-present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20{degree sign} which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2{degree sign}) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.
  
  Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real-life situations.
  
  Specific comments:
  
  A small part of the manuscript relies on replicating or modeling the proprioceptive recalibration in a study we think does NOT measure proprioceptive recalibration (Tsay, Parvin & Ivry, JNP, 2020). In this study, participants reached for a visual target with a clamped cursor, and at the end of the reach were asked to indicate where they thought their hand was. The responses fell very close to the visual target both before and after the perturbation was introduced. This means that the difference between the actual hand position, and the reported/felt hand position gets very large as soon as the perturbation is introduced. That is, proprioceptive recalibration would necessarily have roughly the same magnitude as the adaptation displayed by participants. That would be several times larger than those found in studies where proprioceptive recalibration is measured without a visual anchor. The data is plotted in a way that makes it seem like the proprioceptive recalibration is very small, as they plot the responses relative to the visual target, and not the discrepancy between the actual and reported hand position. It seems to us that this study mostly measures short-term visual memory (of the target location). What is astounding about this study is that the responses change over time to begin with, even if only by a tiny amount. Perhaps this indicates some malleability of the visual system, but it is hard to say for sure.
  
  Regardless, the results of that study do not form a solid basis for the current work and they should be removed. We would recommend making use of the dataset from the same authors, who improved their methods for measuring proprioception shifts just a year later (Tsay, Kim, Parvin, Stover, and Ivry, JNP, 2021). Although here the proprioceptive shifts during error-clamp adaptation (Exp 2) were tiny, and not quite significant (p<0.08), the reports are relative to the actual location of the passively placed unseen hand, measured in trials separate from those with reach adaptation and therefore there is no visual target to anchor their estimates to.
  
  Experiment 1 measures visual uncertainty with increased rotation size. The authors cite relevant work on this topic (Levi & Klein etc) which has found a linear increase in uncertainty of the position of more and more eccentrically displayed stimuli.
  
  First, this is a question where the reported stimuli and effects could greatly benefit from comparisons with the literature in vision science, and the results might even inform it. In order for that to happen, the units for the reported stimuli and effects should (also) be degrees of visual angle (dva).
  
  As far as we know, all previous work has investigated static stimuli, where with moving stimuli, position information from several parts of the visual field are likely integrated over time in a final estimate of position at the end of the trajectory (a Kalman filter type process perhaps). As far as we know, there are no studies in vision science on the uncertainty of the endpoint of moving stimuli. So we think that the experiment is necessary for this study, but there are some areas where it could be improved.
  
  Then, the linear fit is done in the space of the rotation size, but not in the space of eccentricity relative to fixation, and these do not necessarily map onto each other linearly. If we assume that the eye-tracker and the screen were at the closest distance the manufacturer reports it to work accurately at (45 cm), we would get the largest distances the endpoints are away from fixation in dva. Based on that assumed distance between the participant and monitor, we converted the rotation angles to distances between fixation and the cursor endpoint in degrees visual angle: 0.88, 3.5, and 13.25 dva (ignoring screen curvature, or the absence of it). The ratio between the perturbation angle and retinal distance to the endpoint is roughly 0.221, 0.221, and 0.207 if the minimum distance is indeed used - which is probably fine in this case. But still, it would be better to do fit in the relevant perceptual coordinate system.
  
  The first distance (4 deg rotation; 0.88 dva offset between fixation and stimulus) is so close to fixation (even at the assumed shortest distance between eye and screen) that it can be considered foveal and falls within the range of noise of eye-trackers + that of the eye for fixating. There should be no uncertainty on or that close to the fovea. The variability in the data is likely just measurement noise. This also means that a linear fit will almost always go through this point, somewhat skewing the results toward linearity. The advantage is that the estimate of the intercept (measurement noise) is going to be very good. Unfortunately, there are only 2 other points measured, which (if used without the closest point) will always support a linear fit. Therefore, the experiment does not seem suitable to test linearity, only to characterize it, which might be sufficient for the current purposes. We'd understand if the effort to do a test of linearity using many more rotations requires too much effort. But then it should be made much clearer that the experiment assumes linearity and only serves to characterize the assumed linearity.
  
  Final comment after the consultation session:
  
  There were a lot of discussions about the actual interpretation of the behavioral data from this paper with regards to past papers (Tsay et al. 2020 or 2021), and how it matches the different variables of the model. The data from Tsay 2020 combined both proprioceptive information (Xp) and prediction about hand position (Xu) because it involves active movements. On the other hand, Tsay et al. 2021 is based on passive movements and could provide a better measure of Xp alone. We would encourage you to clarify how each of the variables used in the model is mapped onto the outcomes of the cited behavioral experiments.
  
  The reviewers discussed this point extensively during the consultation process. The results reported in the Tsay 2020 study reflect both proprioception and prediction. However, having a visual target contributes more than just prediction, it is likely an anchor in the workspace that draws the response to it. Such that the report is dominated by short-term visual memory of the target (which is not part of the model). However, in the current Exp 3, as in most other work investigating proprioception, this is calculated relative to the actual direction.
  
  The solution is fairly simple. In Experiment 3 in the current study, Xp is measured relative to the hand without any visual anchors drawing responses, and this is also consistent with the reference used in the Tsay et al 2021 study and from many studies in the lab of D. Henriques (none of which also have any visual reach target when measuring proprioceptive estimates). So we suggest using a different data set that also measures Xp without any other influences, such as the data from Tsay et al 2021 instead.
  
  These issues with the data are not superficial and can not be solved within the model. Data with correctly measured biases (relative to the hand) that are not dominated by irrelevant visual attractors would actually be informative about the validity of the PEA model. Dr. Tsay has so much other that we recommend using a more to-the-point data set that could actually validate the PEA model.
  
  As the comments are repetitive at some places, we summarize them into three questions and address it one by one below:
  
  (1) Methodological Concerns about visual uncertainty estimation in Experiment 1: a) the visual uncertainty is measured in movement angles (degrees), while the unit in vision science is in visual angles (vda). This mismatch of unit hinders direct comparison between the found visual uncertainty and those reported in the literature, and b) a 1-second delay between movement endpoint and the reference marker presentation causes an overestimate of visual uncertainty due to potential degradation of visual memory. c) The linear function of visual uncertainty is a result of having only three perturbation sizes.
  
  a) As noted by the reviewer, our visual uncertainty is about cursor motion direction in the display plane, which has never been measured before. We do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work Klein & Levi, 1987; Levi et al., 1987 in vision science since their studies showed that the deviation from the fixation is associated with the increase in visual uncertainty. Their study thus inspired our Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. We believe that any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles.
  
  b) The 1s delay of the reference cursor appears to have minimum impact on the estimate of visual uncertainty, based on previous vision studies. Our Exp1 used a similar visual paradigm by White et al., 1992, which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.
  
  c) We agree that if more angles are tested we can be more confident about the linearity of visual uncertainty. However, the linear function is a good approximation of visual uncertainty (as shown in Figure 2C). More importantly, our model performance does not hinge on a strict linear function. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper, as correctly pointed out by the reviewer. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Lastly, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this analysis in the revision. See details in Supplementary text 2 and Figure S7.
  
  (2) Experiment 3's: the reviewer argues that the Tsay et al., 2020 data does not accurately measure proprioceptive recalibration, thus it is not suitable for showing our model’s capacity in explaining proprioceptive changes during adaptation.
  
  Response: We agree that the data from Tsay et al., 2020 is not from passive localization, which is regarded as the widely-accepted method to measure proprioceptive recalibration, a recalibration effect in the sensory domain. The active localization, as used in Tsay et al., 2020, is hypothesized as closely related to people’s forward prediction (where people want to go as the reviewer put it in the comments). However, we want to emphasize that we never equated Tsay’s findings as proprioceptive recalibration: throughout the paper we call them “reported hand location”. We reserved “proprioceptive recalibration” to our own Exp3, which used a passive localization method. Thus, we are not guilty of using this term. Secondly, as far as we know, localization bias or changes, no matter measured by passive or active methods, have not been formally modeled quantitatively. We believe our model can explain both, at least in the error-clamp adaptation setting here. Exp3 is for passive localization, the proprioceptive bias is caused by the biasing effect from the just-perceived hand location (X_hand_hat) from the adaptation trial. Tsay et al. 2020 data is for active localization, whose bias shows a characteristic change from negative to positive. This can be explained by just-perceived hand location (X_hand_hat again) and a gradually-adapting hand (X_p). We think this is a significant advance in the realm of proprioceptive changes in adaptation. Of course, our idea can be further tested in other task conditions, e.g., conventional visuomotor rotation or even gain adaptation, which should be left for future studies.
  
  For technical concerns, Tsay et al., 2020 data set is not ideal: when reporting hand location, the participants view the reporting wheel as well as the original target. As correctly pointed out by the reviewer, the presence of the target might provide an anchoring cue for perceptual judgment, which acts as an attractor for localization. If it were the case, our cue combination would predict that this extra attractor effect would lead to a smaller proprioceptive effect than that is currently reported in their paper. The initial negative bias will be closer to the target (zero), and the later positive bias will be closer to the target too. However, the main trend will remain, i.e. the reported hand location would still show the characteristic negative-to-positive change. The attractor effect of the target can be readily modeled by giving less weight to the just-perceived hand location (X_hand_hat). Thus, we would like to keep Tsay et al., 2020 data in our paper but add some explanations of the limitations of this dataset as well as how the model would fare with these limitations.
  
  That being said, our model can explain away both passive and active localization during implicit adaptation elicited by error clamp. The dataset from Tsay et al., 2021 paper is not a good substitute for their 2020 paper in terms of modeling, since that study interleaved some blocks of passive localization trials with adaptation trials. This kind of block design would lead to forgetting of both adaptation (Xp in our model) and the perceived hand (X_hand_hat in our model), the latter is still not considered in our model yet. As our Exp3, which also used passive localization, shows, the influence of the perceived hand on proprioceptive bias is short-lived, up to three trials without adaptation trials. Of course, it would be of great interest to design future studies to study how the proprioceptive bias changes over time, and how its temporal changes relate to the perceptual error. Our model provides a testbed to move forward in this direction.
  
  (3) The reviewer raises concerns about the study's assumption that participants ignore error feedback, questioning the model's applicability to broader contexts and real-world scenarios where ignoring errors might not be viable or common.
  
  Reviewer 2 raised the same question above. We moved our responses here. “We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.
  
  Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.
  
  We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.”
  
  We also add one more important implication of our model: as stated above, our model also explains that the proprioceptive changes, revealed by active or passive localization methods, are brought by (mis)perceived hand localization via Bayesian cue combination. This new insight, though only tested here using the error-clamp paradigm, can be further utilized in other domains, e.g., conventional visuomotor rotation or force field adaptation. We hope this serves as an initial endeavor in developing some computational models for proprioception studies. Please see the extended discussion on this matter in the revision.
  
  Recommendations for the authors:
  
  Revisions:
  
  All three reviewers were positive about the work and have provided a set of concrete and well-aligned suggestions, which the authors should address in a revised version of the article. These are listed below.
  
  A few points of particular note:
  
  (1) There are a lot of discussions about the actual interpretation of behavioral data from this paper or past papers (Tsay et al. 2020 or 2021) and how it matches the different variables of the model.
  
  (2) There are some discussions on the results of the first experiment, both in terms of how it is reported (providing degrees of visual angle) and how it is different than previous results (importance of the point of fixation). We suggest also discussing a few papers on eye movements during motor adaptation from the last years (work of Anouk de Brouwer and Opher Donchin). Could the authors also discuss why they found opposite results to that of previous visual uncertainty studies (i.e., visual uncertainty attenuates learning with large, but not small, visual errors); rather than the other way around as in Burge et al and Tsay et al 2021 and Makino Nozaki 2023 (where visual uncertainty attenuates small, but not large, visual errors).
  
  (3) It is recommended by several reviewers to discuss the applicability of the model to other areas/perturbations.
  
  (4) Several reviewers and I believe that the impact of the paper would be much higher if the code to reproduce all the simulations of the model is made available to the readers. In addition, while I am very positive about the fact that the authors shared the data of their experiments, metadata seems to be missing while they are highly important because these data are otherwise useless.
  
  Thank you for the concise summary of the reviewers’ comments. We have addressed their concerns point by point.
  
  Reviewer #2 (Recommendations For The Authors):
  
  L142: The linear increase in visual uncertainty should be substantiated by previous research in vision science. Please cite relevant papers and discuss why the linear model is considered reasonable.
  
  We cited relevant studies in vision science. Their focus is more about eccentricity inflate visual uncertainty, similar to our findings that deviations from the fixation direction inflate visual uncertainty about motion direction.
  
  We also want to add that our model performance does not hinge on a strict linear function of visual uncertainty. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Furthermore, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this new analysis in the revision. See details in Supplementary text 2 and Figure S7.
  
  L300: I found it challenging to understand the basis for this conclusion. Additional explanatory support is required.
  
  We unpacked this concluding sentence as follows:
  
  “The observed proprioceptive bias is formally modeled as a result of the biasing effect of the perceived hand estimate x_hand_hat. In our mini-block of passive localization, the participants neither actively moved nor received any cursor perturbations for three trials in a row. Thus, the fact that the measured proprioceptive bias is reduced to nearly zero at the third trial suggests that the effect of perceived hand estimate x_hand_hat decays rather rapidly.”
  
  L331: For the general reader, a visual representation of what the blurring mask looks like would be beneficial.
  
  Thanks for the nice suggestion. We added pictures of a clear and a blurred cursor in Figure 5D.
  
  L390: This speculation is intriguing. It would be helpful if the authors explained why they consider causal inference to operate at an explicit process level, as the reasoning is not clear here, although the idea seems plausible.
  
  Indeed, our tentative conclusion here is only based on the model comparison results here. It is still possible that causal inference also work for implicit adaptation besides explicit adaptation. We make a more modest conclusion in the revision:
  
  “The casual inference model is also based on Bayesian principle, then why does it fail to account for the implicit adaptation? We postulate that the failure of the causal inference model is due to its neglect of visual uncertainty as a function of perturbation size, as we revealed in Experiment 1. In fact, previous studies that advocating the Bayesian principle in motor adaptation have largely focused on experimentally manipulating sensory cue uncertainty to observe its effects on adaptation (Burge et al., 2008; He et al., 2016; Körding & Wolpert, 2004; Wei & Körding, 2010), similar to our Experiment 4. Our findings suggest that causal inference of perturbation alone, without incorporating visual uncertainty, cannot fully account for the diverse findings in implicit adaptation. The increase in visual uncertainty by perturbation size is substantial: our Experiment 1 yielded an approximate seven-fold increase from a 4° perturbation to a 64° perturbation. We have attributed this to the fact that people fixate in the desired movement direction during movements. Interestingly, even for conventional visuomotor rotation paradigm where people are required to “control” the perturbed cursor, their fixation is also on the desired direction, not on the cursor itself (de Brouwer, Albaghdadi, et al., 2018; de Brouwer, Gallivan, et al., 2018). Thus, we postulate that a similar hike in visual uncertainty in other “free-viewing” perturbation paradigms. Future studies are warranted to extend our PEA model to account for implicit adaptation in other perturbation paradigms.”
  
  L789: The method of estimating Sigma_hand in the brain was unclear. Since Bayesian computation relies on the magnitude of noise, the cognitive system must have estimates of this noise. While vision and proprioception noise might be directly inferred from signals, the noise of the hand could be deduced from the integration of these observations or an internal model estimate. This process of estimating noise magnitude is theorized in recursive Bayesian integration models (or Kalman filtering), where the size estimate of the state noise (sigma_hand) is updated concurrently with the state estimate (x_hand hat). The equation in L789 and the subsequent explanation appear to assume a static model of noise estimation. However, in practice, the noise parameters, including Sigma_hand, are likely dynamic and updated with each new observation. A more detailed explanation of how Sigma_hand is estimated and its role in the cognitive process.
  
  This is a great comment. In fact, if a Kalman filter is used, the learning rate and the state noise all should be dynamically updated on each trial, under the influence of the observed (x_v). In fact, most adaptation models assume a constant learning rate, including our model here. But a dynamic learning rate (B in our model) is something worth trying. However, in our error-clamp setting, x_v is a constant, thus this observation variable cannot dynamically update the Kalman filter; that’s why we opt to use a “static” Bayesian model to explain our datasets. Thus, Sigma_hand can be estimated by using Bayesian principles as a function of three cues available, i.e., the proprioceptive cue, the visual cue, and the motor prediction cue. We added a
  
  detailed derivation of sigma_hand in the revision in Supplementary text 1.
  
  Reviewer #3 (Recommendations For The Authors):
  
  We observed values in Fig 2C for the 64-degree perturbation that seem to be outliers, i.e., greater than 50 degrees. It is unclear how a psychometric curve could have a "slope" or JNP of over 60, especially considering that the tested range was only 60. Since the data plotted in panel C is a collapse of the signed data in panel B, it is perplexing how such large data points were derived, particularly when the signed uncertainty values do not appear to exceed 30.
  
  Related to the previous point, we would also recommend connecting individual data points: if the uncertainty increases (linearly or otherwise), then people with low uncertainty at the middle distance should also have low uncertainty at the high distance, and people with high uncertainty at one point, should also have that at other distances. Or perhaps the best way to go about this is to use the uncertainty at the two smaller perturbations to predict uncertainty at the largest perturbation for each participant individually?
  
  Thank you for your suggestion to examine the consistency of individual levels of visual uncertainty across perturbation sizes. First, a sigma_v of 60 degrees is well possible, naturally falling out of the experimental data. It shows some individuals indeed have large visual uncertainty. Given these potential outliers (which should not be readily removed as we don’t have any reason to do so), we estimated the linear function of sigma_v with a robust method, i.e., the GLM with a gamma distribution, which favors right-skewed distribution that can well capture positive outliers. Furthermore, we added in our revision a verification test of our estimates of sigma_v: we used Exp2’s adaptation data to estimate sigma_v without assuming its linear dependency. As shown, the model-fitted sigma_v closely matched the estimated ones from Exp1 (see Supplementary text 2 and Figure S7).
  
  We re-plotted the sigma_v with connected data points provided, and the data clearly indicate that individuals exhibit consistent levels of visual uncertainty across different perturbation sizes, i.e. those with relatively lower uncertainty at middle distances (in fact, angles) tend to exhibit relatively lower uncertainty at higher distances too, and similarly, those with higher uncertainty at one distance maintain that level of uncertainty at other distances. This is confirmed by spearman correlation analysis to assess the consistency of uncertainties across different degrees of perturbation among individuals. Again, we observed significant correlations between perturbation angles, indicating good individual consistency (4 and 16 degrees, rho = 0.759, p<0.001; 16 and 64 degrees, rho = 0.527, p = 0.026).
  
  Author response image 4.
  
  The illustration in Fig 2A does not seem to show a stimulus that is actually used in the experiment (looks like about -30{degree sign} perturbation). It would be good to show all possible endpoints with all other visual elements to scale - including the start-points of the PEST procedure.
  
  Thanks for the suggestion. We updated Fig 2A to show a stimulus of +16 degree, as well as added an additional panel to show all the possible endpoints.
  
  Finally (related to the previous point), in lines 589-591 it says the target is a blue cross. Then in lines 614-616, it says participants are to fixate the blue cross or the start position. The start position was supposed to have disappeared, so perhaps the blue plus moved to the start position (which could be the case, when looking at the bottom panel in Fig 2A, although in the illustration the plus did not move fully to the start position, just toward it to some degree). Perhaps the descriptions need to be clarified, or it should be explained why people had to make an eye movement before giving their judgments. And if people could have made either 1) no eye movement, but stayed at fixation, 2) moved to the blue plus as shown in the last panel in Fig 2A, or 3) fixated on the home position, we'd be curious to know if this affected participants' judgments.
  
  Thanks for pointing that out. The blue cross serves as the target in the movement task, then disappears with the cursor after 800ms of frozen time. The blue cross then appeared in the discrimination task at the center of the screen, i.e. the start location. Subjects were asked to fixate at the blue cross during the visual discrimination task. Note this return the fixation to the home position is exactly what we will see in typical error-clamp adaptation: once the movement is over, people guided their hand back to the home position. We performed a pilot study to record the typical fixation pattern during error-clamp adaptation, and Exp1 was intentionally designed to mimic its fixation sequence. We have now updated the description of Figure 2A, emphasizing the stimulus sequence. .
  
  In Figure 4A, the label "bias" is confusing as that is used for recalibrated proprioceptive sense of hand position as well as other kinds of biases elsewhere in the paper. What seems to be meant is the integrated hand position (x-hat_hand?) where all three signals are apparently combined. The label should be changed and/or it should be clarified in the caption.
  
  Thanks for pointing that out, it should be x_hand_hat, and we have corrected this in the revised version of Figure 4.
  
  In the introduction, it is claimed that larger perturbations have not been tested with "implicit adaptation" paradigms, but in the same sentence, a paper is cited (Moorehead et al., 2017) that tests a rotation on the same order of magnitude as the largest one tested here (95{degree sign}), as well as much larger rotations (135{degree sign} and 175{degree sign}). With error-clamps. Interestingly, there is no adaptation in those conditions, which seems more in line with the sensory cue integration model. Can the PEA model explain these results as well? If so, this should be included in the paper, and if not, it should be discussed as a limitation.
  
  First, we double checked our manuscript and found that we never claimed that larger perturbations had not been tested.
  
  We agree that it is always good to have as many conditions as possible. However, the 135 and 175 degree conditions would lead to minimum adaptation, which would not help much in terms of model testing. We postulated that this lack of adaptation is simply due to the fact that people cannot see the moving cursor, or some other unknown reasons. Our simple model is not designed to cover those kinds of extreme cases.
  
  Specify the size of the arc used for the proprioceptive tests in Exp 3 and describe the starting location of the indicator (controlled by the left hand). Ideally, the starting location should have varied across trials to avoid systematic bias.
  
  Thank you for the comments. The size of the arc used during these tests, as detailed in the methods section of our paper, features a ring with a 10 cm radius centered at the start position. This setup is visually represented as a red arc in Figure 7B.
  
  After completing each proprioceptive test trial, participants were instructed to position the indicator at approximately -180° on the arc and then relax their left arm. Although the starting location for the subsequent trial remained at-180°, it was not identical for every trial, thereby introducing slight variability.
  
  Please confirm that the proprioceptive biases plotted in Fig 4E are relative to the baseline.
  
  Thank you for bringing this to our attention. Yes, the proprioceptive biases illustrated in Figure 4E are indeed calculated relative to the baseline measurements. We have added this in the method part.
  
  Data availability: the data are available online, but there are some ways this can be improved. First, it would be better to use an open data format, instead of the closed, proprietary format currently used. Second, there is no explanation for what's in the data, other than the labels. (What are the units? What preprocessing was done?) Third, no code is made available, which would be useful for a computational model. Although rewriting the analyses in a non-proprietary language (to increase accessibility) is not a reasonable request at this point in the project, I'd encourage it for future projects. But perhaps Python, R, or Julia code that implements the model could be made available as a notebook of sorts so that other labs could look at (build on) the model starting with correct code - increasing the potential impact of this work.
  
  Great suggestions. We are also fully supportive of open data and open science. We now:
  
  (1) Updated our data and code repository to include the experimental data in an open data format (.csv) for broader accessibility.
  
  (2) The data are now accompanied by detailed descriptions to clarify their contents.
  
  (3) We have made the original MATLAB (.m) codes for data analysis, model fitting and simulation available online.
  
  (4) We also provide the codes in Jupyter Notebook (.ipynb) formats.
  
  These updates can be found in the revised “Data Availability” section of our manuscript.
  
  References
  
  Bromberg, Z., Donchin, O., & Haar, S. (2019). Eye Movements during Visuomotor Adaptation Represent Only Part of the Explicit Learning. eNeuro, 6(6). https://doi.org/10.1523/ENEURO.0308-19.2019
  
  Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 1–19.
  
  de Brouwer, A. J., Gallivan, J. P., & Flanagan, J. R. (2018). Visuomotor feedback gains are modulated by gaze position. Journal of Neurophysiology, 120(5), 2522–2531.
  
  Egly, R., & Homa, D. (1984). Sensitization of the visual field. Journal of Experimental Psychology. Human Perception and Performance, 10(6), 778–793.
  
  Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. eLife, 8. https://doi.org/10.7554/eLife.39882
  
  Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.
  
  Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.
  
  Makino, Y., Hayashi, T., & Nozaki, D. (2023). Divisively normalized neuronal processing of uncertain visual feedback for visuomotor learning. Communications Biology, 6(1), 1286.
  
  Owsley, C., Ball, K., & Keeton, D. M. (1995). Relationship between visual sensitivity and target localization in older adults. Vision Research, 35(4), 579–587.
  
  Simani, M. C., McGuire, L. M. M., & Sabes, P. N. (2007). Visual-shift adaptation is composed of separable sensory and task-dependent effects. Journal of Neurophysiology, 98(5), 2827–2841.
  
  Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.
  
  Tsay, J. S., Chandy, A. M., Chua, R., Miall, R. C., Cole, J., Farnè, A., Ivry, R. B., & Sarlegna, F. R. (2024). Minimal impact of proprioceptive loss on implicit sensorimotor adaptation and perceived movement outcome. bioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2023.01.19.524726
  
  Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. eLife, 11, e76639.
  
  Wei, K., Stevenson, I. H., & Körding, K. P. (2010). The uncertainty associated with visual flow fields and their influence on postural sway: Weber’s law suffices to explain the nonlinearity of vection. Journal of Vision, 10(14), 4.
  
  White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.23.568442v2
www.biorxiv.org www.biorxiv.org

Paradoxical dominant negative activity of an immunodeficiency-associated activating PIK3R1 variant

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  The authors identify new mechanisms that link a PIK3R1 mutant to cellular signaling and division in Activated PI3 Kinase Delta Syndrome 1 and 2 (APDS1/2). The conclusion that this mutant serves as a dominant negative form of the protein, impacting PI3K complex assembly and IRS/AKT signaling, is important, and the evidence from constitutive and inducible systems in cultured cells is convincing. Nevertheless, there are several limitations relating to differences between cell lines and expression systems, as well as more global characterization of the protein interaction landscape, which would further enhance the work.
  
  We are pleased by this fair assessment, while noting that this work relates to APDS2 (PIK3R1-related) rather than APDS1 (PIK3CD-related). Our findings we believe are clear, but the observation that studies including more global proteomics/phosphoproteomics in cells expressing mutants at endogenous levels would add further insight is well made. We hope that this report may motivate such studies by laboratories with wider access to primary cells from patients and knock-in mice.
  
  Public Reviews
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This study provides convincing data showing that expression of the PIK3R1(delta Exon11) dominant negative mutation in Activated PI3K Delta Syndrome 1/2 (APDS1/2) patient-derived cells reduces AKT activation and p110δ protein levels. Using a 3T3-L1 model cell system, the authors show that overexpressed p85α delta Exon 11) displays reduced association with the p110α catalytic subunit but strongly interacts with Irs1/2. Overexpression of PIK3R1 dominant negative mutants inhibits AKT phosphorylation and reduces cellular differentiation of preadipocytes. The strength of this article is the clear results derived from Western blots analysis of cell signaling markers (e.g. pAKT1), and co-immunoprecipitation of PI3K holoenzyme complexes and associated regulatory factors (e.g. Irs1/2). The experimental design, interpretation, and quantification broadly support the authors' conclusions.
  
  Strengths:
  
  The authors analyze a variety of PIK3R1 mutants (i.e. delta Exon11, E489K, R649W, and Y657X), which reveals a range of phenotypes that support the proposed model for dominant negative activity. The use of clonal cell lines with doxycycline-induced expression of the PIK3R1 mutants (DExon 11, R649W, and Y657X) provides convincing experimental data concerning the relationship between p85α mutant expression and AKT phosphorylation in vivo. The authors convincingly show that p85α delta Exon11, R649W, or Y657X) is unable to associate with p110α but instead more strongly associates with Irs1/2 compared to wild type p85α. This helps explain why the authors were unable to purify the recombinant p110α/p85α delta Exon 11) heterodimeric complex from insect cells.
  
  Weaknesses:
  
  Future experimentation will be needed to reconcile the cell type specific differences (e.g. APDS2 patient-derived cells vs. the 3T3-L1 cell model system) in PIK3R1 mutant behavior reported by the authors.
  
  This is a fair comment. It has been established for many years that relative protein levels even of wild type PIK3CA and PIK3R1 gene products influence sensitivity of PI3K to growth factor stimulation. Such issues of stoichiometry become exponentially more complicated when the numerous potential interactions among the full repertoire of Class 1 PI3K regulatory subunits (3 splice variants of PIK3R1, and also PIK3R2 and PIK3R3) and corresponding catalytic subunits (PIK3CA, PIK3CB, PIK3CD) are considered, and when different activities and stabilities of PIK3R1 mutants are added to the mix. It thus seems obvious to us that different levels of expression of different mutants in different cellular contexts will have different signalling consequences. We establish a paradigm in this paper using an overexpression system, and we strongly agree that this merits further investigation in a wider variety of primary cells (or cells with knock in at the endogenous locus), where available.
  
  An unbiased proteomic study that broadly evaluates the cell signaling landscape could provide a more holistic understanding of the APDS2 and SHORT mutants compared to a candidate-based approach.
  
  We agree. This would be highly informative, but we think would best be carried out in both “metabolic” and “immune” cells with endogenous levels of expression of SHORT or APDS2 PIK3R1 mutants. These are not all currently available to us, and require follow up studies.
  
  Additional biochemical analysis of p110α/p85α delta Exon 11 complex is needed to explain why this mutant regulatory subunit does not strongly associate with the p110 catalytic subunit.
  
  We agree. We present this observation in our overexpression system, which is clear and reproducible, even though somewhat surprising. The failure to bind p110a is likely not absolute, as sufficient p110a-p85a<sup>DEx11</sup> was synthesised in vitro in a prior study to permit structural and biochemical studies, although a series of technical workarounds were required to generate enough heterodimeric PI3K to study in vitro given the manifest instability of the complex, particularly when concentrated (PMID 28167755). We already note in discussion that p85a can homodimerize and bind PTEN, likely among other partners, and it may be that the APDS2 deletion strongly favours binding to proteins that effectively compete with p110a. However this requires further study of the wider interactome of the mutant PIK3R1, which, as noted above, are beyond the scope of the current study.
  
  It remains unclear why p85α delta Exon 11 expression reduces p110δ protein levels in APDS2 patient-derived dermal fibroblasts.
  
  We caution that we only had the opportunity to study dermal fibroblasts cultured from a single APDS2 patient, as noted in the paper, and so replication of this finding in future will be of interest. Nevertheless the observation is robust and reproducible in these cells, and we agree that this apparently selective effect on p110d is not fully explained. Having said that, it has been observed previously that heterodimers of the DEx11 p85a variant with either p110a or p110d are unstable, and when the unstable complexes were eventually synthesised, p110a and p110d were demonstrated to show differences in engagement with the mutant p85, with greater disruption of inhibitory interactions observed for p110d (PMID 28167755). It is thus not a great stretch to imagine that as well as disinhibiting p110d more, the DEx11 p85a variant also destabilises the p85a-p110d complex more, potentially explaining its near disappearance in cells with low baseline p110d expression. Following on from the preceding question and response, however, is an alternative explanation, based on the 3T3-L1 overexpression studies in this paper, wherein we were unable to demonstrate binding of p110a by DEx11 p85a. If, in any given cellular context, the mutant p85 could bind p110d but not p110a, then the destabilising effect would be observed only for p110d. So in summary, we believe the selective effect on p110d is explained by differences in binding kinetics and heterodimer stability for different DEx11 p85a-containing complexes. The net effect of these differences may vary among cell types depending on relative levels of subunit expression.
  
  This study would benefit from a more comprehensive biochemical analysis of the described p110α/p85α, p110β/p85α, and p110δ/p85α mutant protein complexes. The current limitation of this study to the use of a single endpoint assay to measure PI3K lipid kinase activity in the presence of a single regulatory input (i.e. RTK-derived pY peptide). A broader biochemical analysis of the mutant PI3K complexes across the canonical signaling landscape will be important for establishing how competition between wild-type and mutant regulatory subunits is regulated in different cell signaling pathways.
  
  We agree that a wider analysis of upstream inputs and downstream network would be of interest, though as noted above the ultimate functional consequences of mutants will be an amalgam of any differential signalling effects of complexes that are stable enough to function, and differential effects of mutant p85a on the kinetics of distinct heterodimer assembly and stability. In this paper we seek to suggest a paradigm worthy of further, deeper assessment. We note that the search space here is large indeed (A. different cell types with differing profiles of PI3K subunit expression B. Multiple upstream stimuli and C. Multiple downstream outputs, with timecourse of responses an additional important factor to consider). These studies are realistically beyond the scope of the current work, but we hope that further studies, as suggested by the reviewer, follow.
  
  Reviewer #2 (Public Review)
  
  Summary:
  
  Patsy R. Tomlinson et al; investigated the impact of different p85alpha variants associated with SHORT syndrome or APDS2 on insulin-mediated signaling in dermal fibroblasts and preadipocytes. They find no evidence of hyperactive PI3K signalling monitored by pAKT in APDS2 patient-derived dermal fibroblast cells. In these cells p110alpha protein levels were comparable to levels in control cells, however, the p110delta protein levels were strongly reduced. Remarkably, the truncated APDS2-causal p85alpha variant was less abundant in these cells than p85alpha wildtype. Afterwards, they studied the impact of ectopically expressed p85alpha variants on insulin-mediated PI3K signaling in 3T3-L1 preadipocytes. Interestingly they found that the truncated APDS2-causal p85alpha variant impaired insulin-induced signaling. Using immunoprecipitation of p110alpha they did not find truncated APDS2-causal p85alpha variant in p110alpha precipitates. Furthermore, by immunoprecipitating IRS1 and IRS2, they observed that the truncated APDS2-causal p85alpha variant was very abundant in IRS1 and IRS2 precipitates, even in the absence of insulin stimulation. These important findings add in an interesting way possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.
  
  Strengths:
  
  Based on state-of-the-art functional investigation the authors propose indicating a loss-of-function activity of the APDS2-disease causing p85alpha variant in preadipocytes providing a possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.
  
  Weaknesses:
  
  Related to Figure 1: PIK3R1 expression not only by Western blotting but also by quantifying the RNA transcripts, e.g. mutant and wildtype transcripts, was not performed. RNA expression analysis would further strengthen the suggested impaired stabilization/binding.
  
  It is not completely clear to us how further PIK3R1 mRNA analysis would enhance the points we seek to make. Perhaps the reviewer’s point is that changes in protein expression could be explained by reduced transcription rather than having anything to do with altered protein turnover? As shown in Figure 1 supplemental figure 1, sequencing cDNA from each of the primary cell lines studied indicates that both mutant and WT alleles are expressed at or close to 50% of the total mRNA for PIK3CA or PIK3R1 as relevant. While this is not strictly quantitative, allied to prior evidence that these are dominant alleles which require to be expressed to exert their effect, with no evidence for altered mRNA expression of these variants in prior studies, we don’t believe any further quantification of mRNA expression would add value.
  
  Related to Figure 2
  
  As mentioned by the authors in the manuscript the expression of p110delta but also p110beta in 3T3-L1 preadipocytes ectopically expressing p85alpha variants has not been analyzed.
  
  We agree that such determination would have been a useful addition to the study, but regretfully it was not undertaken in these modified 3T3-L1 cells at the time of study. However independent bulk RNAseq studies of the founder 3T3-L1 cells from which the stably transduced cells were generated, undertaken as part of an unrelated study, revealed the following relative levels of endogenous expression of PI3K subunit mRNA:
  
  Author response table 1.
  
  We have not determined endogenous protein expression, and so have left the text of the discussion unchanged, simply noting that we have not formally assessed protein expression of p110d/p110b. However these transcriptomic findings suggest that p110d protein is likely either undetectable, or else present at extremely low levels compared to endogenous p110a. p110b also appears to be expressed at a much lower level than p110a. In our studies overexpressing mutant PIK3R1 and assessing insulin action, we believe we are largely or perhaps entirely assaying the effect of the mutants on p110a, in keeping with the fact that genetic and pharmacological studies have firmly established that it is p110a that is responsible for mediating the metabolic actions of insulin in adipose tissue and preadipocytes including 3T3-L1 (e.g. PMID 16647110). Indeed, to quote from this study, in 3T3-L1 “… inhibitors of p110b (TGX-115 and TGX-286) and p110d (IC87114 and PIK-23) had no effect on the insulin-stimulated phosphorylation of any protein in the PI3-K pathway.”
  
  We have added the following sentence to the discussion:
  
  “The current study has limitations. We have studied primary cells from only a single APDS2 patient, and in the 3T3-L1 cell model, we did not determine whether p110d protein could be detected. If not, this could explain the lack of detectable AKT phosphorylation with induction of Pik3r1 DEx11. Indeed, previous pharmacological studies in 3T3-L1 adipocytes has shown that selective inhibition of p110d or p110b does not alter insulin-induced phosphorylation of any protein studied in the PI3-K pathway, attesting to the dominance of p110a in insulin action in this cell model (Knight et al, 2006).”
  
  Furthermore, a direct comparison of the truncated APDS2-causal p85alpha variant with SHORT syndrome-causal p85alpha variants in regard to pAKT level, and p85alpha expression level has not been performed.
  
  These investigations would further strengthen the data.
  
  The cell lines conditionally expressing SHORT syndrome variants have been reported already, as cited (PMID: 27766312). Remarkably, the degree of inhibition of insulin-stimulated signalling is actually less pronounced for the SHORT syndrome variants than for the overexpressed APDS2 variant, as seen in the excerpt from the prior paper below. In this prior paper the maximum insulin concentration used, 100nM, was the concentration used in the current study. While overexpression of the APDS2 p85a variant ablated the response to insulin entirely, it is still seen in the prior study, albeit at a clearly reduced level.
  
  Related to Figure 3
  
  The E489K and Y657X p85alpha variants should be also tested in combination with p110delta in the PI3K activity in vitro assay. This would help to further decipher the overall impact, especially of the E489K variant.
  
  We agree that this would make our data more complete, but for logistical reasons (primarily available personnel) we were compelled to constrain the number of p85-p110 combinations we studied. We elected to prioritise the PIK3R1 R649W variant as by far the most common causal SHORT syndrome variant, and as the variant showing the “cleanest” functional perturbation, namely severely impaired or absent ability to dock to phosphotyrosines in cognate proteins. The paradox that we sought to explain in this paper, namely the phenotypic combination of gain-of-function APDS2 with loss-of-function SHORT syndrome features holds only for APDS2 PIK3R1 variants, and so while it is interesting to document that the canonical SHORT syndrome variant also inhibits PI3Kb and PI3Kd activation in vitro, this was not the main purpose of our study.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Points of clarification and suggestions for improving the manuscript:
  
  (1) Explain whether there are any PIK3R1-independent genetic alterations in the APDS2 and PROS-derived cell lines. For example, are there differences in the karyotype of mutant cell lines compared to wild-type cells?
  
  Karyotypic abnormalities are not an established feature of either PROS or APDS2, and the patients from whom cells were derived were documented to be of normal karyotype. Karyotypic abnormalities acquired during cell culture would not be unprecedented, but confirming normal karyotypes in primary cell lines where there is no specific reason to suppose any alteration exceeds normal expectations for primary cell studies, and so this has not been undertaken.
  
  (2) When introducing the APDS2-associated PIK3R1 mutation (lines 126-128), the authors describe both the exon 11 skipping and in-frame deletions. I recommend rewording this sentence to say exon 11 skipping results in an in-frame deletion of PIK3R1. The current wording makes it seem like APDS2-derived cells contain two genetic perturbations: (1) exon 11 skipping and (2) in-frame deletion. Include a diagram in Figure 1 to help explain the location of the mutations being studied in relationship to the PIK3R1 gene sequence and domains (i.e. nSH2, iSH2, cSH2). The description of the exon 11 skipping and in-frame deletions (lines 126-128) would benefit from having a complementary figure that diagrams the location of these mutations in the PIK3R1 gene.
  
  On review we agree that clarity of description could be enhanced. We have now edited these lines as follows:
  
  “We began by assessing dermal fibroblasts cultured from a previously described woman with APDS2 due to the common causal PIK3R1 mutation. This affects a splice donor site and causes skipping of exon 11, leading to an in-frame deletion of 42 amino acids (434-475 inclusive) in the inter-SH2 domain, which is shared by all PIK3R1 isoforms (Patient A.1 in (Lucas et al., 2014b))(Figure 1 figure supplement 1).”
  
  We have moreover introduced a further figure element including a schematic of all PIK3R1 mutations reported in the current study (new Figure 1 figure supplement 1)
  
  (3) For Figure 2, I recommend including a cartoon that illustrates the experimental design showing the induced expression of PIK3R1 mutants, R649W and Y657X, in the background of the wild-type endogenous gene expression.
  
  Such a figure element has now been generated and included as Figure 2 figure supplement 1, duly called out in the results section where appropriate.
  
  (4) For the data plotted in Figure 1B-1C, please clarify whether the experiments represent a single patient or all 3-4 patients shown in Figure 1A.
  
  Each datapoint shown represents one of the patients in the immunoblots, with all patients included. Each point in turn is the mean from 3 independent experiments. We have added the following to the Figure legend:
  
  “(B)-(E) quantification of immunoblot bands from 3 independent experiments shown for phosphoAKT-S473, phosphoAKT-T308, p110d and p110a respectively. Each point represents data from one of the patient cell lines in the immunoblots. Paired datapoints +/- insulin are shown in (B) and (C), and dotted lines mark means.”
  
  (5) I recommend rewording the following sentence: "Given this evidence that APDS2-associated PIK3R1 delta Exon 11 potently inhibits PI3Kα when overexpressed in 3T3-L1 preadipocytes," to say "... potently inhibits PI3Kα signaling when overexpressed in 3T3-L1 preadipocytes." The data shown in Figures 1 and 2 do not support a direct biochemical inhibition of PI3Kα lipid kinase activity by p85α (delta Exon 11).
  
  This edit has been made.
  
  (6) Provide more discussion concerning the percentage of humans with APDS2 or SHORT syndrome that contain the mutations discussed in this paper. How strong is the genotype-phenotype link for these diseases? Are these diseases inherited or acquired through environmental stresses?
  
  Both APDS2 and SHORT syndrome are very well established, highly penetrant and stereotyped monogenic disease. APDS is defined by the presence of activating PIK3R1 mutations such as the one studied here (by far the commonest causal mutation). SHORT syndrome clinically has some superficial resemblance to other human genetic syndrome including short stature, but when careful attention is paid to characteristic features it is nearly universally attributable to loss-of-function PIK3R1 mutations with the single exception of one case in which a putatively pathogenic PKCE mutation was described (PMID: 28934384). Although both syndromes are monogenic it is often not accurate to refer to them as inherited, as, particularly in SHORT syndrome, de novo mutations (i.e. not found in either parent) are common. Environmental modifiers of phenotypes have not been described. To the introduction has now been added the comment that both conditions are highly penetrant and monogenic.
  
  (7) The data presented in Figure 5 would benefit from additional discussion and citations that describe the molecular basis of the interaction between PI3K and Irs1/2. What studies have previously established this is a direct protein-protein interactions? Are there PI3K mutants that don't interact with Irs1/2 that can be included as a negative control? Alternatively, the authors can simply reference other papers to support the mechanism of interaction.
  
  There is a voluminous literature dating back to the early 1990s documenting the mode of interaction of PI3K with Irs1/2. Relevant papers have now been cited as requested:
  
  p85-Irs1 binding: PMID 1332046 (White lab, PNAS 1992)
  
  p85-Irs2 binding: PMID 7675087 (White lab, Nature 1995)
  
  “This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 (PMID 1332046) and Irs2 (PMID 7675087).”
  
  Regarding PI3K mutants that don't interact with Irs1/2, the SHORT syndrome mutant R649W which we include in this study is perhaps the best example of this, so it is both disease-causing and functions as such a negative control.
  
  (8) To see the effect of the dominant negative delta Exon 11, the truncated p85α needs to be super stoichiometric to the full-length p85α (Figure 2 - Supplemental Figure 2). This is distinct from the results in Figure 1 showing the ADPS2-derived dermal fibroblast express 5-10x lower levels of p85α delta Exon 11 compared to full-length p85α (Figure 1A), but still strongly inhibits pAKT S473 and T308 (Figure 1B-1C). The manuscript would benefit from more discussion concerning the cell type specific differences in phenotypes. Alternatively, do the APDS2-derived dermal fibroblasts have other genetic perturbations that are not accounted for that potentially modulate cell signaling differently compared to 3T3-L1 preadipocytes?
  
  The reviewer is astute to point out this apparent contrast. First of all, we have no reason to suppose there is any specific, PI3K-modifying genetic perturbation present in the primary dermal fibroblasts studied, although of course the genetic background of these cells is very distinct to that of 3T3-L1 mouse embryo fibroblasts. Related to such background differences, however, substantial variability is usually apparent in insulin-responsiveness even of healthy control dermal fibroblasts. This means that caution should be exercised in extrapolating from studies of the primary cells of a single individual. To illustrate this, we point the reviewer to our 2016 study in which we extensively studied the dermal fibroblasts of a proband with SHORT syndrome due to PIK3R1 Y657X:
  
  From this study we conclude that A. WT controls show quite substantial variation in insulin-stimulated AKT phosphorylation and B. even the SHORT syndrome p85a Y657X variant, expressed at higher levels that WT p85a in dermal fibroblasts, does not produce an obvious decrease in insulin-stimulated AKT phosphorylation, despite extensive evidence from other human cell studies and knock-in mice that it does indeed impaired insulin action in metabolic tissues. For both these reasons we are not convinced that the lower insulin-induced AKT phosphorylation we described in Figure 1 should be overinterpreted until reproduced in other studies with primary cells from further APDS2 patients. This is why we did not comment more extensively on this. We now add the following qualifier in results:
  
  “Despite this, no increase in basal or insulin-stimulated AKT phosphorylation was seen in APDS2 cells compared to cells from wild-type volunteers or from people with PROS and activating PIK3CA mutations H1047L or H1047R (Fig 1A-C, Fig 1 figure supplement 3A,B). Although insulin-induced AKT phosphorylation was lower in fibroblasts from the one APDS2 patient studied compared to controls, we have previously reported extensive variability in insulin-responsiveness of primary dermal fibroblasts from WT controls. Moreover even primary cells from a patient expressing high levels of the SHORT syndrome-associated p85a Y657X did not show attenuated insulin action, so we do not believe the reduced insulin action in APDS2 cells in the current study should be overinterpreted until reproduced in further APDS2 cells.”
  
  Nevertheless we remind the reviewer that the main purpose of our primary cell experiment was to determine if there were any INCREASE in basal PI3K activity, or any difference in p110a or p110d protein levels, and we regard our findings in these regards to be clear.
  
  The manuscript would benefit from additional explanation concerning why the E489K, R649W, and Y657X are equivalent substitutes for the characterization of p110α/p85α delta Exon 11). Perhaps a more explicit description of these mutations in relationship to the location of p85α delta Exon 11) mutation would help. I recommend including a diagram in Figure 3 showing the position of the delta Exon 11, E489K, R649W, and Y657X mutations in the PIK3R1 coding sequence. B. Also, please clarify whether all three holoenzyme complexes were biochemically unstable (i.e. p110α/p85α, p110β/p85α, p110δ/p85α) when p85α delta Exon 11) was expressed in insect cells.
  
  A. Whether or not E489K, R649W and Y657X are “equivalent” to the APDS2 mutant is not really a meaningful issue here. These mutants are being studied because they cause SHORT syndrome without immunodeficiency, while the APDS2 mutant causes APDS2 often with features of SHORT syndrome. That is, it is naturally occurring mutations and the associated genotype-phenotype correlation that we seek to understand. Of the 3 SHORT syndrome causal mutations chosen, R649W is by far the commonest, effectively preventing phosphotyrosine binding, Y657X has the interesting attribute that it can be discriminated from full length p85 on immunoblots due to its truncation, and is moreover a variant that we have studied in cells and mice before, while the rarer E489K is an interesting SHORT syndrome variant as it is positioned more proximally in the p85a protein than most SHORT syndrome causal variants. All variants studied are now illustrated in the new Figure 1 figure supplement 1. B. Regarding stability of PI3K heterodimers containing the APDS2 p85a variant, we tried extensively to purify p110a and p110d complexes without success despite several approaches to optimise production. We did not try to synthesise the p110b-containing complex.
  
  (10) I recommend presenting the results in Figure 4 before Figure 3 because it provides a good rationale for why it's difficult to purify the p110α/p85α delta Exon 11) holoenzyme from insect cells.
  
  This would be true of p110d were studied in Figure 4 but it is not. Figure 4 looks instead at effects on p110a of heterologous overexpression of mutant p85, is a natural lead in to the ensuing figures 5 and 6, and we do not agree it would add value or enhance flow to swap Figures 3 and 4.
  
  (11) The authors show that overexpression of the p85α delta Exon 11) did not result in p110α/p85α delta Exon 11) complex formation based on co-immunoprecipitation. Do the authors get the same result when they co-immunoprecipitation p110α/p85α and p110δ/p85α in the APDS2-derived dermal fibroblasts used in Figure 1A?
  
  This is an interesting question but not an experiment we have done. It is not unfeasible, but generating enough cells to undertake IP experiments of this nature in dermal fibroblasts is a significant undertaking, and with finite resources available and only one primary cell line to study we elected not to pursue this.
  
  Details in Methods section:
  
  (1) Include catalog numbers and vendors for reagents (e.g. lipids, PhosSTOP, G-Dynabeads, etc.). There is not enough information provided to reproduce this work.
  
  We have now added all vendors and catalogue numbers where relevant.
  
  (2) Concerning the stated lipid composition (5/10/15/45/20/5 %) in the liposome preparation protocol. Please clarify whether these numbers represent molar percentages or mg/mL percentages.
  
  We have now added that this is expressed as “(wt/vol)”
  
  (3) What is the amino acid sequence of the PDGFR (pY2) peptide used for the PI3K activity assay?
  
  This assay has been published and references with detailed methods are cited. For clarity, however we now say:
  
  “PI(3,4,5)P3 production was measured by modified PI3-Kinase activity fluorescence polarisation assay (Echelon Biosciences, Salt Lake City, UT, USA). 10μL reactions in 384-well black microtitre plates used 1mM liposomes containing 50μM PI(4,5)P2, optimised concentrations of purified PI3K proteins, 100μM ATP, 2mM MgCl2, with or without 1μM tyrosine bisphosphorylated 33-mer peptide derived from mouse PDGFRβ residues 735-767, including phosphotyrosine at positions 740 and 751 (“pY2”; 735-ESDGGYMDMSKDESIDYVPMLDMKGDIKYADIE-767; Cambridge peptides).”
  
  (4) Include a Supplemental file containing a comprehensive description of the plasmids and coding sequencing used in this study.
  
  Such a supplemental file has been created and is included as Table 2
  
  Minor points of clarification, citations, and typos:
  
  (1) Clarify why Activated PI3K Delta Syndrome 1 (APDS1) is thus named APDS2. See lines 71-72 of the introduction. Also see line 89: "...is common in APDS2, but not in APDS1." Briefly describe the difference between APDS1 and APDS2?
  
  This is described in the introduction, but we apologise if our wording was not sufficiently clear. We have tried now to remove any ambiguity:
  
  “Some PIK3R1 mutations reduce basal inhibition of catalytic subunits, usually due to disruption of the inhibitory inter-SH2 domain, and are found in cancers (Philp et al, 2001) and vascular malformations with overgrowth(Cottrell et al, 2021). In both diseases, hyperactivated PI3Ka, composed of heterodimers of PIK3R1 products and p110a, drives pathological growth. Distinct inter-SH2 domain PIK3R1 mutations, mostly causing skipping of exon 11 and deletion of residues 434-475, hyperactivate PI3Kd in immune cells, causing highly penetrant monogenic immunodeficiency (Deau et al, 2014; Lucas et al, 2014b). This phenocopies the immunodeficiency caused by genetic activation of p110d itself, which is named Activated PI3K Delta Syndrome 1 (APDS1) (Angulo et al, 2013; Lucas et al, 2014a). The PIK3R1-related syndrome, discovered shortly afterwards, is thus named APDS2.”
  
  (2) Figure legend 1. Clarify reference to "Figure EV2".
  
  (3) Figure legend 2. Clarify reference to "Figure EV3".
  
  (4) Figure legend 3. Clarify reference to "Figure EV5".
  
  Thank you for pointing out this oversight, arising from failure to update nomenclature fully between versions. “EV” figures actually are the figure supplements in the submission. All labels have now been updated.
  
  (5) For Figure 1 - supplemental figure 1C, indicate experimental conditions on the blot (e.g. -/+ insulin).
  
  This is now added
  
  (6) Figure 4B, y-axis. Clarify how data was quantified. Perhaps reword "(% WT without DOX)" for clarity.
  
  We have left the Y axis label as it is, but have added the following to the figure legend:
  
  “(B) Quantification of immunoblot bands from immunoprecipitates from 3 independent experiments, expressed as a percentage relative to the intensity of the band in WT cells without doxycycline exposure.”
  
  (7) In the results section (lines 117-124), please explicitly state whether the described mutations are homo- or heterozygous.
  
  All mutations are heterozygous, as now explicitly stated
  
  (8) I recommend spelling out the SHORT and APDS2 acronyms in the abstract to make this study more accessible.
  
  We respectfully disagree that such spelling out in the abstract would improve accessibility. Both acronyms are clunky and wordy and are more likely to obscure meaning by squeezing out other words in the abstract. APDS is already spelled out in the introduction, and we now add the following for SHORT syndrome:
  
  “More surprisingly, phenotypic overlap is reported between APDS2 and SHORT syndrome. SHORT syndrome, named for the characteristic developmental features (Short stature, Hyperextensibility, Hernia, Ocular depression, Rieger anomaly, and Teething delay) is caused by loss of PI3Ka function due to disruption of the phosphotyrosine-binding C-terminal SH2 domain (Chudasama et al, 2013; Dyment et al, 2013; Thauvin-Robinet et al, 2013).”
  
  (9) I recommend explaining in more detail or rewording the following jargon/terms to make the writing more accessible to a broad audience: "reduced linear growth" (line 83) and "larger series" (line 86). I assume "reduced linear growth" is height.
  
  Edited as follows:
  
  “It features short stature, insulin resistance, and dysmorphic features (Avila et al, 2016). In recent years, both individual case reports (Bravo Garcia-Morato et al, 2017; Petrovski et al, 2016; Ramirez et al, 2020; Szczawinska-Poplonyk et al, 2022) and larger case series (Elkaim et al, 2016; Jamee et al, 2020; Maccari et al, 2023; Nguyen et al, 2023; Olbrich et al, 2016; Petrovski et al., 2016) have established that many people with APDS2 have overt features of SHORT syndrome, while, more generally, linear growth impairment is common in APDS2, but not in APDS1.”
  
  (10) For clarity, reword lines 214-215 to read, "No increase in p110α levels was seen on conditional overexpression of wild-type or R649W p85α."
  
  Change made, thank you
  
  (11) Figure 6A - Western blot label says, "657X" instead of "Y657X."
  
  Now corrected
  
  (12) Lines 214-215: For clarity, reword the sentence to say, "No increase in p110α was seen on conditional overexpression...".
  
  REPEAT OF POINT 10 ABOVE
  
  (13) Clarify what interactions are being competed for in the following statement: "... delta Ex11 may exert its inhibitory action by competing with PI3K holoenzyme" (lines 237-238). Are you referring to the interaction between p110α and p85α or the interaction between p110α/p85α and another protein?
  
  We have endeavoured to clarify by editing as follows:
  
  “As APDS2 p85a DEx11 does not appear to displace wild-type p85a from p110a despite strong overexpression, it is likely that there are high levels of truncated p85a unbound to p110a in the cell. This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 and Irs2. Excess free regulatory subunits compete with heterodimeric PI3K holoenzyme for binding to these phosphotyrosines (Ueki et al., 2002), raising the possibility that excess free, truncated APDS2 p85a DEx11 may exert its inhibitory action similarly by outcompeting PI3K holoenzyme for phosphotyrosine binding.”
  
  (14) Provide more information about the following statement and how it relates to the mutations in this study: "Homozygous truncating PIK3R1 mutations abolishing p85α expression while preserving p55α and p50α produce agammaglobulinaemia" (lines 271-272). The manuscript would benefit from a more explicit description of the nature of these mutations.
  
  This wording seems to us to be explicit, however we agree that a schematic of PIK3R1 genotype-phenotype correlation, as requested elsewhere, would help readers. Such a schematic is now included as Figure 1 figure supplement 1.
  
  (15) Typo on line 299: "unclike".
  
  Corrected.
  
  (16) The data presented in this study support a model in which p85α (DExon 11) expression functions as a dominant negative. Please clarify why in the discussion section you explain that p85α (DExon 11) activates PI3K. For example, "...skipping of exon 11, were shown in 2014 to activate PI3K..." (lines 290-291), "...activate PI3Kδ on one hand..." (line 309); "...APDS2 mutations in PIK3R1 has mixed consequences, producing greater hyperactivation of p110δ than p110α" (lines 354-355).
  
  We do not entirely understand the reviewer’s question and thus request here. p85α (DExon 11) activates PI3Kd in immune cells and in vitro, and this is accepted, based on numerous reports, to be the mechanism underlying immunodeficiency. We do not challenge this, and cite evidence for any such claims in our report. The dominant negative activity we describe here towards PI3Ka activation is based not on inhibition of mutant-containing heterodimer, but rather on destabilisation of and/or competition with heterodimeric WT holoenzyme. This is the basis of the model we present; that is, a finely balanced competition between enzymic activation and mutant holoenzyme destabilisation and competition of mutant free p85a with WT holoenzyme, whose net effect likely differs among cells and tissues, most likely based on the repertoire and proportions of PI3K subunit expression. If the reviewer has specific suggestions for us that will make this point clearer still we should be happy to consider them.
  
  (17) Provide references for the statements in lines 349-353 of the discussion.
  
  This brief closing paragraph is a succinct recap and summary of the key points made throughout the manuscript and thoroughly referenced therein. We prefer to keep this section clean to maximise clarity, but are happy to copy references from the various other places in the manuscript to back up these assertions if this is preferred by the editorial team. Current text:
  
  “In summary, it is already established that: A. genetic activation of PIK3CD causes immunodeficiency without disordered growth, while B. inhibition of PIK3R1 recruitment to RTKs and their substrates impairs growth and insulin action, without immunodeficiency, despite all catalytic subunits being affected and C. loss of p85 alone causes immunodeficiency.”
  
  Reviewer #2 (Recommendations For The Authors):
  
  In the abstract line 42 I would rather talk from SHORT syndrome like features.
  
  Some patients do indeed meet the criteria for SHORT syndrome, but there is a spectrum. We have thus added this qualification and removed “short stature” to maintain the word count, as this is itself a SHORT syndrome-like feature.
  
  Line 74 It would be helpful for the reader to give the amino-acid exchange and affected position of this single case.
  
  We agree. Now added.
  
  Furthermore, an illustration indicating the location of the different PIK3R1 variants on the p85 alpha level would be helpful for the reader.
  
  As noted above such a figure element is now included as Figure 1 figure supplement 1 and duly called out in the text
  
  The sentence in lines 298-300 makes no sense to me. Do you mean, unlike APDS1 murine models?
  
  We agree, on review, that this paragraph is convoluted and makes a simple observation complex. We have rewritten now in what we hope is a more accessible style:
  
  “Thus, study of distinct PIK3R1-related syndromes shows that established loss-of-function PIK3R1 mutations produce phenotypes attributable selectively to impaired PI3Ka hypofunction, while activating mutations produce phenotypes attributable to selectively increased PI3Kd signalling. Indeed, not only do such activating mutations not produce phenotypes attributable to PI3Ka activation, but they surprisingly have features characteristic of impaired PI3Ka function.”
  
  Line 321 I propose including the notion of different cells: “The balance between expression and signalling in different cells may be a fine one ...”
  
  This change has been made
  
  Line 352 C. loss replace with complete loss.
  
  “C.” actually denotes the last in a list after “A.” and “B.”. We have now used bold to emphasise this, but we imagine house style may dictate how we approach this.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.02.565250v3
www.biorxiv.org www.biorxiv.org

An image segmentation method based on the spatial correlation coefficient of Local Moran’s I - identification of A-type potassium channel clusters in the thalamus

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The study describes a new computational method for unsupervised (i.e., non-artificial intelligence) segmentation of objects in grayscale images that contain substantial noise, to differentiate object, no object, and noise. Such a problem is essential in biology because they are commonly confronted in the analysis of microscope images of biological samples and recently have been resolved by artificial intelligence, especially by deep neural networks. However, training artificial intelligence for specific sample images is a difficult task and not every biological laboratory can handle it. Therefore, the proposed method is particularly appealing to laboratories with little computational background. The method was shown to achieve better performance than a threshold-based method for artificial and natural test images. To demonstrate the usability, the authors applied the method to high-power confocal images of the thalamus for the identification and quantification of immunostained potassium ion channel clusters formed in the proximity of large axons in the thalamic neuropil and verified the results in comparison to electron micrographs.
  
  Strengths:
  
  The authors claim that the proposed method has higher pixel-wise accuracy than the threshold-based method when applied to gray-scale images with substantial noises.
  
  Since the method does not use artificial intelligence, training and testing are not necessary, which would be appealing to biologists who are not familiar with machine learning technology.
  
  The method does not require extensive tuning of adjustable parameters (trying different values of "Moran's order") given that the size of the object in question can be estimated in advance.
  
  We appreciate the positive assessment of our approach.
  
  Weaknesses:
  
  It is understood that the strength of the method is that it does not depend on artificial intelligence and therefore the authors wanted to compare the performance with another non-AI method (i.e. the threshold-based method; TBM). However, the TBM used in this work seems too naive to be fairly compared to the expensive computation of "Moran's I" used for the proposed method. To provide convincing evidence that the proposed method advances object segmentation technology and can be used practically in various fields, it should be compared to other advanced methods, including AI-based ones, as well.
  
  Protein localization studies revealed that protein distributions are frequently inhomogeneous in a cell. This is very common in neurons which are highly polarized cell types with distinct axo-somato-dendritic functions. Moreover, due to the nature of the cell-to-cell interactions among neurons (e.g. electrical and chemical synapses) the cell membrane includes highly variable microdomains with unique protein assemblies (i.e. clusters). Protein clusters are defined as membrane segments with higher protein densities compared to neighboring membrane regions. However, protein density can continuously change between “clusters” and “non-clusters”. As a consequence, differentiating proteins involved vs not involved in clusters is a challenging task. Indeed, our analysis showed that the boundaries of protein clusters varied remarkably when 23 human experts delineated them.
  
  Despite the fact the protein clusters can only be vaguely defined numerous studies have demonstrated the functional relevance of inhomogeneous protein distribution. Thus, there is a high relevance and need for an observer independent, “operative” segmentation method that can be accomplished and compared among different conditions and specimens. The strength of the Moran’s I analysis we propose here, as pointed out by our reviewers and editors, is that it can extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.
  
  In AI based analysis the ground truth is known by an observer and using a large training set AI learns to extract the relevant information for image segmentation. As outlined above the “ground truth”, however, cannot be unequivocally defined for protein clusters. There is no doubt, that with sufficient resource investment there would be an AI based analysis of the same problem. In our view, however, in an average laboratory setting generating a training set using hundreds of images examined by many experts may not be plausible. Moreover, generalization of one training set to another set of cluster, resistance to noise or different levels of background could also not be guaranteed.
  
  This method was claimed to be better than the TBM when the noise level was high. Related to the above, TBMs can be used in association with various denoising methods as a preprocess. It is questionable whether the claim is still valid when compared to the methods with adequate complexity used together with denoising. Consider for example, Weigert et al. (2018) https://doi.org/10.1038/s41592-018-0216-7; or Lehtinen et al (2018) https://doi.org/10.48550/arXiv.1803.04189.
  
  In Weigert et al. AI was trained with high-quality images of the same object obtained with extreme photon exposure in confocal microscope. As delineated above without training AI systems cannot be used for such purposes. The Lehtinen paper is unfortunately no longer available at this doi.
  
  We must emphasize that in our work we did not intend to compare the image segmentation method based on local Moran’s I with all other available segmentation techniques. Rather we wanted to demonstrate a straightforward method of grouping pixels with similar intensities and in spatial proximity which does not require a priori knowledge of the objects. We used TBM to benchmark the method. We agree that with more advanced TBM methods the difference between Moran’s and TBM might have been smaller. The critical component here is, however, that even with most advanced TBM an artificial threshold is needed to be defined. The optimal threshold may change from sample to sample depending on the experimental conditions which makes quantification questionable. Moran’s method overcomes this problem and allows more objective segmentation of images even if the exact conditions (background labeling, noise, intensity etc) are not identical among the samples.
  
  The computational complexity of the method, determined by the convolution matrix size (Moran's order), linearly increases as the object size increases (Fig. S2b). Given that the convolution must be run separately for each pixel, the computation seems quite demanding for scale-up, e.g. when the method is applied for 3D image volumes. It will be helpful if the requirement for computer resources and time is provided.
  
  Here we provide the required data concerning the hardware and the computational time:
  
  Hardware used for performing the analysis:
  
  Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.
  
  MATLAB R2021b software was used for implementation.
  
  Author response table 1.
  
  Computation times:
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.
  
  Strengths:
  
  Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.
  
  Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy.
  
  The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.
  
  We are grateful for the constructive assessment of our results.
  
  Weaknesses:
  
  While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.
  
  The code for the analysis is now available online as a user-friendly MATLAB script at: https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m
  
  Recommendations for the authors:
  
  Summary of reviews:
  
  Both reviewers acknowledge the potential significance and practicality of the newly proposed image segmentation method. This method uses Local Moran's principles, offering an advantage over traditional intensity thresholding approaches by providing more sensitivity, particularly in reducing background noise and preserving biologically relevant pixels.
  
  Strengths Highlighted:
  
  • The proposed method can provide more accurate results, especially for grayscale images with significant noise.
  
  • The method is not dependent on artificial intelligence, making it appealing for researchers with minimal computational background.
  
  • The approach can operate without the need for extensive tuning, given that the size of the object is known.
  
  • Several proof-of-concept experiments were carried out, revealing the effectiveness of the method in comparison with the threshold-based segmentation methods.
  
  • The manuscript is clear in terms of methodology, and the results are supported by effective illustrations and references.
  
  Weaknesses Noted:
  
  • The study lacked a comparative analysis with advanced segmentation methods, especially those that employ artificial intelligence.
  
  See our response above to the same question of Reviewer 1.
  
  • There are concerns about computational complexity, especially when dealing with larger data sets or 3D image volumes.
  
  See our response about the calculations of computation times above to the similar question of Reviewer 1.
  
  • Both reviewers noted the absence of a data/code availability statement in the manuscript, which might restrict the method's adoption by other researchers.
  
  The code availability is provided now.
  
  • Reviewer 2 suggested that some results, particularly related to Kv4.2 in the thalamus, might be better presented in a separate study due to their significance.
  
  We thank our reviewers for this suggestion. We carefully evaluated the pros and cons of publishing the Kv4.2 data separately. We finally decided to keep the segmentation and experimental data together due to the following reason. We believe that the ultrastructural localization provides strong experimental proof for the relevance of our novel segmentation method. In order to make the potassium channel data more visible we added a subsentence to the title. In this manner we think scientist interested in the imaging method as well as the neurobiology will be both find and cite the paper. The novel title reads now:
  
  “An image segmentation method based on the spatial correlation coefficient of Local Moran’s I - identification of A-type potassium channel clusters in the thalamus.”
  
  Reviewer Recommendations:
  
  (1) Provide details about the data and program code availability.
  
  See our response above
  
  (2) Offer practical recommendations and provide clarity on software packages and coding for the proposed method to enhance its adoption.
  
  Done.
  
  (3) Consider presenting the findings about Kv4.2 in the thalamus separately as they hold significant importance on their own.
  
  See our response above
  
  Given the reviews, the proposed image segmentation method presents a promising advancement in the domain of image analysis. The technique offers tangible benefits, especially for researchers dealing with biological microscopy data. However, for this method to see a broader application, it's imperative to provide clearer practical guidance and make data or code easily accessible. Additionally, while the findings regarding Kv4.2 in the thalamus are intriguing, they might achieve more impact if detailed in a dedicated paper.
  
  Reviewer #1 (Recommendations For The Authors):
  
  The availability of data or program code was not stated in the manuscript.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) While the principles of the method are explained clearly in a step-by-step fashion in the Methods section, the practical aspects of running sequential computations over a large matrix of pixel values are not well described. It would be very useful if the authors could provide recommendations on how to set the data structure and clarify which software and programming package for Local Moran's analysis they used. In addition, providing the code for the sequential implementation described in the Methods section would facilitate the adoption of the method by other researchers, and thus, the impact of the study. Currently, there is no data or code availability statement included in the manuscript.
  
  See our response above.
  
  (2) Figure 4 illustrates an experiment in which transmission electron microscopy and freeze-fracture replica labeling approaches were used to demonstrate that a potassium channel marker, Kv4.2 was selective to synapses forming on larger caliber dendrites in the thalamus. As impressive as the EM approaches utilized in this figure are, the results of this experiment have a somewhat tangential bearing on the segmentation method that is the focus of this study. In fact, the experiments illustrated in Figure 5, dual immuno-EM, are more than sufficient to confirm what the dual-confocal imaging coupled with Local Moran's segmentation analysis reveals. Furthermore, the author's findings about the localization and selectivity of Kv4.2 in the thalamus are too important and exciting to bury in a paper focusing on the methodology. Those results may have a wider impact if they are presented and discussed in a separate experimental paper.
  
  See our response above
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.05.02.539063v2
www.biorxiv.org www.biorxiv.org

New submission 31/10/2023, 10:01:49

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1
  
  The study provides a complete comparative interactome analysis of α-arrestin in both humans and drosophila. The authors have presented interactomes of six humans and twelve Drosophila α-arrestins using affinity purification/mass spectrometry (AP/MS). The constructed interactomes helped to find α-arrestins binding partners through common protein motifs. The authors have used bioinformatic tools and experimental data in human cells to identify the roles of TXNIP and ARRDC5: TXNIP-HADC2 interaction and ARRDC5-V-type ATPase interaction. The study reveals the PPI network for α-arrestins and examines the functions of α-arrestins in both humans and Drosophila.
  
  Comments
  
  I will like to congratulate the authors and the corresponding authors of this manuscript for bringing together such an elaborate study on α-arrestin and conducting a comparative study in drosophila and humans.
  
  Introduction:
  
  The introduction provides a rationale behind why the comparison between humans and Drosophila is carried out.
  
  • Even though this is a research manuscript, including existing literature on similar comparison of α-arrestin from other articles will invite a wide readership.
  
  Results:
  
  The results cover all the necessary points concluded from the experiments and computational analysis.
  
  1) The authors could point out the similarity of the α-arrestin in both humans and Drosophila. While comparing α-arrestin in both humans and Drosophila If percentage homology between α-arrestin of both Drosophila and humans needs to be calculated.
  
  Thank you for your insightful feedback. As suggested by reviewer, we determined percentage homology of α-arrestin protein sequences from human and Drosophila using Clustal Omega. This homology is now illustrated as a heatmap in revised Figure S5. Please note that only the values with percentage homology of 40% or higher are selectively labeled.
  
  • Citing the direct connecting genes from the network in the text will invite citations and a wider readership.
  
  Figures:
  
  The images are elaborate and well-made.
  
  2) The authors could use a direct connected gene-gene network that pointing interactions. This can be used by other readers working on the same topic and ensure reproducibility and citations.
  
  We appreciate your valuable comment. Based on the reviewer’s suggestion, we have developed a new website in which one can navigate the gene-gene networks of α-arrestins. These direct connected gene-gene networks are housed in the network data exchange (NDEx) project. Additionally, we have included gene ontology and protein class details for α-arrestins’ interactors in these set of networks, offering a more comprehensive view of α-arrestins’ interactomes.
  
  On page 24 lines 15-18, we have revised the manuscript to introduce the newly developed website, as follows.
  
  “Lastly, to assist the research community, we have made comprehensive α-arrestin interactome maps on our website (big.hanyang.ac.kr/alphaArrestin_PPIN). Researchers can search and download their interactomes of interest as well as access information on potential cellular functions and protein class associated with these interactomes.”
  
  3-1) The co-expression interactions represented as figures should reveal interaction among the α-arrestin and other genes. Which are the sub-network genes does the α- arrestin interact to/ with from the sub-network? The arrows are only pointing at the sub-networks. The figures do not reveal their interaction. Kindly reveal the interaction in the figure with the proper nodes in the figure.
  
  3-2) Figure 2: the network attached in both human and drosophila is well represented. The green lines from α-arrestin indicate the strength of the interaction. Several smaller expression networks are seen. But "α-arrestin" in both organisms seems highly disconnected from all the genes. Connected genes have edges, not arrows. If α-arrestin can be shown connected to these gene-gene networks will help in identifying which genes connect with which gene through α-arrestin. This can be used by other readers working on the same topic and ensure reproducibility and citations.
  
  Thank you for your valuable comment. In response to the reviewer’s recommendation, we’ve added supplementary figure, Figure S4, which illustrates direct interaction between α-arrestin and protein components of clustered complexes (or sub-networks) in addition to the associations shown between α-arrestins and the clustered complexes in Figure 2. We believe that this newly incorporated information regarding direct protein interactions will invite citations and wider readership as the reviewer pointed out.
  
  On page 12 line 27 to page 13 line 5, we have revised the manuscript to cite the direction interactions between ARRDC3 and proteins involved in ubiquitination-dependent proteolysis, as follows.
  
  “While the association of ARRDC3 with these ubiquitination-dependent proteolysis complexes is statistically insignificant, ARRDC3 does interact with individual components of these complexes such as NEDD4, NEDD4L, WWP1, and ITCH (Figure S4A). This suggest their functional relevance in this context, as previously reported in both literatures and databases (Nabhan et al., 2010; Shea et al., 2012; Szklarczyk et al., 2015; Warde-Farley et al., 2010) (Puca & Brou, 2014; Xiao et al., 2018).”
  
  Direct interaction between α-arrestins and protein components of clustered complexes are illustrated in the newly added figure, Figure S4.
  
  4-1) Figure 4. The Protein blot image was blurred. Kindly provide a higher-resolution image.
  
  4-2) Figure 5. B. - The authors can provide images with higher resolution blot images. The bands were not visible.
  
  We appreciate for valuable comment. Unfortunately, the protein blot image was scanned from the original film and the images we provided in the figure represent the highest resolution that we have obtained to date. Raw, uncropped images are shown in Author response image 1 and 2.
  
  Author response image 1.
  
  Raw image of Figure 4B
  
  Author response image 2.
  
  Raw image of Figure 5B
  
  5) Figure: 5. A. - I see non-specific amplifications in the gel images. Are these blotting images? or the gel images that were changed to "Grayscale"? Non-specific amplification may imply that the experiment was not repeated and standardized. Was it gel images or blot images?
  
  We appreciate your insightful comment. The images in Figure 5A represent western blot bands from co-immunoprecipitation assay for analysis of the interaction between TXNIP and HDAC2 proteins. Since immunoblotting using immunoprecipitates can usually detect some non-specific bands from heavy (~ 50 kDa) and light (~25 kDa) chains of the target antibody or from multiple co-immunoprecipitated proteins, we assume that the vague non-specific bands in Figure 5A might be a heavy chain of TXNIP or HDAC2 antibody or an unclear non-specific band. Because target bands showed strong intensity and very clear pattern compared to the non-specific bands in the co-immunoprecipitation assay, we believe that this data is sufficient to support the interaction of TXNIP with HDAC2. Finally, In the revised Figure 5A, we’ve modified the labeling for different experimental conditions, namely siCon and siTXNIP treatments, and added expected size of proteins (kDa), as shown below.
  
  6) Figure 5. A. RT-PCR analysis: What was your expected size of the amplifications? the ladder indicated is in KDa. Is that right?
  
  We appreciate your insightful questions. As mentioned above, Figure 5A shows the blotting images of co-immunoprecipitation analysis, and the ladder indicates the molecular weight (kDa) of protein markers. For clearer interpretation, the expected size of target proteins has been added in Figure 5A in the revised manuscript.
  
  7) How were the band intensities determined?
  
  Thank you for your question. For quantification of immunoblot results, the densities of target protein bands were analyzed with Image J, as we described in the Materials and Methods.
  
  Discussion:
  
  The authors have utilized and discussed the conclusion they draw from their study. But could highlight more on ARRDCs and why it was selected out of the other arrestins. The authors have provided future work directions associated with their work.
  
  8) Why were only ARRDCs presented amongst all the arrestin in the main part of the manuscript?
  
  We’re grateful for your valuable feedback. The reason we focused on α-arrestins was that α-arrestins have been discovered relatively recently, especially when compared to more established visual/ β-arrestin proteins in the same arrestin family but the biological functions of many α-arrestins remain largely unexplored, with notable exceptions in the budding yeast model and a few α-arrestins in mammals and invertebrate species. Most importantly, comparative study highlighting the shared or unique features of α-arrestins is yet to be undertaken. To gain a more comprehensive understanding of these unexplored α-arrestins across multiple species, we’ve centered our research on the ARRDCs within the arrestin protein family.
  
  On page 21 lines 8-17, we’ve edited the manuscript to emphasize the importance of a comparative study on α-arrestins, as detailed below.
  
  “According to a phylogenetic analysis of arrestin family proteins, α-arrestins were shown to be ubiquitously conserved from yeast to human (Alvarez, 2008). However, compared to the more established visual/ β-arrestin proteins, α-arrestins have been discovered more recently and much of their molecular mechanisms and functions remain mostly unexplored except for budding yeast model (Zbieralski & Wawrzycka, 2022). Based on the high-confidence interactomes of α-arrestins from human and Drosophila, we identified conserved and specific functions of these α-arrestins. Furthermore, we uncovered molecular functions of newly discovered function of human specific α-arrestins, TXNIP and ARRDC5. We anticipate that the discovery made here will enhance current understanding of α-arrestins.”
  
  9) The discussion could be elaborated more by utilizing the data.
  
  We appreciate your insightful feedback. Based on the reviewer’s suggestion, we’ve enhanced the discussion in the manuscript to provide a clearer interpretation of our results. First, we’ve added description of conserved protein complexes significantly associated with α-arrestins, stated on page 22 lines 5-12 and lines 23-26.
  
  Page 22 lines 5-12: “The integrative map of protein complexes also highlighted both conserved and unique relationships between α-arrestins and diverse functional protein complexes. For instance, protein complexes involved in ubiquitination-dependent proteolysis, proteasome, RNA splicing, and intracellular transport (motor proteins) were prevalently linked with α-arrestins in both human and Drosophila. To more precisely identify conserved PPIs associated with α-arrestins, we undertook ortholog predictions within the α-arrestins’ interactomes. This revealed 58 orthologous interaction groups that were observed to be conserved between human and Drosophila (Figure 3).”
  
  Page 22 lines 23-26: “Additionally, interaction between α-arrestins and entities like motor proteins, small GTPase, ATP binding proteins, and endosomal trafficking components were identified to be conserved. Further validation of these interactions could unveil molecular mechanisms consistently associated with these cellular functions.”
  
  Secondly, we’ve added description of role of ARRDC5 in osteoclast maturation, as stated on page 23 lines 22-24.
  
  “Conversely, depletion of ARRDC5 reduces osteoclast maturation, underscoring the pivotal role of ARRDC5 in osteoclast development and function (Figure S9A and B).”
  
  Lastly, we examined the association between α-arrestins’ interactomes and human diseases, incorporating our findings into the discussion. The newly introduced figure based on the result is Figure S10.
  
  On page 24 lines 10-14, we’ve added discussion on Figure S10 as follows.
  
  “We further explored association between α-arrestins’ interactomes and disease pathways (Figure S10). Notably, the interactomes of α-arrestins in human showed clear links to specific diseases. For instance, ARRDC5 is closely associated with disease resulting from viral infection and cardiovascular conditions. ARRDC2, ARRDC4, and TXNIP share common association with certain neurodegenerative diseases, while ARRDC1 is implicated in cancer.”
  
  Supplementary figures:
  
  The authors have a rigorous amount of work added together for the success of this manuscript.
  
  10) The reference section needs editing before publication. Maybe the arrangement was disturbed during compiling.
  
  Thank you for your valuable comment. Based on the reviewer’s suggestion, we have rearranged the reference section to enhance its clarity. Below are excerpts from the update reference section in the manuscript.
  
  “Adenuga, D., & Rahman, I. (2010). Protein kinase CK2-mediated phosphorylation of HDAC2 regulates co-repressor formation, deacetylase activity and acetylation of HDAC2 by cigarette smoke and aldehydes. Arch Biochem Biophys, 498(1), 62-73. doi:10.1016/j.abb.2010.04.002
  
  Adenuga, D., Yao, H., March, T. H., Seagrave, J., & Rahman, I. (2009). Histone Deacetylase 2 Is Phosphorylated, Ubiquitinated, and Degraded by Cigarette Smoke. American Journal of Respiratory Cell and Molecular Biology, 40(4), 464-473. doi:10.1165/rcmb.2008-0255OC
  
  Akalin, A., Franke, V., Vlahovicek, K., Mason, C. E., & Schubeler, D. (2015). Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics, 31(7), 1127-1129. doi:10.1093/bioinformatics/btu775
  
  Alvarez, C. E. (2008). On the origins of arrestin and rhodopsin. BMC Evol Biol, 8, 222. doi:10.1186/1471-2148-8-222”
  
  11) many important references were missing.
  
  We appreciate and agree with the reviewer’s comment. In response to the reviewer’s recommendation, we’ve thoroughly reviewed the manuscript and below are sections of the manuscript where around 20 new references have been added.
  
  On page 8 lines 12-14:
  
  “Utilizing the known affinities between short linear motifs in α-arrestins and protein domains in interactomes(El-Gebali et al., 2019; UniProt Consortium, 2018) “
  
  On page 8 lines 19-22:
  
  “One of the most well-known short-linear motifs in α-arrestin is PPxY, which is reported to bind with high affinity to the WW domain found in various proteins, including ubiquitin ligases (Ingham, Gish, & Pawson, 2004; Macias et al., 1996; Sudol, Chen, Bougeret, Einbond, & Bork, 1995)”
  
  On page 9 lines 3-6:
  
  “Next, we conducted enrichment analyses of Pfam proteins domains (El-Gebali et al., 2019; Huang da, Sherman, & Lempicki, 2009b) among interactome of each α-arrestin to investigate known and novel protein domains commonly or specifically associated (Figure S3A; Table S5).”
  
  On page 9 lines 7-10:
  
  “HECT and C2 domains are well known to be embedded in the E3 ubiquitin ligases such as NEDD4, HECW2, and ITCH along with WW domains (Ingham et al., 2004; Melino et al., 2008; Rotin & Kumar, 2009; Scheffner, Nuber, & Huibregtse, 1995; Weber, Polo, & Maspero, 2019)”
  
  On page 10 lines 12-16:
  
  “In fact, the known binding partners, NEDD4, WWP2, WWP1, and ITCH in human and CG42797, Su(dx), Nedd4, Yki, Smurf, and HERC2 in Drosophila, that were detected in our data are related to ubiquitin ligases and protein degradation (C. Chen & Matesic, 2007; Ingham et al., 2004; Y. Kwon et al., 2013; Marin, 2010; Melino et al., 2008; Rotin & Kumar, 2009) (Figure 1E; Figure S2F).”
  
  On page 13 lines 20-21:
  
  “Given that α-arrestins are widely conserved in metazoans (Alvarez, 2008; DeWire, Ahn, Lefkowitz, & Shenoy, 2007), “
  
  On page 14 lines 12-17:
  
  “The most prominent functional modules shared across both species were the ubiquitin-dependent proteolysis, endosomal trafficking, and small GTPase binding modules, which are in agreement with the well-described functions of α-arrestins in membrane receptor degradation through ubiquitination and vesicle trafficking (Dores et al., 2015; S. O. Han et al., 2013; Y. Kwon et al., 2013; Nabhan et al., 2012; Puca & Brou, 2014; Puca et al., 2013; Shea et al., 2012; Xiao et al., 2018; Zbieralski & Wawrzycka, 2022) (Figure 3).”
  
  Reviewer #2
  
  In this manuscript, the authors present a novel interactome focused on human and fly alpha-arrestin family proteins and demonstrate its application in understanding the functions of these proteins. Initially, the authors employed AP/MS analysis, a popular method for mapping protein-protein interactions (PPIs) by isolating protein complexes. Through rigorous statistical and manual quality control procedures, they established two robust interactomes, consisting of 6 baits and 307 prey proteins for humans, and 12 baits and 467 prey proteins for flies. To gain insights into the gene function, the authors investigated the interactors of alpha-arrestin proteins through various functional analyses, such as gene set enrichment. Furthermore, by comparing the interactors between humans and flies, the authors described both conserved and species-specific functions of the alpha-arrestin proteins. To validate their findings, the authors performed several experimental validations for TXNIP and ARRDC5 using ATAC-seq, siRNA knockdown, and tissue staining assays. The experimental results strongly support the predicted functions of the alpha-arrestin proteins and underscore their importance. `
  
  I would like to suggest the following analyses to further enhance the study:
  
  1) It would be valuable if the authors could present a side-by-side comparison of the interactomes of alpha-arrestin proteins, both before and after this study. This visual summary network would demonstrate the extent to which this work expanded the existing interactome, emphasizing the overall contribution of this study to the investigation of the alpha-arrestin protein family.
  
  We greatly appreciate your insightful feedback. In response to the reviewer’s suggestion, we’ve depicted a network of known PPIs associated with α-arrestins (Figure S2C and D). Furthermore, by comparing our high-confidence PPIs to these known sets, we found that the overlaps are statistically significant and the high-confidence PPIs of α-arrestins broaden the existing interactome (Figure S2E).
  
  From page 7 line 26 to page 8 line 8, we’ve detailed this side-by-side comparisons of existing interactome and newly discovered high-confidence PPIs of α-arrestins, as outline below.
  
  “As a result, we successfully identified many known interaction partners of α-arrestins such as NEDD4, WWP2, WWP1, ITCH and TSG101, previously documented in both literatures and PPI databases (Figure S2C-F) (Colland et al., 2004; Dotimas et al., 2016; Draheim et al., 2010; Mellacheruvu et al., 2013; Nabhan et al., 2012; Nishinaka et al., 2004; Puca & Brou, 2014; Szklarczyk et al., 2015; Warde-Farley et al., 2010; Wu et al., 2013). Additionally, we greatly expanded repertoire of PPIs associated with α-arrestins in human and Drosophila, resulting in 390 PPIs between six α-arrestins and 307 prey proteins in human, and 740 PPIs between twelve α-arrestins and 467 prey proteins in Drosophila (Figure S2E). These are subsequently referred to as ‘high-confidence PPIs’ (Table S3).”
  
  2) While the authors conducted several analyses exploring protein function, there is a need to further explore the implications of the interactome in human diseases. For instance, it would be beneficial to investigate the association of the newly identified interactome members with specific human diseases. Including such investigations would strengthen the link between the interactome and human disease contexts.
  
  Thank you for your valuable comment. As suggested by the reviewer, we examined the association between α-arrestins’ interactomes and human diseases, incorporating our findings into the discussion. The newly introduced figure based on the result is Figure S10.
  
  On page 24 lines 10-14, we’ve added discussion on Figure S10 as follows.
  
  “We further explored association between α-arrestins’ interactomes and disease pathways (Figure S10). Notably, the interactomes of α-arrestins in human showed clear links to specific diseases. For instance, ARRDC5 is closely associated with disease resulting from viral infection and cardiovascular conditions. ARRDC2, ARRDC4, and TXNIP share common association with certain neurodegenerative diseases, while ARRDC1 is implicated in cancer.”
  
  Reviewer #3:
  
  Lee, Kyungtae and colleagues have discovered and mapped out alpha-arrestin interactomes in both human and Drosophila through the affinity purification/mass spectrometry and the SAINTexpress method. They found the high confident interactomes, consisting of 390 protein-protein interactions (PPIs) between six human alpha-arrestins and 307 preproteins, as well as 740 PPIs between twelve Drosophila alpha-arrestins and 467 prey proteins. To define and characterize these identified alpha-arrestin interactomes, the team employed a variety of widely recognized bioinformatics tools. These included protein domain enrichment analysis, PANTHER for protein class enrichment, DAVID for subcellular localization analysis, COMPLEAT for the identification of functional complexes, and DIOPT to identify evolutionary conserved interactomes. Through these analyses, they confirmed known alpha-arrestin interactors' role and associated functions such as ubiquitin ligase and protease. Furthermore, they found unexpected biological functions in the newly discovered interactomes, including RNA splicing and helicase, GTPase-activating proteins, ATP synthase. The authors carried out further study into the role of human TXNIP in transcription and epigenetic regulation, as well as the role of ARRDC5 in osteoclast differentiation. This study holds important value as the newly identified alpha-arrestin interactomes are likely aiding functional studies of this group of proteins. Despite the overall support from data for the paper's conclusions, certain elements related to data quantification, interpretation, and presentation demand more detailed explanation and clarification.
  
  1) In Figure 1B, it is shown that human alpha-arrestins were N-GFP tagged (N-terminal) and Drosophila alpha-arrestins were C-GFP (C-terminal). However, the rationale of why the authors used different tags for human and fly proteins was not explained in the main text and methods.
  
  We appreciate your valuable comment. Both N- and C-terminally tagged α-arrestins have been used previously. Given that our study aims to increase the repertoire of α-arrestin interacting proteins, where GFP is added might not be a concern. We note that GFP is a relatively bulky tag, and tagging a protein with GFP can potentially abolish the interaction with some of the binding proteins. Follow-up studies utilizing different approaches for detecting protein-protein interactions, such as BioID and yeast two-hybrid, will allow us to build more comprehensive α-arrestin interactomes.
  
  2) In Figure 2A, there seems to be an error for labeling the GAL4p/GAL80p complex that includes NOTCH2, NOTCH1 and TSC2.
  
  Thank you for comment. We double-checked COMPLEAT (protein COMPLex Enrichment Analysis Tool) database for the name of protein complex consisting of NOTCH1, NOTCH2, AND TSC2. The database indeed labeled this complex as the “GAL4p/GAL80p complex”. However, given the potential for mis-annotation (since we could not ascertain the relevance of these proteins to the “GAL4p/GAL80p complex”), we chose to exclude this protein complex from the network. The update protein complex network is illustrated in the revised Figure 2A.
  
  3) In Figure 5, given that knockdown of TXNIP did not affect the levels and nuclear localization of HDAC2, the authors suggest that TXNIP might modulate HDAC2 activity. However, the ChiP assay suggest a different model - TXNIP-HDAC2 interaction might inhibit the chromatin occupancy of HDAC2, reducing histone deacetylation and increasing global chromatin accessibly. The authors need to propose a model consistent with these sets of all data.
  
  We greatly appreciate your detailed feedback. Our data indicates a global decrease in chromatin accessibility (Figure 4C-G) and a diminished interaction between TXNIP and HDAC2 under depletion of TXNIP (Figure 5A). Additionally, we observed an increased occupancy of HDAC2 and subsequent histone deacetylation at TXNIP-target promoter regions (Figure 5C) without any changes in the HDAC2 expression level (Figure 5A) in TXNIP- knockdown cells. From these observations, we infer that the interaction between TXNIP-HDAC2 might suppress the function of HDAC2, a major gene silencer affecting the formation of condensed or accessible chromatin by deacetylating activity. Although we checked whether TXNIP could induce cytosolic retention of HDAC2 to inhibit nuclear function of HDAC2, TNXIP knockdown did not alter its subcellular localization (Figure 5B).
  
  To elucidate the mechanism by which TXNIP inhibits the function of HDAC2, we further investigated the effect of TXNIP on the levels of HDAC2 phosphorylation, which is known to be crucial for its deacetylase activity and the formation of transcriptional repressive complex. However, as shown in the Figure S8C and D, the knockdown of TXNIP did not affect the HDAC2 phosphorylation status, as well as the interaction between HDAC2 and other components in NuRD complex in the immunoblotting and co-IP assays, respectively. The results suggest that TXNIP may inhibit the function of HDAC2 independently of these factors.
  
  Following the reviewer’s suggestion, we carefully provided a proposed model describing the possible role of TXNIP in transcriptional regulation through interaction with HDAC2 and co-repressor complex in Figure S8E.
  
  Description of these newly added figures can be found in the revised manuscript from page 18 line 7 to 27, as outlined below.
  
  “HDAC2 typically operates within the mammalian nucleus as part of co-repressor complexes as it lacks ability to bind to DNA directly (Hassig, Fleischer, Billin, Schreiber, & Ayer, 1997). The nucleosome remodeling and deacetylation (NuRD) complex is one of the well-recognized co-repressor complexes that contains HDAC2 (Kelly & Cowley, 2013; Seto & Yoshida, 2014) and we sought to determine if depletion of TXNIP affects interaction between HDAC2 and other components in this NuRD complex. While HDAC2 interacted with MBD3 and MTA1 under normal condition, the interaction between HDAC2 and MBD3 or MTA1 was not affected upon TXNIP depletion (Figure S8C). Next, given that HDAC2 phosphorylation is known to influence its enzymatic activity and stability (Adenuga & Rahman, 2010; Adenuga, Yao, March, Seagrave, & Rahman, 2009; Bahl & Seto, 2021; Tsai & Seto, 2002), we tested if TXNIP depletion alters phosphorylation status of HDAC2. The result indicated, however, that phosphorylation status of HDAC2 does not change upon TXNIP depletion (Figure S8D). In summary, our findings suggest a model where TXNIP plays a role in transcriptional regulation independent of these factors (Figure S8E). When TXNIP is present, it directly interacts with HDAC2, a key component of transcriptional co-repressor complex. This interaction suppresses the HDAC2 ‘s recruitment to target genomic regions, leading to the histone acetylation of target loci possibly through active complex including histone acetyltransferase (HAT). As a result, transcriptional activation of target gene occurs. In contrast, when TXNIP expression is diminished, the interaction between TXNIP and HDAC2 weakens. This restores histone deacetylating activity of HDAC2 in the co-repressor complex, leading to subsequent repression of target gene transcription.”
  
  4) The authors showed that ectopic expression of ARRDC5 increased osteoclast differentiation and function. Does loss of ARDDC5 lead to defects in osteoclast function and fate determination?
  
  We appreciate your valuable comment. We have confirmed the endogenous expression of ARRDC5 in osteoclasts and conducted a loss-of-function study using shARRDC5. As determined by qPCR, ARRDC5 was endogenously expressed very low in osteoclasts. Even during RANKL-induced osteoclast differentiation, the CT value (29-31) for ARRDC5 expression was high in osteoclasts compared to the CT value (17-24) for the expression of marker genes Cathepsin K, TRAP, and NFATc1. Even though its endogenous expression was very low, we generated ARRDC5 knockdown cells by infecting BMMs with lentivirus expressing shRNA of ARRDC5 and subsequently differentiated the cells into mature osteoclasts. After five days of differentiation, we observed a significant decrease in the total number of TRAP-positive multinucleated cells (No. of TRAP+ MNCs) in shARRDC5 cells compared to that in the control cells. This result indicates that the loss of ARRDC5 leads to defects in osteoclast differentiation. Result of this loss-of-function study using shARRDC5 is depicted in Figure S9A and B.
  
  In the revised manuscript, following sentence explaining Figure S9A and B was added on page 19 lines 15-17 as follows.
  
  “Depletion of ARRDC5 using short hairpin RNA (shRNA) impaired osteoclast differentiation, further affirming its crucial role in this differentiation process (Figure S9A and B).”
  
  5) From Figure 6D, the authors argued that ARRDC5 overexpression resulted in more V-ATPase signals: however, there is no quantification. Quantification of the confocal images will foster the conclusion. Also, western blots for V-ATPase proteins will provide an alternative way to determine the effects of ARRDC5.
  
  We appreciate your insightful feedback. As suggested by the reviewer, we quantified V-type ATPase signals using confocal images, which were shown in Figure 6D. The ImageJ program was employed for integrated density measurements, and the integrated density of GFP-GFP overexpressing osteoclasts was set to 1 for relative comparison. The result in the revised Figure 6D revealed a significant increase in V-type ATPase signals in GFP-ARRDC5 overexpressing osteoclasts compared to that in GFP-GFP overexpressing osteoclasts, as outlined below.
  
  We also agree with the reviewer’s comment that Western blot for V-ATPase proteins will be an alternative way to determine the effects of ARRDC5 in osteoclast differentiation. We have confirmed no different expression of V-type ATPase between GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts using qPCR and western blot analysis. The corresponding western blot result is shown in the revised Figure S9C.
  
  In addition, the corresponding qPCR that measures the expression level of V-type ATPase between GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts is shown in Author response image 3.
  
  Author response image 3.
  
  Moreover, based on the references, the V-type ATPase is localized at the plasma membrane during osteoclast differentiation (Toyomura et al., 2003). Although mRNA and protein expression levels were similar in both cells, localization of V-ATPase in plasma membrane was significantly increased in GFP-ARRDC5 overexpressing osteoclasts compared to that in GFP-GFP osteoclasts, as shown in the revised Figure 6D above.
  
  6) The results from Figure 6D did not support the authors' argument that ARRDC5 might control the membrane localization of the V-ATPase, as bafilomycin is the V-ATPase inhibitor. ARRDC5 knockdown experiments will help to determine whether ARRDC5 can control the membrane localization of the V-ATPase in osteoclast.
  
  Thank you for your insightful comment. V-type ATPase has been reported to play an important role in the differentiation and function of osteoclasts (Feng et al., 2009; Qin et al., 2012). Given that various subunits of the V-type ATPase interact with ARRDC5 (Figure 6A), we speculated that ARRDC5 might be involved in the function of this complex and play a role in osteoclast differentiation and function. As answered above, GFP-ARRDC5 overexpressing osteoclasts showed a similar expression level of V-type ATPase to GFP-GFP cells but exhibited increased V-type ATPase signals at the cell membrane compared to those in GFP-GFP cells (Figure 6D). Additionally, co-localization of ARRDC5 and V-type ATPase was observed in the osteoclast membrane (Figure 6D), as predicted by the human ARRDC5-centric PPI network. On the other side, bafilomycin A1, a V-type ATPase inhibitor, not only blocked localization of V-type ATPase to plasma membrane in GFP-ARRDC5 overexpressing osteoclasts, but also reduced ARRDC5 signals (Figure 6D). These results indicate that ARRDC5 plays a role in osteoclast differentiation and function by interacting with V-type ATPase and promoting the localization of V-type ATPase to plasma membrane in osteoclasts.
  
  V-type ATPase present in osteoclast membrane is important to cell fusion, maturation, and function during osteoclast differentiation (Feng et al., 2009; Qin et al., 2012). GFP-ARRDC5 overexpressing osteoclasts showed a significant increase of V-type ATPase signals in the cell membrane compared to GFP-GFP cells (Figure 6D), and also significantly increased cell fusion (No. of TRAP+ MNCs in Figure 6B) and resorption activity (resorption pit formation in Figure 6C). However, ARRDC5 knockdown in osteoclasts (shARRDC5 cells) showed a significant decrease in No. of TRAP+ MNCs compared to that in the control cells, indicating that the loss of ARRDC5 leads to defects in cell fusion during osteoclast differentiation (Figure S9A and B). As described above, the endogenous expression of ARRDC5 was very low in osteoclasts and could be specifically expressed in a certain timepoint during the differentiation. Therefore, to better understand the interaction with V-type ATPase of ARRDC5 in osteoclasts, ARRDC5 overexpression is more suitable than its knockdown.
  
  Part of the manuscript on page 19 line 21 to page 20 line 6 was edited to support our statement, as outlined below.
  
  “The V-type ATPase is localized at the osteoclast plasma membrane (Toyomura et al., 2003) and its localization is important for cell fusion, maturation, and function during osteoclast differentiation (Feng et al., 2009; Qin et al., 2012). Furthermore, its localization is disrupted by bafilomycin A1, which is shown to attenuate the transport of the V-type ATPase to the membrane (Matsumoto & Nakanishi-Matsui, 2019). We analyzed changes in the expression level and localization of V-type ATPase, especially V-type ATPase V1 domain subunit (ATP6V1), in GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts. The level of V-type ATPase expression did not change in osteoclasts regardless of ARRDC5 expression levels (Figure S9C). GFP signals were detected at the cell membrane when GFP-ARRDC5 was overexpressed, indicating that ARRDC5 might also localize to the osteoclast plasma membrane (Figure 6D; Figure S9D). In addition, we detected more V-type ATPase signals at the cell membrane in the GFP-ARRDC5 overexpressing osteoclasts, and ARRDC5 and V-type ATPase were co-localized at the osteoclast membrane (Figure 6D; Figure S9D).”
  
  7) The tables (excel files) do not have proper names for each table S numbers. Please correct the name of excel files for readers.
  
  We appreciate your valuable comments. In response to the reviewer’s suggestion, we’ve renamed excel files to more appropriate titles for easier readability. List of renamed tables (excel files) are shown below.
  
  Table S1. List of α-arrestins from human and Drosophila Table S2. Evaluation sets of α-arrestins PPIs Table S3. Summary tables of SAINTexpress results Table S4. Protein domains and short linear motifs in the α-arrestin interactomes Table S5. Enriched Pfam domains in the α-arrestin interactomes Table S6. Subcellular localizations of α-arrestin interactomes Table S7. Summary of protein complexes and cellular components associated with α-arrestin Table S8. Orthologous relationship of α-arrestin interactomes between human and Drosophila Table S9. Summary of ATAC- and RNA-seq read counts before and after processing Table S10. Differential accessibility of ACRs and gene expression Table S11. Summary of ATAC-seq peaks located in promoters and gene expression level Table S12. List of primer sequences used in this study
  
  8) http://big.hanyang.ac.kr/alphaArrestin_Fly link does not work. Please fix the link.
  
  We appreciate your comment. In response to the reviewer’s comment, we have made comprehensive α-arrestin interactome maps on our new website (big.hanyang.ac.kr/alphaArrestin_PPIN) and confirmed that users can be re-directed to networks housed in NDEx.
  
  Author response image 4.
  
  Screen shot of the first page of the newly developed website.
  
  Website address: big.hanyang.ac.kr/‌‌‌‌‌‍‍‍‌‌alphaArrestin_PPIN
  
  Author response image 5.
  
  Screen shot of the gene-gene network involving α-arrestin in human.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.11.544486v3
www.biorxiv.org www.biorxiv.org

Species -Shared and -Unique Gyral Peaks on Human and Macaque Brains

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Public review
  
  Reviewer 1
  
  Zhang et al. tackle the important topic of primate-specific structural features of the brain and the link with functional specialization. The authors explore and compare gyral peaks of the human and macaque cortex through non-invasive neuroimagery, using convincing techniques that have been previously validated elsewhere. They show that nearly 60% of the macaque peaks are shared with humans, and use a multi-modal parcellation scheme to describe the spatial distribution of shared and unique gyral peaks in both species.
  
  We thank the reviewer for his/her summary and affirmation of our work.
  
  The claim is made that shared peaks are mainly located in lower-order cortical areas whereas unique peaks are located in higher-order regions, however, no systematic comparison is made. The authors then show that shared peaks are more consistently found across individuals than unique peaks, and show a positive but small and non-significant correlation between cross-individual counts of the shared peaks of the human and the macaque i.e. the authors show a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques.
  
  Answer: We appreciate the reviewer for raising questions about our work. In order to provide a more systematic comparison for the conclusion that ‘shared peaks are mainly located in lowerorder cortical areas whereas unique peaks are located in higher-order regions’, we have conducted two additional experiments. Following the reviewers’ suggestions, we conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). Through these three experiments, we have obtained a more systematic and comprehensive conclusion that ‘shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks’.
  
  In order to identify if unique and shared peaks could be identified based on the structural features of the cortical regions containing them, the authors compared them with t-tests. A correction for multiple comparisons should be applied and t-values reported. Graph-theoretical measures were applied to functional connectivity datasets (resting-state fMRI) and compared between unique and shared peak regions for each species separately. Again the absence of multiple comparison correction and t-values make the results hard to interpret. The same comment applies to the analysis reporting that shared peaks are surrounded by a larger number of brain regions than unique peaks. Finally, the potentially extremely interesting results about differential human gene expression of shared and unique peaks regions are not systematically reported e.g. the 28 genes identified are not listed and the selection procedure of 7 genes is not fully reported.
  
  Answer: We appreciate the reviewer for their suggestions about the statistical analysis in our manuscript. Firstly, we applied False Discovery Rate (FDR) correction to all experiments involving multiple comparisons throughout the entire manuscript, and the corrected t-values are reported (Table 2-5 and A5-A6). Additionally, in response to the reviewers’ guidance regarding the gene analysis section, we provided a list of 28 genes (Table A7) selected by lasso, along with the t-values obtained from Welch’s t-test for the expression of the two type of peaks. The functions corresponding to the seven genes with final t-values below 0.05 are reported in Table 6.
  
  The paper is well written and the methods used for data processing are very compelling i.e. the peak cluster extraction pipeline and cross-species registration. However, the analysis and especially the reporting of statistics, as they stand now, constitutes the main weakness of the paper. Some aspects of the statistical analysis need to be clarified.
  
  Reviewer 2
  
  The authors compared the cortical folding of human brains with folding in macaque monkey brains to reveal shared and unique locations of gyral peaks. The shared gyral peaks were located in cortical regions that are functionally similar and less changed in humans from those in macaques, while the locations of unique peaks in humans are in regions that have changed or expanded functions. These findings are important in that they suggest where human brains have changed more than macaque brains in their subsequent evolution from a common ancestor. The massive analysis of comparative results provides evidence of where humans and macaques are similar or different in cortical markers, as well as noting some of the variations within each of the two primates.
  
  Answer: Gratitude to the reviewer for his/her summary and appreciation of our cross-species work.
  
  Strengths:
  
  The study includes massive detail.
  
  Weaknesses:
  
  The manuscript is too long and there is not enough focus on the main points.
  
  Answer: We appreciate the reviewer for pointing out the shortcomings in our manuscript. Firstly, considering the manuscript is too long, we have chosen to retain only the core experiments and relevant analyses in the main text. Relatively minor conclusions have been moved to the supplementary information, such as original Table 1 is now moved to the Supplementary Information as Table A1 (locations of all shared clusters). Additionally, some non-essential expressions in the original manuscript have been removed.
  
  Our experiments primarily revealed the existence of partially shared cortical landmarks, known as gyral peaks, in both humans and macaques. We found that these shared and unique peaks are mainly distributed across low- and high-order brain networks. To emphasize this main point, we added two experiments on top of the existing ones to provide a more systematic explanation of this conclusion. We conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). By combining the results of these two experiments with the original manuscript’s statistical findings on the proportions of the two type of peaks in different brain networks, the conclusion that ‘shared and unique peaks are predominantly located in low-order and high-order brain networks’ becomes more prominent.
  
  A brief listing of previous views on why fissures form and what factors are important would be helpful.
  
  Answer: In response to this suggestion from the reviewer, we have incorporated some previous views on why fissures form and what factors are important into the ‘Introduction’ section.
  
  ‘Cortical folds are important features of primate brains. The primary driver of cortical folding is the differential growth between cortical and subcortical layers. During the gyrification process in the cortex, areas with high-density stiff axonal fiber bundles towards gyri. The brain’s folding pattern formed through a series of complex processes. The folding patterns in the brain, formed through a series of complex processes, are found to play a crucial role in various cognitive and behavioral processes, including perception, action, and cognition (Fornito et al. 2004; Cachia et al. 2018; Yang et al. 2019; Whittle et al. 2009).’
  
  Reviewer 1 (Recommendations For The Authors):
  
  (1) Figure 3b shows a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques. In the discussion, lines 218-219, the fact that the correlation is not significant should be reported more clearly.
  
  Answers: We thank the reviewer for this question. We revised the Line 218-219 (now Line 257-259) as follows: ‘2. Consistency: The inter-individual consistency of shared peaks within each species was greater than that of unique peaks. The consistency of shared peaks in the human and macaque brains exhibits a positive correlation (non-significant though).’
  
  (2) It is not fully clear how much shared peaks are mostly distributed in the higher-order cortex, especially in the macaque. It is reported in the results lines 132-133 that ‘In the macaque brain, shared peak cluster centers most distributed in the V2, DMN, and CON (Figure.2 (d)), while unique peak cluster centers most distributed in the DMN, Language (Lan), and Dorsal-attention (DAN)’ but not further discussed. Please develop this point in the discussion. Further, the results presented in Figures 2 and A1 are actually quite different and this shall be better described in the results. Given that shared and unique peaks can be found in the same region, this analysis would gain importance by applying a comparison test for the selection of regions where the most shared or unique peaks are found. The sentence lines 306-308 should be accordingly revised.
  
  It is hard to understand what the 0-3% corresponds to in Figures 2 and A1?
  
  Please also correct in both legends and in the text the labeling of panels and add in the legends a brief description of panel (c). In the legend of Figure 2, ‘shared peaks’ in the second sentence shall be replaced by ‘unique peaks’.
  
  Answers: We thank the reviewer for these questions and suggestions. Our responses to them are itemized as follows:
  
  A1: In general, to clarify the distribution of shared and unique peaks in the high-order and loworder networks, we divided 12 brain networks in Cole-Anticevic atlas into the low-order networks (visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN)) and higher-order networks (include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default mode network (DMN)) based on previous research (Golesorkhi et al. 2022; Ito, Hearne, and Cole 2020). On this lower/higher -order division, we reported the number of shared and unique peaks in both species in Author response table 1. It is found that, whether in humans or macaques, shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks. This observation is particularly pronounced in humans.
  
  Author response table 1.
  
  The number of shared and unique peaks in lower- and higher-order brain networks of the two species. Lower-order networks include visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN), higher-order networks include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default-mode network (DMN).
  
  In the main text, Figure 2 (referring to Author response figure 1 later in the text.) illustrates the proportions of shared and unique peaks across 12 brain networks in both species. In each pie chart, we have specifically highlighted the top three ranked brain regions. Although the pie chart also generally supports the above results, two brain networks deserve further discussion. They are DMN and CON, two higher-order networks that have higher ranks in terms of shared peak count (the second-ranked and the third-ranked on macaque shared peaks; the fourth-ranked and the fifth-ranked on human shared peaks).
  
  The cingulo-opercular network (CON) is a brain network associated with action, goal, arousal, and pain. However, a study found three newly discovered areas of the primary motor cortex that exhibit strong functional connectivity with the CON region, forming a novel network known as the somato-cognitive action network (SCAN) (Gordon et al. 2023). The SCAN integrates body control (motor and autonomic) and action planning, consistent with the findings that aspects of higher-level executive control might derive from movement coordination (Llinás 2002; Gordon et al. 2023). CON may be shared in the form of the SCAN network across these two species. This could explain in part the results in Author response figure 1 that shared peaks are more on CONs.
  
  Author response image 1.
  
  Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.
  
  Default-mode network (DMN) is a ensemble of brain regions that are active in passive tasks, including the anterior and posterior cingulate cortex, medial and lateral parietal cortex, and medial prefrontal cortex (Buckner, Andrews-Hanna, and Schacter 2008). Although DMN is considered a higher-order brain network, numerous studies have provided evidence of its homologous presence in both humans and macaques. Many existing studies have confirmed the similarity between the DMN regions in humans and macaques from various perspectives, including cytoarchitectonic (Parvizi et al. 2006; Buckner, Andrews-Hanna, and Schacter 2008; Caminiti et al. 2010) and anatomical tracing (Vincent et al. 2007). These studies all support the notion that some elements of the DMN may be conserved across primate species (Mantini et al. 2011). In general, the partial sharing of DMN between humans and macaques may be attributed to the higher occurrence of shared peaks within the DMN.
  
  These results have been added to Table 2 along with corresponding text and discussion section.
  
  A2: The difference between the results of Figure 2 and Figure A1 (now Figure A2) is whether the peak count is normalized by cortical area, which hugely varies across networks. For example, among the 12 brain networks, the three networks with the largest surface areas are the DMN, SMN and CON, and the three networks with the smallest area are OAN, PMN and VMN. The area difference between networks can be as large as 18-fold. Therefore, it is not difficult to find that, although the DMN ranks high in both shared and unique peak counts during statistical analysis (Figure 2 (a)), it is relatively small in Figure A2 after area normalization. In contrast, VMN ranks lower in peak count statistics but exhibits a substantial proportion after area normalization (For example, 38% of macaque shared peaks are distributed in the VMN region, but there are actually only four peaks). However, the two pie charts deliver the same message that there are more shared peaks in lower-order networks, while unique peaks are more in higher-order networks (except for macaques, where shared peaks are also distributed significantly in DMN and CON).
  
  Following the suggestion from the reviewer, we adopted a new approach to present the ratio between shared peak count and unique peak count for each network (see Author response figure 2), such that the networks where the most shared or unique peaks are found can be easily highlighted. To mitigate potential imbalances in proportions caused by differences in the absolute numbers of each category (shared or unique) of peak, the proportions of peaks within their respective categories were utilized in the calculations. In Author response figure 2, the pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. In general, from left to right in the figure, the ratio of shared peaks decreases gradually while the ratio of unique peaks increases, suggesting that shared peaks are more (>0.5, above the dashed line) on lower-order networks (orange font), while unique peaks are generally more on higher-order networks (blue font). In specific, in human brains, the networks with a higher abundance of shared peaks are Aud, VMN, V1, SMN, and V2; whereas in macaques, they are CON, VMN, V1, V2, FPN, and SMN. Again, in the human brains, the disparity between shared and unique peaks tends to be more significant (further away from the reference line), for both lower-order and higher-order networks, respectively. In contrast, in the macaque brains, the disparity between shared and unique peaks is less significant (closer to the reference line). The ratio of shared and unique peaks is around 0.5 for 6 out of all 10 networks (including both lower and higher-order ones).
  
  Author response image 2.
  
  The ratio of shared and unique peaks in each brain network in the Cole-Anticevic (CA) atlas. The pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. For each brain region, the sum of the ratios of shared and unique peaks is equal to 1.
  
  Based on these analyses, the sentence lines 306-308 (now Line 368-370) has been revised as follows: ‘In the human brain, the more shared peaks (about 65%) are located in lower-order brain regions, while unique peaks are mainly (about 74%) located in higher-order regions. However, this trend is relatively less pronounced in the macaque brain.’
  
  These results have been added to Figure 2 (b) along with corresponding text and discussion section.
  
  A3: In response to the third suggestion from the reviewer, we have clearly labeled the brain region names corresponding to 0% to 3% in Figure 2 (now Figure 2 (a)) and Figure A1 (now Figure A2).
  
  Author response image 3.
  
  Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.
  
  A4: Finally, we would like to express our gratitude to the reviewer for pointing out our mistakes.
  
  We have made improvements to Figure 2 and revised the figure captions accordingly.
  
  (3) The conclusions regarding the spatial relationship between peaks and functional regions shall be revised (Lines 187-188, 228-229, and 329-330). In the macaque, the results are opposite in the two atlases used. Further, in the human, it is not clear how multiple comparison corrections will impact statistics and some atlases show opposite results, although conclusions hold true in the majority of human atlases.
  
  Answers: We thank the reviewer very much for this suggestion. We have added the results of the Cole-Anticevic atlas for macaques in the main text, which also has the observation that shared>unique (Author response table 2, corresponds to Table 5 in main text), namely, there are more diverse brain regions around shared peaks than around unique peaks. Therefore, out of the commonly used three macaque atlases, two (Markov91 and Cole-Anticevic) conform to this observation, while BA05 does not. We utilized false discovery rate (FDR) correction for multiple comparisons, and the corrected p-values are reported in Tables (in the revised main text and are shown below). Results on atlas with multiple resolutions are reported in Author response table 4) (Table A6 in the Supplementary Information). The observation that more diverse brain regions around shared peaks than around unique peaks, holds for human atlases in Author response table 3) (Table 4 in main text), where the atlas resolutions ranges from 7 parcels to 300 parcels, demonstrating the robustness of the conclusion. It is noted that the observation is not consistent on atlases with relatively lower resolutions (e.g., BA05 for macaque, n=30 and Yeo2011 for human, n=7) or, in particular, higher resolutions (e.g., Schaefer-500, and Vosdewael-400, n>300). This inconsistency could be reasonable since the resolution of the parcellation itself will largely determines the chance of a cortical region appear in a peak’s neighborhood, if the parcellation is too coarse or too fine. For example, if n=1 (the entire cortex is the only one region) or n=30k (each vertex is a region), each peak will has the same number of neighboring regions for these two extreme cases (one brain region for each peak for n=1; around 30 vertices for each peak for n=30k).
  
  In conclusion, we observed that there are more diverse brain regions around shared peaks than around unique peaks for multiple brain atlases with a median parcellation resolution. These results have been added to Tables 4, 5, and A6 along with corresponding text and discussion section.
  
  Author response table 2.
  
  The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 3 common macaque atlases. For both Markov91 and Cole-Anticevic atlas, the shared peaks has more variety of functional regions around it than the unique peaks. But for the altas BA05, the conclusion was reversed. The bold font represent the larger values between the shared peak and unique peaks. All p<0.001, after false discovery rate (FDR) corrected.
  
  (4) For Tables 2-4, A4, and Figure 3a, please indicate in all the legends if values correspond to Mean plus minus Standard Deviation, report t-value, and n in the legend or in the text.
  
  Answers: We thank the reviewer very much for this suggestion. We added the ‘mean (±SD)’ in the notes of Tables 2-4, A4 (now A6), and Figure 3 (a). All the t and n values of t-test are reported in tables or in the main text.
  
  (5) Please create a statistical section in the Methods to describe more precisely the tests used e.g. for t-tests, if datasets follow a normal distribution with unknown variance. In the case of multiple comparisons like in e.g. Table 2-4, A4, please report what multiple comparisons correction was used to adjust the significance level.
  
  Author response table 3.
  
  The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 10 common human atlases. All the shared peaks in the table have a greater number of neighboring brain regions compared to the unique peaks. All p<0.001, false discovery rate (FDR) corrected.
  
  Author response table 4.
  
  The mean values (±SD) of brain regions where shared and unique peaks appeared within a 3-ring neighborhood in 21 common human atlases. The p-values were corrected by FDR.
  
  Answers: Thanks for the reviewer’s suggestion, we added a ‘Statistic Analysis’ section in the ‘Materials and Methods’ part:
  
  ‘All variables used in the two-samples t-test follow a normal distribution check and all p-values were corrected for multiple comparisons using the false discovery rate (FDR) method. Moreover, in order to identify differently expressed genes between shared and unique peaks, we employed the Welch’s t-test, given the unequal sample sizes for shared and unique peaks. For all tests, a p-value <0.05 was considered significant (FDR corrected).’
  
  For the experiments of multiple comparisons such as Table 2-4, A4 (now A6), etc., we have added explanations in the main text, multiple comparisons correction has been corrected by false discovery rate (FDR), p-value<0.05 is considered significant.
  
  (6) It would be of great interest to provide the full list of the 28 genes that significantly contributed to the classification of shared and unique peaks. Please provide a description of the Welch’s t-test results. From the 7 genes selected, only two are discussed. Could the authors please describe briefly the function of the other genes? Although we understand that they are not associated with neuronal activity and brain function.
  
  Answers: We thank the reviewer for these suggestions. We have provided a complete list of 28 genes selected by LASSO in the Author response table 5. Additionally, Welch’s t-test was employed to calculate p-values for the expression differences of each gene in shared and unique peak clusters, and the results are also reported in the Author response table 5.
  
  Author response table 5.
  
  The 28 genes selected by LASSO and their corresponding p-values from Welch’s t-test.
  
  Seven genes showed significant differential expression between shared and unique peaks in Welch’s t-test. These genes were PECAM1, TLR1, SNAP29, DHRS4, BHMT2, PLBD1, KCNH5. Brief descriptions of their functions are listed in Author response table 6. All gene function descriptions were derived from the NCBI website (https://www.ncbi.nlm.nih.gov/).
  
  These results have been added to Tables 6 and A7 along with corresponding text.
  
  (6) For comparison, could the authors provide a supplementary figure of shared peak clusters like in Figure 1b but displayed on the surface of the macaque brain template?
  
  Answers: We thank the reviewer very much for this suggestion and we have incorporated a display of shared peak clusters on the macaque brain template surface (Author response figure 4, corresponds to Figure A1 of Supplementary Information.)
  
  (7) Could the author develop or rephrase the sentence lines 69-72 which remains unclear?
  
  Answers: We appreciate the reviewer’s feedback and have revised this sentence to ensure clarity. The sentences from line 69 to 72 have been revised to ‘In the study of macaques, it has been observed that the peak consistently present across individuals is located on more curved gyri (S. Zhang, Chavoshnejad, et al. 2022). Similar conclusions have been drawn in human brain research (S. Zhang, T. Zhang, et al. 2023).’ Now, this sentence corresponds to lines 74-77 in the main text.
  
  (8) Line 99: please indicate which section.
  
  Author response table 6.
  
  Seven genes were selected using LASSO that showed significant differential expression in shared and unique peaks.
  
  Answers: We thank the reviewer very much for this suggestion and we revised this sentence to ‘The definition of peaks and the method for extracting peak clusters within each species are described in the Materials and Methods section’.
  
  (9) In Figure 3b, please report R2 and p-value. A semi-log might be more appropriate given the overdispersion of Human Peak Counts.
  
  Answers: We thank the reviewer very much for this suggestion. Linear regression analysis was conducted on the average counts of all corresponding shared peak clusters of human and macaque. The horizontal and vertical axes of the Author response figure 5 (b) represent the average count of shared peaks in the macaque and human brains, respectively. The Pearson correlation coefficient (PCC) of the interspecies consistency of the left and right brain is 0.20 and 0.26 (p>0.05 for both), respectively. The result of linear regression shows that there is a positive correlation in the inter-individual consistency of shared peaks between macaque and human brains, but it is not statistically significant (with R2 for the left and right brain are 0.07 and 0.01, respectively).
  
  Author response image 4.
  
  Shared peak clusters of macaque, shows on macaque brain template.
  
  The goodness of fit (R2), pearson correlation coefficient (PCC), and their respective p-values were indicated in Author response figure 5 (b). To avoid overdispersion, the peak count of the human brain is displayed in a semi-log format.
  
  The updated Figure and results are presented in Figure 3 of the main text.
  
  (10) Line 177: please indicate where in the Supplementary Information.
  
  Answers: Thank you for the reviewer’s reminder. We have incorporated the results of the human brain structural connectivity matrix into Table A5 in the Supplementary Information and provided corresponding indications in the main text.
  
  (11) Line 226: please correct ‘(except for betweeness [and efficiency] of the’.
  
  Answers: We thank the reviewer very much for this suggestion and we added ‘and efficiency’ in original Line 173 and 226 (now Line 206 and 267) after ‘betweeness’.
  
  (12) The gene expression dataset used is from the Allen Human Brain Atlas (AHBA). Reference to Hawrylycz et al., 2012 Nature. 2012 Sep 20;489(7416):391-399. doi: 10.1038/nature11405 shall be made and abbreviation defined at first use in the text.
  
  Answers: We added the full name ‘Allen Human Brain Atlas’ when AHBA is first mentioned, along with the reference suggested by the reviewer.
  
  Author response image 5.
  
  (a) Mean peak count (±SD) covered by shared and unique peak clusters in two species. ***indicates p<0.001. The t-values for the t-tests in humans and macaques are 4.74 and 2.67, respectively. (b) Linear regression results of the consistency of peak clusters shared between macaque and human brains. The pink and blue colors represent the left and right hemispheres, respectively. The results of the linear regression are depicted in the figure. While there was a positive correlation observed in the consistency of gyral peaks between macaque and human, the obtained p-value for the fitted results exceeded the significance threshold of 0.05.
  
  (13) Line 17: remove ‘are’.
  
  Answers: We thank the reviewer very much for this suggestion and we removed ‘are’ in Line 17 (now Line 18).
  
  (14) Line 201: remove ‘is used’.
  
  Answers: We thank the reviewer very much for this suggestion and we removed ‘is used’ in Line 201 (now Line 237).
  
  References
  
  Buckner, Randy L, Jessica R Andrews-Hanna, and Daniel L Schacter (2008). “The brain’s default network: anatomy, function, and relevance to disease”. In: Annals of the new York Academy of Sciences 1124.1, pp. 1–38.
  
  Cachia, Arnaud et al. (2018). “How interindividual differences in brain anatomy shape reading accuracy”. In: Brain Structure and Function 223, pp. 701–712.
  
  Caminiti, Roberto et al. (2010). “Understanding the parietal lobe syndrome from a neurophysiological and evolutionary perspective”. In: European Journal of Neuroscience 31.12, pp. 2320–2340.
  
  Fornito, Alexander et al. (2004). “Individual differences in anterior cingulate/paracingulate morphology are related to executive functions in healthy males”. In: Cerebral cortex 14.4, pp. 424–431.
  
  Golesorkhi, Mehrshad et al. (2022). “From temporal to spatial topography: hierarchy of neural dynamics in higher-and lower-order networks shapes their complexity”. In: Cerebral Cortex 32.24, pp. 5637–5653.
  
  Gordon, Evan M et al. (2023). “A somato-cognitive action network alternates with effector regions in motor cortex”. In: Nature, pp. 1–9.
  
  Ito, Takuya, Luke J Hearne, and Michael W Cole (2020). “A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales”. In: NeuroImage 221, p. 117141.
  
  Ji, Jie Lisa et al. (2019). “Mapping the human brain’s cortical-subcortical functional network organization”. In: Neuroimage 185, pp. 35–57.
  
  Llinás, Rodolfo R (2002). I of the vortex: From neurons to self. MIT press.
  
  Mantini, Dante et al. (2011). “Default mode f brain function in monkeys”. In: Journal of Neuroscience 31.36, pp. 12954–12962.
  
  Parvizi, Josef et al. (2006). “Neural connections of the posteromedial cortex in the macaque”. In:Proceedings of the National Academy of Sciences 103.5, pp. 1563–1568.
  
  Vincent, Justin L et al. (2007). “Intrinsic functional architecture in the anaesthetized monkey brain”.In: Nature 447.7140, pp. 83–86.
  
  Whittle, Sarah et al. (2009). “Variations in cortical folding patterns are related to individual differences in temperament”. In: Psychiatry Research: Neuroimaging 172.1, pp. 68–74.
  
  Yang, Shimin et al. (2019). “Temporal variability of cortical gyral-sulcal resting state functional activity correlates with fluid intelligence”. In: Frontiers in neural circuits 13, p. 36.
  
  Zhang, Songyao, Poorya Chavoshnejad, et al. (2022). “Gyral peaks: Novel gyral landmarks in developing macaque brains”. In: Human Brain Mapping 43.15, pp. 4540–4555.
  
  Zhang, Songyao, Tuo Zhang, et al. (2023). “Gyral peaks and patterns in human brains”. In: Cerebral Cortex.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.26.550760v3
www.biorxiv.org www.biorxiv.org

The ubiquitin-conjugating enzyme UBE2D/eff maintains a youthful proteome and ensures protein quality control during aging

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  In this study, Hunt et al investigated the role of the ubiquitin-conjugating enzyme UBE2D/effete (eff) in maintaining proteostasis during aging. Utilizing Drosophila as a model, the researchers observed diverse roles of E2 ubiquitinconjugating enzymes in handling the aggregation-prone protein huntingtin-polyQ in the retina. While some E2s facilitated aggregate assembly, UBE2D/eff and other E2s were crucial for degradation of hL-polyQ. The study also highlights the significance of UBE2D/eff in skeletal muscle, showing that declining levels of eff during aging correlate with proteostasis disruptions. Knockdown of eff in muscle led to accelerated accumulation of poly-ubiquitinated proteins, shortened lifespan, and mirrored proteomic changes observed in aged muscles. The introduction of human UBE2D2, analogous to eff, partially rescued the deficits in lifespan and proteostasis caused by eff-RNAi expression in muscles.
  
  The conclusions of this paper are mostly well supported by data, although a more precise mechanistic explanation of phenotypes associated with UBE2D/eff deficiency would have strengthened the study. Additionally, some aspects of image quantification and data analysis need to be clarified and/or extended.
  
  We thank reviewer #1 for the thoughtful assessment of our work. We have amended the discussion to better explain the phenotypes associated with UBE2D/eff deficiency. We have also improved the methods describing the procedures for image quantification and data analysis.
  
  Reviewer #2 (Public Review):
  
  Important findings:
  
  - Knockdown of UBE2D increases HTT aggregation.
  
  - Knockdown of UBE2D leads to an accumulation of ubiquitinated proteins and reduces the lifespan of Drosophila, which is rescued by an ectopic expression of the human homolog.
  
  - UBE2D protein levels decline with aging.
  
  - UBE2D knockdown is associated with an up- and downregulation of several different cellular pathways, including proteostasis components.
  
  Thank you for reviewing our manuscript.
  
  Caveats:
  
  - The readout of HTT aggregation (with methods that are not suitable) as a proxy for the role of UBE2D in proteostasis is not convincing. It would probably improve the manuscript to start with the proteomic analysis of UBE2D to demonstrate that its protein levels decrease with aging. The authors could then induce UBE2D in aged animals to assess the role of UBE2D in the proteome with aging.
  
  While presenting the data in a different order would be possible, we prefer to keep the current order in which from a general screen with a proteostasis readout (HTT aggregates; see the answer below for a discussion on the methods) we proceed to identify a candidate (UBE2D) which is then studied in more detail with additional focused analyses in the retina and skeletal muscle during aging. Concerning the induction of UBE2D in aged animals, our analyses in Figure 4E demonstrate that muscle-specific induction of UBE2D2 throughout life does not increase lifespan alone: this could be explained by UBE2D2 only partially recapitulating the function and substrate diversity of Drosophila eff/UBE2D due to divergence from a single Drosophila UBE2D enzyme (eff) to multiple UBE2D enzymes in humans (UBE2D1/2/3/4).
  
  - UBE2D knockdown increases the number of HTT foci (Figure 1A), but the quantification is less convincing as depicted in Figure 1B, and other E2 enzymes show a stronger effect (e.g. Ubc6 that is only studied in Figures 1 and 2 without an explanation and Ubc84D). The graph is hard to interpret. What is the sample size and which genetic conditions show a significant change? P values and statistical analyses are missing.
  
  The full data underlying this genetic screen is reported in Supplementary Table 1. The role of UBC6/UBE2A/B is thoroughly examined in Hunt et al 2021 (PMID: 33658508). We agree that Ubc84D has an important effect and that it should be considered for future studies. We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi lines that were not effective at knocking down the target protein (or that did not affect HTT aggregates). The E2s worth pursuing were identified because of multiple RNAi lines scoring consistently: this is the case of UBC6 (studied previously in PMID: 33658508) and eff/UBE2D (pursued in this study). This screen was therefore utilized to identify and select candidate genes (i.e. eff/UBE2D) for more in-depth studies on proteostasis.
  
  - The quantification of the HTT fluorescence cannot be used as a proxy for HTT aggregation. The authors should assess HTT aggregation by e.g. SDD-AGE, FRAP, filter retardation, etc. The quantification of the higher MW species of HTT in the SDS-PAGE is not ideal either as this simply reflects material that is stuck in the wells that could not enter the gel. Aggregation and hence high MW size could be one reason, but it can also be HTT trapped in cell debris, etc.
  
  We agree that the use of multiple methods is a good way to assess the impact of E2 enzymes on HTT protein aggregation. In this regard, we estimated HTT aggregates by fluorescence microscopy and by western blot. Microscopy-based analyses demonstrate both the accumulation of the HTT-GFP pathogenic protein into aggregates (HTT polyQ polypeptides aggregating into one spatial region; Fig. 1 and Fig. 2B) as well as their potential cytotoxicity, resulting in the disruption of the ommatidial ultrastructure and cellular degeneration (Fig. 2A). Similar to native gels and filter retardation, we have utilized SDS-PAGE and western blotting of cellular samples isolated with strong chaotropic and denaturing reagents (8M urea plus detergents and reducing reagents used in the lysis). These experimental conditions maintain the higher-order organization of HTT into high-molecular-weight aggregates that are not broken down into individual polypeptides and that therefore do not readily travel through a gel or filter. Therefore, the biochemical methods we have used are equivalent to those proposed by the reviewer. In addition to combining microscopy-based and biochemical approaches to examine the impact of eff/UBE2D on the HTT aggregates, we have analyzed eff/UBE2D during skeletal muscle aging and found consistent phenotypes as those observed in the HTT model: RNAi for eff/UBE2D leads to the accumulation of detergent-insoluble ubiquitinated proteins that associate with protein aggregates.
  
  - Does UBE2D ubiquitinate HTT? And thus, is HTT accumulation a suitable readout for the functional assessment of the E2 enzyme UBE2D?
  
  We propose that the accumulation of HTT in response to eff/UBE2D RNAi may be due to a generalized loss of protein quality control rather than to a direct decline in the ubiquitination of HTT by eff/UBE2D. In a previous study that examined the UBE2D interactome (Hunt et al. 2023; PMID: 37963875), we did not find an interaction between UBE2D and HTT, suggesting that HTT may not be directly modulated by eff/UBE2D via ubiquitination.
  
  - The proteomic analyses could help to identify potential substrates for UBE2D.
  
  The proteomic analyses in Figure 5 identify several proteins that are modulated by RNAi for eff and by its human homolog, UBE2D2. Such eff/UBE2D2-modulated proteins may indeed be potential substrates for UBE2D-mediated ubiquitination. For example, this is the case for Pex11 and Pex13, which were found to be upregulated upon UBE2D RNAi also in human cells, where they are ubiquitinated in a UBE2D-dependent manner (Hunt et al. 2023; PMID: 37963875).
  
  - Are there mutants available for UBE2D or conditional mutants? One caveat of RNAi is: first not complete knockdown and second, variable knockdown efficiencies that increase variability.
  
  There are potential hypomorphic alleles of eff/UBE2D that may be available, but they would present the same caveats of incomplete loss of eff/UBE2D function as RNAi. Given the strong phenotype that we find with partial eff knockdown, a caveat of full eff/UBE2D knockout is that this could be lethal.
  
  - The analysis of the E3 enzymes does not add anything to this manuscript.
  
  The analysis of E3 enzymes relates to our recent publication (Hunt et al. 2023; PMID: 37963875) that reports the physical interactions between E2 and E3 enzymes. Analysis of these E2-E3 pairs in the genetic screen in Fig.1 therefore follows this IP-MS study to provide insight into the functional interaction between these E2-E3 pairs in proteostasis.
  
  - Figure 2B: the fluorescence intensities in images 2 and 4 are rather similar, yet the quantification shows significant differences.
  
  Please note that some of the GFP fluorescence in image 4 is not punctate, but rather diffuse fluorescence that is not related to HTT-GFP aggregates. Our image quantitation methods utilized thresholding to identify GFP-positive puncta while eliminating background fluorescence not corresponding to HTT-GFP puncta.
  
  - The proteomic analyses could provide insights into the functional spectrum of UBE2D or even the identification of substrates. Yet apart from a DAVID analysis, none of the hits were followed up. In addition, only a few hits were labelled in the volcano plots (Figure 5). On what basis did the authors select those?
  
  Please see the previous answer above regarding the identification of eff/UBE2D protein substrates from our proteomic analysis in Fig. 5. Only some of the top-regulated hits could be labeled in Fig.5 to avoid overcrowding.
  
  - The manuscript remains at this stage rather descriptive.
  
  Our study has demonstrated a key role for the eff/UBE2D ubiquitin-conjugating enzyme in regulating protein quality control during aging in the Drosophila retina and skeletal muscle. Our study has identified key proteins that are modulated by eff/UBE2D RNAi in Drosophila muscle, that are rescued by expression of human UBE2D2, and that may underlie the accelerated decline in proteostasis that occurs upon eff/UBE2D RNAi. While more could be known about the regulation of these eff/UBE2D-modulated proteins in Drosophila, we have previously demonstrated that some of the proteins that are upregulated by UBE2DRNAi in human cells (e.g. some peroxins) are indeed direct ubiquitination targets of UBE2D via associated E3 ubiquitin ligases (Hunt et al. 2023; PMID: 37963875).
  
  Reviewer #3 (Public Review):
  
  This is a potentially quite interesting paper that defines E2 and E3 genes in Drosophila that can impact the accumulation of the Q72-GFP protein in the fly eye. The authors then focus on the eff gene, showing which human homolog can rescue fly knockdown. They extend to skeletal muscle, from the hL protein, to show that eff by TMT mass spec decreases with age normally in the fly muscle and that there is a significant overlap of proteins that are disrupted with eff knockdown in young animals in muscle vs aged animals normally in muscle.
  
  Overall these data suggest eff decrease with age may contribute to the increase in ubiquitinated proteins in muscle with age, and that upregulation of eff activity might be of interest to extending lifespan. Because eff function can be performed by a human homologue, the findings may also apply to human situations of aging.
  
  These data are overall interesting and are of relevance for those interested in neurodegenerative disease and aging, although a number of points from the figures seem confusing and need more explanation or clarity.
  
  Thank you for reviewing our manuscript, we have improved the explanations and clarity of the manuscript.
  
  Recommendations for the authors:
  
  We would like to keep the manuscript title as it is currently to report the partial overlap in the proteomic changes induced by aging and effRNAi (Fig. 6).
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) A significant concern arises from the unexpected outcome observed in the UBE2D/eff loss-of-function experiments. Despite its role as a ubiquitin-conjugating enzyme (E2), the reduction in UBE2D/eff levels paradoxically increased polyubiquitinated proteins and p62 accumulation, presenting a more intricate and seemingly unrelated phenotype to its anticipated function.
  
  eff/UBE2D represents one out of 21 different Drosophila E2 ubiquitin-conjugating enzymes and therefore eff RNAi alone is unlikely to reduce the total pool of ubiquitinated proteins. The generalized increase in insoluble polyubiquitinated proteins results from an overall derangement of protein quality control caused by effRNAi. In agreement with this scenario, the protein categories that were found to be modulated by effRNAi (Fig. 5) include proteins associated with protein quality control such as proteasome components and chaperones. Therefore, derangement in the levels of a wide range of regulators of proteostasis may lead to a generalized loss of protein quality control upon effRNAi.
  
  I believe elucidating the mechanisms underlying the impact of UBE2D/eff deficiency on the observed phenotypes would contribute to a more comprehensive understanding of the study's implications. For instance, investigating whether the loss of UBE2D/eff influences muscle proteostasis by impeding proteasome assembly or function, modulating autophagy, etc.
  
  We have previously utilized luciferase assays to measure the proteolytic activity of the proteasome in human cells treated with siRNAs targeting UBE2D1/2/3/4 but found no effect of UBE2D knockdown compared to control nontargeting siRNAs (Hunt et al. 2023; PMID: 37963875). In Drosophila muscles, we have examined the levels of GFP-CL1 (a GFP fused with a proteasomal degron) and found that effRNAi does not impact GFP-CL1 levels (data shown in author response image 1). Overall, these results suggest that effRNAi reduces protein quality control without affecting proteasome activity.
  
  Author response image 1.
  
  (2) Related to Figures 1B-C: It is not clear to this reviewer the quantification methodology used in the experiment. Does each point represent the Average +/- SD for each replicate? If so, it appears that not all cases align with the n=5 as indicated in the figure legend. Additionally, how many animals per replicate were quantified?
  
  We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi line, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi that were not effective at knocking down the target protein (or with no effect on HTT aggregates).
  
  (3) Related to the previous point: The analysis of pathogenic Huntingtin aggregation in the Materials and Methods section lacks information regarding the number of individuals, replicates, etc.
  
  Please see the response above.
  
  (4) Related to Figure 1 B: In the case of eff/UBE2D, it appears that 3 out of 9 replicates demonstrate a significant increase in HL-polyQ aggregates. Considering the strength of this result, it raises questions about whether it justifies using eff for future analyses.
  
  Please see the response to point (2) above. These results indicate that 3 distinct UAS-RNAi lines targeting eff/UBE2D produced the same effect whereas 6 other effRNAi lines did not, possibly because they are less efficacious in knocking down eff/UBE2D. We have now amended the legend of Fig. 1B to better explain these results.
  
  (5) Related to Figure 1 D-E: Could the authors provide clarification regarding the tissue type and animal age utilized in these experiments?
  
  Whole flies were utilized at 1 week of age.
  
  (6) Related to Figure 3: Incorporating the normal accumulation of poly-ubiquitinated proteins during aging could provide context to better interpret the effect of eff/UBE2D KD at 3 weeks of age.
  
  Several papers from us and others have previously demonstrated a progressive increase in the insoluble levels of poly-ubiquitinated proteins during aging in Drosophila skeletal muscle (PMID: 36640359; PMID: 31249065; PMID: 33773104; PMID: 33658508; PMID: 24092876; PMID: 21111239; PMID: 24244197; PMID: 25199830; PMID: 28878259; PMID: 36213625). Our analyses now indicate that such age-related loss of protein quality control is accelerated by eff/UBE2D knockdown.
  
  (7) Related to Figure 3: Would it be possible for the authors to include a list or table detailing the specific E2, deubiquitinating enzymes, and E3s identified in the comparative analysis of the old vs young proteome? This would provide a clear reference for the identified regulatory proteins involved in the age-related proteomic changes.
  
  We have added a tab to Supplementary Table 2 to report the list of age-regulated deubiquitinating enzymes (DUBs) and E1, E2, and E3 enzymes.
  
  (8) Related to Figures 3 and 4: Given that the comparative analysis of the old versus young proteome identified 10 out of 21 E2 ubiquitin-conjugating enzymes, exploring the impact of eff/UBE2D overexpression becomes pivotal to understanding its role in age-related changes in proteostasis and lifespan. Conducting an experiment involving eff overexpression could provide valuable insights into whether restoring eff levels mitigates aging-related phenotypes.
  
  Although we have not done this experiment with eff overexpression, Fig. 4E reports that the overexpression of human UBE2D2 in skeletal muscle does not appear to influence lifespan by itself (green line in Fig. 4E), although it can partially rescue the short lifespan of flies with muscle-specific effRNAi (purple line in Fig. 4E).
  
  (9) Providing a more detailed description of the Supplementary Tables would significantly enhance the reader's comprehension of their content.
  
  A description has been added at the end of the methods.
  
  Reviewer #2 (Recommendations For The Authors):
  
  In addition, to the points listed above:
  
  - The title does not reflect the content of the manuscript and should be changed. There is no evidence that UBE2D maintains a "youthful" (needs to be changed as well) proteome. Rather, its expression declines with aging and its depletion leads to an increase of ubiquitinated proteins. This is true for essentially the entire proteostasis network.
  
  While proteostasis generally declines with aging, it is incompletely understood what specific components of the proteostasis network are dysregulated with aging. Our study now identifies the E2 ubiquitin-conjugating enzyme eff/UBE2D as a key regulator of proteostasis that is transcriptionally downregulated with aging. Comparison of the proteomic changes induced by aging versus those induced by effRNAi in young age indicates a partial overlap (Fig. 6), indicating that eff/UBE2D is, at least in part, necessary to maintain the proteome composition that is found in young age (“youthful”). On this basis, we would like to keep the current title but have amended the manuscript to indicate that such regulation of the proteome composition is only in part dependent on eff/UBE2D.
  
  - Molecular weight markers are missing for the gels/western blot depicted in Fig 1E, 2C, 3E, and 4A.
  
  Thank you for pointing this out, these have been added.
  
  - Fig. 4A, the Ponceau staining for the detergent insoluble samples shows almost no signal for lane 7 and the data should hence not be analyzed.
  
  The western blot membrane in Fig. 4A shows a reliable signal in all lanes (including lane 7) when probed with antibodies for ubiquitin, Ref(2)P, and tubulin. Therefore, there is no reason for excluding lane 7 from the analysis. Ponceau S staining is provided as an additional loading control but was not used to normalize the data.
  
  Reviewer #3 (Recommendations For The Authors):
  
  There are a number of confusing or not sufficiently explained points in the figures that require clarity.
  
  In Figure 1, panels B and C, one assumes the gray broad line across means no difference from control. For the genes, many have points that are scattered both above and below that control line. What do the dots and range represent for each gene, and why are the data so scattered. How do the authors explain data ranging from no effect, to a negative effect to a positive effect, all for the same gene? Akt1 and Hsp83 are controls but are not quantitated to appreciate how variable the assay is. Can they explain the figure better, and also why the data for any one gene are so variable?
  
  We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi line, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi lines that were not effective at knocking down the target protein (or that did not affect HTT aggregates). Therefore, the variability in the analysis of a single gene arises because different RNAi lines targeting that gene may have different efficacy. RNAi lines for Akt1 and Hsp83 are merely used as controls (these have been quantified in Jiao et al. 2023; PMID: 36640359).
  
  In Figure 2A, it is not clear which animals have the hL-Q72-GFP (which eyes are "rough eyes"?). Also, do ubc6-RNAi and eff-RNAi have an impact on the normal eye? That is, can they explain the images and genotypes more clearly.
  
  UBC6 and eff RNAi produce these rough eye phenotypes in the absence of HTT-polyQ and these are rescued by the expression of their human homologs. The panel images indicated in bold here below are those that have “rough eye” phenotypes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (a green R has been added to these panels in Fig. 2A).
  
  In Figure 2B, panel 3 looks very different from 1 and 4 and yet is not different from them by quantitation. Can they replace it with a more representative panel or is 3 lower (but not significantly so)?
  
  Please note that some of the GFP fluorescence in image 4 is not punctate, but rather diffuse fluorescence that is not related to HTT-GFP aggregates. Our image quantitation methods utilized thresholding to identify GFP-positive puncta while eliminating background fluorescence not corresponding to HTT-GFP puncta.
  
  In Figures 3E and F, it would be helpful in F to put the detergent soluble bar graphs all on the left so that those data are on the left in both E and F, and then detergent-insoluble in E and F to the right. This would make the figure and quantitation easier to follow.
  
  Done.
  
  The same point as above for Figures 4 A and B.
  
  Done.
  
  In Figure 3A, CG7656 is nearly as reduced with age as eff. One wonders if that gene would give a different or similarly overlapping proteome with age as eff. Was CG7656 not focused on because not conserved?
  
  As indicated in Figure 1B, CG7656 is orthologous to UBE2R1 (also called CDC34) and UBE2R2 in humans. In this screen, however, RNAi targeting CG7656 did not appear to influence HTT aggregates and therefore was not selected for further analyses. However, it may play a role in skeletal muscle proteostasis during aging.
  
  In Figure 6, the R2 value correlating age with eff-RNAi is weak. Although they discuss this in the text, it might also be helpful to include Venn diagrams for gene overlaps and the significance to make the argument more clear that there is a significant correlation in proteins up and down to indicate that eff largely recapitulates the changes of aging. Correlating this with proteins that are restored with UBE2D in muscle in a more clear manner may also be helpful for readers interested in aging.
  
  We have amended the text to indicate that this relatively low correlation (R2\=~0.2, but corresponding to a consistent regulation of 70% of proteins by aging and effRNAi) could indicate that eff/UBE2D is only in part responsible for maintaining a youthful composition of the muscle proteome during aging. Other changes that occur with aging likely account for non-correlated alterations in protein levels. We have also added Venn diagrams (Fig. 6E) to further display the overlap in protein regulation by aging vs. effRNAi.
  
  In Figure 7, they might indicate that the accumulated insoluble protein is ubiquitinated. That is left out of the figure, although indicated in the legend.
  
  Done.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.12.571303v2
www.biorxiv.org www.biorxiv.org

Unveiling the influence of tumor and immune signatures on immune checkpoint therapy in advanced lung cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction, clonal expansion differences, and tumor expression differences between responders and non-responders. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort.
  
  Strengths:
  
  The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain the heterogeneity of patient response and be able to predict it.
  
  Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state-of-the-art methods.
  
  The authors provide an interesting scRNAseq data set linked to outcomes data.
  
  Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis.
  
  Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof.
  
  Weaknesses:
  
  Generally, a very heterogeneous and small cohort where adjustments for confounding are hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments would negate signal and confirmation bias likely, so biological takeaways have to be questioned.
  
  Thank you for your comment. We made multiple testing adjustments as suggested in “Recommendations for Authors.”
  
  RNAseq is heavily influenced by the tissue of origin (both cell type and expression), so the association with the outcome can be confounded. The authors try to argue that lymph node T-cell and NK content are similar, but a quantitative test on that would be helpful.
  
  Following the reviewer’s suggestion, we performed principal component analysis (PCA) to assess the influence of tissue of origin on immune and stromal cell populations. In the revised Figure S1g, we quantified the similarity using Euclidean distances of centroids between sample groups based on their tissue of origin in the PC1 and PC3 plot.
  
  The authors claim a very high "accuracy" performance, however, given the small cohort and lack of information on the exact evaluation it is not clear if this just amounts to overfitting the data.
  
  We acknowledge the concern about the high “accuracy” potentially indicating overfitting. To address this, we revised the manuscript to clarify the use of 'accuracy,' 'AUC,' and 'performance' with clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.
  
  Especially for tumor cell program/state analysis the specificity to the setting of ICIs is not clear and could be prognostic.
  
  Thank you for your comments. As outlined in the ‘Table 2 in the revised manuscript’, we conducted a multivariate survival analysis of tumor signature candidates using the TCGA lung adenocarcinoma (LUAD, n = 533) and squamous cell carcinoma (LUSC, n = 502) cohorts to evaluate their prognostic potential. No tumor cell programs or states were found to be associated with overall survival in either LUAD or LUSC. We added descriptions related to Table 2 in the Results (Lines 249-251) and Methods (Lines 530-542) section.
  
  Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.
  
  Expanding the cohort size was difficult due to limited resources. We recognize the challenges posed by the small and heterogeneous cohort. We have acknowledged these limitations and applied statistical corrections to address them.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.
  
  Strengths:
  
  The main strengths of this work lie in the methodology of integrating single-cell sequencing, genetic data, and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.
  
  Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3, and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.
  
  Weaknesses:
  
  Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate the robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts, no functional characterization of the findings, and the discussion section does not include discussion around the relevance/interpretation of key findings that were highlighted in the abstract (eg. role of Th17, TRM, STAT3, and NFKb). Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.
  
  We acknowledge the challenges posed by the small and heterogeneous cohort. To address this, we tempered our claims related to accuracy by applying statistical testing corrections. We also appreciate the feedback on functional characterization and have expanded the discussion in the revised manuscript to include an overview of specific cell populations and genes.
  
  Related to the absence of discussion around prior TRM findings, the association between TRM involvement in response to IO therapy in this manuscript is counter to what has been previously demonstrated (Cell Rep Med. 2020;1(7):100127, Nat Immunol. 2017;18(8):940-950., J Immunol. 2015;194(7):3475-3486.). However, it should be noted that the authors in this manuscript chose to employ alternative markers of TRM characterisation when defining their clusters and this could indicate a potential rationale for differences in these findings. TRM population is generally characterised through the inclusion of the classical TRM markers CD69 (tissue retention marker) and CD103 (TCR experienced integrin that supports epithelial adhesion), which are both absent from the TRM definition in this study. Additional markers often used are CD44, CXCR6, and CD49a, of which only CXCR6 has been included by the authors. Conversely, the majority of markers used by the authors in the cell type clustering are not specific to TRM (eg. CD6, which is included in the TRM cluster but is expressed at its lowest in cluster 3 which the authors have highlighted as the CD8+ TRM population). Therefore, whilst there is an interesting finding of this particular cell cluster being associated with resistance to ICI, its annotation as a TRM cluster should be interpreted with caution.
  
  Single-cell RNA sequencing (scRNA-seq) can sometimes fail to detect the expression of classical cell type markers due to incomplete capture of a cell’s transcriptome. To determine cell identity, we utilized cell type markers established in previous scRNA-seq studies. In response to your comments, we have added the expression levels of classical TRM markers, including CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Although these markers were not exclusively expressed in TRM clusters, TRM clusters exhibited relatively high levels of these genes while lacking other clusters’ specific marker genes.
  
  Reviewer #1 (Recommendations For The Authors):
  
  General suggestions:
  
  When analyzing the association of cell type proportions with outcomes, some adjustment for multiple testing should be considered (either sampling-based, e.g. permutation test, or adjustment based on assumptions of independence of tests, e.g. Bonferroni).
  
  Thank you for your comments. As suggested, we calculated the adjusted p-value using the False Discovery Rate for the association of cell type proportions with outcomes in Figure 3a. The heatmap in Reviewer's ONLY Figure 1, using the adjusted p-value consistently showed the expected grouping of cell types and outcomes. However, the significance did not meet the conventional statistical cutoff criteria. We acknowledge this limitation, which results from statistical testing based on ratio values.
  
  Author response image 1.
  
  Heat map with unsupervised hierarchical clustering of proportional changes in cell subtypes within total immune cells. Proportional changes were compared across multiple ICI response groups. The color represents the adjusted -log (p-value) calculated using the False Discovery Rate.
  
  A formal test of clonotype differences (normalized to cell type fraction) would be great as the shown plot 2e could be confounded by cell number and type differences between responders and non-responders.
  
  Thank you for your suggestion. We have revised Figure 2e to display the relative clonotype differences versus CD4+ and CD8+ T cell fractions in each sample. The relative clone size of each cell was calculated by dividing the size of each clone by the total number of CD4+ or CD8+ T cells, respectively.
  
  It could be made a bit more clear when the core group of patients was used (only when associating with outcomes?) and when all other patients were used as well (only cell type annotation?).
  
  As the reviewer correctly noted, we performed scRNA-seq analysis on all specimens, but only the core group of patients was used for the comparative analysis between the responder and non-responder groups. This information has been detailed in the manuscript (Lines 103-105).
  
  For immune cells, it would be interesting to look at expression patterns (NMF, scINSIGHT) as well, not just immune cell fractions and expansion.
  
  In contrast to tumor signatures, immune cell programs are more directly tied to their functional characteristics. Therefore, we focused on annotating immune cells based on their functional properties and conducted comparative analyses between responders and non-responders.
  
  Multiple testing is necessary for the univariate association analysis. Some adjustments for confounders in a multivariate model (despite the size) could be informative.
  
  As shown in ‘Reviewer's ONLY Table 1’, we conducted a multivariate regression analysis of immune and tumor signatures for ICI response, adjusting for clinical variables such as tissue origin, cancer subtype, pathological stage, and smoking status. However, the results were not significant, likely due to the heterogeneity and small size of the cohort.
  
  Author response table 1.
  
  P-values from univariate and multivariate regression analysis of immune and tumor signatures for ICI response.
  
  It is not clear from the manuscript how "accuracy" is measured. The terms "accuracy" and "AUC", as well as "performance" are used interchangeably, a section in the methods with the precise definition is needed.
  
  We have revised the manuscript to clarify the terms 'accuracy,' 'AUC,' and 'performance' by using clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.
  
  Furthermore, it has to be clear if this is in-sample performance or if there was some train/test split or cross-validation used. Given the small cohort size and wealth of features finding some combination of predictors that could overfit on responders/non-responders would not be surprising.
  
  As the reviewer has noted, we acknowledge the statistical limitations due to the small cohort size. We have revised the sentence on Lines 545-547 “Classification models of responders and non-responders for PC signatures and combinatorial indexes between tumor and/or immune cells were generated based on in-sample performance…”.
  
  Suggestions to improve readability:
  
  Line 84: The sentence should be reformulated to improve understanding.
  
  We have revised sentences in lines 81-93.
  
  Line 86: missing a "the".
  
  We have revised the sentences in lines 81-93.
  
  Reviewer #2 (Recommendations For The Authors):
  
  "Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells" Please look to rephrase this sentence as this is not entirely accurate: PD-1 is upregulated in tumor-experienced T cells as a consequence of antigen recognition ie those cells that recognise tumor will increase PD-1, whereas the sentence as it's currently written indicates that PD1+ cells have an intrinsically increased capacity to kill tumors, which is incorrect.
  
  We have revised the sentence “Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells” in lines 86-88 as “More specifically, PD-1 expression is upregulated upon antigen recognition (PMID29296515), indicating that certain T cells in the tumor microenvironment are actively engaged as tumor-specific T cells.” in the revised manuscript.
  
  Cancer subtype abbreviations (eg. SQ, ADC, NUT) are used in figures in the main article and so should be defined in the main text (they are currently only explained in the legend for the supplementary table).
  
  As per the reviewer’s suggestion, the manuscript has been revised to include definitions of cancer type abbreviations in lines 108-110.
  
  Figure S1d-f does not appear to corroborate the statement that "Although there were differences in tissue-specific resident populations, we found that the immune cell profiles, especially T/NK cells of mLN were similar to those of primary tumor tissues indicating the activation of immune responses were 118 consistently observed at metastatic sites (Figure S1d-f)." The diagrams are complex (please explain all abbreviations) and it is not clear how the authors have come to this conclusion. Additionally, cell quantity does not indicate that the 'activation of immune responses' is consistently observed at metastatic sites as these cells could be dysfunctional/bystander.
  
  In the revision, we have quantified the diagrams (Figure S1f) to more clearly highlight the differences in tissue-specific resident populations. We performed principal component analysis (PCA) to evaluate the impact of tissue origin on immune and stromal cell populations. In the revised Figure S1g, we illustrated the quantitative similarity between sample groups using Euclidean distances in the PC plot based on their tissue of origin. Additionally, the legends for Figures S1d and S1e have been updated to include definitions for all abbreviations.
  
  We agree with the reviewer's comment that cell quantity alone may not fully reflect activation of antigen-specific immune responses, even though we annotated the functional T cell subtypes. To better focus on the comparisons of cellular profiles between metastatic sites (mLN) and primary tumors (tLung and tL/B), we removed the sentence “…indicating the activation of immune responses were consistently observed at metastatic sites (Fig. S1d-f).” from the revised manuscript.
  
  In Figure 2c, classical markers for TRM (CD103, CD69) should be included in the description for the definition of the TRM clusters, or their exclusion appropriately explained. The findings regarding the negative correlation between follicular B cells and ICI response are surprising. Figure S3, the cluster identified as Follicular B cells contains MS4A1 (CD20) and HLA-DRA. Classical markers are CD20 (pan-B cell), CD21 (CR2), CD23, and IgD/IgM (double positive), and as such it is not clear if the authors have appropriately annotated this cluster as representing follicular B cells. These classical markers should be included in the interpretation of the cell clustering or their exclusion appropriately explained.
  
  We appreciate your comments. In response, we have added the expression levels of classical TRM markers such as CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Additionally, we revised the dot plot showing the mean expression of marker genes in each cell cluster for B/Plasma cells (revised Figure S3b) by incorporating classical markers for Follicular B cells, such as CD21 (CR2), CD23 (FCER2), IgD (IGHD), IgM (IGHM).
  
  Figure 2f is rather confusing for the reader. I would recommend changing to an alternative plot that shows logP and response in a different way. If keeping to this plot type please clarify why plotting response vs PD, and whether the lower left quadrant indicates patients with progressive disease and the top right indicates responders as the interpretation is not clear currently.
  
  Thank you for your feedback. To address the concerns raised, we have updated the figure legend for Figure 2f to clarify the interpretation of the quadrants: “The lower left quadrant shows cell types overrepresented in the poor responder groups, while the upper right quadrant indicates cell types overrepresented in the better responder groups”. This clarification aims to help readers understand that the lower left quadrant reflects cell types associated with worse treatment outcomes, while the upper right quadrant reflects cell types associated with improved therapeutic responses.
  
  The terms "PC7.neg, INT.down, and UNION.down" are included in the results with no explanation to the reader of what they are or how to interpret them. The methods description "We constructed DEGs with 470 intersections (INT) and union (UNION) of up- or down-regulated genes for comparisons" does not sufficiently describe how they were generated/calculated and, therefore, this is difficult for the reader to interpret in the final results section. Please add an additional explanation for the reader in the final section of the results/Figure 5 and in the methods.
  
  Following the reviewer’s suggestion, we added additional explanation in the Results section (lines 258-261): “PC7.neg denotes genes negatively correlated with PC7, a principal component extracted from PCA that distinguishes tumor cells in poor response groups. INT.down and UNION.down represent the intersection and union of down-regulated genes in the responder group, respectively.”. We also explained the details in the Methods section (lines 489-495): “We reconstructed DEGs as four groups: INT.up, INT.down, UNION,up, and UNION.down, based on with the intersection (INT) and union (UNION) of up- or down-regulated genes for pairwise comparisons between responder versus non-responder, PR versus PD, and PR versus SD. INT.up and INT.down represent the intersection of up- and down-regulated genes in the responder group, respectively. UNION.up and UNION.down represent the union of up- and down-regulated genes in the responder group, respectively.”
  
  The TRM and Th17+ T cell populations are highlighted in the abstract as being related to ICI resistance, but these populations of cells are not even mentioned in the discussion. Likewise, STAT3 and NFkb pathways are also highlighted in the abstract but absent in the discussion section. Please discuss the relevance of these findings, particularly given the prior studies demonstrating the opposite impact of TRM populations in NSCLC.
  
  We have expanded the discussion in the revised manuscript (Lines 295-313) to address the roles of TRM and Th17+ T cell, as well as the STAT3 and NF-κB pathways, in association with ICI resistance in NSCLC.
  
  “The identification of an abundance of CD4+ TRM cells as a negative predictor of ICI response is an unexpected finding, considering that higher frequencies of TRM cells in lung tumor tissues are generally associated with better clinical outcomes in NSCLC (PMID28628092). This is largely due to their role in sustaining high densities of tumor-infiltrating lymphocytes and promoting anti-tumor responses. Additionally, previous studies have demonstrated that TRM cell subsets coexpressing PD-1 and TIM-3 are relatively enriched in patients who respond to PD-1 inhibitors (PMID31227543). However, recent findings suggest that pre-existing TRM-like cells in lung cancer may promote immune evasion mechanisms, contributing to resistance to immune checkpoint blockade therapies (PMID37086716). These observations suggest that the roles of TRM subsets in tumor immunity are highly context-dependent.
  
  Similarly, CD4+ TH17 cells, which were overrepresented in the non-responder groups, exhibit context-dependent roles in tumor immunity and may be associated with both unfavorable and favorable outcomes (PMID34733609; PMID30941641). In exploring tumor cell signatures linked to ICI response, non-responder attributes were regulated by STAT3 and NFKB1. The STAT3 and NF-κB pathways are crucial for Th17 cell differentiation and T cell activation (PMID24605076; PMID32697822). Notably, STAT3 activation in lung cancer orchestrates immunosuppressive characteristics by inhibiting T-cell mediated cytotoxicity (PMID31848193). The combined influence of the Th17/STAT3 axis and TRM cell activity in predicting ICI response underscores the complexity of these pathways and suggests that their roles in tumor immunity and therapy response warrants further investigation.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.15.589544v2
www.biorxiv.org www.biorxiv.org

Structure and evolution of Alanine/Serine Decarboxylases and the engineering of theanine production

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Response to reviewer’s comments
  
  Reviewer #1 (Public Review):
  
  In this study, the structural characteristics of plant AlaDC and SerDC were analyzed to understand the mechanism of functional differentiation, deepen the understanding of substrate specificity and catalytic activity evolution, and explore effective ways to improve the initial efficiency of theanine synthesis.
  
  On the basis of previous solid work, the authors successfully obtained the X-ray crystal structures of the precursors of theanine synthesis-CsAlaDC and AtSerDC, which are key proteins related to ethylamine synthesis, and found a unique zinc finger structure on these two crystal structures that are not found in other Group II PLP-dependent amino acid decarboxylases. Through a series of experiments, it is pointed out that this characteristic zinc finger motif may be the key to the folding of CsAlaDC and AtSerDC proteins, and this discovery is novel and prospective in the study of theine synthesis.
  
  In addition, the authors identified Phe106 of CsAlaDC and Tyr111 of AtSerDC as key sites of substrate specificity by comparing substrate binding regions and identified amino acids that inhibit catalytic activity through mutation screening based on protein structure. It was found that the catalytic activity of CsAlaDCL110F/P114A was 2.3 times higher than that of CsAlaDC. At the same time, CsAlaDC and AtSerDC substrate recognition key motifs were used to carry out evolutionary analysis of the protein sequences that are highly homologous to CsAlaDC in embryos, and 13 potential alanine decarboxylases were found, which laid a solid foundation for subsequent studies related to theanine synthesis.
  
  In general, this study has a solid foundation, the whole research idea is clear, the experimental design is reasonable, and the experimental results provide strong evidence for the author's point of view. Through a large number of experiments, the key links in the theanine synthesis pathway are deeply studied, and an effective way to improve the initial efficiency of theanine synthesis is found, and the molecular mechanism of this way is expounded. The whole study has good novelty and prospectivity, and sheds light on a new direction for the efficient industrial synthesis of theanine
  
  Response: Thank you very much for taking time to review this manuscript. We appreciate all your insightful comments and constructive suggestions.
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) If some test methods are not original, references or method basis should be indicated.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have added references for the enzymatic activity experiments performed to measure the synthesis of theanine in the revised manuscript.
  
  (2) The conclusion is a little lengthy, and the summary of the whole study is not well condensed.
  
  Response: Thank you very much for your valuable suggestions. We have refined the conclusion in the revised manuscript, and it is as follows:
  
  In conclusion, our structural and functional analyses have significantly advanced understanding of the substrate-specific activities of alanine and serine decarboxylases, typified by CsAlaDC and AtSerDC. Critical amino acid residues responsible for substrate selection were identified—Tyr111 in AtSerDC and Phe106 in CsAlaDC—highlighting pivotal roles in enzyme specificity. The engineered CsAlaDC mutant (L110F/P114A) not only displayed enhanced catalytic efficiency but also substantially improved L-theanine yield in a synthetic biosynthesis setup with PsGS or GMAS. Our research expanded the repertoire of potential alanine decarboxylases through the discovery of 13 homologous enzyme candidates across embryophytic species and uncovered a special motif present in serine protease-like proteins within Fabale, suggesting a potential divergence in substrate specificity and catalytic functions. These insights lay the groundwork for the development of industrial biocatalytic processes, promising to elevate the production of L-theanine and supporting innovation within the tea industry.
  
  Reviewer #2 (Public Review)
  
  Summary:
  
  The manuscript focuses on the comparison of two PLP-dependent enzyme classes that perform amino acyl decarboxylations. The goal of the work is to understand the substrate specificity and factors that influence the catalytic rate in an enzyme linked to theanine production in tea plants.
  
  Strengths:
  
  The work includes x-ray crystal structures of modest resolution of the enzymes of interest. These structures provide the basis for the design of mutagenesis experiments to test hypotheses about substrate specificity and the factors that control catalytic rate. These ideas are tested via mutagenesis and activity assays, in some cases both in vitro and in plants.
  
  Weaknesses:
  
  The manuscript could be more clear in explaining the contents of the x-ray structures and how the complexes studied relate to the reactant and product complexes. The structure and mechanism section would also be strengthened by including a diagram of the reaction mechanism and including context about reactivity. As it stands, much of the structural results section consists of lists of amino acids interacting with certain ligands without any explanation of why these interactions are important or the role they play in catalysis. The experiments testing the function of a novel Zn(II)-binding domain also have serious flaws. I don't think anything can be said at this point about the function of the Zn(II) due to a lack of key controls and problems with experimental design.
  
  Response: Thank you very much for your thoughtful comments and feedback on our manuscript. We are pleased to hear that the work's strengths, such as the X-ray crystal structures and the mutagenesis experiments tied to the catalytic rate and substrate specificity, align with the goals of our research.
  
  We recognize the areas identified for improvement and appreciate the suggestions provided. We have emphasized how we use the structural information obtained to infer the roles of key amino acid residues in the reaction. Additionally, we have added a diagram of the reaction mechanism in the Supplementary figure to provide clearer context on reactivity and improve the overall understanding of the catalytic process. Regarding the structural results section, we have included a discussion that contextualizes the list of amino acids and their interactions with the ligands by explaining their significance and roles in catalysis. We acknowledge the weaknesses you've pointed out in the experiments concerning the novel Zn(II)-binding domain, but we would like to clarify that the focus of our study was not primarily on the zinc structure. While we agree that there may be limitations in the experimental design and controls for the zinc binding domain, we believe that these flaws do not significantly impact the overall findings of the study. The experiment served as a preliminary exploration of the potential functionality of the domain, and further studies are required to fully understand its role and mechanism.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) In addition to the points raised in the public review, it would be ideal to provide some context for the enzymatic characterization. Why are the differences in kinetic parameters for AlaDC and SerDC significant?
  
  Response: Thank you for your comments and suggestions. The Km values for CsAlaDC and SerDCs are comparable, suggesting similar substrate affinities. However, CsAlaDC exhibits a significantly lower Vmax compared to AtSerDC and CsSerDC. This discrepancy implies that CsAlaDC and SerDCs may differ in the rates at which they convert substrate to product when saturated with substrate. SerDCs may have a faster turnover rate, meaning they convert substrate to product and release the enzyme more quickly, resulting in a higher Vmax. Differences in the stability or correct folding of the enzymes under assay conditions can also affect their Vmax. If SerDCs are more stable, they might maintain their catalytic activity better at higher substrate concentrations, contributing to a higher Vmax. We have added these to the part of “Enzymatic properties of CsAlaDC, AtSerDC, and CsSerDC” in our revised manuscript.
  
  (2) Why is Phe106/Tyr111 pair critical for substrate specificity? Does the amino acid contact the side chain? It might be helpful to a reader to formulate a hypothesis for this interaction.
  
  Response: Thank you for the question and comments. We conducted a comparison between the active sites of CsAlaDC and AtSerDC and observed a distinct difference in only two amino acids: F106 in CsAlaDC and Y111 in AtSerDC. The remaining amino acids were found to be identical. Expanding on previous research concerning Group II PLP-dependent amino acid decarboxylases, it was postulated and subsequently confirmed that these specific amino acids play a crucial role in substrate recognition. However, since we lack the structure of the enzyme-substrate complex, we are unable to elucidate the precise interactions occurring between the substrate and the amino acids at this particular site based solely on structural information.
  
  (3) Line 55 - Define EA again.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have redefined “EA” as the abbreviation for ethylamine in the revised manuscript.
  
  (4) Line 58 - The meaning of "determined by the quality formation of tea" is not clear.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have modified it in the revised manuscript.
  
  (5) Line 65 - Missing words between "despite they".
  
  Response: Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.
  
  (6) Line 67 - Need a reference for the statement about lower activity?
  
  Response: Thank you for the question and comments. We have provided the following reference to support this statement in the revised manuscript.
  
  Reference: Bai, P. et al. (2021) Biochemical characterization of specific Alanine Decarboxylase (ADC) and its ancestral enzyme Serine Decarboxylase (SDC) in tea plants (Camellia sinensis). BMC Biotechnol. 21,17.
  
  (7) Line 100-101 - The meaning of "its closer relationship was Dicots plants." is not clear.
  
  Response: We have revised the sentence in the revised manuscript, as follows: “Phylogenetic analysis indicated that CsAlaDC is homologous with SerDCs in Dicots plants.”
  
  (8) Line 139 - Missing a word between "as well as" and "of".
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.
  
  (9) Line 142 - The usage of comprised here is not correct. It would be more correct to say "The overall architecture of CsAlaDC and AtSerDC is homodimeric with the two subunits...".
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.
  
  (10) Line 148-149 - I didn't understand the statement about the "N-terminal structures" Are these structures obtained from protein samples that have a truncated N-terminus?
  
  Response: Group II PLP-dependent amino acid decarboxylases are comprised of three distinct structural domains: the N-terminal domain, the large domain, and the C-terminal domain. Each of these domains possesses unique structural features. Similarly, CsAlaDC and AtSerDC can also be classified into three structural domains based on their specific characteristics. To achieve more stable proteins for further experiments, we conducted truncation on both of these proteins. The truncated section pertains to a subsection of the N-terminal domain and is truncated from the protein's N-terminus.
  
  (11) Line 153 - Say "is composed of" instead of "composes of".
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.
  
  (12) Line 156 - I didn't understand the statement about the cofactor binding process. What is the cofactor observed? And how can we say anything about the binding process from a single static structure of the enzyme? It might be better to say that the cofactor binding site is located at the subunit junction - but the identity of the cofactor still needs to be defined first.
  
  Response: Thank you for your comments and suggestions. The cofactor mentioned here is PLP. We aim to elucidate the binding state of PLP at the active site, excluding the binding process. The description has been revised in the revised manuscript.
  
  (13) Lines 157-158 - I didn't understand the conclusion about the roles of each monomer. In the images in Figure 3 - both monomers appear to bind PLP but the substrate is not present - so it's not clear how conclusions can be drawn about differential substrate binding in the two subunits.
  
  Response: Thank you very much for your careful reading and valuable suggestions. The main idea we want to convey is that this protein possesses two active sites. At each active site, the two monomers carry out distinct functions. Of course, our previous conclusion is inaccurate due to the non-existence of the substrate. So, we have made the necessary amendments in the revised manuscript.
  
  (14) Line 161 - I would say loop instead of ring.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.
  
  (15) Line 165 - Please provide some references for this statement. It would also be ideal to state the proximity of the Zn-binding motif to the active site or otherwise provide some information about the role of the motif based on its location.
  
  Response: Thank you for your comments and suggestions. We have provided the following references to support this statement in the revised manuscript.
  
  Author response image 1.
  
  (A) Structure of histidine decarboxylase. (B) Structure of glutamate decarboxylase.
  
  Reference:
  
  30 Komori, H. et al. (2012) Structural study reveals that Ser-354 determines substrate specificity on human Histidine Decarboxylase. J Biol Chem. 287, 29175-83.
  
  31 Huang, J. et al. (2018) Lactobacillus brevis CGMCC 1306 glutamate decarboxylase: Crystal structure and functional analysis. Biochem Biophys Res Co. 503, 1703-1709
  
  In CsAlaDC, the zinc is positioned at a distance of 29.6 Å from the active center, whereas in AtSerDC, the zinc is situated 29 Å away from the active center. Hence, we hypothesize that this structure does not impact the enzyme's catalytic activity but might be correlated with its stability.
  
  (16) Lines 166-178 - This paragraph appears to be a list of all of the interactions between the protein, PLP, and the EA product. It would be ideal to provide some text to explain why these interactions are important and what we can learn from them.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have been conducting additional analysis on the functional roles of amino acid residues involved in the interaction between the active site and PLP. This analysis focuses on aiding PLP binding, determining its orientation, and understanding enzyme catalytic mechanisms. These details are mentioned in the revised manuscript.
  
  (17) Line 192 - Bond not bound.
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have made corrections in the revised manuscript.
  
  (18) Lines 201-207 - It would be ideal to verify that the inclusion of 5 mM DTT affects Zn binding. It's not clear to me that this reagent would necessarily disrupt Zn binding. Under certain circumstances, it could instead promote Zn association. For example, if the Cys ligands are oxidized initially but then become reduced? I don't think the current experiment really provides any insight into the role of the Zn.
  
  Response: Thank you for your valuable insights regarding the role of DTT and its potential effects on Zn binding in our experiments. The main function of DTT is to protect or restore the reduced state of proteins and other biological molecules, particularly by disrupting the crosslinking formed by thiol (-SH) groups and disulfide bonds to maintain the function and structure of proteins. Therefore, the reason for DTT's inhibition of enzyme activity is unknown, and we cannot provide a reasonable explanation for this phenomenon. As a result, we have removed the section discussing the inhibition of enzyme activity by DTT in our revised manuscript.
  
  Reviewer #3 (Public Review):
  
  In the manuscript titled "Structure and Evolution of Alanine/Serine Decarboxylases and the Engineering of Theanine Production," Wang et al. solved and compared the crystal structures of Alanine Decarboxylase (AlaDC) from Camellia sinensis and Serine Decarboxylase (SerDC) from Arabidopsis thaliana. Based on this structural information, the authors conducted both in vitro and in vivo functional studies to compare enzyme activities using site-directed mutagenesis and subsequent evolutionary analyses. This research has the potential to enhance our understanding of amino acid decarboxylase evolution and the biosynthetic pathway of the plant-specialized metabolite theanine, as well as to further its potential applications in the tea industry. Response: Thank you very much for taking the time to review this manuscript. We appreciate all your insightful comments.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Page 6, Figure 2, Page 23 (Methods)
  
  "The supernatants were purified with a Ni-Agarose resin column followed by size-exclusion chromatography."
  
  What kind of SEC column did the authors use? Can the authors provide the SEC elution profile comparison results and size standard curve?
  
  Response: We use a Superdex 200 (Hiload 16/600) column for size exclusion chromatography. The comparison results of SEC elution profiles for AtSerDC and CsAlaDC, along with the standard curve of SEC column, are presented below.
  
  Author response image 2.
  
  (A) Comparison of elution profiles of CsAlaDC and AtSerDC. (B) Elution profile of Blue Dextron 2000. (C) Elution profile of mixed protein (Aldolase, 158000 Da,71.765ml; Conalbumin, 75000 Da,79.391ml; Ovalbumin, 44000 Da,83.767ml; Carbonic anhydrase, 29000 Da,90.019ml; Ribonuclease A, 13700 Da,98.145ml). (D) Size standard curves of Superdex 200 (Hiload 16/600) column.
  
  Page 6 & Page 24 (Methods)
  
  "The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 ℃ and pH 8.0 for CsAlaDC, 40 ℃ and pH 8.0 for AtSerDC for 30 min)."
  
  (1) The enzymatic activities of CsAldDC and AtSerDC were measured at two different temperatures (45 and 40 ℃, but their activities were directly compared. Is there a reason for experimenting at different temperatures?
  
  Response: We determined that the optimal reaction temperature for AtSerDC is 40°C and for CsAlaDC is 45°C through our verification process. Consequently, all subsequent experiments were performed at these specific temperatures.
  
  Author response image 3.
  
  (A) Relative activity of CsAlaDC at different temperatures. (B) Relative activity of AtSerDC at different temperatures.
  
  (2) Enzyme activities were measured at temperatures above 40℃, which is not a physiologically relevant temperature and may affect the stability or activity of the proteins. At the very least, the authors should provide temperature-dependent protein stability data (e.g., CD spectra analysis) or, if possible, temperature-dependent enzyme activities, to show that their experimental conditions are suitable for studying the activities of these enzymes.
  
  Response: Thank you very much for your careful reading. We have already validated that the experimental temperature we used did not significantly affect the stability of the protein before experimenting. The results are shown in the figure below:
  
  Author response image 4.
  
  Place the two proteins individually into water baths set at temperatures of 25°C, 37°C, 45°C, 60°C, and 80°C for 15 minutes. Subsequently, carry out enzymatic reactions utilizing a standard reaction system, with untreated enzymes serving as the experimental control within the said system. The experimental results suggest that the temperature at which we experimented does not have a significant impact on the stability of the enzyme.
  
  (3) The authors used 20 mM of substrate. What are the physiological concentrations of alanine and serine typically found in plants?
  
  Response: The content of alanine in tea plant roots ranges from 0.28 to 4.18 mg/g DW (Yu et al., 2021; Cheng et al., 2017). Correspondingly, the physiological concentration of alanine is 3.14 mM to 46.92 mM, in tea plant roots. The content of serine in plants ranges from 0.014 to 17.6 mg/g DW (Kumar et al., 2017). Correspondingly, the physiological concentration of serine is 0.13 mM to 167.48 mM in plants. In this study, the substrate concentration of 20 mM was close to the actual concentrations of alanine and serine in plants.
  
  Yu, Y. et al. (2021) Glutamine synthetases play a vital role in high accumulation of theanine in tender shoots of albino tea germplasm "Huabai 1". J. Agric. Food Chem. 69 (46),13904-13915.
  
  Cheng, S. et al. (2017) Studies on the biochemical formation pathway of the amino acid L-theanine in tea (Camellia sinensis) and other plants.” J. Agric. Food Chem. 65 (33), 7210-7216.
  
  Kumar, V. et al. (2017) Differential distribution of amino acids in plants. Amino Acids. 49(5), 821-869.
  
  Pages 6-7 & Table 1
  
  (1) Use the correct notation for Km and Vmax. Also, the authors show kinetic parameters and use multiple units (e.g., mmol/L or mM for Km).
  
  Response: Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected this in the revised manuscript.
  
  (2) When comparing the catalytic efficiency of enzymes, kcat/Km (or Vmax/Km) is generally used. The authors present a comparison of catalytic activity from results to conclusion. A clarification of what results are being compared is needed.
  
  Response: Thank you for your comments and suggestions. The catalytic activity is assessed by comparing reaction rates.
  
  Page 7 & Figure 3
  
  In Figure 3A, the authors describe the overall structure, but a simple explanation or labeling within the figure should be added.
  
  Response: Thank you very much for your suggestions, we have made modifications to Figure 3A as follows:
  
  Author response image 5.
  
  Crystal structures of CsAlaDC and AtSerDC. (A) Dimer structure of CsAlaDC. The color display of the N-terminal domain, large domain, and C-terminal domains of chain A is shown in light pink, khaki and sky blue, respectively. Chain B is shown in spring green. The PLP molecule is shown as a sphere model. The zinc finger structure at the C-terminus of CsAlaDC is indicated by the red box. The gray spheres represent zinc ions, while the red dotted line depicts the coordination bonds formed by zinc ions with cysteine and histidine.
  
  Figures 3F & 4A
  
  In these figures, the two structures are overlaid and compared, but the colors are very similar to see the differences. The authors should use a different color scheme.
  
  Response: Thank you very much for your suggestions, we have made modifications to the Figure 3F & 4A as follows:
  
  Author response image 6.
  
  (Figure 3F) - The monomers of CsAlaDC and AtSerDC are superimposed. CsAlaDC is depicted in spring green, while AtSerDC is shown in plum. The conserved amino acid catalytic ring is indicated by the red box. (Figure 4A) - Superposition of substrate binding pocket amino acid residues in CsAlaDC and AtSerDC. The amino acid residues of CsAlaDC are shown in spring green, the amino acid residues of AtSerDC are shown in plum, with the substrate specificity-related amino acid residue highlighted in a red ellipse.
  
  Pages 7 & 8
  
  Figures 3 and 4 do not include illustrations of what the authors describe in the text. The reader will not be able to understand the descriptions until they download and view the structures themselves. The authors should create additional figures to make it easier for readers to understand the structures.
  
  Response: Thank you very much for your suggestions, we have included supplementary figure 1 in the revised manuscript, which presents more elaborate structural depictions of the two proteins.
  
  Pages 9 & 10
  
  "This result suggested this Tyr is required for the catalytic activity of CsAlaDC and AtSerDC."
  
  The author's results are interesting, but it is recommended to perform the experiments in a specific order. First, experiments should determine whether mutagenesis affects the protein's stability (e.g., CD, as discussed earlier), and second, whether mutagenesis affects ligand binding (e.g., ITC, SPR, etc.), before describing how site-directed mutagenesis alters enzyme activity. In particular, the authors' hypothesis would be much more convincing if they could show that the ligand binding affinity is similar between WT and mutants.
  
  Response: Thank you for your insightful feedback on our manuscript, which we greatly appreciate. Your suggestion to methodically sequence the experiments provides a clear pathway to bolster the strength and conclusiveness of our results.
  
  We agree that it is crucial to first assess the stability of the mutant proteins, as changes therein could inadvertently affect catalytic activity. To this end, we have employed circular dichroism (CD) to study the potential structural alterations in the proteins induced by mutations. The experimental results are shown in the following figure:
  
  Author response image 7.
  
  (A) Circular Dichroism Spectra of CsAlaDC (WT). (B) Circular Dichroism Spectra of CsAlaDC (Y336F). (C) Circular Dichroism Spectra of CD of AtSerDC (WT). (D) Circular Dichroism Spectra of AtSerDC (Y341F).
  
  The experimental results indicate that the secondary structure of the mutant proteins remains unchanged, which means the mutations do not alter the protein's stability.
  
  The ligand PLP forms a Schiff base structure with the ε-amino group of a lysine residue in the protein, with maximum absorbance around 420-430 nm. Since we have already added PLP during the protein purification process, as long as the absorbance of mutant proteins and wild-type proteins is the same at 420-430 nm at equivalent concentrations, it indicates that the mutant proteins do not affect the binding of the ligand PLP. Therefore, we scanned the UV-visible absorption spectra of both the wild-type and mutant proteins, and the results are as presented in the following figure:
  
  Author response image 8.
  
  (A) UV-Visible Absorption Spectra of CsAlaDC (WT) compared to CsAlaDC (Y336F). (B) UV-Visible Absorption Spectra of AtSerDC (WT) compared to AtSerDC (Y341F).
  
  The mutant protein and the wild-type protein exhibit similar absorbance at 420-430 nm, indicating that the mutation does not affect the binding of PLP to the protein.
  
  The above experiments have confirmed that the mutations do not significantly affect the stability of the protein or the affinity for the ligand, so we can more confidently attribute changes in enzyme activity to the specific role of the tyrosine residue in question. We believe this comprehensive approach will substantiate our hypothesis and illustrate the necessity of this Tyr residue for the catalytic activity of CsAlaDC and AtSerDC enzymes.
  
  Figure 3
  
  In the 3D structure figure provided by the authors, the proposed reaction mechanism of the enzyme and the involved amino acids are not included. Can the authors add a supplementary figure with a schematic drawing that includes more information, such as distances?
  
  Response: Thank you for your valuable feedback on our manuscript. We completely agree that a schematic drawing with additional details, including distances, would enhance the clarity and understanding of the enzymatic mechanism. In response to your suggestion, we have added a supplementary figure 2 in the revised manuscript that accurately illustrates the proposed reaction pathway, highlighting the key amino acids involved.
  
  Page 10
  
  "The results showed that 5 mM L-DTT reduced the relative activity of CsAlaDC and AtSerDC to 22.0% and 35.2%, respectively"
  
  The authors primarily use relative activity to compare WT and mutants. Can the authors specify the exact experiments, units, and experimental conditions? Is it Vmax or catalytic efficiency? If so, under what specific experimental conditions?
  
  Response: Thank you for your attention and review of our research paper, we appreciate your suggestions and feedback. The experimental protocol employed to evaluate the influence of DTT on protein catalytic efficiency is outlined as follows:
  
  The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, 5 mM L-DTT, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 °C and pH 8.0 for CsAlaDC for 5 min, 40 °C and pH 8.0 for AtSerDC for 2 min). DTT is absent as a control in the reaction system. Then the reaction was stopped with 20 μL of 10% trichloroacetic acid. The product was derivatized with 6-aminoquinolyl-N-hydroxy-succinimidyl carbamate (AQC) and subjected to analysis by UPLC. All enzymatic assays were performed in triplicate.
  
  However, due to the unknown mechanism of DTT inhibition on protein activity, we have removed this part of the content in the revised manuscript.
  
  Pages 10-12
  
  The identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC,' along with the subsequent mutagenesis and enzymatic activity assays, is intriguing. However, the current manuscript lacks an explanation and discussion of the underlying reasons for these results. As previously mentioned, it would be helpful to gain insights and analysis from WT-ligand and mutant-ligand binding studies (e.g., ITC, SPR, etc.). Furthermore, the authors' analysis would be more convincing with accompanying structural analysis, such as steric hindrance analysis.
  
  Response: Thank you for your insightful comments and constructive feedback on our manuscript. We appreciate the interest you have expressed in the identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC' and their functional implications based on mutagenesis and enzymatic assays.
  
  In order to investigate the binding status of the mutant protein and the ligand PLP,we scanned the UV-visible absorption spectra of both the wild-type and mutant proteins, and the results are as presented in the following figure:
  
  Author response image 9.
  
  (A) UV-Visible Absorption Spectra of CsAlaDC (WT) compared to CsAlaDC (F106Y). (B) UV-Visible Absorption Spectra of AtSerDC (WT) compared to AtSerDC (Y111F).
  
  The mutant protein and the wild-type protein exhibit similar absorbance at 420-430 nm, indicating that the mutation does not affect the binding of PLP to the protein. Therefore, we can conclude that the change in activity of the mutant protein is caused by the substitution of the amino acid at that site, i.e., the amino acid at that site affects substrate specificity. By combining the structure of the two proteins, we can see that the Lys at position 111 of AtSerDC is a hydrophilic amino acid, which increases the hydrophilicity of the active site, and thus the substrate is the hydrophilic amino acid Ser. In contrast, the amino acid at the corresponding site in CsAlaDC is Phe, which, lacking a hydroxyl group compared to Lys, increases the hydrophobicity of the active site, making the substrate lean towards the hydrophobic amino acid Ala. We have added a discussion of the potential reasons for this result to the revised manuscript's discussion section.
  
  Page 5 & Figure 1B
  
  "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. However, CsAlaDC is relatively distant from CsSerDC."
  
  In Figure 1B, CsSerDC and AtSerDC are in different clades, and this figure does not show that the two enzymes are closest. To provide another quantitative comparison, please provide a matrix table showing amino acid sequence similarities as a supplemental table.
  
  Response: Many thanks for your constructive suggestion. We added a matrix table showing amino acid sequence similarities in the supplemental materials. The results showed that the similarity of amino acid sequences between CsSerDC and AtSerDC is 86.21%, which is higher than that between CsAlaDC and CsSerDC (84.92%). This data exactly supports the description of Figure 1B. We added the description of the amino acid sequence similarities analysis in the revised manuscript. The description of "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. " is not accurate enough, so we revised it to "As expected, CsSerDC was closer to AtSerDC, which implies that they shared similar functions.", in the revised manuscript.
  
  Page 5 & Figure 1C
  
  Figure 1C, which shows a multiple sequence alignment with the amino acid sequences of the 6 SerDCs and CsAlaDC, clearly shows the differences between the sequences of AlaDC and other SerDCs. However, the authors' hypothesis would be more convincing if they showed that this difference is also conserved in AlaDCs from other plants. Can the authors show a new multiple-sequence alignment by adding more amino acid sequences of other AlaDCs?
  
  Response: Thank you for your comments and suggestions. We aim to discover additional alanine decarboxylase. However, at present, the only experimentally confirmed alanine decarboxylase is CsAlaDC. No experimentally verified alanine decarboxylases have been found in other plant species.
  
  Figure 5A
  
  Figure 5A is missing the error bar.
  
  Response: Figure 5A serves as a preliminary screening for these mutants, without conducting repeated experiments. Subsequently, only the L110F and P114A mutants, which exhibited significantly improved activity, underwent further experimental verification to confirm their enhanced functionality.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.04.556203v2
www.biorxiv.org www.biorxiv.org

Dysfunctional S1P/S1PR1 signaling in the dentate gyrus drives vulnerability of chronic pain-related memory impairment

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This work from Cui, Pan, Fan, et al explores memory impairment in chronic pain mouse models, a topic of great interest in the neurobiology field. In particular, the work starts from a very interesting observation, that WT mice can be divided into susceptible and unsusceptible to memory impairment upon modelling chronic pain with CCI. This observation represents the basis of the work where the authors identify the sphingosine receptor S1PR1 as down-regulated in the dentate gyrus of susceptible animals and demonstrate through an elegant range of experiments involving AAV-mediated knockdown or overexpression of S1PR1 that this receptor is involved in the memory impairment observed with chronic pain. Importantly for translational purposes, they also show that activation of S1PR1 through a pharmacological paradigm is able to rescue the memory impairment phenotype.
  
  The authors also link these defects to reduced dendritic branching and a reduced number of mature excitatory synapses in the DG to the memory phenotype.
  
  They then proceed to explore possible mechanisms downstream of S1PR1 that could explain this reduction in dendritic spines. They identify integrin α2 as an interactor of S1PR1 and show a reduction in several proteins involved in actin dynamic, which is crucial for dendritic spine formation and plasticity.
  
  They thus hypothesize that the interaction between S1PR1 and Integrin α2 is fundamental for the activation of Rac1 and Cdc42 and consequently for the polymerisation of actin; a reduction in this pathway upon chronic pain would thus lead to impaired actin polymerisation, synapse formation, and thus impaired memory.
  
  The work is of great interest and the experiments are of very good quality with results of great importance. I have however some concerns. The main concern I have relates to the last part of the work, namely Figures 8 and 9, which I feel are not at the same level as the results presented in the previous 7 Figures, which are instead outstanding.
  
  In particular:
  
  - In Figure 8, given the reduction in all the proteins tested, the authors need to check some additional proteins as controls. One good candidate could be RhoA, considering the authors say it is activated by S1PR2 and not by S1PR1;
  
  Thanks for your suggestion. We tested the expression level of RhoA in mice 7 days and 21 days post CCI as negative controls (Supplemental Figure 9).
  
  - In addition to the previous point, could the authors also show that the number of neurons is not grossly different between susceptible and unsusceptible mice? This could be done by simply staining for NeuN or performing a western blot for a neuronal-specific protein (e.g. Map2 or beta3-tubulin);
  
  As suggested, we performed immunofluorescence using NeuN antibody to detect the number of neurons in susceptible and unsusceptible mice. The number is not significantly different between the two populations (Supplementary Figure 7).
  
  - In Figure 8, the authors should also evaluate the levels of activated RAC1 and activated Cdc42, which are much more important than just basal levels of the proteins to infer an effect on actin dynamics. This is possible through kits that use specific adaptors to pulldown GTP-Rac1 and GTP-Cdc42;
  
  Thanks for your constructive suggestion. An elevated level and hyperactivation of Rac1 protein are both associated with actin dynamics and dendritic development [1]. We agree that showing the levels of activated RAC1 is better to infer its effect on actin dynamics. Here in Figure 8, the purpose of this experiment is to prove the levels of actin organization related proteins are altered according to the expression level of S1PR1, thus drawing a conclusion that the actin organization was disrupted, but not to specifically emphasize that S1PR1 activated these proteins. We apologize for the confusion made but we think the current data is enough to support the conclusion.
  
  Thanks again for your advice. Your understanding is greatly appreciated.
  
  - In Figure 9C, the experiment is performed in an immortalised cell line. I feel this needs to be performed at least in primary hippocampal neurons;
  
  Thanks for your suggestion. As suggested, we performed the experiment in primary hippocampal neurons. Knockdown of S1pr1 in primary hippocampal neurons induced reduction in the number of branches and filamentous actin. Please refer to the updated Figure 9C.
  
  - In Figure 9D, the authors use a Yeast two-hybrid system to demonstrate the interaction between S1PR1 and Integrin α2. However, as the yeast two-hybrid system is based on the proximity of the GAL4 activating domain and the GAL4 binding domain, which are used to activate the transcription of reporter genes, the system is not often used when probing the interaction between transmembrane proteins. Could the authors use other transmembrane proteins as negative controls?;
  
  Thanks for your question. We apologize for the unclear description in the method part. Traditional yeast two-hybrid system can only detect protein interactions that occur in the nucleus, but cannot detect ones between membrane proteins. Here, we utilized the split-ubiquitin membrane-based Yeast two-hybrid system. Briefly, in the ubiquitin system, ubiquitin, a protein composed of 76 amino acid residues that can mediate the ubiquitination degradation of target proteins by proteasomes, is split into two domains, namely Cub at the C-terminus and NbuG at the N-terminus, which are fused and expressed with the bait protein “Bait” and the prey protein “Prey”, respectively. At the same time, Cub is also fused with transcription factors. If Bait and Prey proteins could bind, Cub and NbuG would be brought together and a complete ubiquitin would be formed, which would be recognized by the proteasome and the fused transcription factor would be cut off and enter the cell nucleus to activate the expression of the reporter gene. We then determine whether the Bait and Prey proteins interact with each other through the growth of the yeast.
  
  Thanks again for pointing this out. We reworded the method in M&M (Line 678-696).
  
  - In Figure 9E, the immunoblot is very unconvincing. The bands in the inputs are very weak for both ITGA2 and S1PR1, the authors do not show the enrichment of S1PR1 upon its immunoprecipitation and the band for ITGA2 in the IP fraction has a weird appearance. Were these experiments performed on DG lysates only? If so, I suggest the authors repeat the experiment using the whole brain (or at least the whole hippocampus) so as to have more starting material. Alternatively, if this doesn't work, or in addition, they could also perform the immunoprecipitation in heterologous cells overexpressing the two proteins;
  
  Thanks for the question and suggestion. We used DG lysates from both the dentate gyrus of a single mouse as the starting material. We updated the result which showed clearer bands (Figure 9E).
  
  - About the point above, even if the results were convincing, the authors can't say that they demonstrate an interaction in vivo. In co-IP experiments, the interaction is much more likely to occur in the lysate during the incubation period rather than being conserved from the in vivo state. These co-IPs demonstrate the ability of proteins to interact, not necessarily that they do it in vivo. If the authors wanted to demonstrate this, they could perform a Proximity ligation assay in primary hippocampal neurons, using antibodies against S1PR1 and ITGA2.
  
  Thanks for your concern. Co-immunoprecipitation (Co-IP) is the gold standard to identify protein-protein interactions [2], and it is one of the most efficient techniques to study these protein-protein interactions in vivo [3]. We repeated the experiment and followed the experimental procedure exactly to avoid the protein interaction due to over-incubation. Over-incubation, particularly at room temperature, may result in non-specific binding and therefore high background, thus we performed Co-IPs at 4°C to preserve protein interactions. We agree that Proximity ligation assay is better suited for studies of endogenously expressed proteins in primary cells [4]. Since we optimized the experiment procedure to avoid non-specific binding and particularly, Co-IP utilized proteins from DG lysates which could validate the specificity of the protein interaction in native tissue, we prefer to keep the Co-IP result in Figure 9E.
  
  Thanks again for your suggestion. We appreciate your understanding on this matter.
  
  - In Figure 9H, could the authors increase the N to see if shItga2 causes further KD in the CCI?
  
  As suggested, we repeated the experiment and increased the N to 6. As shown in the following picture, shItga2 did not cause further KD in the CCI.
  
  Author response image 1.
  
  - To conclusively demonstrate that S1PR1 and ITGA2 participate in the same pathway, they could show that knocking down the two proteins at the same time does not have additive effects on behavioral tests compared to the knockdown of each one of them in isolation.
  
  Thanks for your suggestion. As suggested, we knocked down the two proteins at the same and did not observe additive effects on behavioral tests compared to the knockdown of each one of them in isolation. Please refer to Figure 9L-O.
  
  Other major concerns:
  
  - Supplementary Figure 5: the image showing colocalisation between S1PR1 and CamKII is not very convincing. Is the S1PR1 antibody validated on Knockout or knockdown in immunostaining?;
  
  S1PR1 is a membrane receptor and the S1P1 antibody (PA1-1040, Invitrogen) shows membranous staining with diffuse dot-like signals (Please refer to the image “A” provided by ThermoFisher Scientific). Here, we utilized the antibody to detect the expression of S1PR1 in DG granule cells. We can see the diffuse dot-like signals aggregated in each single granule cell. CaMKII shows intense staining around the border of the granule cell soma (Image “B”) [5]. According to the images shown in Supplementary Figure 5B, we concluded that S1PR1 is expressed in CaMKII+ cells.
  
  Besides, as suggested, we validated the S1PR1 antibody on knockdown in immunostaining (Image “C” and “D”). The expression of S1PR1 is significantly decreased compared with the control.
  
  Author response image 2.
  
  - It would be interesting to check S1PR2 levels as a control in CCI-chronic animals;
  
  As suggested, we quantified the S1PR2 levels in Sham and CCI animals, and there is no significant difference between groups (Supplementary Figure 9).
  
  - Figure 1: I am a bit concerned about the Ns in these experiments. In the chronic pain experiments, the N for Sham is around 8 whereas is around 20 for CCI animals. Although I understand higher numbers are necessary to see the susceptible and unsusceptible populations, I feel that then the same number of Sham animals should be used;
  
  Thanks for your concern. In the preliminary experiment, we noticed that the ratio of susceptible and unsusceptible populations is around 1:1. After the behavioral tests, we need to further take samples to investigate molecular and cellular changes of each group. Thus, we set sham around 8 and CCI around 20 to ensure that after characterization into susceptible and unsusceptible groups, each group has relatively equal numbers for further investigations.
  
  - Figures 1E and 1G have much higher Ns than the other panels. Why is that? If they have performed this high number of animals why not show them in all panels?;
  
  Thanks for your concern. For Figure 1B, C, D and F, we showed the data for each batch of experiment, while for Figure 1E and 1G, we used data collected from all batches of experiment. To show the data from a single batch, we would like to demonstrate the ratio of susceptible to unsusceptible is relatively stable, but not only based on a big sample size.
  
  - In the experiments where viral injection is performed, the authors should show a zoomed-out image of the brain to show the precision of the injection and how spread the expression of the different viruses was;
  
  As suggested, we showed the zoomed-out image in Supplementary Figure 6. The viruses are mainly expressed in the hippocampal DG.
  
  - The authors should check if there is brain inflammation in CCI chronic animals. This would be interesting to explain if this could be the trigger for the effects seen in neurons. In particular, the authors should check astrocytes and microglia. This is of interest also because the pathways altered in Figure 8A are related to viral infection.
  
  - If the previous point shows increased brain inflammation, it would be interesting for the authors to check whether a prolonged anti-inflammatory treatment in CCI animals administered before the insurgence of memory impairment could stop it from happening;
  
  - In addition, the authors should speculate on what could be the signal that can induce these molecular changes starting from the site of injury;
  
  - Also, as the animals are all WT, the authors should speculate on what could render some animals prone to have memory impairments and others resistant.<br />
  
  Thanks for the above four suggestions. We have observed inflammation including T cell infiltration and microglia activation in the hippocampal DG in CCI chronic animals and also used S1PR1 modulator which has anti-lymphocyte mediated inflammatory effect to prevent the insurgence of memory impairment from happening. We also examined the alteration in the numbers of peripheral T-lymphocyte subsets and the serum levels of cytokines. Furthermore, we found a neuron-microglia dialogue in the DG which may promote the resilience to memory impairment in CCI animals. Since these are unpublished results, we apologize that we would not give much detailed information to the public at the current stage. We will publish these data as soon as possible. Thanks for your understanding.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioural tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers segregated chronic pain mice into memory impairment-susceptible and -unsusceptible subpopulations. They discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits.
  
  Knockdown of S1PR1 in the DG induced a susceptible phenotype, while overexpression or pharmacological activation of S1PR1 promoted resistance to memory impairment and restored normal synaptic structure. The study identifies actin cytoskeleton-related pathways, including ITGA2 and its downstream Rac1/Cdc42 signaling, as key mediators of S1PR1's effects, offering new insights and potential therapeutic targets for chronic pain-related cognitive dysfunction.
  
  This manuscript consists of a comprehensive investigation and significant findings. The study provides novel insights into the molecular mechanisms of chronic pain-related memory impairment, highlighting the critical role of S1P/S1PR1 signaling in the hippocampal dentate gyrus. The clear identification of S1P/S1PR1 as a potential therapeutic target offers promising avenues for future research and treatment strategies. The manuscript is well-structured, methodologically sound, and presents valuable contributions to the field.
  
  Strengths:
  
  (1) The manuscript is well-structured and written in clear, concise language. The flow of information is logical and easy to follow.
  
  (2) The segregation of mice into memory impairment-susceptible and -unsusceptible subpopulations is innovative and well-justified. The statistical analyses are robust and appropriate for the data.
  
  (3) The detailed examination of S1PR1 expression and its impact on synaptic plasticity and actin cytoskeleton reorganization is impressive. The findings are significant and contribute to the understanding of chronic pain-related memory impairment.
  
  Weaknesses:
  
  (1) Results: While the results are comprehensive, some sections are data-heavy and could be more reader-friendly with summarized key points before diving into detailed data.
  
  Thanks for the suggestion. For the first sentence in each part/paragraph, we used statement that summarises what will be investigating in the following experiments to make it more reader-friendly. They are labeled as blue in the main text.
  
  (2) Discussion: There is a need for a more balanced discussion regarding the limitations of the study. For example, addressing potential biases in the animal model or limitations in the generalizability of the findings to humans would strengthen the discussion. Also, providing specific suggestions for follow-up studies would be beneficial.
  
  As suggested, we discussed more on the limitations of this study and outlined some directions for future research (Line 481-498).
  
  (3) Conclusion: The conclusion, while concise, could better highlight the study's broader impact on the field and potential clinical implications.
  
  Thanks. We reworded the conclusion to better highlight the impacts of this study (Line 501-505).
  
  Reviewer #3 (Public Review):
  
  Summary of the Authors' Objectives:
  
  The authors aimed to delineate the role of S1P/S1PR1 signaling in the dentate gyrus in the context of memory impairment associated with chronic pain. They sought to understand the molecular mechanisms contributing to the variability in memory impairment susceptibility and to identify potential therapeutic targets.
  
  Major Strengths and Weaknesses of the Study:
  
  The study is methodologically robust, employing a combination of RNA-seq analysis, viral-mediated gene manipulation, and pharmacological interventions to investigate the S1P/S1PR1 pathway. The use of both knockdown and overexpression approaches to modulate S1PR1 levels provides compelling evidence for its role in memory impairment. The research also benefits from a comprehensive assessment of behavioral changes associated with chronic pain.
  
  However, the study has some weaknesses. The categorization of mice into 'susceptible' and 'unsusceptible' groups based on memory performance requires further validation. Additionally, the reliance on a single animal model may limit the generalizability of the findings. The study could also benefit from a more detailed exploration of the impact of different types of pain on memory impairment.
  
  Assessment of the Authors' Achievements:
  
  The authors successfully identified S1P/S1PR1 signaling as a key factor in chronic pain-related memory impairment and demonstrated its potential as a therapeutic target. The findings are supported by rigorous experimental evidence, including biochemical, histological, and behavioral data. However, the study's impact could be enhanced by further exploration of the molecular pathways downstream of S1PR1 and by assessing the long-term effects of S1PR1 manipulation.
  
  Impact on the Field and Utility to the Community:
  
  This study is likely to have a significant impact on pain research by providing a novel perspective on the mechanisms underlying memory impairment in chronic pain conditions. The identification of the S1P/S1PR1 pathway as a potential therapeutic target could guide the development of new treatments.
  
  Additional Context for Readers:
  
  The study's approach to categorizing susceptibility to memory impairment could inspire new methods for stratifying patient populations in clinical settings.
  
  Recommendations:
  
  (1) A more detailed explanation of the k-means clustering algorithm and its application in categorizing mice should be provided.
  
  As suggested, we explained the k-means clustering algorithm in details (Line 697-711).
  
  (2) The discussion on the potential influence of different pain types or sensitivities on memory impairment should be expanded.
  
  Thanks for your suggestion. We discussed this point in the limitations of this study (Line 484-491).
  
  (3) The protocol for behavioral testing should be clarified and the potential for learning or stress effects should be addressed.
  
  Thanks for your suggestion. We clarified the order of the battery of behavioral tests in this study (Line 537-542). We start with the least stressful test (Y-maze) and leave the most stressful of all for last (Morris Water maze) [6]. Besides, we also conducted behavioral assays to prove that a one-day rest is enough to decrease carryover effects from prior test (Y-maze). We examined the stress related behaviors one day after Y-maze (23d post CCI) using open field test (OFT) and elevated plus maze (EPM). As shown in Author response image 3, the tests did not reflect the mice were under stressful circumstances. Thus, the order in which the tests were performed are appropriate in this study.
  
  Author response image 3.
  
  (4) Conduct additional behavioral assays for other molecular targets implicated in the study.
  
  We agree that other molecular targets on susceptibility to memory impairment would be interesting to know. Our study was designed to focus specifically on ITGA2 this time and we'd like to keep the focus intact, but we have included your point as a consideration for future study (Lines 496-498). Thank you for the suggestion.
  
  (5) The effective drug thresholds and potential non-specific effects of pharmacological interventions should be discussed in more detail.
  
  As suggested, we emphasized this point of drug SEW2871 in Line 242-245.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor concerns:
  
  - In Figure 6E the lines of the different groups are not visible. Showing the errors as error bars for each point would probably be better;
  
  We apologize for the mistake of using mean±SD here instead of mean±SEM. After changing to mean±SEM, the lines of Figure 6E, Figure 7E and 7L become much clearer. It looks a little bit messy to show the error bars since there are numerous points, so we prefer to keep the line style.
  
  - Do the authors have any speculation on why the % time in the quadrant is not further affected in the KD Itga2 in CCI animals (Figure 9K)?;
  
  In CCI animals, the level of S1PR1 expression is decreased. ITGA2 may participate in the same pathway with S1PR1. Thus, knocking down ITGA2 in CCI animals will not further affect the animal behaviors. This has been proved by knocking down the two proteins at the same time and no additive effects were observed on behavioral tests compared to the knockdown of each one of them in isolation (Figure 9L-O).
  
  - In the methods, it's unclear if in the multiple infusion, the animals were anaesthetised or kept awake;
  
  We have clarified this point in the method. mice were deeply anesthetized by 1% pentobarbital sodium (40 mg/kg, i.p.). (Line 649-650)
  
  - As the DG is quite small, could the authors clarify if, when performing western blots, they used the two DGs from one animal for each sample or if they pulled together the DGs of several animals?;
  
  We used the two DGs from one animal for each sample. The amount of protein extracted from each sample is enough for 20-30 times of Western Blot assays. We have now added this to the method for clarity (Line 612).
  
  - Is it possible to check the correlation between performance in the YM and MWM with S1PR1 levels?;
  
  We would also be interested in this point. The data that we have cannot reveal this for it is difficult to manipulate the S1PR1 levels by using KD and overexpression viruses.
  
  - EM images have a poor resolution in the figures, could the authors show higher-resolution images?;
  
  We have inserted 300 DPI images for high resolution output.
  
  - In line 268 there is a mention of an "ShLamb1"?
  
  We apologize for the mistake and it was revised.
  
  Reviewer #3 (Recommendations For The Authors):
  
  This study explored the role of S1P/S1PR1 signaling within the dentate gyrus (DG) in chronic pain-related memory impairment using a murine model. The authors identified decreased expression of S1PR1 in the DG of mice susceptible to memory deficits. They demonstrated that S1PR1 knockdown increased susceptibility to memory deficits, whereas its overexpression or pharmacological activation mitigated these effects. Further biochemical and immunofluorescence analyses indicated that disruptions in S1P/S1PR1 signaling were related to disruptions in actin cytoskeleton dynamics, influenced by molecular pathways involving ITGA2, Rac1/Cdc42 signaling, and the Arp2/3 complex. These findings offer intriguing insights and suggest a potential therapeutic target for treating memory impairment in chronic pain.
  
  Major Concerns:
  
  The following five major concerns are the same with the five recommendations from Reviewer 3 on Page 9-10. Please refer to the answers above.
  
  (1) The division of subjects into 'susceptible' and 'unsusceptible' categories requires further clarification regarding the methodologies and rationale employed, particularly concerning the use of the k-means clustering algorithm in data analysis. This explanation will strengthen the scientific grounding of the categorization process.
  
  (2) The categorization of 'susceptible' and 'unsusceptible' groups might also benefit from a more detailed analysis or discussion concerning the influence of different pain sensitivities or types of pain assessments. Although the study mentions that memory impairment stands independent of pain thresholds, a more nuanced exploration could provide deeper insights.
  
  (3) The article could benefit from more clarity on the protocol of behavioral testing, especially regarding the potential effects of repeated testing on performance outcomes due to learning or stress.
  
  (4) While the connection between S1P/S1PR1 signaling and the molecular pathways highlighted (ITGA2, Rac1/Cdc42, Arp2/3) is intriguing, only ITGA2 underwent further behavioral validation in vivo. Conducting additional behavioral assays for one or more of the molecular targets could substantially strengthen these findings.
  
  (5) Discussions regarding effective drug thresholds and the potential for non-specific effects are essential to fully evaluate the implications of pharmacological interventions utilized in the study.
  
  Minor Concerns:
  
  (1) Clarification of evidence of the specific infusion sites in pharmacological experiments would enhance the transparency and replicability of these methods.
  
  For the infusion of S1PR1 agonist, guide cannula (internal diameter 0.34 mm, RWD) was unilaterally implanted into DG of hippocampus (-1.3 A/P, -1.95 M/L, and -2.02 D/V) as evidenced by Figure 5B.
  
  (2) It would be beneficial if the manuscript provided details regarding the efficiency and reach of viral transfection within the neuronal population. This information would help in assessing the impact of genetic manipulations.
  
  S1PR1 immunostaining showed that the efficiency is quite high and the reach of viral transfection is sufficient.
  
  Author response image 4.
  
  (3) The manuscript should make explicit the normalization techniques used in quantitative assessments such as Western blotting, including the housekeeping genes or proteins used for this purpose.
  
  Here, we used housekeeping protein normalization for normalizing Western blot data. GAPDH was used as the internal control. First, the stained blot is imaged, a rectangle is drawn around the target protein in each lane, and the signal intensity inside the rectangle is measured by using ImageJ. The signal intensity obtained can then be normalized by being divided by the signal intensity of the loading internal control (GAPDH) detected on the same blot. The average of the ratios from the control group is calculated, and all individual ratios are divided by this average to obtain a new set of values, which represent the normalized values (Line 619-625).
  
  (4) Details about the control groups in behavioral assessments were subjected to comparable handling and experimental conditions as the chronic pain groups are crucial, barring nerve injury, for maintaining the integrity of the comparative analysis.
  
  We agree that a control group and an experimental group is identical in all respects except for one difference-nerve injury. We have added this point in the method (Line 520-522).
  
  Minor Recommendations:
  
  The following four minor recommendations are the same with the four minor concerns from Reviewer 3 on Page 12-13. Please refer to the answers above.
  
  (1) Clarify the specifics of infusion site verification in pharmacological experiments.
  
  (2) Provide details on the efficiency and neuronal reach of viral transfections.
  
  (3) Explicitly describe the normalization techniques used in quantitative assessments.
  
  (4) Ensure that control groups in behavioral assessments undergo comparable handling to maintain analysis integrity.
  
  References
  
  (1) Gualdoni, S., et al., Normal levels of Rac1 are important for dendritic but not axonal development in hippocampal neurons. Biology of the Cell, 2007. 99(8): p. 455-464.
  
  (2) Alam, M.S., Proximity Ligation Assay (PLA). Curr Protoc Immunol, 2018. 123(1): p. e58.
  
  (3) Song, P., S. Zhang, and J. Li, Co-immunoprecipitation Assays to Detect In Vivo Association of Phytochromes with Their Interacting Partners. Methods Mol Biol, 2021. 2297: p. 75-82.
  
  (4) Krieger, C.C., et al., Proximity ligation assay to study TSH receptor homodimerization and crosstalk with IGF-1 receptors in human thyroid cells. Frontiers in Endocrinology, 2022. 13.
  
  (5) Arruda-Carvalho, M., et al., Conditional Deletion of α-CaMKII Impairs Integration of Adult-Generated Granule Cells into Dentate Gyrus Circuits and Hippocampus-Dependent Learning. The Journal of Neuroscience, 2014. 34(36): p. 11919-11928.
  
  (6) Wolf, A., et al., A Comprehensive Behavioral Test Battery to Assess Learning and Memory in 129S6/Tg2576 Mice. PLoS One, 2016. 11(1): p. e0147733.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.30.596721v3
www.biorxiv.org www.biorxiv.org

Progressive neural engagement within the IFG-pMTG circuit as gesture and speech entropy and MI advances

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Responses to Editors:
  
  We appreciate the editors’ concern regarding the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process—particularly due to the close temporal and spatial proximity of the stimulation windows and the potential for prolonged disruption. While we agree with that stimulation techniques, such as transcranial magnetic stimulation (TMS), can evoke or modulate neuronal activity both locally within the target region and in remote connected areas of the network. This complex interaction makes drawing clear conclusions about the causal relationship between stimulation and cognitive function more challenging. However, we believe that cause-and-effect relationships in cognitive neuroscience studies using non-invasive brain stimulation (NIBS) can still be robustly established if key assumptions are explicitly tested and confounding factors are rigorously controlled (Bergmann & Hartwigsen et al., 2021, J Cogn Neurosci).
  
  In our experiment, we addressed these concerns by including a sham TMS condition, an irrelevant control task, and multiple control time points. The results showed that TMS selectively disrupted the IFG-pMTG interaction during specific time windows of the task related to gesture-speech semantic congruency, but not in the sham TMS condition or the control task (gender congruency effect) (Zhao et al., 2021, JN). This selective disruption provides strong evidence for a causal link between IFG-pMTG connectivity and gesture-speech integration in the targeted time window.
  
  Regarding the potential for transient artifacts from TMS, we acknowledge that previous research has demonstrated that single-pulse TMS induces brief artifacts (0–10 ms) due to direct depolarization of cortical neurons, which momentarily disrupts electrical activity in the stimulated area (Romero et al., 2019, NC). However, in the case of paired-pulse TMS (ppTMS), the interaction between the first and second pulses is more complex. The first pulse increases membrane conductance in the target neurons via shunting inhibition mediated by GABAergic interneurons. This effectively lowers neuronal membrane resistance, “leaking” excitatory current and diminishing the depolarization induced by the second pulse, leading to a reduction in excitability during the paired-pulse interval. This mechanism suppresses the excitatory response to the second pulse, which is reflected in a reduced motor evoked potential (MEP) (Paulus & Rothwell, 2016, J Physiol).
  
  Furthermore, ppTMS has been widely used in previous studies to infer causal temporal relationships and explore the neural contributions of both structurally and functionally connected brain regions, across timescales as brief as 3–60 ms. We have reviewed several studies that employed paired-pulse TMS to investigate neural dynamics in regions such as the tongue and lip areas of the primary motor cortex (M1), as well as high-level semantic regions like the pMTG, PFC, and ATL (Table 1). These studies consistently demonstrate the methodological rigor and precision of double-pulse TMS in elucidating the temporal dynamics between different brain regions within short temporal windows.
  
  Given these precedents and the evidence provided, we respectfully assert the validity of the methods employed in our study. We therefore kindly request the editors to reconsider the assessment that “the methods are insufficient for studying tightly-coupled brain regions over short timescales.” We hope that the editors’ concerns about the complexities of TMS-induced effects have been adequately addressed, and that our study’s design and results provide a clear and convincing causal argument for the role of IFG-pMTG in gesture-speech integration.
  
  Author response table 1.
  
  Double-pulse TMS studies on brain regions over 3-60 ms time interval
  
  Reference
  
  Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.
  
  Amemiya, T., Beck, B., Walsh, V., Gomi, H., & Haggard, P. (2017). Visual area V5/hMT+ contributes to perception of tactile motion direction: a TMS study. Scientific reports, 7(1), 40937.
  
  Muessgens, D., Thirugnanasambandam, N., Shitara, H., Popa, T., & Hallett, M. (2016). Dissociable roles of preSMA in motor sequence chunking and hand switching—a TMS study. Journal of Neurophysiology, 116(6), 2637-2646.
  
  Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.
  
  Pitcher, D. (2014). Facial expression recognition takes longer in the posterior superior temporal sulcus than in the occipital face area. Journal of Neuroscience, 34(27), 9173-9177.
  
  Bardi, L., Kanai, R., Mapelli, D., & Walsh, V. (2012). TMS of the FEF interferes with spatial conflict. Journal of cognitive neuroscience, 24(6), 1305-1313.
  
  D’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2012). The role of the motor system in discriminating normal and degraded speech sounds. Cortex, 48(7), 882-887.
  
  Pitcher, D., Duchaine, B., Walsh, V., & Kanwisher, N. (2010). TMS evidence for feedforward and feedback mechanisms of face and body perception. Journal of Vision, 10(7), 671-671.
  
  Gagnon, G., Blanchet, S., Grondin, S., & Schneider, C. (2010). Paired-pulse transcranial magnetic stimulation over the dorsolateral prefrontal cortex interferes with episodic encoding and retrieval for both verbal and non-verbal materials. Brain Research, 1344, 148-158.
  
  Kalla, R., Muggleton, N. G., Juan, C. H., Cowey, A., & Walsh, V. (2008). The timing of the involvement of the frontal eye fields and posterior parietal cortex in visual search. Neuroreport, 19(10), 1067-1071.
  
  Pitcher, D., Garrido, L., Walsh, V., & Duchaine, B. C. (2008). Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. Journal of Neuroscience, 28(36), 8929-8933.
  
  Til Ole Bergmann, Gesa Hartwigsen; Inferring Causality from Noninvasive Brain Stimulation in Cognitive Neuroscience. J Cogn Neurosci 2021; 33 (2): 195–225. https://doi.org/10.1162/jocn_a_01591
  
  Romero, M.C., Davare, M., Armendariz, M. et al. Neural effects of transcranial magnetic stimulation at the single-cell level. Nat Commun 10, 2642 (2019). https://doi.org/10.1038/s41467-019-10638-7
  
  Paulus W, Rothwell JC. Membrane resistance and shunting inhibition: where biophysics meets state-dependent human neurophysiology. J Physiol. 2016 May 15;594(10):2719-28. doi: 10.1113/JP271452. PMID: 26940751; PMCID: PMC4865581.
  
  Staat, C., Gattinger, N., & Gleich, B. (2022). PLUSPULS: A transcranial magnetic stimulator with extended pulse protocols. HardwareX, 13. https://doi.org/10.1016/j.ohx.2022.e00380
  
  Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. https://doi.org/10.1523/jneurosci.1355-21.2021.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).
  
  Strengths:
  
  A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.
  
  We sincerely thank the reviewer for recognizing our efforts in conducting three experiments to explore the neural activity linked to the amount of information processed during multisensory gesture-speech integration. In Experiment 1, we observed that the extent of inhibition in the pMTG and LIFG was closely linked to the overlapping gesture-speech responses, as quantified by mutual information. Building on the established roles of the pMTG and LIFG in our previous study (Zhao et al., 2021, JN), we then expanded our investigation to determine whether the dynamic neural engagement between the pMTG and LIFG during gesture-speech processing was also associated with the quality of the information. This hypothesis was further validated through high-temporal resolution EEG, where we examined ERP components related to varying information contents. Notably, we observed a close time alignment between the ERP components and the time windows of the TMS effects, which were associated with the same informational matrices in gesture-speech processing.
  
  Weaknesses:
  
  (1) One major issue is that there is a tight anatomical coupling between pMTG and LIFG. Stimulating one area could therefore also result in stimulation of the other area (see Silvanto and Pascual-Leone, 2008). I therefore think it is very difficult to tease apart the contribution of these areas to the speech-gesture integration process, especially considering that the authors stimulate these regions in time windows that are very close to each other in both time and space (and the disruption might last longer over time).
  
  Response 1: We greatly appreciate the reviewer’s careful consideration. We trust that the explanation provided above has clarified this issue (see Response to Editors for detail).
  
  (2) Related to this point, it is unclear to me why the HD-TDCS/TMS is delivered in set time windows for each region. How did the authors determine this, and how do the results for TMS compare to their previous work from 2018 and 2023 (which describes a similar dataset+design)? How can they ensure they are only targeting their intended region since they are so anatomically close to each other?
  
  Response 2: The current study builds on a series of investigations that systematically examined the temporal and spatial dynamics of gesture-speech integration. In our earlier work (Zhao et al., 2018, J. Neurosci), we demonstrated that interrupting neural activity in the IFG or pMTG using TMS selectively disrupted the semantic congruency effect (reaction time costs due to semantic incongruence), without affecting the gender congruency effect (reaction time costs due to gender incongruence). These findings identified the IFG and pMTG as critical hubs for gesture-speech integration. This informed the brain regions selected for subsequent studies.
  
  In Zhao et al. (2021, J. Neurosci), we employed a double-pulse TMS protocol, delivering stimulation within one of eight 40-ms time windows, to further examine the temporal involvement of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, confirming the dynamic and temporally staged roles of these regions during gesture-speech integration.
  
  In Zhao et al. (2023, Frontiers in Psychology), we investigated the semantic predictive role of gestures relative to speech by comparing two experimental conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. We observed time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG only in the second condition, leading to the conclusion that gestures exert a semantic priming effect on co-occurring speech. These findings underscored the semantic advantage of gesture in facilitating speech integration, further refining our understanding of the temporal and functional interplay between these modalities.
  
  The design of the current study—including the choice of brain regions and time windows—was directly informed by these prior findings. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (mutual information, MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and ERPs to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.
  
  We acknowledge that the rationale for the design of the current study was not fully articulated in the original manuscript. In the revised version, we provided a more comprehensive and coherent explanation of the logic behind the three experiments, as well as the alignment with our previous findings in Lines 75-102:
  
  ‘To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI.
  
  Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).
  
  Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics.’
  
  Although the IFG and pMTG are anatomically close, the consistent differentiation of their respective roles, as evidenced by our experiment across various time windows (TWs) and supported by previous research (see Response to editors for details), reinforces the validity of the stimulation effect observed in our study.
  
  References
  
  Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.
  
  Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. https://doi.org/10.1523/jneurosci.1355-21.2021.
  
  Zhao, W. (2023). TMS reveals a two-stage priming circuit of gesture-speech integration. Front Psychol 14, 1156087. 10.3389/fpsyg.2023.1156087.
  
  Bikson, M., Inoue, M., Akiyama, H., Deans, J.K., Fox, J.E., Miyakawa, H., and Jefferys, J.G.R. (2004). Effects of uniform extracellular DC electric fields on excitability in rat hippocampal slices. J Physiol-London 557, 175-190. 10.1113/jphysiol.2003.055772.
  
  Federmeier, K.D., Mai, H., and Kutas, M. (2005). Both sides get the point: hemispheric sensitivities to sentential constraint. Memory & Cognition 33, 871-886. 10.3758/bf03193082.
  
  Kelly, S.D., Kravitz, C., and Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89, 253-260. 10.1016/s0093-934x(03)00335-3.
  
  Wu, Y.C., and Coulson, S. (2005). Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654-667. 10.1111/j.1469-8986.2005.00356.x.
  
  Fritz, I., Kita, S., Littlemore, J., and Krott, A. (2021). Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliates. J Mem Lang 117, 104191. 10.1016/j.jml.2020.104191.
  
  Gunter, T.C., and Weinbrenner, J.E.D. (2017). When to take a gesture seriously: On how we use and prioritize communicative cues. J Cognitive Neurosci 29, 1355-1367. 10.1162/jocn_a_01125.
  
  Ozyurek, A., Willems, R.M., Kita, S., and Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. J Cognitive Neurosci 19, 605-616. 10.1162/jocn.2007.19.4.605.
  
  (3) As the EEG signal is often not normally distributed, I was wondering whether the authors checked the assumptions for their Pearson correlations. The authors could perhaps better choose to model the different variables to see whether MI/entropy could predict the neural responses. How did they correct the many correlational analyses that they have performed?
  
  Response 3: We greatly appreciate the reviewer’s thoughtful comments.
  
  (1) Regarding the questioning of normal distribution of EEG signals and the use of Pearson correlation, in Figure 5 of the manuscript, we have already included normal distribution curves to illustrate the relationships between average ERP amplitudes across each ROI or elicited cluster and the three information models.
  
  Additionally, we performed the Shapiro-Wilk test, a widely accepted method for assessing bivariate normality, on both the MI/entropy and averaged ERP data. The p-values for all three combinations were greater than 0.05, indicating that the sample data from all bivariate combinations were normally distributed (Author response table 2).
  
  Author response table 2.
  
  Shapiro-Wilk results of bivariable normality test
  
  To further consolidate the relationship between entropy/MI and various ERP components, we also conducted a Spearman rank correlation analysis (Author response table 3-5). While the correlation between speech entropy and ERP amplitude in the P1 component yielded a p-value of 0.061, all other results were consistent with those obtained from the Pearson correlation analysis across the three experiments. Therefore, our conclusion that progressive neural responses reflected the degree of information remains robust. Although the Spearman rank and Pearson correlation analyses yielded similar results, we opted to report the Pearson correlation coefficients throughout the manuscript to maintain consistency.
  
  Author response table 3.
  
  Comparison of Pearson and Spearman results in Experiment 1
  
  Author response table 4.
  
  Comparison of Pearson and Spearman results in Experiment 2
  
  Author response table 5.
  
  Comparison of Pearson and Spearman results in Experiment 3
  
  (2) Regarding the reviewer’s comment ‘choose to model the different variables to see whether MI/entropy could predict the neural responses’, we employed Representational Similarity Analysis (RSA) (Popal et.al, 2019) with MI and entropy as continuous variables. This analysis aimed to build a model to predict neural responses based on these feature metrics.
  
  To capture dynamic temporal features indicative of different stages of multisensory integration, we segmented the EEG data into overlapping time windows (40 ms in duration with a 10 ms step size). The 40 ms window was chosen based on the TMS protocol used in Experiment 2, which also employed a 40 ms time window. The 10 ms step size (equivalent to 5 time points) was used to detect subtle shifts in neural responses that might not be captured by larger time windows, allowing for a more granular analysis of the temporal dynamics of neural activity.
  
  Following segmentation, the EEG data were reshaped into a four-dimensional matrix (42 channels × 20 time points × 97 time windows × 20 features). To construct a neural similarity matrix, we averaged the EEG data across time points within each channel and each time window. The resulting matrix was then processed using the pdist function to compute pairwise distances between adjacent data points. This allowed us to calculate correlations between the neural matrix and three feature similarity matrices, which were constructed in a similar manner. These three matrices corresponded to (1) gesture entropy, (2) speech entropy, and (3) mutual information (MI). This approach enabled us to quantify how well the neural responses corresponded to the semantic dimensions of gesture and speech stimuli at each time window.
  
  To determine the significance of the correlations between neural activity and feature matrices, we conducted 1000 permutation tests. In this procedure, we randomized the data or feature matrices and recalculated the correlations repeatedly, generating a null distribution against which the observed correlation values were compared. Statistical significance was determined if the observed correlation exceeded the null distribution threshold (p < 0.05). This permutation approach helps mitigate the risk of spurious correlations, ensuring that the relationships between the neural data and feature matrices are both robust and meaningful.
  
  Finally, significant correlations were subjected to clustering analysis, which grouped similar neural response patterns across time windows and channels. This clustering allowed us to identify temporal and spatial patterns in the neural data that consistently aligned with the semantic features of gesture and speech stimuli, thus revealing the dynamic integration of these multisensory modalities across time. Results are as follows:
  
  (1) Two significant clusters were identified for gesture entropy (Author response image 1 left). The first cluster was observed between 60-110 ms (channels F1 and F3), with correlation coefficients (r) ranging from 0.207 to 0.236 (p < 0.001). The second cluster was found between 210-280 ms (channel O1), with r-values ranging from 0.244 to 0.313 (p < 0.001).
  
  (2) For speech entropy (Author response image 1 middle), significant clusters were detected in both early and late time windows. In the early time windows, the largest significant cluster was found between 10-170 ms (channels F2, F4, F6, FC2, FC4, FC6, C4, C6, CP4, and CP6), with r-values ranging from 0.151 to 0.340 (p = 0.013), corresponding to the P1 component (0-100 ms). In the late time windows, the largest significant cluster was observed between 560-920 ms (across the whole brain, all channels), with r-values ranging from 0.152 to 0.619 (p = 0.013).
  
  (3) For mutual information (MI) (Author response image 1 right), a significant cluster was found between 270-380 ms (channels FC1, FC2, FC3, FC5, C1, C2, C3, C5, CP1, CP2, CP3, CP5, FCz, Cz, and CPz), with r-values ranging from 0.198 to 0.372 (p = 0.001).
  
  Author response image 1.
  
  Results of RSA analysis.
  
  These additional findings suggest that even using a different modeling approach, neural responses, as indexed by feature metrics of entropy and mutual information, are temporally aligned with distinct ERP components and ERP clusters, as reported in the current manuscript. This alignment serves to further consolidate the results, reinforcing the conclusion we draw. Considering the length of the manuscript, we did not include these results in the current manuscript.
  
  (3) In terms of the correction of multiple comparisons, in Experiment 1, two separate participant groups were recruited for HD-tDCS applied over either the IFG or pMTG. FDR correction was performed separately for each group, resulting in six comparisons for each brain region (three information matrices × two tDCS effects: anodal-sham or cathodal-sham). In Experiment 2, six comparisons (three information matrices × two sites: IFG or pMTG) were submitted for FDR correction. In Experiment 3, FDR correction was applied to the seven regions of interest (ROIs) within each component, resulting in five comparisons.
  
  Reference:
  
  Wilk, M.B. (2015). The Shapiro Wilk And Related Tests For Normality.
  
  Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to representational similarity analysis for social neuroscience. Social cognitive and affective neuroscience, 14(11), 1243-1253.
  
  (4) The authors use ROIs for their different analyses, but it is unclear why and on the basis of what these regions are defined. Why not consider all channels without making them part of an ROI, by using a method like the one described in my previous comment?
  
  Response 4: For the EEG data, we conducted both a traditional ROI analysis and a cluster-based permutation approach. The ROIs were defined based on a well-established work (Habets et al., 2011), allowing for hypothesis-driven testing of specific regions. In addition, we employed a cluster-based permutation methods, which is data-driven and helps enhance robustness while addressing multiple comparisons. This method serves as a complement to the hypothesis-driven ROI analysis, offering an exploratory, unbiased perspective. Notably, the results from both approaches were consistent, reinforcing the reliability of our findings.
  
  To make the methods more accessible to a broader audience, we clarified the relationship between these approaches in the revised manuscript in Lines 267-270: ‘To consolidate the data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work40, and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons’
  
  Additionally, we conducted an RSA analysis without defining specific ROIs, considering all channels in the analysis. This approach yielded consistent results, further validating the robustness of our findings across different analysis methods. See Response 3 for detail.
  
  Reference:
  
  Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462
  
  (5) The authors describe that they have divided their EEG data into a "lower half" and a "higher half" (lines 234-236), based on entropy scores. It is unclear why this is necessary, and I would suggest just using the entropy scores as a continuous measure.
  
  Response 5: To identify ERP components or spatiotemporal clusters that demonstrated significant semantic differences, we split each model into higher and lower halves based on entropy scores. This division allowed us to capture distinct levels of information processing and explore how different levels of entropy or mutual information (MI) related to neural activity. Specifically, the goal was to highlight the gradual activation process of these components and clusters as they correlate with changes in information content. Remarkably, consistent results were observed between the ERP components and clusters, providing robust evidence that semantic information conveyed through gestures and speech significantly influenced the amplitude of these components or clusters. Moreover, the semantic information was shown to be highly sensitive, varying in tandem with these amplitude changes.
  
  Reviewer #2 (Public review):
  
  Comment:
  
  Summary:
  
  The study is an innovative and fundamental study that clarified important aspects of brain processes for integration of information from speech and iconic gesture (i.e., gesture that depicts action, movement, and shape), based on tDCS, TMS, and EEG experiments. They evaluated their speech and gesture stimuli in information-theoretic ways and calculated how informative speech is (i.e., entropy), how informative gesture is, and how much shared information speech and gesture encode. The tDCS and TMS studies found that the left IFG and pMTG, the two areas that were activated in fMRI studies on speech-gesture integration in the previous literature, are causally implicated in speech-gesture integration. The size of tDC and TMS effects are correlated with the entropy of the stimuli or mutual information, which indicates that the effects stem from the modulation of information decoding/integration processes. The EEG study showed that various ERP (event-related potential, e.g., N1-P2, N400, LPC) effects that have been observed in speech-gesture integration experiments in the previous literature, are modulated by the entropy of speech/gesture and mutual information. This makes it clear that these effects are related to information decoding processes. The authors propose a model of how the speech-gesture integration process unfolds in time, and how IFG and pMTG interact with each other in that process.
  
  Strengths:
  
  The key strength of this study is that the authors used information theoretic measures of their stimuli (i.e., entropy and mutual information between speech and gesture) in all of their analyses. This made it clear that the neuro-modulation (tDCS, TMS) affected information decoding/integration and ERP effects reflect information decoding/integration. This study used tDCS and TMS methods to demonstrate that left IFG and pMTG are causally involved in speech-gesture integration. The size of tDCS and TMS effects are correlated with information-theoretic measures of the stimuli, which indicate that the effects indeed stem from disruption/facilitation of the information decoding/integration process (rather than generic excitation/inhibition). The authors' results also showed a correlation between information-theoretic measures of stimuli with various ERP effects. This indicates that these ERP effects reflect the information decoding/integration process.
  
  We sincerely thank the reviewer for recognizing our efforts and the innovation of employing information-theoretic measures to elucidate the brain processes underlying the multisensory integration of gesture and speech.
  
  Weaknesses:
  
  The "mutual information" cannot fully capture the interplay of the meaning of speech and gesture. The mutual information is calculated based on what information can be decoded from speech alone and what information can be decoded from gesture alone. However, when speech and gesture are combined, a novel meaning can emerge, which cannot be decoded from a single modality alone. When example, a person produces a gesture of writing something with a pen, while saying "He paid". The speech-gesture combination can be interpreted as "paying by signing a cheque". It is highly unlikely that this meaning is decoded when people hear speech only or see gestures only. The current study cannot address how such speech-gesture integration occurs in the brain, and what ERP effects may reflect such a process. Future studies can classify different types of speech-gesture integration and investigate neural processes that underlie each type. Another important topic for future studies is to investigate how the neural processes of speech-gesture integration change when the relative timing between the speech stimulus and the gesture stimulus changes.
  
  We greatly appreciate Reviewer2 ’s thoughtful concern regarding whether "mutual information" adequately captures the interplay between the meanings of speech and gesture. We would like to clarify that the materials used in the present study involved gestures that were performed without actual objects, paired with verbs that precisely describe the corresponding actions. For example, a hammering gesture was paired with the verb “hammer”, and a cutting gesture was paired with the verb “cut”. In this design, all gestures conveyed redundant information relative to the co-occurring speech, creating significant overlap between the information derived from speech alone and that from gesture alone.
  
  We understand the reviewer’s concern about cases where gestures and speech might provide complementary, rather than redundant, information. To address this, we have developed an alternative metric for quantifying information gains contributed by supplementary multisensory cues, which will be explored in a subsequent study. However, for the present study, we believe that the observed overlap in information serves as a key indicator of multisensory convergence, a central focus of our investigation.
  
  Regarding the reviewer’s concern about how neural processes of speech-gesture integration may change with varying relative timing between speech and gesture stimuli, we would like to highlight findings from our previous study (Zhao, 2023, Frontiers in Psychology). In that study, we explored the semantic predictive role of gestures relative to speech under two timing conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. Interestingly, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures play a semantic priming role for co-occurring speech. Building on this, we designed the present study with gestures deliberately preceding speech at its semantic identification point to reflect this semantic priming relationship. Additionally, ongoing research in our lab is exploring gesture and speech interactions in natural conversational settings to investigate whether the neural processes identified here remain consistent across varying contexts.
  
  To address potential concerns and ensure clarity regarding the limitations of the MI measurement, we have included a discussion of tthis in the revised manuscript in Lines 543-547: ‘Furthermore, MI quantifies overlap in gesture-speech integration, primarily when gestures convey redundant meaning. Consequently, the conclusions drawn in this study are constrained to contexts in which gestures serve to reinforce the meaning of the speech. Future research should aim to explore the neural responses in cases where gestures convey supplementary, rather than redundant, semantic information.’ This is followed by a clarification of the timing relationship between gesture and speech: ‘Note that the sequential cortical involvement and ERP components discussed above are derived from a deliberate alignment of speech onset with gesture DP, creating an artificial priming effect with gesture semantically preceding speech. Caution is advised when generalizing these findings to the spontaneous gesture-speech relationships, although gestures naturally precede speech[34].’ (Lines 539-543).
  
  Reviewer #3 (Public review):
  
  In this useful study, Zhao et al. try to extend the evidence for their previously described two-step model of speech-gesture integration in the posterior Middle Temporal Gyrus (pMTG) and Inferior Frontal Gyrus (IFG). They repeat some of their previous experimental paradigms, but this time quantifying Information-Theoretical (IT) metrics of the stimuli in a stroop-like paradigm purported to engage speech-gesture integration. They then correlate these metrics with the disruption of what they claim to be an integration effect observable in reaction times during the tasks following brain stimulation, as well as documenting the ERP components in response to the variability in these metrics.
  
  The integration of multiple methods, like tDCS, TMS, and ERPs to provide converging evidence renders the results solid. However, their interpretation of the results should be taken with care, as some critical confounds, like difficulty, were not accounted for, and the conceptual link between the IT metrics and what the authors claim they index is tenuous and in need of more evidence. In some cases, the difficulty making this link seems to arise from conceptual equivocation (e.g., their claims regarding 'graded' evidence), whilst in some others it might arise from the usage of unclear wording in the writing of the manuscript (e.g. the sentence 'quantitatively functional mental states defined by a specific parser unified by statistical regularities'). Having said that, the authors' aim is valuable, and addressing these issues would render the work a very useful approach to improve our understanding of integration during semantic processing, being of interest to scientists working in cognitive neuroscience and neuroimaging.
  
  The main hurdle to achieving the aims set by the authors is the presence of the confound of difficulty in their IT metrics. Their measure of entropy, for example, being derived from the distribution of responses of the participants to the stimuli, will tend to be high for words or gestures with multiple competing candidate representations (this is what would presumptively give rise to the diversity of responses in high-entropy items). There is ample evidence implicating IFG and pMTG as key regions of the semantic control network, which is critical during difficult semantic processing when, for example, semantic processing must resolve competition between multiple candidate representations, or when there are increased selection pressures (Jackson et al., 2021). Thus, the authors' interpretation of Mutual Information (MI) as an index of integration is inextricably contaminated with difficulty arising from multiple candidate representations. This casts doubt on the claims of the role of pMTG and IFG as regions carrying out gesture-speech integration as the observed pattern of results could also be interpreted in terms of brain stimulation interrupting the semantic control network's ability to select the best candidate for a given context or respond to more demanding semantic processing.
  
  Response 1: We sincerely thank the reviewer for pointing out the confound of difficulty. The primary aim of this study is to investigate whether the degree of activity in the established integration hubs, IFG and pMTG, is influenced by the information provided by gesture-speech modalities and/or their interactions. While we provided evidence for the differential involvement of the IFG and pMTG by delineating their dynamic engagement across distinct time windows of gesture-speech integration and associating these patterns with unisensory information and their interaction, we acknowledge that the mechanisms underlying these dynamics remain open to interpretation. Specifically, whether the observed effects stem from difficulties in semantic control processes, as suggested by the reviewer, or from resolving information uncertainty, as quantified by entropy, falls outside the scope of the current study. Importantly, we view these two interpretations as complementary rather than mutually exclusive, as both may be contributing factors. Nonetheless, we agree that addressing this question is a compelling avenue for future research.
  
  In the revised manuscript, we have included an additional analysis to assess whether the confounding effects of lexical or semantic control difficulty—specifically, the number of available responses—affect the neural outcomes. To address this, we performed partial correlation analyses, controlling for the number of responses.
  
  We would like to clarify an important distinction between the measure of entropy derived from the distribution of responses and the concept of response diversity. Entropy, in our analysis, is computed based on the probability distribution of each response, as captured by the information entropy formula. In contrast, response diversity refers to the simple count of different responses provided. Mutual Information (MI), by its nature, is also an entropy measure, quantifying the overlap in responses. For reference, although we observed a high correlation between the three information matrices and the number of responses (gesture entropy & gesture response number: r = 0.976, p < 0.001; speech entropy & speech response number: r = 0.961, p < 0.001; MI & total response number: r = 0.818, p < 0.001), it is crucial to emphasize that these metrics capture different aspects of the semantic information represented. In the revised manuscript, we have provided a table detailing both entropy and response numbers for each stimulus, to allow for greater transparency and clarity.
  
  Furthermore, we have added a comprehensive description of the partial correlation analysis conducted across all three experiments in the methodology section: for Experiment 1, please refer to Lines 213–222: ‘To account for potential confounds related to multiple candidate representations, we conducted partial correlation analyses between the tDCS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses provided for each gesture and speech, as well as the total number of combined responses. Given that HD-tDCS induces overall disruption at the targeted brain regions, we hypothesized that the neural activity within the left IFG and pMTG would be progressively affected by varying levels of multisensory convergence, as indexed by MI. Moreover, we hypothesized that the modulation of neural activity by MI would differ between the left IFG and pMTG, as reflected in the differential modulation of response numbers in the partial correlations, highlighting their distinct roles in semantic processing[37].’
  
  Experiment 2: ‘To control for potential confounds, partial correlations were also performed between the TMS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses for each gesture and speech, as well as the total number of combined responses. By doing this, we can determine how the time-sensitive contribution of the left IFG and pMTG to gesture–speech integration was affected by gesture and speech information distribution.’ (Lines 242–246).
  
  Experiment 3: ‘Additionally, partial correlations were conducted, accounting for the number of responses for each respective metric’ (Lines 292–293).
  
  As anticipated by the reviewer, we observed a consistent modulation of response numbers across both regions as well as across the four ERP components and associated clusters. The detailed results are presented below:
  
  Experiment 1: ‘However, partial correlation analysis, controlling for the total response number, revealed that the initially significant correlation between the Cathodal-tDCS effect and MI was no longer significant (r = -0.303, p = 0.222, 95% CI = [-0.770, 0.164]). This suggests that the observed relationship between Cathodal-tDCS and MI may be confounded by semantic control difficulty, as reflected by the total number of responses. Specifically, the reduced activity in the IFG under Cathodal-tDCS may be driven by variations in the difficulty of semantic control rather than a direct modulation of MI.’ (Lines 310-316) and ‘’Importantly, the reduced activity in the pMTG under Cathodal-tDCS was not influenced by the total response number, as indicated by the non-significant correlation (r = -0.253, p = 0.295, 95% CI = [-0.735, 0.229]). This finding was further corroborated by the unchanged significance in the partial correlation between Cathodal-tDCS and MI, when controlling for the total response number (r = -0.472, p = 0.048, 95% CI = [-0.903, -0.041]). (Lines 324-328).
  
  Experiment 2:’ Notably, inhibition of pMTG activity in TW2 was not influenced by the number of speech responses (r = -0.539, p = 0.087, 95% CI = [-1.145, 0.067]). However, the number of speech responses did affect the modulation of speech entropy on the pMTG inhibition effect in TW2. This was evidenced by the non-significant partial correlation between pMTG inhibition and speech entropy when controlling for speech response number (r = -0.218, p = 0.545, 95% CI = [-0.563, 0.127]).
  
  In contrast, the interrupted IFG activity in TW6 appeared to be consistently influenced by the confound of semantic control difficulty. This was reflected in the significant correlation with both gesture response number (r = -0.480, p = 0.032, 95% CI = [-904, -0.056]), speech response number (r = -0.729, p = 0.011, 95% CI = [-1.221, -0.237]), and total response number (r = -0.591, p = 0.008, 95% CI = [-0.993, -0.189]). Additionally, partial correlation analyses revealed non-significant relationship between interrupted IFG activity in TW6 and gesture entropy (r = -0.369, p = 0.120, 95% CI = [-0.810, -0.072]), speech entropy (r = -0.455, p = 0.187, 95% CI = [-1.072, 0.162]), and MI (r = -0.410, p = 0.091, 95% CI = [-0.856, -0.036]) when controlling for response numbers.’ (Lines 349-363)
  
  Experiment 3: ‘To clarify potential confounds of semantic control difficulty, partial correlation analyses were conducted to examine the relationship between the elicited ERP components and the relevant information matrices, controlling for response numbers. Results consistently indicated modulation by response numbers in the relationship of ERP components with the information matrix, as evidenced by the non-significant partial correlations between the P1 amplitude (P1 component over ML: r = -0.574, p = 0.082, 95% CI = [-1.141, -0.007]) and the P1 cluster (r = -0.503, p = 0.138, 95% CI = [-1.102, 0.096]) with speech entropy; the N1-P2 amplitude (N1-P2 component over LA: r = -0.080, p = 0.746, 95% CI = [-0.554, 0.394]) and N1-P2 cluster (r \= -0.179, p = 0.464, 95% CI = [-0.647, 0.289]) with gesture entropy; the N400 amplitude (N400 component over LA: r = 0.264, p = 0.247, 95% CI = [-0.195,0.723]) and N400 cluster (r = 0.394, p = 0.095, 95% CI = [-0.043, 0.831]) with gesture entropy; the N400 amplitude (N400 component over LA: r = -0.134, p = 0.595, 95% CI = [-0.620, 0.352]) and N400 cluster (r = -0.034, p = 0.894, 95% CI = [-0.524,0.456]) with MI; and the LPC amplitude (LPC component over LA: r \= -0.428, p = 0.217, 95% CI = [-1.054, 0.198]) and LPC cluster (r \= -0.202, p = 0.575, 95% CI = [-0.881, 0.477]) with speech entropy.’ (Lines 424-438)
  
  Based on the above results, we conclude that there is a dynamic interplay between the difficulty of semantic representation and the control pressures that shape the resulting neural responses. Furthermore, while the role of the IFG in control processes remains consistent, the present study reveals a more segmented role for the pMTG. Specifically, although the pMTG is well-established in the processing of distributed speech information, the integration of multisensory convergence, as indexed by MI, did not elicit the same control-related modulation in pMTG activity. A comprehensive discussion of the control process in shaping neural responses, as well as the specific roles of the IFG and pMTG in this process, is provided in the Discussion section in Lines (493-511): ‘Given that control processes are intrinsically integrated with semantic processing50, a distributed semantic representation enables dynamic modulation of access to and manipulation of meaningful information, thereby facilitating flexible control over the diverse possibilities inherent in a concept. Accordingly, an increased number of candidate responses amplifies the control demands necessary to resolve competing semantic representations. This effect was observed in the present study, where the association of the information matrix with the tDCS effect in IFG, the inhibition of pMTG activity in TW2, disruption of IFG activity in TW6, and modulation of four distinct ERP components collectively demonstrated that response quantity modulated neural activity. These results underscore the intricate interplay between the difficulty of semantic representation and the control pressures that shape the resulting neural responses.
  
  The IFG and pMTG, central components of the semantic control network, have been extensively implicated in previous research 50-52. While the role of the IFG in managing both unisensory information and multisensory convergence remains consistent, as evidenced by the confounding difficulty results across Experiments 1 and 2, the current study highlights a more context-dependent function for the pMTG. Specifically, although the pMTG is well-established in the processing of distributed speech information, the multisensory convergence, indexed by MI, did not evoke the same control-related modulation in pMTG activity. These findings suggest that, while the pMTG is critical to semantic processing, its engagement in control processes is likely modulated by the specific nature of the sensory inputs involved’
  
  Reference:
  
  Tesink, C.M.J.Y., Petersson, K.M., van Berkum, J.J.A., van den Brink, D., Buitelaar, J.K., and Hagoort, P. (2009). Unification of speaker and meaning in language comprehension: An fMRI study. J Cognitive Neurosci 21, 2085-2099. 10.1162/jocn.2008.21161
  
  Jackson, R.L. (2021). The neural correlates of semantic control revisited. Neuroimage 224, 117444. 10.1016/j.neuroimage.2020.117444.
  
  Jefferies, E. (2013). The neural basis of semantic cognition: converging evidence from neuropsychology, neuroimaging and TMS. Cortex 49, 611-625. 10.1016/j.cortex.2012.10.008.
  
  Noonan, K.A., Jefferies, E., Visser, M., and Lambon Ralph, M.A. (2013). Going beyond inferior prefrontal involvement in semantic control: evidence for the additional contribution of dorsal angular gyrus and posterior middle temporal cortex. J Cogn Neurosci 25, 1824-1850. 10.1162/jocn_a_00442.
  
  In terms of conceptual equivocation, the use of the term 'graded' by the authors seems to be different from the usage commonly employed in the semantic cognition literature (e.g., the 'graded hub hypothesis', Rice et al., 2015). The idea of a graded hub in the controlled semantic cognition framework (i.e., the anterior temporal lobe) refers to a progressive degree of abstraction or heteromodal information as you progress through the anatomy of the region (i.e., along the dorsal-to-ventral axis). The authors, on the other hand, seem to refer to 'graded manner' in the context of a correlation of entropy or MI and the change in the difference between Reaction Times (RTs) of semantically congruent vs incongruent gesture-speech. The issue is that the discourse through parts of the introduction and discussion seems to conflate both interpretations, and the ideas in the main text do not correspond to the references they cite. This is not overall very convincing. What is it exactly the authors are arguing about the correlation between RTs and MI indexes? As stated above, their measure of entropy captures the spread of responses, which could also be a measure of item difficulty (more diverse responses imply fewer correct responses, a classic index of difficulty). Capturing the diversity of responses means that items with high entropy scores are also likely to have multiple candidate representations, leading to increased selection pressures. Regions like pMTG and IFG have been widely implicated in difficult semantic processing and increased selection pressures (Jackson et al., 2021). How is this MI correlation evidence of integration that proceeds in a 'graded manner'? The conceptual links between these concepts must be made clearer for the interpretation to be convincing.
  
  Response 2: Regarding the concern of conceptual equivocation, we would like to emphasize that this study represents the first attempt to focus on the relationship between information quantity and neural engagement, a question addressed in three experiments. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and ERPs to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.
  
  Therefore, the incremental engagement of the integration hub of IFG and pMTG along with the informativeness of gesture and speech during multisensory integration is different from the "graded hub," which refers to anatomical distribution. We sincerely apologize for this oversight. In the revised manuscript, we have changed the relevant conceptual equivocation in Lines 44-60: ‘Consensus acknowledges the presence of 'convergence zones' within the temporal and inferior parietal areas [1], or the 'semantic hub' located in the anterior temporal lobe[2], pivotal for integrating, converging, or distilling multimodal inputs. Contemporary theories frame the semantic processing as a dynamic sequence of neural states[3], shaped by systems that are finely tuned to the statistical regularities inherent in sensory inputs[4]. These regularities enable the brain to evaluate, weight, and integrate multisensory information, optimizing the reliability of individual sensory signals[5]. However, sensory inputs available to the brain are often incomplete and uncertain, necessitating adaptive neural adjustments to resolve these ambiguities [6]. In this context, neuronal activity is thought to be linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations[7,8]. Although the role of 'convergence zones' and 'semantic hubs' in integrating multimodal inputs is well established, the precise functional patterns of neural activity in response to the distribution of unified multisensory information—along with the influence of unisensory signals—remain poorly understood.
  
  To this end, we developed an analytic approach to directly probe the cortical engagement during multisensory gesture-speech semantic integration.’
  
  Furthermore, in the Discussion section, we have replaced the term 'graded' with 'incremental' (Line 456,). Additionally, we have included a discussion on the progressive nature of neural engagement, as evidenced by the correlation between RTs and MI indices in Lines 483-492: ‘The varying contributions of unisensory gesture-speech information and the convergence of multisensory inputs, as reflected in the correlation between distinct ERP components and TMS time windows (TMS TWs), are consistent with recent models suggesting that multisensory processing involves parallel detection of modality-specific information and hierarchical integration across multiple neural levels[4,48]. These processes are further characterized by coordination across multiple temporal scales[49]. Building on this, the present study offers additional evidence that the multi-level nature of gesture-speech processing is statistically structured, as measured by information matrix of unisensory entropy and multisensory convergence index of MI, the input of either source would activate a distributed representation, resulting in progressively functioning neural responses.’
  
  Reference:
  
  Damasio, H., Grabowski, T.J., Tranel, D., Hichwa, R.D., and Damasio, A.R. (1996). A neural basis for lexical retrieval. Nature 380, 499-505. DOI 10.1038/380499a0.
  
  Patterson, K., Nestor, P.J., and Rogers, T.T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience 8, 976-987. 10.1038/nrn2277.
  
  Brennan, J.R., Stabler, E.P., Van Wagenen, S.E., Luh, W.M., and Hale, J.T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94. 10.1016/j.bandl.2016.04.008.
  
  Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.
  
  Noppeney, U. (2021). Perceptual Inference, Learning, and Attention in a Multisensory World. Annual Review of Neuroscience, Vol 44, 2021 44, 449-473. 10.1146/annurev-neuro-100120-085519.
  
  Ma, W.J., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Annu Rev Neurosci 37, 205-220. 10.1146/annurev-neuro-071013-014017.
  
  Fischer, B.J., and Pena, J.L. (2011). Owl's behavior and neural representation predicted by Bayesian inference. Nat Neurosci 14, 1061-1066. 10.1038/nn.2872.
  
  Ganguli, D., and Simoncelli, E.P. (2014). Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput 26, 2103-2134. 10.1162/NECO_a_00638.
  
  Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.
  
  Senkowski, D., and Engel, A.K. (2024). Multi-timescale neural dynamics for multisensory integration. Nat Rev Neurosci 25, 625-642. 10.1038/s41583-024-00845-7.
  
  Reviewer #2 (Recommendations for the authors):
  
  I have a number of small suggestions to make the paper more easy to understand.
  
  We sincerely thank the reviewer for their careful reading and thoughtful consideration. All suggestions have been thoroughly addressed and incorporated into the revised manuscript.
  
  (1) Lines 86-87, please clarify whether "chronometric double-pulse TMS" should lead to either excitation or inhibition of neural activities
  
  Double-pulse TMS elicits inhibition of neural activities (see responses to editors), which has been clarified in the revised manuscript in Lines 90-93: ‘we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI)’
  
  (2) Line 106 "validated by replicating the semantic congruencey effect". Please specify what the task was in the validation study.
  
  The description of the validation task has been added in Lines 116-119: ‘To validate the stimuli, 30 participants were recruited to replicate the multisensory index of semantic congruency effect, hypothesizing that reaction times for semantically incongruent gesture-speech pairs would be significantly longer than those for congruent pairs.’
  
  (3) Line 112. "30 subjects". Are they Chinese speakers?
  
  Yes, all participants in the present study, including those in the pre-tests, are native Chinese speakers.
  
  (4) Line 122, "responses for each item" Please specify whether you mean here "the comprehensive answer" as you defined in 118-119.
  
  Yes, and this information has been added in Lines 136-137: ‘comprehensive responses for each item were converted into Shannon's entropy (H)’
  
  (5) Line 163 "one of three stimulus types (Anodal, Cathodal or Sham)". Please specify whether the order of the three conditions was counterbalanced across participants. Or, whether the order was fixed for all participants.
  
  The order of the three conditions was counterbalanced across participants, a clearer description has been added in the revised manuscript in Lines 184-189: ‘Participants were divided into two groups, with each group undergoing HD-tDCS stimulation at different target sites (IFG or pMTG). Each participant completed three experimental sessions, spaced one week apart, during which 480 gesture-speech pairs were presented across various conditions. In each session, participants received one of three types of HD-tDCS stimulation: Anodal, Cathodal, or Sham. The order of stimulation site and type was counterbalanced using a Latin square design to control for potential order effects.’
  
  (6) Line 191-192, "difference in reaction time between semantic incongruence and semantic congruent pairs)" Here, please specify which reaction time was subtracted from which one. This information is very crucial; without it, you cannot interpret your graphs.
  
  (17) Figure 3. Figure caption for (A). "The semantic congruence effect was calculated as the reaction time difference between...". You need to specify which condition was subtracted from what condition; otherwise, you cannot interpret this figure. "difference" is too ambiguous.
  
  Corrections have been made in the revised manuscript in Lines 208-211: ‘Neural responses were quantified based on the effects of HD-tDCS (active tDCS minus sham tDCS) on the semantic congruency effect, defined as the difference in reaction times between semantic incongruent and congruent conditions (Rt(incongruent) - Rt(congruent))’ and Line 796-798: ‘The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs (Rt(incongruent) - Rt(congruent))’.
  
  (7) Line 363 "progressive inhibition of IFG and pMTG by HD-tDCS as the degree of gesture-speech interaction, indexed by MI, advanced." This sentence is very hard to follow. I don't understand what part of the data in Figure 3 speaks to "inhibition of IFG". And what is "HD-tDCS"? I think it is easier to read if you talk about correlation (not "progressive" and "advanced").
  
  High-Definition transcranial direct current stimulation (HD-tDCS) was applied to modulate the activity of pMTG and IFG, with cathodal stimulation inducing inhibitory effects and anodal stimulation facilitating neural activity. In Figure 3, we examined the relationship between the tDCS effects on pMTG and IFG and the three information matrices (entropy and MI). Our results revealed significant correlations between MI and the cathodal-tDCS effects in both regions. We acknowledge that the original phrasing may have been unclear, and in the revised manuscript, we have provided a more explicit explanation to enhance clarity in Lines 443-445: ‘Our results, for the first time, revealed that the inhibition effect of cathodal-tDCS on the pMTG and IFG correlated with the degree of gesture-speech multisensory convergence, as indexed by MI’.
  
  (8) Lines 367-368 I don't understand why gesture is top down and speech is bottom up. Is that because gesture precedes speech (gesture is interpretable at the point of speech onset)?
  
  Yes, since we employed a semantic priming paradigm by aligning speech onset with the gesture comprehension point, we interpret the gesture-speech integration process as an interaction between the top-down prediction from gestures and the bottom-up processing of speech. In the revised manuscript, we have provided a clearer and more coherent description that aligns with the results. Lines 445-449: ‘Moreover, the gradual neural engagement was found to be time-sensitive and staged, as evidenced by the selectively interrupted time windows (Experiment 2) and the distinct correlated ERP components (Experiment 3), which were modulated by different information contributors, including unisensory entropy or multisensory MI’
  
  (9) Line 380 - 381. Can you spell out "TW" and "IP"?
  
  (16) Line 448, NIBS, Please spell out "NIBS".
  
  "TW" have been spelled out in Lines 459: ‘time windows (TW)’,"IP" in Line 460: ‘identification point (IP)’. The term "NIBS" was replaced with "HD-tDCS and TMS" to provide clearer specification of the techniques employed: ‘Consistent with this, the present study provides robust evidence, through the application of HD-tDCS and TMS, that the integration hubs for gesture and speech—the pMTG and IFG—operate in an incremental manner.’ (Lines 454-457).
  
  (10) Line 419, The higher certainty of gesture => The higher the certainty of gesture is
  
  (13) Line 428, "a larger MI" => "a larger MI is"
  
  (12) Line 427-428, "the larger overlapped neural populations" => "the larger, the overlapped neural populations"
  
  Changes have been made in Line 522 ‘The higher the certainty of gesture is’ , Line 531: ‘a larger MI is’ and Line 530 ‘the larger, overlapped neural populations’
  
  (11) Line 423 "Greater TMS effect over the IFG" Can you describe the TMS effect?
  
  TMS effect has been described as ‘Greater TMS inhibitory effect’ (Line 526)
  
  (14) Line 423 "reweighting effect" What is this? Please describe (and say which experiment it is about).
  
  Clearer description has been provided in Lines 535-538: ‘As speech entropy increases, indicating greater uncertainty in the information provided by speech, more cognitive effort is directed towards selecting the targeted semantic representation. This leads to enhanced involvement of the IFG and a corresponding reduction in LPC amplitude’.
  
  (15) Line 437 "the graded functionality of every disturbed period is not guaranteed" (I don't understand this sentence).
  
  Clearer description has been provided in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56], whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).’
  
  References:
  
  Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006
  
  Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013
  
  (18) Figure 4. "TW1", "TW2", etc. are not informative. Either replace them with the actual manuscript or add manuscript information (either in the graph itself or in the figure title).
  
  Information was added into the figure title ‘Figure 4. TMS impacts on semantic congruency effect across various time windows (TW).’ (Line 804), included a detailed description of each time window in Lines 805-807: ‘(A) Five time windows (TWs) showing selective disruption of gesture-speech integration were chosen: TW1 (-120 to -80 ms relative to speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms).’
  
  (19) Table 2C.
  
  The last column is titled "p(xi, yi)". I don't understand why the authors use this label for this column.
  
  In the formula, at the very end, there is "p(xi|yi). I wonder why it is p(xi|yi), as opposed to p(yi|xi).
  
  Mutual Information (MI) was calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of the individual entropies of gesture and speech (Entropy(gesture) + Entropy(speech)). Thus, the p(xi,yi) aimed to describe the entropy of the combined dataset. We acknowledge the potential ambiguity in the original description, and in the revised manuscript, we have changed the formula of p(xi,yi) into ‘p(xi+yi)’ (Line 848) in Table 2C, and the relevant equation of MI ‘’. Also we provided a clear MI calculation process in Lines 143-146: ‘MI was used to measure the overlap between gesture and speech information, calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of their individual entropies (Entropy(gesture) + Entropy(speech)) (see Appendix Table 2C)’.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) The authors should try and produce data showing that the confound of difficulty due to the number of lexical or semantic representations is not underlying high-entropy items if they wish to improve the credibility of their claim that the disruption of the congruency effect is due to speech-gesture integration. Additionally, they should provide more evidence either in the form of experiments or references to better justify why mutual information is an index for integration in the first place.
  
  Response 1: An additional analysis has been conducted to assess whether the number of lexical or semantic representations affect the neural outcomes, please see details in the Responses to Reviewer 3 (public review) response 1.
  
  Mutual information (MI), a concept rooted in information theory, quantifies the reduction in uncertainty about one signal when the other is known, thereby capturing the statistical dependence between them. MI is calculated as the difference between the individual entropies of each signal and their joint entropy, which reflects the total uncertainty when both signals are considered together. This metric aligns with the core principle of multisensory integration: different modalities reduce uncertainty about each other by providing complementary, predictive information. Higher MI values signify that the integration of sensory signals results in a more coherent and unified representation, while lower MI values indicate less integration or greater divergence between the modalities. As such, MI serves as a robust and natural index for assessing the degree of multisensory integration.
  
  To date, the use of MI as an index of integration has been limited, with one notable study by Tremblay et al. (2016), cited in the manuscript, using pointwise MI to quantify the extent to which two syllables mutually constrain each other. While MI has been extensively applied in natural language processing to measure the co-occurrence strength between words (e.g., Lin et al., 2012), its application as an index of multisensory convergence—particularly in the context of gesture-speech integration as employed in this study—is novel. In the revised manuscript, we have clarified the relationship between MI and multisensory convergence: ‘MI assesses share information between modalities[25],indicating multisensory convergence and acting as an index of gesture-speech integration’ (Lines 73-74).
  
  Also, in our study, we calculated MI as per its original definition, by subtracting the entropy of summed dataset of gesture-speech from the combined entropies of gesture and speech. The detailed calculation method is provided in Lines 136-152: ‘To quantify information content, comprehensive responses for each item were converted into Shannon's entropy (H) as a measure of information richness (Figure 1A bottom). With no significant gender differences observed in both gesture (t(20) = 0.21, p = 0.84) and speech (t(20) = 0.52, p = 0.61), responses were aggregated across genders, resulting in 60 answers per item (Appendix Table 2). Here, p(xi) and p(yi) represent the distribution of 60 answers for a given gesture (Appendix Table 2B) and speech (Appendix Table 2A), respectively. High entropy indicates diverse answers, reflecting broad representation, while low entropy suggests focused lexical recognition for a specific item (Figure 2B). MI was used to measure the overlap between gesture and speech information, calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of their individual entropies (Entropy(gesture) + Entropy(speech)) (see Appendix Table 2C). For specific gesture-speech combinations, equivalence between the combined entropy and the sum of individual entropies (gesture or speech) indicates absence of overlap in response sets. Conversely, significant overlap, denoted by a considerable number of shared responses between gesture and speech datasets, leads to a noticeable discrepancy between combined entropy and the sum of gesture and speech entropies. Elevated MI values thus signify substantial overlap, indicative of a robust mutual interaction between gesture and speech.’
  
  Additional examples outlined in Appendix Table 2 in Lines 841-848:
  
  This novel application of MI as a multisensory convergence index offers new insights into how different sensory modalities interact and integrate to shape semantic processing.
  
  Reference:
  
  Tremblay, P., Deschamps, I., Baroni, M., and Hasson, U. (2016). Neural sensitivity to syllable frequency and mutual information in speech perception and production. Neuroimage 136, 106-121. 10.1016/j.neuroimage.2016.05.018
  
  Lin, W., Wu, Y., & Yu, L. (2012). Online Computation of Mutual Information and Word Context Entropy. International Journal of Future Computer and Communication, 167-169.
  
  (2) Finally, if the authors wish to address the graded hub hypothesis as posited by the controlled semantic cognition framework (e.g., Rice et al., 2015), they would have to stimulate a series of ROIs progressing gradually through the anatomy of their candidate regions showing the effects grow along this spline, more than simply correlate MI with RT differences.
  
  Response 2: We appreciate the reviewer’s thoughtful consideration. The incremental engagement of the integration hub of IFG and pMTG along with the informativeness of gesture and speech during multisensory integration is different from the concept of "graded hub," which refers to anatomical distribution. See Responses to reviewer 3 (public review) response 2 for details.
  
  (3) The authors report significant effects with p values as close to the threshold as p=0.49 for the pMTG correlation in Experiment 1, for example. How confident are the authors these results are reliable and not merely their 'statistical luck'? Especially in view of sample sizes that hover around 22-24 participants, which have been called into question in the field of non-invasive brain stimulation (e.g., Mitra et al, 2021)?
  
  Response 3: In Experiment 1, a total of 52 participants were assigned to two groups, each undergoing HD-tDCS stimulation over either the inferior frontal gyrus (IFG) or posterior middle temporal gyrus (pMTG), yielding 26 participants per group for correlation analysis. Power analysis, conducted using G*Power, indicated that a sample size of 26 participants per group would provide sufficient power (0.8) to detect a large effect size (0.5) at an alpha level of 0.05, justifying the chosen sample size. To control for potential statistical artifacts, we compared the results to those from the unaffected control condition.
  
  In the Experiment 1, participants were tasked with a gender categorization task, where they responded as accurately and quickly as possible to the gender of the voice they saw, while gender congruency (e.g., a male gesture paired with a male voice or a female gesture with a male voice) was manipulated. This manipulation served as direct control, enabling the investigation of automatic and implicit semantic interactions between gesture and speech. This relevant information was provided in the manuscript in Lines 167-172:‘An irrelevant factor of gender congruency (e.g., a man making a gesture combined with a female voice) was created[22,23,35]. This involved aligning the gender of the voice with the corresponding gender of the gesture in either a congruent (e.g., male voice paired with a male gesture) or incongruent (e.g., male voice paired with a female gesture) manner. This approach served as a direct control mechanism, facilitating the investigation of the automatic and implicit semantic interplay between gesture and speech[35]’. Correlation analyses were conducted to examine the TMS disruption effects on gender congruency, comparing reaction times for gender-incongruent versus congruent trials. No significant correlations were found between TMS disruption effects on either the IFG (Cathodal-tDCS effect with MI: r = 0.102, p = 0.677; Anodal-tDCS effect with MI: r = 0.178, p = 0.466) or pMTG (Cathodal-tDCS effect with MI: r \= -0.201, p = 0.410; Anodal-tDCS effect with MI: r = -0.232, p = 0.338).
  
  Moreover, correlations between the TMS disruption effect on semantic congruency and both gesture entropy, speech entropy, and mutual information (MI) were examined. P-values of 0.290, 0.725, and 0.049 were observed, respectively.
  
  The absence of a TMS effect on gender congruency, coupled with the lack of significance when correlated with the other information matrices, highlights the robustness of the significant finding at p = 0.049.
  
  (4) The distributions of entropy for gestures and speech are very unequal. Whilst entropy for gestures has high variability, (.12-4.3), that of speech is very low (ceiling effect?) with low variance. Can the authors comment on whether they think this might have affected their analyses or results in any way? For example, do they think this could be a problem when calculating MI, which integrates both measures? L130-131.'
  
  Response 4: We sincerely thank the reviewer for raising this insightful question. The core premise of the current study is that brain activity is modulated by the degree of information provided. Accordingly, the 20 entropy values for gesture and speech represent a subset of the overall entropy distribution, with the degree of entropy correlating with a distributed pattern of neural activity, regardless of the scale of variation. This hypothesis aligns with previous studies suggesting that neuronal activity is linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations (Fischer & Pena, 2011; Ganguli & Simoncelli, 2014).
  
  Importantly, we conducted another EEG experiment with 30 subjects. Given the inherent differences between gesture and speech, it is important to note that speech, being more structurally distinct, tends to exhibit lower variability than gesture. To prevent an imbalance in the distribution of gesture and speech, we manipulated the information content of each modality. Specifically, we created three conditions for both gesture and speech (i.e., 0.75, 1, and 1.25 times the identification threshold), thereby ensuring comparable variance between the two modalities: gesture (mean entropy = 2.91 ± 1.01) and speech (mean entropy = 1.82 ± 0.71) (Author response table 6).
  
  Full-factorial RSA analysis revealed an early P1 effect (0-100 ms) for gesture and a late LPC effect (734-780 ms) for speech (Author response image 2b). Crucially, the identified clusters showed significant correlations with both gesture (Author response image 2c1) and speech entropy (Author response image 2c3), respectively. These findings replicate the results of the present study, demonstrating that, irrespective of the variance in gesture and speech entropy, both modalities elicited ERP amplitude responses in a progressive manner that aligned with their respective information distributions.
  
  Regarding the influence on MI values, since MI was calculated based on the overlapping responses between gesture and speech, a reduction in uncertainty during speech comprehension would naturally result in a smaller contribution to the MI value. However, as hypothesized above, the MI values were also assumed to represent a subset of the overall distribution, where the contributions of both gesture and speech are expected to follow a normal distribution. This hypothesis was further supported by our replication experiment. When the contributions of gesture and speech were balanced, a correlation between MI values and N400 amplitude was observed (Author response image 2c2), consistent with the results reported in the present manuscript. These findings not only support the idea that the correlation between MI and ERP components is unaffected by the subset of MI values but also confirm the replicability of our results.
  
  Author response table 6.
  
  Quantitative entropy for each gesture stimulus (BD: before discrimination point; DP: discrimination point; AD: after discrimination point) and speech stimulus (BI: before identification point; IP: identification point; AI: after identification point).
  
  Author response image 2.
  
  Results of group-level analysis and full-factorial RSA. a: The full-factorial representational similarity analysis (RSA) framework is illustrated schematically. Within the general linear model (GLM), the light green matrix denotes the representational dissimilarity matrix (RDM) for gesture semantic states, while light blue matrix represents speech semantic states, and the light red matrix illustrates the semantic congruency effect. The symbol ‘e’ indicates the random error term. All matrices, including the neural dissimilarity matrix, are structured as 18 * 18 matrices, corresponding to 18 conditions (comprising 3 gesture semantic states, 3 speech semantic states, and 2 congruency conditions). b: Coding strength for gesture states, speech states and congruency effect. Shaded clusters represent regions where each factor exhibited significant effects. Clusters with lower opacity correspond to areas where the grand-mean ERP amplitudes across conditions showed the highest correlation with unimodal entropy or MI. c1-c6: Topographical correlation maps illustrate the four significant RSA clusters (top), accompanied by the highest correlations between ERP amplitudes within the significant RSA clusters and the information matrices (bottom). Black dots represent electrodes exhibiting significant correlations, while black stars highlight the electrode with the highest correlation coefficient.
  
  (5) L383: Why are the authors calling TW2 pre-lexical and TW6 post-lexical? I believe they must provide evidence or references justifying calling these periods pre- and post-lexical. This seems critical given the argument they're trying to make in this paragraph.
  
  Response 5: The time windows (TWs) selected for the current study were based on our previous work (Zhao et al., 2021, J. Neurosci). In that study, we employed a double-pulse TMS protocol, delivering stimulation across eight 40-ms time windows: three windows preceding the speech identification point (TWs 1-3) and five windows following it (TWs 4-8). The pre-lexical time windows (TWs 1-3) occur before speech identification, while the post-lexical time windows (TWs 4-8) occur after this point. in the revised manuscript, we have made that clear in Lines 462-466:
  
  “In TW2 of gesture-speech integration, which precedes the speech identification point23 and represents a pre-lexical stage, the suppression effect observed in the pMTG was correlated with speech entropy. Conversely, during TW6, which follows the speech identification point23 and represents a post-lexical stage, the IFG interruption effect was influenced by both gesture entropy, speech entropy, and their MI”
  
  Reference:
  
  Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.
  
  (6) Below, I recommend the authors improve their description of the criteria employed to select ROIs. This is important for several reasons. For example, the lack of a control ROI presumably not implicated in integration makes the interpretation of the specificity of the results difficult. Additionally, other regions have been proposed more consistently by recent evidence as multimodal integrators, like for example, the angular gyrus (Humphreys, 2021), or the anterior temporal lobe. The inclusion of IFG as a key region for integration and the oversight of angular gyrus seems to me unjustified in the light of recent evidence.
  
  Response 6: We appreciate the reviewer’s thoughtful consideration. The selection of IFG and pMTG as ROIs was based on a meta-analysis of multiple fMRI studies on gesture-speech integration, in which these two locations were consistently identified as activated. See Table 2 for details of the studies and coordinates of brain locations reported.
  
  Author response table 7.
  
  Meta-analysis of previous studies on gesture-speech integration.
  
  Based on the meta-analysis of previous studies, we selected the IFG and pMTG as ROIs for gesture-speech integration. The rationale for selecting these brain regions is outlined in the introduction in Lines 65-68: ‘Empirical studies have investigated the semantic integration between gesture and speech by manipulating their semantic relationship[15-18] and revealed a mutual interaction between them[19-21] as reflected by the N400 latency and amplitude[14] as well as common neural underpinnings in the left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG)[15,22,23]’.
  
  And further described in Lines 79-80: ‘_Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG ’._ And Lines 87-90: ‘Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to assess whether the activity of these regions was associated with relevant informational matrices’.
  
  In the Methods section, we clarified the selection of coordinates in Lines 193-199: ‘Building on a meta-analysis of prior fMRI studies examining gesture-speech integration[22], we targeted Montreal Neurological Institute (MNI) coordinates for the left IFG at (-62, 16, 22) and the pMTG at (-50, -56, 10). In the stimulation protocol for HD-tDCS, the IFG was targeted using electrode F7 as the optimal cortical projection site[36], with four return electrodes placed at AF7, FC5, F9, and FT9. For the pMTG, TP7 was selected as the cortical projection site36, with return electrodes positioned at C5, P5, T9, and P9.’
  
  The selection of IFG or pMTG as integration hubs for gesture and speech has also been validated in our previous studies. Specifically, Zhao et al. (2018, J. Neurosci) applied TMS to both areas. Results demonstrated that disrupting neural activity in the IFG or pMTG via TMS selectively impaired the semantic congruency effect (reaction time costs due to semantic incongruence), while leaving the gender congruency effect unaffected. These findings identified the IFG and pMTG as crucial hubs for gesture-speech integration, guiding the selection of brain regions for our subsequent studies.
  
  In addition, Zhao et al. (2021, J. Neurosci) employed a double-pulse TMS protocol across eight 40-ms time windows to explore the temporal dynamics of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, further supporting the dynamic and temporally staged involvement of these regions in gesture-speech integration.
  
  While we have solid rationale for selecting the IFG and pMTG as key regions, we acknowledge the reviewer's point that the involvement of additional functionally and anatomically brain areas, cannot be excluded. We have included in the discussion as limitations in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56], whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).’
  
  References:
  
  Willems, R.M., Ozyurek, A., and Hagoort, P. (2009). Differential roles for left inferior frontal and superior temporal cortex in multimodal integration of action and language. Neuroimage 47, 1992-2004. 10.1016/j.neuroimage.2009.05.066.
  
  Drijvers, L., Jensen, O., and Spaak, E. (2021). Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Human Brain Mapping 42, 1138-1152. 10.1002/hbm.25282.
  
  Drijvers, L., and Ozyurek, A. (2018). Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditions. Brain and Language 177, 7-17. 10.1016/j.bandl.2018.01.003.
  
  Drijvers, L., van der Plas, M., Ozyurek, A., and Jensen, O. (2019). Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noise. Neuroimage 194, 55-67. 10.1016/j.neuroimage.2019.03.032.
  
  Holle, H., and Gunter, T.C. (2007). The role of iconic gestures in speech disambiguation: ERP evidence. J Cognitive Neurosci 19, 1175-1192. 10.1162/jocn.2007.19.7.1175.
  
  Kita, S., and Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 48, 16-32. 10.1016/S0749-596x(02)00505-3.
  
  Bernardis, P., and Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia 44, 178-190. 10.1016/j.neuropsychologia.2005.05.007.
  
  Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.
  
  Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.
  
  Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.
  
  Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016.
  
  Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006
  
  Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013
  
  (7) Some writing is obscure or unclear, in part due to superfluous words like 'intricate neural processes' on L74. Or the sentence in L47 - 48 about 'quantitatively functional mental states defined by a specific parser unified by statistical regularities' which, even read in context, fails to provide clarity about what a quantitatively functional mental state is, or how it is defined by specific parsers (or what these are), and what is the link to statistical regularities. In some cases, this lack of clarity leads to difficulties assessing the appropriateness of the methods, or the exact nature of the claims. For example, do they mean degree of comprehension instead of comprehensive value? I provide some more examples below:
  
  Response 7: We appreciate the reviewer’s thoughtful consideration. The revised manuscript now includes a clear description and a detailed explanation of the association with the statistical logic, addressing the concerns raised in Lines 47-55: ‘Contemporary theories frame the semantic processing as a dynamic sequence of neural states[3], shaped by systems that are finely tuned to the statistical regularities inherent in sensory inputs[4]. These regularities enable the brain to evaluate, weight, and integrate multisensory information, optimizing the reliability of individual sensory signals [5]. However, sensory inputs available to the brain are often incomplete and uncertain, necessitating adaptive neural adjustments to resolve these ambiguities[6]. In this context, neuronal activity is thought to be linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations[7,8].’
  
  References:
  
  Brennan, J.R., Stabler, E.P., Van Wagenen, S.E., Luh, W.M., and Hale, J.T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94. 10.1016/j.bandl.2016.04.008.
  
  Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.
  
  Noppeney, U. (2021). Perceptual Inference, Learning, and Attention in a Multisensory World. Annual Review of Neuroscience, Vol 44, 2021 44, 449-473. 10.1146/annurev-neuro-100120-085519.
  
  Ma, W.J., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Annu Rev Neurosci 37, 205-220. 10.1146/annurev-neuro-071013-014017.
  
  Fischer, B.J., and Pena, J.L. (2011). Owl's behavior and neural representation predicted by Bayesian inference. Nat Neurosci 14, 1061-1066. 10.1038/nn.2872.
  
  Ganguli, D., and Simoncelli, E.P. (2014). Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput 26, 2103-2134. 10.1162/NECO_a_00638.
  
  Comment 7.1: a) I am not too sure what they mean by 'response consistently provided by participants for four to six consecutive instances' [L117-118]. They should be clearer with the description of these 'pre-test' study methods.
  
  Response 7.1: Thank you for this insightful question. An example of a participant's response to the gesture 'an' is provided below (Table 3). Initially, within 240 ms, the participant provided the answer "an," which could potentially be a guess. To ensure that the participant truly comprehends the gesture, we repeatedly present it until the participant’s response stabilizes, meaning the same answer is given consistently over several trials. While one might consider fixing the number of repetitions (e.g., six trials), this could lead to participants predicting the rule and providing the same answer out of habit. To mitigate this potential bias, we allow the number of repetitions to vary flexibly between four and six trials.
  
  We understand that the initial phrase might be ambiguous, in the revised manuscript, we have changed the phrase into: ‘For each gesture or speech, the action verb consistently provided by participants across four to six consecutive repetitions—with the number of repetitions varied to mitigate learning effects—was considered the comprehensive response for the gesture or speech.’ (Lines 130-133)
  
  Author response table 8.
  
  Example of participant's response to the gesture 'an'
  
  Comment 7.2: b) I do not understand the paragraph in L143 - 146. This is important to rephrase for clarification. What are 'stepped' neural changes? What is the purpose of 'aggregating' neural responses with identical entropy / MI values?
  
  Response 7.2: It is important to note that the 20 stimuli exhibit 20 increments of gesture entropy values, 11 increments of speech entropy values, and 19 increments of mutual information values (Appendix Table 3). This discrepancy arises from the calculation of entropy and mutual information, where the distributions were derived from the comprehensive set of responses contributed by all 30 participants. As a result, these values were impacted not only by the distinct nameabilities of the stimuli but also by the entirety of responses provided. Consequently, in the context of speech entropy, 9 items demonstrate the nameability of 1, signifying unanimous comprehension among all 30 participants, resulting in an entropy of 0. Moreover, stimuli 'ning' and 'jiao' share an identical distribution, leading to an entropy of 0.63. Regarding MI, a value of 0.66 is computed for the combinations of stimuli 'sao' (gesture entropy: 4.01, speech entropy: 1.12, Author response image 32) and 'tui' (gesture entropy: 1.62, speech entropy: 0, Author response image 4). This indicates that these two sets of stimuli manifest an equivalent degree of integration.
  
  Author response image 3.
  
  Example of gesture answers (gesture sao), speech answers (speech sao), and mutual information (MI) for the ‘sao’ item
  
  Author response image 4.
  
  Example of gesture answers (gesture tui), speech answers (speech tui), and mutual information (MI) for the ‘tui’ item
  
  To precisely assess whether lower entropy/MI corresponds to a smaller or larger neural response, neural responses (ERP amplitude or TMS inhibition effect) with identical entropy or MI values were averaged before undergoing correlational analysis. We understand that the phrasing might be ambiguous. Clear description has been changed in the revised manuscript in Lines 157-160: ‘To determine whether entropy or MI values corresponds to distinct neural changes, the current study first aggregated neural responses (including inhibition effects of tDCS and TMS or ERP amplitudes) that shared identical entropy or MI values, prior to conducting correlational analyses.’
  
  Comment 7.3: c) The paragraph in L160-171 is confusing. Is it an attempt to give an overview of all three experiments? If so, consider moving to the end or summarising what each experiment is at the beginning of the paragraph giving it a name (i.e., TMS). Without that, it is unclear what each experiment is counterbalancing or what 'stimulation site' refers to, for example, leading to a significant lack of clarity.
  
  Response 7.3: We are sorry for the ambiguity, in the revised manuscript, we have moved the relevant phrasing to the beginning of each experiment.
  
  ‘Experiment 1: HD-tDCS protocol and data analysis
  
  Participants were divided into two groups, with each group undergoing HD-tDCS stimulation at different target sites (IFG or pMTG). Each participant completed three experimental sessions, spaced one week apart, during which 480 gesture-speech pairs were presented across various conditions. In each session, participants received one of three types of HD-tDCS stimulation: Anodal, Cathodal, or Sham. The order of stimulation site and type was counterbalanced using a Latin square design to control for potential order effects’ (Lines 183-189)
  
  ‘Experiment 2: TMS protocol and data analysis
  
  Experiment 2 involved 800 gesture-speech pairs, presented across 15 blocks over three days, with one week between sessions. Stimulation was administered at three different sites (IFG, pMTG, or Vertex). Within the time windows (TWs) spanning the gesture-speech integration period, five TWs that exhibited selective disruption of integration were selected: TW1 (-120 to -80 ms relative to the speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms)23 (Figure 1C). The order of stimulation site and TW was counterbalanced using a Latin square design.’ (Lines 223-230)
  
  ‘Experiment 3: Electroencephalogram (EEG) recording and data analysis
  
  Experiment 3, comprising a total of 1760 gesture-speech pairs, was completed in a single-day session.’ (Lines 249-250)
  
  Comment 7.4: d) L402-406: This sentence is not clear. What do the authors mean by 'the state of [the neural landscape] constructs gradually as measured by entropy and MI'? How does this construct a neural landscape? The authors must rephrase this paragraph using clearer language since in its current state it is very difficult to assess whether it is supported by the evidence they present.
  
  Response 7.4: We are sorry for the ambiguity, in the revised manuscript we have provided clear description in Lines 483-492: ‘The varying contributions of unisensory gesture-speech information and the convergence of multisensory inputs, as reflected in the correlation between distinct ERP components and TMS time windows (TMS TWs), are consistent with recent models suggesting that multisensory processing involves parallel detection of modality-specific information and hierarchical integration across multiple neural levels[4,48]. These processes are further characterized by coordination across multiple temporal scales[49]. Building on this, the present study offers additional evidence that the multi-level nature of gesture-speech processing is statistically structured, as measured by information matrix of unisensory entropy and multisensory convergence index of MI, the input of either source would activate a distributed representation, resulting in progressively functioning neural responses’
  
  References:
  
  Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.
  
  Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.
  
  Senkowski, D., and Engel, A.K. (2024). Multi-timescale neural dynamics for multisensory integration. Nat Rev Neurosci 25, 625-642. 10.1038/s41583-024-00845-7.
  
  (8) Some writing suffers from conceptual equivocation. For example, the link between 'multimodal representation' and gesture as a type of multimodal extralinguistic information is not straightforward. What 'multimodal representations' usually refer to in semantic cognition is not the co-occurrence of gesture and speech, but the different sources or modalities that inform the structure of a semantic representation or concept (not the fact we use another modality vision to perceive gestures that enrich the linguistic auditory communication of said concepts). See also my comment in the public review regarding the conceptual conflation of the graded hub hypothesis.
  
  Response 8: We aimed to clarify that the integration of gesture and speech, along with the unified representation it entails, is not merely a process whereby perceived gestures enhance speech comprehension. Rather, there exists a bidirectional influence between these two modalities, affecting both their external forms (Bernaidis et al., 2006) and their semantic content (Kita et al., 2003; Kelly et al., 2010). Given that multisensory processing is recognized as an interplay of both top-down and bottom-up mechanisms, we hypothesize that this bidirectional semantic influence between gesture and speech operates similarly. Consequently, we recorded neural responses—specifically the inhibitory effects observed through TMS/tDCS or ERP components—beginning at the onset of speech, which marks the moment when both modalities are accessible.
  
  We prioritize gesture for two primary reasons. Firstly, from a naturalistic perspective, speech and gesture are temporally aligned; gestures typically precede their corresponding speech segments by less than one second (Morrelsamuls et al., 1992). This temporal alignment has prompted extensive research aimed at identifying the time windows during which integration occurs (Obermeier et al., 2011, 2015). Results indicate that local integration of gesture and speech occurs within a time frame extending from -200 ms to +120 ms relative to gesture-speech alignment, where -200 ms indicates that gestures occur 200 ms before speech onset, and +120 ms signifies gestures occurring after the identification point of speech.
  
  Secondly, in our previous study (Zhao, 2023), we investigated this phenomenon by manipulating gesture-speech alignment across two conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. Notably, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures serve a semantic priming function for co-occurring speech.
  
  We recognize that our previous use of the term "co-occurring speech" may have led to ambiguity. Therefore, in the revised manuscript, we have replaced those sentences with a detailed description of the properties of each modality in Lines 60-62: ‘Even though gestures convey information in a global-synthetic way, while speech conveys information in a linear segmented way, there exists a bidirectional semantic influence between the two modalities[9,10]’
  
  Conceptual conflation of the graded hub hypothesis has been clarified in the Response to Reviewer 3 (public review) response 2.
  
  References:
  
  Bernardis, P., & Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia, 44(2), 178-190
  
  Kelly, S. D., Ozyurek, A., & Maris, E. (2010b). Two sides of the same coin: speech and gesture mutually interact to enhance comprehension. Psychological Science, 21(2), 260-267. doi:10.1177/0956797609357327
  
  Kita, S., & Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16-32. doi:10.1016/s0749-596x(02)00505-3
  
  Obermeier, C., & Gunter, T. C. (2015). Multisensory Integration: The Case of a Time Window of Gesture-Speech Integration. Journal of Cognitive Neuroscience, 27(2), 292-307. doi:10.1162/jocn_a_00688
  
  Obermeier, C., Holle, H., & Gunter, T. C. (2011). What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help. Journal of Cognitive Neuroscience, 23(7), 1648-1663. doi:10.1162/jocn.2010.21498
  
  Morrelsamuels, P., & Krauss, R. M. (1992). WORD FAMILIARITY PREDICTS TEMPORAL ASYNCHRONY OF HAND GESTURES AND SPEECH. Journal of Experimental Psychology-Learning Memory and Cognition, 18(3), 615-622. doi:10.1037/0278-7393.18.3.615
  
  Hostetter, A., and Mainela-Arnold, E. (2015). Gestures occur with spatial and Motoric knowledge: It's more than just coincidence. Perspectives on Language Learning and Education 22, 42-49. doi:10.1044/lle22.2.42.
  
  McNeill, D. (2005). Gesture and though (University of Chicago Press). 10.7208/chicago/9780226514642.001.0001.
  
  Zhao, W. (2023). TMS reveals a two-stage priming circuit of gesture-speech integration. Front Psychol 14, 1156087. 10.3389/fpsyg.2023.1156087.
  
  (9) The last paragraph of the introduction lacks a conductive thread. The authors describe three experiments without guiding the reader through a connecting thread underlying the experiments. Feels more like three disconnected studies than a targeted multi-experiment approach to solve a problem. What is each experiment contributing to? What is the 'grand question' or thread unifying these?
  
  Response 9: The present study introduced three experiments to explore the neural activity linked to the amount of information processed during multisensory gesture-speech integration. In Experiment 1, we observed that the extent of inhibition in the pMTG and LIFG was closely linked to the overlapping gesture-speech responses, as quantified by mutual information. Building on the established roles of the pMTG and LIFG in our previous study (Zhao et al., 2021, JN), we then expanded our investigation to determine whether the dynamic neural engagement between the pMTG and LIFG during gesture-speech processing was also associated with the quality of the information. This hypothesis was further validated through high-temporal resolution EEG, where we examined ERP components related to varying information qualities. Notably, we observed a close time alignment between the ERP components and the time windows of the TMS effects, which were associated with the same informational matrices in gesture-speech processing.
  
  Linkage of the three experiments has been clarified in the introduction in Lines 75-102: ‘
  
  To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI.
  
  Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).
  
  Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics’
  
  References:
  
  Bikson, M., Inoue, M., Akiyama, H., Deans, J.K., Fox, J.E., Miyakawa, H., and Jefferys, J.G.R. (2004). Effects of uniform extracellular DC electric fields on excitability in rat hippocampal slices. J Physiol-London 557, 175-190. 10.1113/jphysiol.2003.055772.
  
  Federmeier, K.D., Mai, H., and Kutas, M. (2005). Both sides get the point: hemispheric sensitivities to sentential constraint. Memory & Cognition 33, 871-886. 10.3758/bf03193082.
  
  Kelly, S.D., Kravitz, C., and Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89, 253-260. 10.1016/s0093-934x(03)00335-3.
  
  Wu, Y.C., and Coulson, S. (2005). Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654-667. 10.1111/j.1469-8986.2005.00356.x.
  
  Fritz, I., Kita, S., Littlemore, J., and Krott, A. (2021). Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliates. J Mem Lang 117, 104191. 10.1016/j.jml.2020.104191.
  
  Gunter, T.C., and Weinbrenner, J.E.D. (2017). When to take a gesture seriously: On how we use and prioritize communicative cues. J Cognitive Neurosci 29, 1355-1367. 10.1162/jocn_a_01125.
  
  Ozyurek, A., Willems, R.M., Kita, S., and Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. J Cognitive Neurosci 19, 605-616. 10.1162/jocn.2007.19.4.605.
  
  Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.
  
  (10) The authors should provide a clearer figure to appreciate their paradigm, illustrating clearly the stimulus presentation (gesture and speech).
  
  Response 10: To reduce ambiguity, unnecessary arrows were deleted from Figure 1.
  
  Comment 11.1: (11) Required methodological clarifications to better assess the strength of the evidence presented:
  
  a) Were the exclusion criteria only handedness and vision? Did the authors exclude based on neurological and psychiatric disorders? Psychoactive drugs? If not, do they think the lack of these exclusion criteria might have influenced their results?
  
  Response 11.1: Upon registration, each participant is required to complete a questionnaire alongside the consent form and handedness questionnaire. This procedure is designed to exclude individuals with potential neurological or psychiatric disorders, as well as other factors that may affect their mental state or reaction times. Consequently, all participants reported in the manuscript do not have any of the aforementioned neurological or psychiatric disorders. The questionnaire is attached below:
  
  Author response image 4.
  
  Comment 11.2: b) Are the subjects from the pre-tests (L112-113) and the replication study (L107) a separate sample or did they take part in Experiments 1-3?
  
  Response 11.2: The participants in each pre-test and experiment were independent, resulting in a total of 188 subjects. Since the stimuli utilized in this study were previously validated and reported (Zhao et al., 2021), the 90 subjects who participated in the three pre-tests are not included in the final count for the current study, leaving a total of 98 participants reported in the manuscript in Lines 103-104: ‘Ninety-eight young Chinese participants signed written informed consent forms and took part in the present study’.
  
  Comment 11.3: c) L176. The authors should explain how they selected ROIs. This is very important for the reasons outlined above.
  
  Response 11.3: Please see Response to Comment 6 for details.
  
  Comment 11.4: d) The rationale for Experiment 1 and its analysis approach should be explicitly described. Why perform Pearson correlations? What is the conceptual explanation of the semantic congruency effect and why should it be expected to correlate with the three information-theoretic metrics? What effects could the authors expect to find and what would they mean? There is a brief description in L187-195 but it is unclear.
  
  Response 11.4: We thank the reviewer for their rigorous consideration. The semantic congruency effect is widely used as an index of multisensory integration. Therefore, the effects of HD-tDCS on the IFG and pMTG, as measured by changes in the semantic congruency effect, serve as an indicator of altered neural responses to multisensory integration. In correlating these changes with behavioral indices of information degree, we aimed to assess whether the integration hubs (IFG and pMTG) function progressively during multisensory gesture-speech integration. The rationale for using Pearson correlations is based on the hypothesis that the 20 sets of stimuli used in this study represent a sample from a normally distributed population. Thus, even with changes in the sample (e.g., using another 20 values), the gradual relationship between neural responses and the degree of information would remain unchanged. This hypothesis is supported by the findings from another experiment (see details in Response to Comment 4).
  
  In the revised manuscript, we have provided a clear description of the rationale for Experiment 1 in Lines 206-219: ‘To examine the relationship between the degree of information and neural responses, we conducted Pearson correlation analyses using a sample of 20 sets. Neural responses were quantified based on the effects of HD-tDCS (active tDCS minus sham tDCS) on the semantic congruency effect, defined as the difference in reaction times between semantic incongruent and congruent conditions (Rt(incongruent) - Rt(congruent)). This effect served as an index of multisensory integration[35] within the left IFG and pMTG. The variation in information was assessed using three information-theoretic metrics. To account for potential confounds related to multiple candidate representations, we conducted partial correlation analyses between the tDCS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses provided for each gesture and speech, as well as the total number of combined responses. Given that HD-tDCS induces overall disruption at the targeted brain regions, we hypothesized that the neural activity within the left IFG and pMTG would be progressively affected by varying levels of multisensory convergence, as indexed by MI.’
  
  Additionally, in the introduction, we have rephrased the relevant rationale in Lines 75-86: _‘_To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI
  
  Reference:
  
  Kelly, S.D., Creigh, P., and Bartolotti, J. (2010). Integrating speech and iconic gestures in a Stroop-like task: Evidence for automatic processing. Journal of Cognitive Neuroscience 22, 683-694. 10.1162/jocn.2009.21254.
  
  Comment 11.5: e) The authors do not mention in the methods if FDR correction was applied to the Pearson correlations in Experiment 1. There is a mention in the Results Figure, but it is unclear if it was applied consistently. Can the authors confirm, and explicitly state the way they carried out FDR correction for this family of tests in Experiment 1? This is especially important in the light of some of their results having a p-value of p=.049.
  
  Response 11.5: FDR correction was applied to Experiment 1, and all reported p-values were corrected using this method. In the revised manuscript, we have included a reference to FDR correction in Lines 221-222: ‘False discovery rate (FDR) correction was applied for multiple comparisons.’
  
  In Experiment 1, since two separate participant groups (each N = 26) were recruited for the HD-tDCS over either the IFG or pMTG, FDR correction was performed separately for each group. Therefore, for each brain region, six comparisons (three information matrices × two tDCS effects: anodal-sham or cathodal-sham) were submitted for FDR correction.
  
  In Experiment 2, six comparisons (three information matrices × two sites: IFG or pMTG) were submitted for FDR correction. In Experiment 3, FDR correction was applied to the seven regions of interest (ROIs) within each component, resulting in five comparisons
  
  The confidence of a p-value of 0.049 was clarified in Response to Comment 3.
  
  Comment 11.6: f) L200. What does the abbreviation 'TW' stands for in this paragraph? When was it introduced in the main text? The description is in the Figure, but it should be moved to the main text.]
  
  Comment 11.7: g) How were the TWs chosen? Is it the criterion in L201-203? If so, it should be moved to the start of the paragraph. What does the word 'selected' refer to in that description? Selected for what? The explanation seems to be in the Figure, but it should be in the main text. It is still not a complete explanation. What were the criteria for assigning TWs to the IFG or pMTG?
  
  Response 11.6& 11.7: Since the two comments are related, we will provide a synthesized response. 'TW' refers to time window, the selection of which was based on our previous study (Zhao et al., 2021, J. Neurosci). In Zhao et al. (2021), we employed the same experimental protocol—using inhibitory double-pulse transcranial magnetic stimulation (TMS) over the IFG and pMTG in one of eight 40-ms time windows relative to the speech identification point (IP; the minimal length of lexical speech), with three time windows before the speech IP and five after. Based on this previous work, we believe that these time windows encompass the potential gesture-speech integration process. Results demonstrated a time-window-selective disruption of the semantic congruency effect (i.e., reaction time costs driven by semantic conflict), with no significant modulation of the gender congruency effect (i.e., reaction time costs due to gender conflict), when stimulating the left pMTG in TW1, TW2, and TW7, and when stimulating the left IFG in TW3 and TW6. Based on these findings, the present study selected the five time windows that showed a selective disruption effect during gesture-speech integration.
  
  Note that in the present study, we applied stimulation to both the IFG and pMTG across all five time windows, and further correlated the TMS disruption effects with the three information matrices.
  
  We recognize that the rationale for the choice of time windows was not sufficiently explained in the original manuscript. In the revised manuscript, we have added the relevant description in Lines 223-228: ‘Stimulation was administered at three different sites (IFG, pMTG, or Vertex). Within the time windows (TWs) spanning the gesture-speech integration period, five TWs that exhibited selective disruption of integration were selected: TW1 (-120 to -80 ms relative to the speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms)[23] (Figure 1C). The order of stimulation site and TW was counterbalanced using a Latin square design.’
  
  Comment 11.8: h) Again, the rationale for the Pearson correlations of semantic congruency with information-theoretic metrics should be explicitly outlined. What is this conceptually?
  
  Response 11.8: Given that the rationale behind Experiment 1 and Experiment 2 is similar—both investigating the correlation between interrupted neural effects and the degree of information—we believe that the introduction of the Pearson correlation between semantic congruency and information-theoretic metrics, as presented in Experiment 1 (see Response to Comment 11.4 for details), is sufficient for both experiments.
  
  Comment 11.9: i)What does 'gesture stoke' mean in the Figure referring to Experiment 3? Figure 1D is not clear. What are the arrows referring to?
  
  Response 11.9: According to McNeill (1992), gesture phases differ based on whether the gesture depicts imagery. Iconic and metaphoric gestures are imagistic and typically consist of three phases: a preparation phase, a stroke phase, and a retraction phrase. Figure 4 provides an example of these three phases using the gesture ‘break’. In the preparation phase, the hand and arm move away from their resting position to a location in gesture space where the stroke begins. As illustrated in the first row of Figure 4, during the preparation phase of the ‘break’ gesture, the hands, initially in a fist and positioned downward, rise to a center-front position. In the stroke phase, the meaning of the gesture is conveyed. This phase occurs in the central gesture space and is synchronized with the linguistic segments it co-expresses. For example, in the stroke phase of the ‘break’ gesture (second row of Figure 4), the two fists move 90 degrees outward before returning to a face-down position. The retraction phase involves the return of the hand from the stroke position to the rest position. In the case of the ‘break’ gesture, this involves moving the fists from the center front back into the resting position (see third row of Figure 4).
  
  Therefore, in studies examining gesture-speech integration, gestures are typically analyzed starting from the stroke phase (Habets et al., 2011; Kelly et al., 2010), a convention also adopted in our previous studies (Zhao et al., 2018, 2021, 2023). We acknowledge that this should be explained explicitly, and in the revised manuscript, we have added the following clarification in Lines 162-166: ‘Given that gestures induce a semantic priming effect on concurrent speech[33], this study utilized a semantic priming paradigm in which speech onset was aligned with the DP of each gesture[23,33], the point at which the gesture transitions into a lexical form[34]. The gesture itself began at the stroke phase, a critical moment when the gesture conveys its primary semantic content[34].’
  
  Additionally, Figure 1 has been revised in the manuscript to eliminate ambiguous arrows. (see Response 10 for detail).
  
  Author response image 5.
  
  An illustration of the gesture phases of the 'break' gesture.
  
  References：
  
  Habets, B., Kita, S., Shao, Z. S., Ozyurek, A., & Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. Journal of Cognitive Neuroscience, 23(8), 1845-1854. doi:10.1162/jocn.2010.21462
  
  Kelly, S. D., Creigh, P., & Bartolotti, J. (2010). Integrating Speech and Iconic Gestures in a Stroop-like Task: Evidence for Automatic Processing. Journal of Cognitive Neuroscience, 22(4), 683-694. doi:DOI 10.1162/jocn.2009.21254
  
  Comment 11.10: j) L236-237: "Consequently, four ERP components were predetermined" is very confusing. Were these components predetermined? Or were they determined as a consequence of the comparison between the higher and lower halves for the IT metrics described above in the same paragraph? The description of the methods is not clear.
  
  Response 11.10: The components selected were based on a comparison between the higher and lower halves of the information metrics. By stating that these components were predetermined, we aimed to emphasize that the components used in our study are consistent with those identified in previous research on semantic processing. We acknowledge that the phrasing may have been unclear, and in the revised manuscript, we have provided a more explicit description in Lines 267-276: ‘To consolidate the data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work[40], and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons.
  
  For the traditional ROI analysis, grand-average ERPs at electrode Cz were compared between the higher (≥50%) and lower (<50%) halves for gesture entropy (Figure 5A1), speech entropy (Figure 5B1), and MI (Figure 5C1). Consequently, four ERP components were determined: the P1 effect observed within the time window of 0-100 ms[27,28], the N1-P2 effect observed between 150-250ms[27,28], the N400 within the interval of 250-450ms[14,28,29], and the LPC spanning from 550-1000ms[30,31].’
  
  Reference: Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462.
  
  (12) In the Results section for Experiment 2 (L292-295), it is not clear what the authors mean when they mention that a more negative TMS effect represents a stronger interruption of the integration effect. If I understand correctly, the correlation reported for pMTG was for speech entropy, which does not represent integration (that would be MI).
  
  Response 12: Since the TMS effect was defined as active TMS minus Vertex TMS, the inhibitory TMS effect is inherently negative. A greater inhibitory TMS effect corresponds to a larger negative value, such that a more negative TMS effect indicates a stronger disruption of the integration process. We acknowledge that the previous phrasing was somewhat ambiguous. In the revised manuscript, we have rephrased the sentence as follows: ‘a larger negative TMS effect signifies a greater disruption of the integration process’ (Lines 342-343)
  
  Multisensory integration transcends simple data amalgamation, encompassing complex interactions at various hierarchical neural levels and the parallel detection and discrimination of raw data from each modality (Benetti et al., 2023; Meijer et al., 2019). Therefore, we regard the process of gesture-speech integration as involving both unisensory processing and multisensory convergence. The correlation of gesture and speech entropy reflects contributions from unisensory processing, while the mutual information (MI) index indicates the contribution of multisensory convergence during gesture-speech integration. The distinction between these various source contributions will be the focus of Experiment 2 and Experiment 3, as described in the revised manuscript Lines 87-102: ‘Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).
  
  Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics’.
  
  References:
  
  Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.
  
  Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.
  
  (13) I find the description of the results for Experiment 3 very hard to follow. Perhaps if the authors have decided to organise the main text by describing the components from earliest to latest, the Figure organisation should follow suit (i.e., organise the Figure from the earliest to the latest component, instead of gesture entropy/speech entropy / mutual information). This might make the description of the results easier to follow.
  
  Response 13: As suggested, we have reorganized the results of experiment 3 based on components from earliest to latest, together with an updated Figure 5.
  
  The results are detailed in Lines 367-423: ‘Topographical maps illustrating amplitude differences between the lower and higher halves of speech entropy demonstrate a central-posterior P1 amplitude (0-100 ms, Figure 5B). Aligning with prior findings[27], the paired t-tests demonstrated a significantly larger P1 amplitude within the ML ROI (t(22) = 2.510, p = 0.020, 95% confidence interval (CI) = [1.66, 3.36]) when contrasting stimuli with higher 50% speech entropy against those with lower 50% speech entropy (Figure 5D1 left). Subsequent correlation analyses unveiled a significant increase in the P1 amplitude with the rise in speech entropy within the ML ROI (r = 0.609, p = 0.047, 95% CI = [0.039, 1.179], Figure 5D1 right). Furthermore, a cluster of neighboring time-electrode samples exhibited a significant contrast between the lower 50% and higher 50% of speech entropy, revealing a P1 effect spanning 16 to 78 ms at specific electrodes (FC2, FCz, C1, C2, Cz, and CPz, Figure 5D2 middle) (t(22) = 2.754, p = 0.004, 95% confidence interval (CI) = [1.65, 3.86], Figure 5D2 left), with a significant correlation with speech entropy (r = 0.636, p = 0.035, 95% CI = [0.081, 1.191], Figure 5D2 right).
  
  Additionally, topographical maps comparing the lower 50% and higher 50% gesture entropy revealed a frontal N1-P2 amplitude (150-250 ms, Figure 5A). In accordance with previous findings on bilateral frontal N1-P2 amplitude[27], paired t-tests displayed a significantly larger amplitude for stimuli with lower 50% gesture entropy than with higher 50% entropy in both ROIs of LA (t(22) = 2.820, p = 0.011, 95% CI = [2.21, 3.43]) and RA (t(22) = 2.223, p = 0.038, 95% CI = [1.56, 2.89]) (Figure 5E1 left). Moreover, a negative correlation was found between N1-P2 amplitude and gesture entropy in both ROIs of LA (r = -0.465, p = 0.039, 95% CI = [-0.87, -0.06]) and RA (r = -0.465, p = 0.039, 95% CI = [-0.88, -0.05]) (Figure 5E1 right). Additionally, through a cluster-permutation test, the N1-P2 effect was identified between 184 to 202 ms at electrodes FC4, FC6, C2, C4, C6, and CP4 (Figure 5E2 middle) (t(22) = 2.638, p = 0.015, 95% CI = [1.79, 3.48], (Figure 5E2 left)), exhibiting a significant correlation with gesture entropy (r = -0.485, p = 0.030, 95% CI = [-0.91, -0.06], Figure 5E2 right).
  
  Furthermore, in line with prior research[42], a left-frontal N400 amplitude (250-450 ms) was discerned from topographical maps of gesture entropy (Figure 5A). Specifically, stimuli with lower 50% values of gesture entropy elicited a larger N400 amplitude in the LA ROI compared to those with higher 50% values (t(22) = 2.455, p = 0.023, 95% CI = [1.95, 2.96], Figure 5F1 left). Concurrently, a negative correlation was noted between the N400 amplitude and gesture entropy (r = -0.480, p = 0.032, 95% CI = [-0.94, -0.03], Figure 5F1 right) within the LA ROI. The identified clusters showing the N400 effect for gesture entropy (282 – 318 ms at electrodes FC1, FCz, C1, and Cz, Figure 5F2 middle) (t(22) = 2.828, p = 0.010, 95% CI = [2.02, 3.64], Figure 5F2 left) also exhibited significant correlation between the N400 amplitude and gesture entropy (r = -0.445, p = 0.049, 95% CI = [-0.88, -0.01], Figure 5F2 right).
  
  Similarly, a left-frontal N400 amplitude (250-450 ms) [42] was discerned from topographical maps for MI (Figure 5C). A larger N400 amplitude in the LA ROI was observed for stimuli with lower 50% values of MI compared to those with higher 50% values (t(22) = 3.00, p = 0.007, 95% CI = [2.54, 3.46], Figure 5G1 left). This was accompanied by a significant negative correlation between N400 amplitude and MI (r = -0.504, p = 0.028, 95% CI = [-0.97, -0.04], Figure 5G1 right) within the LA ROI. The N400 effect for MI, observed in the 294–306 ms window at electrodes F1, F3, Fz, FC1, FC3, FCz, and C1 (Figure 5G2 middle) (t(22) = 2.461, p = 0.023, 95% CI = [1.62, 3.30], Figure 5G2 left), also showed a significant negative correlation with MI (r = -0.569, p = 0.011, 95% CI = [-0.98, -0.16], Figure 5G2 right).
  
  Finally, consistent with previous findings[30], an anterior LPC effect (550-1000 ms) was observed in topographical maps comparing stimuli with lower and higher 50% speech entropy (Figure 5B). The reduced LPC amplitude was evident in the paired t-tests conducted in ROIs of LA (t(22) = 2.614, p = 0.016, 95% CI = [1.88, 3.35]); LC (t(22) = 2.592, p = 0.017, 95% CI = [1.83, 3.35]); RA (t(22) = 2.520, p = 0.020, 95% CI = [1.84, 3.24]); and ML (t(22) = 2.267, p = 0.034, 95% CI = [1.44, 3.10]) (Figure 5H1 left). Simultaneously, a marked negative correlation with speech entropy was evidenced in ROIs of LA (r = -0.836, p = 0.001, 95% CI = [-1.26, -0.42]); LC (r = -0.762, p = 0.006, 95% CI = [-1.23, -0.30]); RA (r = -0.774, p = 0.005, 95% CI = [-1.23, -0.32]) and ML (r = -0.730, p = 0.011, 95% CI = [-1.22, -0.24]) (Figure 5H1 right). Additionally, a cluster with the LPC effect (644 - 688 ms at electrodes Cz, CPz, P1, and Pz, Figure 5H2 middle) (t(22) = 2.754, p = 0.012, 95% CI = [1.50, 4.01], Figure 5H2 left) displayed a significant correlation with speech entropy (r = -0.699, p = 0.017, 95% CI = [-1.24, -0.16], Figure 5H2 right).’
  
  (14) In the Discussion (L394 - 395) the authors mention for the first time their task being a semantic priming paradigm. This idea of the task as a semantic priming paradigm allowing top-down prediction of gesture over speech should be presented earlier in the paper, perhaps during the final paragraph of the introduction (as part of the rationale) or during the explanation of the task. The authors mention top-down influences earlier and this is impossible to understand before this information about the paradigm is presented. It would also make the reading of the paper significantly clearer. Critically, an appropriate description of the paradigm is missing in the Methods (what are the subjects asked to do? It states that it replicates an effect in Ref 28, but this manuscript does not contain a clear description of the task). To further complicate things, the 'Experimental Procedure' section of the methods states this is a semantic priming paradigm of gestures onto speech (L148) and proceeds to provide two seemingly irrelevant references (for example, the Pitcher reference is to a study that employed faces and houses as stimuli). How is this a semantic priming paradigm? The study where I found the first mention of this paradigm seems to clearly classify it as a Stroop-like task (Kelly et al, 2010).
  
  We appreciate the reviewer’s thorough consideration. The experimental paradigm employed in the current study differs from the Stroop-like task utilized by Kelly et al. (2010). In their study, the video presentation started with the stroke phase of the gesture, while speech occurred 200 ms after the gesture onset.
  
  As detailed in our previous study (Zhao et al., 2023, Frontiers in Psychology), we confirmed the semantic predictive role of gestures in relation to speech by contrasting two experimental conditions: (1) gestures preceding speech by a fixed 200 ms interval, and (2) gestures preceding speech at the semantic identification point of the gesture. Our findings revealed time-window-selective disruptions in the semantic congruency effect in the IFG and pMTG, but only in the second condition, suggesting that gestures exert a semantic priming effect on concurrent speech.
  
  This work highlighted the semantic priming role of gestures in the integration of speech found in Zhao et al. (2021, Journal of Neuroscience). In the study, a comparable approach was adopted by segmenting speech into eight 40-ms time windows based on the speech discrimination point, while manipulating the speech onset to align with the gesture identification point. The results revealed time-window-selective disruptions in the semantic congruency effect, providing support for the dynamic and temporally staged roles of the IFG and pMTG in gesture-speech integration.
  
  Given that the present study follows the same experimental procedure as our prior work (Zhao et al., 2021, Journal of Neuroscience; Zhao et al., 2023, Frontiers in Psychology), we refer to this design as a "semantic priming" of gesture upon speech. We agree with the reviewer that a detailed description should be clarified earlier in the manuscript. To address this, we have added a more explicit description of the semantic priming paradigm in the methods section of the revised manuscript in Lines 162-166: ‘Given that gestures induce a semantic priming effect on concurrent speech[33], this study utilized a semantic priming paradigm in which speech onset was aligned with the DP of each gesture[23,33], the point at which the gesture transitions into a lexical form[34]. The gesture itself began at the stroke phase, a critical moment when the gesture conveys its primary semantic content [34].’
  
  The task participants completed was outlined immediately following the explanation of the experimental paradigm: ‘Gesture–speech pairs were presented randomly using Presentation software (www.neurobs.com). Participants were asked to look at the screen but respond with both hands as quickly and accurately as possible merely to the gender of the voice they heard’ (Lines:177-180).
  
  Wrongly cited references have been corrected.
  
  (15) L413-417: How do the authors explain that they observe this earlier ERP component and TMS effect over speech and a later one over gesture in pMTG when in their task they first presented gesture and then speech? Why mention STG/S when they didn't assess this?
  
  (19) L436-440: This paragraph yields the timing of the findings represented in Figure 6 even more confusing. If gesture precedes speech in the paradigm, why are the first TMS and ERP results observed in speech?
  
  Response 15 &19: Since these two aspects are closely related, we offer a comprehensive explanation. Although gestures were presented before speech, the integration process occurs once both modalities are available. Consequently, ERP and TMS measurements were taken after speech onset to capture the integration of the two modalities. Neural responses were used as the dependent variable to reflect the degree of integration—specifically, gesture-speech semantic congruency in the TMS study and high-low semantic variance in the ERP study. Therefore, the observed early effect can be interpreted as an interaction between the top-down influence of gesture and the bottom-up processing of speech.
  
  To isolate the pure effect of gesture, neural activity would need to be recorded from gesture onset. However, if one aims to associate the strength of neural activity with the degree of gesture information, recording from the visual processing areas would be more appropriate.
  
  To avoid unnecessary ambiguity, the phrase "involved STG/S" has been removed from the manuscript.
  
  (16) L427-428: I find it hard to believe that MI, a behavioural metric, indexes the size of overlapped neural populations activated by gesture and speech. The authors should be careful with this claim or provide evidence in favour.
  
  Response 16: Mutual information (MI) is a behavioral metric that indexes the distribution of overlapping responses between gesture and speech (for further details, please see the Response to Comment 1). In the present study, MI was correlated with neural responses evoked by gesture and speech, with the goal of demonstrating that neural activity progressively reflects the degree of information conveyed, as indexed by MI.
  
  (17) Why would you have easier integration (reduced N400) with larger gesture entropy in IFG (Figure 6(3))? Wouldn't you expect more difficult processing if entropy is larger?
  
  (18) L431-432: The claim that IFG stores semantic information is controversial. The authors provide two references from the early 2000s that do not offer support for this claim (the IFG's purported involvement according to these is in semantic unification, not storage).
  
  Response 17 &18: As outlined in the Responses to Comment 1 of the public review, we have provided a re-explanation of the IFG as a semantic control region. Additionally, we have clarified the role of the IFG in relation to the various stages of gesture-speech integration in Lines 533-538: ‘Last, the activated speech representation would disambiguate and reanalyze the semantic information and further unify into a coherent comprehension in the pMTG[12,37]. As speech entropy increases, indicating greater uncertainty in the information provided by speech, more cognitive effort is directed towards selecting the targeted semantic representation. This leads to enhanced involvement of the IFG and a corresponding reduction in LPC amplitude’
  
  (20) Overall, the grammar makes some parts of the discussion hard to follow (e.g. the limitation in L446-447: 'While HD tDCS and TMS may impact functionally and anatomically connected brain regions, the graded functionality of every disturbed period is not guaranteed')
  
  Response 20: Clear description has been provided in the revised manuscript in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56], whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).’
  
  References:
  
  Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.
  
  Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016
  
  Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006
  
  Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013
  
  (21) Inconsistencies between terminology employed in Figures and main text (e.g., pre-test study in text, gating study in Figure?)
  
  Response 21: Consistence has been made by changing the ‘gating study’ into ‘pre-tests’ in Figure 1 (Lines 758).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.11.23.517759v4
www.biorxiv.org www.biorxiv.org

Sex-dependent gastrointestinal colonization resistance to MRSA is microbiota and Th17 dependent

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Lejeune et al. demonstrated sex-dependent differences in the susceptibility to MRSA infection. The authors demonstrated the role of the microbiota and sex hormones as potential determinants of susceptibility. Moreover, the authors showed that Th17 cells and neutrophils contribute to sex hormone-dependent protection in female mice.
  
  Strengths:
  
  The role of microbiota was examined in various models (gnotobiotic, co-housing, microbiota transplantation). The identification of responsible immune cells was achieved using several genetic knockouts and cell-specific depletion models. The involvement of sex hormones was clarified using ovariectomy and the FCG model.
  
  Weaknesses:
  
  The mechanisms by which specific microbiota confer female-specific protection remain unclear.
  
  We thank the reviewer for highlighting the strengths of the manuscript including the models and techniques we employ. We agree that the relationship between the microbiota and sex-dependent protection is less developed compared with other aspects of the study. As detailed below, we are attempting to identify specific microbes that confer femalespecific protection and links with sex hormones. We have promising but preliminary results. Thus, in our revised manuscript, we added new data on the host response as suggested by the detailed comments from the Reviewers. We also elaborate on the potential role of the microbiota in the discussion section.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The authors nicely showed that the transfer of the protective phenotype by FMT requires the female sex in recipients (Figure 2E). However, it remains unclear whether the female sex is required to develop protective microbiota in donor mice, as only the female NYU donor-male Jax recipient combination was tested. What happens if the microbiota from male NYU mice is transplanted into female Jax mice? If sex hormones act only on the downstream of the microbiota, such mice would show the protective phenotype. However, if sex hormones are required to establish a protective microbiota, the transplantation of microbiota from male NYU mice will not confer protection in recipient female Jax mice.
  
  The Reviewer’s comment is well taken. We have not conducted the suggested experiment of FMT from male NYU mice to JAX female mice yet because we are pursuing an in vitro approach that we hope will eventually provide a more definitive answer. We observed that stool from female NYU mice and not JAX mice inhibits MRSA when cultured under anaerobic conditions, and this inhibitory activity is eliminated by filtration (Author response image 1A). We also observed that stool from male NYU mice inhibits MRSA growth to a similar extent as stool from female NYU mice (Author response image 1B). This result suggests that the protective role of sex hormones is downstream of the microbiota. We are in the process of identifying the specific microbiota member to support this conclusion.
  
  Author response image 1.
  
  Stool from NYU mice inhibits MRSA growth in vitro. (A) MRSA CFU/mL in media (TSB) following culture with unfiltered or filtered stool homogenate from female NYU or JAX mice. Stool homogenate or TSB alone was added in a 1:1 ratio to 1x106 CFU/mL MRSA and cultured anaerobically for up to 24 hours. (B) MRSA CFU/mL in TSB following culture with unfiltered stool homogenate from NYU male or female mice. Stool homogenate or TSB alone was added in a 1:1 ratio to 1x106 CFU/mL MRSA. 3 experimental replicates performed; stool taken from 6 individual mice per condition. Mean MRSA burden ± SEM. Area under the curve analysis + One way ANOVA with Sidak’s multiple comparisons test. ns: not significant.
  
  (2) The results clearly showed the involvement of the specific microbiota in NYU mice in the sex-dependent bias in susceptibility to MRSA. However, the mechanisms by which specific microbiota promotes female sex-mediated protection need to be better described. Is this simply attributed to the different Th17 cell numbers in NYU and Jax mice (i.e., increased commensalspecific Th17 cells in NYU like Taconic mice)? Or is it possible that NYU microbiota impacts the regulation of sex hormones or their downstream signaling? What about the level of sex hormones in NYU and Jax mice? Are these levels equivalent or different? Do NYU and Jax microbiotas regulate the expression of sex hormone receptors in immune cells differently?
  
  These are great questions. We do not observe baseline differences in Th17 cells like JAX versus Taconic mice (Figure 5B), suggesting that the mechanism is different. However, it is quite possible that an antigen-specific T cells, or Th17 cell specifically, is present at low levels and expands rapidly upon MRSA colonization. We have added this possibility to the discussion in the revised manuscript. To address the Reviewer’s question about the effect of the microbiota on sex hormones, we first sought to determine which sex hormone is necessary. Using estrogen receptor knockouts (Esr1<sup>-/-</sup>), we were able to implicate estrogen and have added this important finding to the manuscript (Fig 6C). Then, we measured levels of estradiol in stool samples but did not observe a difference between NYU and JAX female mice (Author response image 2). We provide the results below but did not add it to the revised manuscript because we found it difficult to draw a conclusion without more extensive profiling as well as quantification of the receptor on specific immune cell subsets and cell-type specific knockouts. Also, see our response to Reviewer #3 regarding receptor expression. Although we have yet to explain the role of the microbiota, we hope the Reviewer agrees that we have promising yet preliminary results and that the new experiments we added to the manuscript have further strengthened the mechanism on the host-side.
  
  Author response image 2.
  
  Estradiol levels in stool samples prior to MRSA inoculation. (A) Estradiol levels in stool samples collected prior to MRSA inoculation in male and female mice bred at NYU or purchased from Jackson Labs. Frozen stool samples were normalized by weight and processed using the DetectX® Estradiol ELISA Kit (Arbor Assays).
  
  (3) The authors claimed that Th17-mediated recruitment of neutrophils likely promotes the clearance of MRSA in female NYU mice. However, the experimental evidence supporting this claim could be stronger. The authors should show the neutrophil recruitment in the gut mucosa in female and male NYU mice. Also, the levels of neutrophils between NYU and Jax female mice should be examined. To further strengthen the link between Th17 and neutrophils, it would be ideal to analyze neutrophil recruitment in mice lacking Th17 cells (i.e., Rag2-/-, anti-CD4 treated, Rorgt-/- mice).
  
  We agree and now include a more detailed analyses of neutrophils. We found that the number of neutrophils in the intestine were not higher in NYU female mice compared with NYU male mice, with or without MRSA. Instead, we show that neutrophils in NYU female mice display higher levels of surface CD11b, a sign of activation, compared to males following inoculation with MRSA . We have added these findings to the revised manuscript (Fig5 H and I). IL-17 can activate neutrophils and increase their antimicrobial activity. Consistent with this possibility, we now show that female mice lacking the IL-17 receptor lose the enhanced colonization resistance. Based on these findings, we have modified this aspect of the conclusion, and thank the reviewer for the helpful suggestion.
  
  Reviewer #2 (Public review):
  
  The current study by Lejeune et al. investigates factors that allow for persistent MRSA infection in the GI tract. They developed an intriguing model of intestinal MRSA infection that does not use the traditional antibiotic approach, thereby allowing for a more natural infection that includes the normal intestinal microbiota. This model is more akin to what might be expected to be observed in a healthy human host. They find that biological sex plays a clear role in bacterial persistence during infection but only in mice bred at an NYU Facility and not those acquired from Jackson Labs. This clearly indicates a role for the intestinal microbiome in affecting female bacterial persistence but not male persistence which was unaffected by the origin of the mice and thus the microbiome. Through a series of clever microbiome-specific transfer experiments, they determine that the NYU-specific microbiome plays a role in this sexual dimorphism but is not solely responsible. Additional experiments indicate that Th17 cells, estrogen, and neutrophils also participate in the resistance to persistent infection. Notably, they assess the role of sex chromosomes (X/Y) using the established four core genotype model and find that these chromosomes appear to play little role in bacterial persistence.
  
  Overall, the paper nicely adds to the growing body of literature investigating how biological sex impacts the immune system and the burden of infectious disease. The conclusions are mostly supported by the data although there are some aspects of the data that could be better addressed and clarified.
  
  We thank the Reviewer for appreciating our contribution and these supportive comments. We have added several experiments to fill-in gaps and text revisions to increase clarity and acknowledge limitations.
  
  (1) There is something of a disconnect between the initial microbiome data and the later data that analyzes sex hormones and chromosomes. While there are clearly differences in microbial species across the two sites (NYU and JAX) how these bacterial species might directly interact with immune cells to induce female-specific responses is left unexplored. At the very least it would help to try and link these two distinct pieces of data to try and inform the reader how the microbiome is regulating the sex-specific response. Indeed, the reader is left with no clear exploration of the microbiota's role in the persistence of the infection and thus is left wanting.
  
  We agree. This comment is similar to Reviewer #1’s feedback. As mentioned above, we are attempting to clarify the association between sex differences and the microbiota and have included preliminary results for the Reviewers. However, addressing this disconnect will require substantially more investigation. Instead, we have added insightful new data that elaborate on aspects of the host response. We hope the Reviewer agrees that revised manuscript is stronger and that further delineation of the microbiota can be addressed by future studies.
  
  (2) While the authors make a reasonable case that Th17 T cells are important for controlling infection (using RORgt knockout mice that cannot produce Th17 cells), it is not clear how these cells even arise during infection since the authors make most of the observations 2 days postinfection which is longer before a normal adaptive immune response would be expected to arise. The authors acknowledge this, but their explanation is incomplete. The increase in Th17 cells they observe is predicated on mitogenic stimulation, so they are not specific (at least in this study) for MRSA. It would be helpful to see a specific restimulation of these cells with MRSA antigens to determine if there are pre-existing, cross-reactive Th17 cells specific for MRSA and microbiota species which could then link these two as mentioned above.
  
  We acknowledge that this is a limitation of our study. Although an experiment demonstrating pre-existing, cross-reactive T cells would help support our conclusion, aspects of MRSA biology may make the results of this experiment difficult to interpret. We have consulted with an expert on MRSA virulence factors, co-lead author Dr. Victor Torres, about the feasibility of this experiment. MRSA possess superantigens, such as Staphylococcal enterotoxin B, which bind directly to specific Vβ regions of T-cell receptors (TCR) and major histocompatibility complex (MHC) class II on antigen-presenting cells, resulting in hyperactivation of T lymphocytes and monocytes/macrophages. Additionally, other MRSA virulence factors, such as α-hemolysin and LukED, induce cell death of lymphocytes. MRSA’s enterotoxins are heat stable, so heat-inactivation of the bacterium may not help in this matter. For these reasons, it is unlikely that we can perform a simple restimulation of lymphocytes with MRSA antigens.
  
  A study by Shao et al. provides an example of a host commensal species inducing Th17 cells with cross-reactivity against MRSA. Upon intestinal colonization, the intestinal fungus Candida albicans influences T cell polarization towards a Th17 phenotype in the spleen and peripheral lymph nodes which provided protection to the host against systemic candidemia. Interestingly, this induction of protective Th17 cells, increased IL-17 and responsiveness in circulating Ly6G+ neutrophils also protected mice from intravenous infection with MRSA, indicating that T cell activation and polarization by intestinal C. albicans leads to non-specific protective responses against extracellular pathogens.
  
  Shao TY, Ang WXG, Jiang TT, Huang FS, Andersen H, Kinder JM, Pham G, Burg AR, Ruff B, Gonzalez T, Khurana Hershey GK, Haslam DB, Way SS. Commensal Candida albicans Positively Calibrates Systemic Th17 Immunological Responses. Cell Host & Microbe. 2019 Mar 13;25(3):404-417.e6. doi: 10.1016/j.chom.2019.02.004. PMID: 30870622; PMCID: PMC6419754.
  
  We have added a brief version of the above discussion in the revised manuscript. Also, as mentioned earlier, we have added new data strengthening the axis between Th17 and neutrophils, including showing that IL-17 receptor is necessary and that neutrophils display signs of heightened activation in female mice during MRSA colonization.
  
  (3) The ovariectomy experiment demonstrates a role for ovarian hormones; however, it lacks a control of adding back ovarian hormones (or at least estrogen) so it is not entirely obvious what is causing the persistence in this experiment. This is especially important considering the experiments demonstrating no role for sex chromosomes thus demonstrating that hormonal effects are highly important. Here it leaves the reader without a conclusive outcome as to the exact hormonal mechanism.
  
  This is a great suggestion. Rather than adding back ovarian hormones, we performed the more direct experiment and tested whether the estrogen receptor (ERα, encoded by Esr1) is necessary for the enhanced colonization resistance. Indeed, we observed that Esr1<sup>-/-</sup> female mice have increased MRSA burden compared to Esr1<sup>+/-</sup> littermates. We have added this new result (Figure 6C) and thank the Reviewer for their guidance.
  
  4) The discussion is underdeveloped and is mostly a rehash of the results. It would greatly enhance the manuscript if the authors would more carefully place the results in the context of the current state of the field including a more enhanced discussion of the role of estrogen, microbiome, and T cells and how the field might predict these all interact and how they might be interacting in the current study as well.
  
  Author response: We thank the Reviewer for their feedback in improving the scholarship on the manuscript. We have expanded on the literature and the mechanistic model in both the discussion section and other parts to provide better context for our findings.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Using a mouse model of Staphylococcus aureus gut colonization, Lejeune et al. demonstrate that the microbiome, immune system, and sex are important contributing factors for whether this important human pathogen persists in the gut. The work begins by describing differential gut clearance of S. aureus in female B6 mice bred at NYU compared to those from Jackson Laboratories (JAX). NYU female mice cleared S. aureus from the gut but NYU male mice and mice of both sexes from JAX exhibited persistent gut colonization. Further experimentation demonstrated that differences between staphylococcal gut clearance in NYU and JAX female mice were attributed to the microbiome. However, NYU male and female mice harbor similar microbiomes, supporting the conclusion that the microbiome cannot account for the observed sex-dependent clearance of S. aureus gut colonization. To identify factors responsible for female clearance of S. aureus, the authors performed RNAseq on intestinal epithelial cells and cells enriched within the lamina propria. This analysis revealed sexdependent transcriptional responses in both tissues. Genes associated with immune cell function and migration were distinctly expressed between the sexes. To determine which immune cell types contribute to S. aureus clearance Lejeune et al employed genetic and antibody-mediated immune cell depletion. This experiment demonstrated that CD4+ IL17+ cells and neutrophils promote the elimination of S. aureus from the gut. Subsequent experiments, including the use of the 'four core genotype model' were conducted to discern between the roles of sex chromosomes and sex hormones. This work demonstrated that sex-chromosome-linked genes are not responsible for clearance, increasing the likelihood that hormones play a dominant role in controlling S. aureus gut colonization.
  
  Strengths:
  
  A strength of the work is the rigorous experimental design. Appropriate controls were executed and, in most cases, multiple approaches were conducted to strengthen the authors' conclusions. The conclusions are supported by the data.
  
  The following suggestions are offered to improve an already strong piece of scholarship.
  
  Weaknesses:
  
  The correlation between female sex hormones and the elimination of S. aureus from the gut could be further validated by quantifying sex hormones produced in the four core genotype mice in response to colonization. Additionally, and this may not be feasible, but according to the proposed model administering female sex hormones to male mice should decrease colonization. Finally, knowing whether the quantity of IL-17a CD4+ cells change in the OVX mice has the potential to discern whether abundance/migration of the cells or their activation is promoted by female sex hormones.
  
  In the Discussion, the authors highlight previous work establishing a link between immune cells and sex hormone receptors, but whether the estrogen (and progesterone) receptor is differentially expressed in response to S. aureus colonization could be assessed in the RNAseq dataset. Differential expression of known X and Y chromosome-linked genes were discussed but specific sex hormones or sex hormone receptors, like the estrogen receptor, were not. This potential result could be highlighted.
  
  We appreciate the comment on the scholarship and thank the Reviewer for the insightful suggestions to improve this manuscript. We apologize for not including references that address some of the Reviewer’s questions. Other research groups have compared the levels of hormones between XX and XY males and females in the four core genotypes model and have found similar levels of circulating testosterone in adult XX and XY males. No difference was found in circulating estradiol levels in XX vs XY- females when tested at 4-6 or 79 months of age.
  
  Karen M. Palaszynski, Deborah L. Smith, Shana Kamrava, Paul S. Burgoyne, Arthur P. Arnold, Rhonda R. Voskuhl, A Yin-Yang Effect between Sex Chromosome Complement and Sex Hormones on the Immune Response. Endocrinology, Volume 146, Issue 8, 1 August 2005, Pages 3280–3285, https://doi.org/10.1210/en.2005-0284
  
  Sasidhar MV, Itoh N, Gold SM, Lawson GW, Voskuhl RR. The XX sex chromosome complement in mice is associated with increased spontaneous lupus compared with XY. Ann Rheum Dis. 2012 Aug;71(8):1418-22. doi: 10.1136/annrheumdis-2011-201246. Epub 2012 May 12. PMID: 22580585; PMCID: PMC4452281.
  
  Administering female sex hormones to males is a good idea. We did not observe an effect of injecting males with estrogen on MRSA colonization (data not shown), perhaps due to the dose or timing, or because it is not sufficient (i.e., additional hormones and factors may be required). Therefore, we analyzed the necessity of estrogen signaling and found that Esr1<sup>-/-</sup> female mice impairs colonization resistance to MRSA. We have added this new experiment to the revised manuscript (Fig6 C).
  
  Examination of the levels of estrogen, progesterone, and androgen receptors in our cecalcolonic lamina propria RNA-seq dataset is an excellent idea. We observed a significant increase in the G-protein coupled estrogen receptor 1 (Gper1) and a non-significant increase in Estrogen receptor alpha (Esr1) following MRSA inoculation in the immune cell compartment. This analysis has been added to the revised manuscript (Supplemental Fig6).
  
  Reviewer #3 (Recommendations for the authors)
  
  Minor editing issues:
  
  The topic sentence of the last paragraph in the Results section states - 'male sex defining gene sex determining region Y (Sry) has been moved from the Y chromosome to an autosome'. 'Sex defining gene' and sex-determining region seems redundant in this context. A sex-defining gene would presumably be located within a sex-determining region.
  
  Bold the letter 'F' in the Figure 5 legend.
  
  It's not clear from the Figure 6E legend when the IL-17A+ CD4+ cells were quantified, 2 dpi?
  
  In the third sentence of the second paragraph of the Discussion, the two references are merged together.
  
  We thank the Reviewer for pointing out these editing issues. They have been addressed in the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.17.603994v2
www.biorxiv.org www.biorxiv.org

WRNIP1 prevents transcription-associated genomic instability

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  This paper describes the role of WRNIP1 AAA+ ATPase, particularly its UBZ domain for ubiquitinbinding, but not ATPase, to prevent the formation of the R-loop when DNA replication is mildly perturbated. By combining cytological analysis for DNA damage, R-loop, and chromosome aberration with the proximity ligation assay for colocalization of various proteins involved in DNA replication and transcription, the authors provide solid evidence to support the claim. The authors also revealed a distinct role of WRNIP1 in the prevention of R-loop-induced DNA damage from FANCD2, which is inconsistent with the known relationship between WRNIP1 and FANCD2 in the repair of crosslinks.
  
  One concern is the relationship between WRNIP1 and FANCD2 (Figure 6) in the suppression of Rloop-induced DNA damage. This is different from the relationship in inter-crosslink (ICL) repair (Socha et al. 2020), which shows the epistatic relationship between WRNIP1 as well as its UBZ domain and FANCD2 in the ICL repair. The authors need to re-evaluate the role of FNACD2 in Rloop suppression under mild replication stress (MRS) by analyzing R-loop formation in the FANCD2 knockdown (KD) cells as well as colocalization of FANCD2 with PCNA and RNA polymerase II by the PLA method and restarting the forks by the DNA coming.
  
  In this line, it is important to show PLA signal between FANCD2 and R-loop depends on WRNIP1 since WRINP1 recruits FANCD2 in ICL repair (Socha et al. 2020).
  
  In the study referenced by the reviewer, the authors implicated WRNIP1 in repairing interstrand crosslinks (ICLs) induced by agents, such as TMP/UVA, MMC, and Cisplatin (Socha et al., 2020). For the repair of ICLs, the FANCD2/FANCI complex, the central component of the FA pathway, must be recruited to DNA. The study suggests a potential role for WRNIP1 in loading the FANCD2/FANCI complex onto DNA immediately after ICL formation. However, even in the absence of WRNIP1, a residual recruitment of the FANCD2/FANCI complex to DNA was observed, possibly due to alternative mechanisms, as proposed by the authors. Interestingly, the study did not establish a similar relationship between WRNIP1 and FANCD2 after treatments that does not induce ICLs, demonstrating that WRNIP1 and FANCD2 may also play independent roles. Hence, our data demonstrating a distinct role of WRNIP1 from the FA pathway in response to R-loop-associated replication stress are not inconsistent with prior findings. Additionally, considering the UBZ domain ability to interact with ubiquitin in both its free form and when conjugated to other proteins, thereby regulating protein functions, it is not surprising that the UBZ domain of WRNIP1 may also play a role in the response to R-loop accumulation.
  
  Therefore, to address the reviewer's request for a more in-depth exploration of the role of FANCD2 in the regulation of R-loops, we chose to examine the impact of FANCD2 loss on the accumulation of R-loops in WRNIP1-deficient and WRNIP1 UBZ mutant cells, as well as on the dynamics of stalled forks following aphidicolin-induced MRS. Additionally, we investigated the colocalization between FANCD2 and R-loops in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells. Details are provided below.
  
  In agreement with our observations, the analysis of R-loop formation upon MRS, in WRNIP1deficient cells depleted of FANCD2, revealed a significantly higher accumulation of R-loops in cells with a concomitant loss of both WRNIP1 and FANCD2 compared to those with a single deficiency (see Fig. 6D of the revised manuscript). Similar results were observed in the WRNIP1 UBZ mutant cells in which FANCD2 was abrogated (see Fig. 6D of the revised manuscript). It is important to note that, to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the binding of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine the proximity between FANCD2 and R-loops more accurately, cells were treated with RNase III, following established protocols (Crossley et al., 2020).
  
  Furthermore, we examined the interaction of FANCD2 with R-loops using a proximity ligation assay (PLA). Our findings revealed significant colocalization between FANCD2 and R-loops in the absence of WRNIP1 and in WRNIP1 UBZ mutant cells following low-dose aphidicolin treatment and RNase III exposure, showing a significant increase compared to the control counterpart (shWRNIP1WT cells; see Fig. 6B of the revised manuscript). Consequently, we conclude that neither WRNIP1 nor its UBZ domain is necessary for FANCD2 recruitment under conditions of MRS.
  
  We also performed a DNA fiber assay to evaluate restarting replication forks in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells in which FANCD2 was abrogated. Our results show that FANCD2 depletion slightly decreased the ability of the cells to restart forks from MRS (see Fig. 6E of the revised manuscript).
  
  Given a low number (2-4) of PLA foci for WRNIP1-RNA polymerase II or WRNIP1 and R-loop (Figure 4B and 4D), how does this colocalization reflect the functional significance?
  
  The data from the PLA of Figures 4B and 4D are reported as the mean of three independent experiments. It is important to note that we have introduced a new Figure 4D. To selectively assess R-loop structures, cells were treated with RNase III, a double-stranded RNA-specific endoribonuclease, following established protocols (Crossley et al., 2020). Our PLA analysis confirms the localization of WRNIP1 at/near R-loops in shWRNIP1 and shWRNIP1D37A cells, and this phenomenon is more evident in WRNIP1 UBZ mutant cells (see Fig. 4D of the revised manuscript). Specifically, the new protocol allows us to visualize a higher number of PLA foci, and we observed that Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment.
  
  Regarding the Fig. 4B, it is not uncommon for a low number of PLA spots per nucleus to correspond to a phenotypic effect. For instance, a similar low average in the colocalization of PCNA or RNA pol II with FANCD2 has been observed in a prior paper as well, suggesting that transcription-replication collisions occur upon Aph-induced MRS (Okamoto et al., 2019). Also, not all R-loops could be “targeted” by WRNIP1.
  
  It would be helpful to readers if the authors were to provide a summary figure of this paper.
  
  As suggested by the reviewer, we have developed a model to summarize the findings obtained in our study (see Fig. 6F of the revised manuscript).
  
  Minor points:
  
  (1) Most of the cytological images in the paper show only colocalized ones, which makes it hard to see a signal. Please show a single-color image.
  
  For a better visualization of nuclei signals in the figures, single-color images have been provided for Figs. 2A; 3B; 4A, B, C, D and E; 6B and D; Suppl. Fig. 2A and B of the revised manuscript.
  
  (2) In Figure 2A, only one or two S9.6 focus(foci) can be seen. Why 1 or 2? This focus marks a specific chromosomal locus such as the centromere or telomere.
  
  We agree with the reviewer that the observed foci in nuclei may indicate a specific chromosomal locus, such as telomeres or centromeres.
  
  (3) Figure 3A, graph: Why this graph does not use a dot plot like Figure 1B and Figure 3C?
  
  The graph in Figure 3A has been represented as a dot plot, as requested.
  
  (4) Figure 1C: P values between unperturbed conditions should be provided.
  
  In Figure 1C, P values comparing unperturbed conditions were already included. The results showed no significance between shWRNIP1 and shWRNIP1D37A cells when compared to MRC5SV cells and, similarly, to shWRNIP1T294A cells, as indicated in the corresponding legend.
  
  (5) Figure 2B: Please provide the quantification or show the reproducibility of the data.
  
  The quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to demonstrate the reproducibility of the findings in Fig. 2B, we conducted a repeat of the dot-blot experiment. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.
  
  (6) Figure 4A: the expression of RNaseH under aphidicolin addition increased colocalization of PCNA and RNA pol II. It is important to mention the result and provide an explanation of why it is increasing in the main text.
  
  Although the result may appear unexpected, and we lack experiments that explain the nature of this phenotype, a previous study reported that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, resulting in a significant accumulation of DNA damage (Shen et al., 2017). Consequently, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.
  
  Reviewer #2:
  
  This paper aims at establishing the role of WRN-interacting protein 1 (WRNIP1) and its UBZ domain (an N-terminal ubiquitin-binding zinc finger domain) on genome instability caused by mild inhibition of DNA synthesis by aphidicolin. The authors used human MRC5 fibroblasts investigated with standard methods in the field. The results clearly showed that WRNIP1 silencing and UBZ-mutation (D37A) increased DNA damage, chromosome aberrations, and transcription-replication conflicts caused by aphidicolin. The conclusions of the paper are overall well supported by results, however, aspects of some data analyses would need to be clarified and/or extended.
  
  (1) The methods (immunofluorescence microscopy and dot-blots) to determine R-loop levels can lack sensitivity and specificity. In particular, since the S9.6 antibody can bind to other structures besides heteroduplex, dot-blot analyses only grossly assess R-loop levels in cellular samples of purified nucleic acids, which are constituted by many different types of DNA/RNA structures.
  
  To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 2B; 3B; 4D and E; 6B of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease (RNase III), confirms higher R-loop accumulation in WRNIP1-deficient or WRNIP1 UBZ mutant cells compared to control cells (Fig 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript). Finally, we performed a new dot-blot experiment (see Fig. 2B of the revised manuscript). We treated with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms a significant accumulation of the S9.6 signal in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the foldchange values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.
  
  (2) Experimental plan has analyzed the impact of WRNIP1 lack or mutations at steady-state conditions. Thus, the possible role of WRNIP1 at an early step of the mechanism would require some sort of kinetics analysis of the molecular process, therefore not at steady-state conditions. The findings of a co-localization of R-loops and WRNIP1 have been obtained with the S9.6 antibody, which recognizes DNA-RNA heteroduplexes. Since WRNIP1 is known to be recruited at stalled forks and DNA cleavage sites, it is not surprising that WRNIP1 is very close to heteroduplexes, abundant structures at replication forks and cleavage sites. Similar interpretations may also be valid for Rad51/S9.6 co-localization findings.
  
  Investigating the potential role of WRNIP1 at an early step in the mechanism is undoubtedly very interesting and requires separate investigation. Our decision to explore the relevance of the loss of WRNIP1 or WRNIP1 mutations under steady-state conditions is based on a preliminary alkaline comet assay (provided below). The comet assay, performed at various exposure times of aphidicolin at a concentration of 0.4 micromolar, clearly indicates that the most significant effect on DNA damage accumulation in WRNIP1-deficient cells occurs after 24 hours of treatment. Therefore, we have chosen to study the transcription-associated genomic instability in our cells by treating them with a low-dose of aphidicolin for 24 hours to maximize the effect.
  
  Author response image 1.
  
  We agree that the presence of WRNIP1 or RAD51 in proximity to R-loops is consistent with their roles and may not be surprising. However, these experiments formally demonstrate their proximity to R-loops under our conditions. Notably, the new graphs, obtained from experiments repeated by treating with RNase III to reduce the amount of dsRNA and improve the specificity of the S9.6 antibody, show increased interaction of the mutated form of WRNIP1 in the UBZ domain with Rloops when compared to the wild-type form. Additionally, it is more evident that the presence of RAD51 at/near R-loops is reduced in WRNIP1 UBZ mutant cells both in untreated conditions and after MRS (see Figs 4D and E of the revised manuscript).
  
  (3) Determination of DNA damage, chromosome aberration, and co-localization data are reported as means of measurements with appropriate statistics. However, the fold-change values relative to corresponding untreated samples are not reported. In some instances, it seems that WRNIP1 silencing or mutations actually reduce or do not affect aphidicolin effects. That leaves open the interpretation of specific results.
  
  To better evaluate the significance of the data presented in the study, we have introduced the foldchange values calculated with respect to the untreated samples, as requested by the reviewer. This allowed us to conclude that the loss of WRNIP1 or the expression of the UBZ mutant form of WRNIP1 does not reduce in any case the effects of aphidicolin-induced mild replication stress.
  
  I would suggest some additional experiments or analyses to get more convincing results:
  
  (1) DNA damage should be verified also with other methods, such as DNA damage markers pH2AX and 53BP1.
  
  The quantification of DNA damage was also corroborated by determining the percentage of gammaH2AX-positive cells, as reported in Supplementary Figure 1B. This result is consistent with the findings from the comet assay, confirming transcription-dependent DNA accumulation in shWRNIP1 and shWRNIP1D37A cells. Regarding the 53BP1 marker, we believe that the existing data sufficiently demonstrate DNA damage accumulation in the absence of WRNIP1 or when its UBZ domain is mutated, providing comprehensive support to the study without necessitating additional results.
  
  (2) Repair foci may also be detected with Rad51 foci. That will also provide evidence for increased DNA damage levels under the tested conditions.
  
  Our prior study identified WRNIP1 as a crucial factor for RAD51 function (Leuzzi et al., 2016). Loss of WRNIP1 indeed results in a defective relocalization of RAD51 to chromatin. Consequently, the analysis of RAD51 foci may be not a useful readout to evaluate DNA damage levels under our conditions.
  
  (3) WRNIP1 effects should be presented as FC (fold-changes) of DNA damage, PLA results, chromosomal errors, etc, to provide evidence of the level of effects on the tested phenotypes.
  
  We have introduced the fold-change values calculated with respect to the untreated samples, as requested by the reviewer, for a more comprehensive analysis in the graph of Figs. 1B, C and D; 2A and B; 3A, B and C; 4A, B, C, D and E; 6B, C and D.
  
  (4) R-loop detection ideally should be performed by one of the several types of immunoprecipitation techniques. Alternatively, dot-blot assays should be performed with a 1:2 dilution series of each sample. Then, heteroduplexes should be detected with S9.6 along with a general aspecific dye for DNA quantity in each spot. Next, densitometric analyses of S9.6 signal should be normalized over DNA quantity.
  
  We acknowledge that the quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to overcome this issue, we repeated the experiment shown in Fig. 2B. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.
  
  (5) A major focus on WRNIP1 D37A and T294A mutations may also make the paper overall more convincing. For instance: do the mutations affect protein recruitment at damaged chromatin? Do they increase repair foci? Do they affect the recruitment of WRN or BLM helicases or specific nucleases at chromatin under the tested conditions of MRS?
  
  To address this point raised by the reviewer, we performed a chromatin experiment to assess the ability of WRNIP1 and its mutated forms to translocate to chromatin upon MRS. Our analysis shows that the mutated forms of WRNIP1 do not exhibit any defects in recruitment to chromatin, although the levels of the WRNIP1 ATPase mutant appear lower than the others (see Western blotting provided below for the reviewer’s use only, Fig. A). Additionally, we tested the presence of WRN helicase, which does not show any difference between cells lines (see Western blot provided below, Author Response image 2B).
  
  Author response image 2.
  
  (6) I suggest revising the text for spelling errors.
  
  The manuscript has been carefully revised to identify and correct any spelling errors that may have occurred.
  
  Reviewer #3:
  
  In the manuscript by Valenzisi et al., the authors report on the role of WRNIP1 to prevent R-loop and TRC-associated DNA damage. The authors claim WRNIP1 localizes to TRCs in response to replication stress and prevents R-loop accumulation, TRC formation, replication fork stalling, and subsequent DNA damage. While the findings are of potential significance to the field, the strength of evidence in support of the conclusions is lacking.
  
  Weaknesses:
  
  (1) The authors fail to utilize the proper controls throughout the manuscript in regard to the shWRNIP1, WT, and mutant cell lines. It is unclear why the authors failed to use the shWRNIP1WT line in the comet assay, DNA fiber assay, and the FANCD2 assays. This is a key control for i) the use of only a single shRNA (most studies will use at least 2 different shRNAs) and ii) the use of the mutant WRNIP1 lines. In several figures, the authors only show the effect of the UBZ mutant, but don't include the ATPase mutant or WT for comparison. Including these is essential.
  
  We agree with the reviewer's criticism that the use of shWRNIP1WT cells as a control is more appropriate. Therefore, all the new experiments presented in the revised version of the manuscript have been performed using the shWRNIP1WT cells. Notably, new results are in line with those obtained using the MRC5SV cells, rendering us confident that our findings are reliable overall. By contrast, we do not feel that including the WRNIP1 ATPase mutant cells is always essential, since our data clearly demonstrate that the loss of ATPase activity of WRNIP1 does not affect transcriptionassociated genome instability.
  
  (2) The authors use the S9.6 antibody to conclude the loss of WRNIP1 causes more R-loops; however, it has been shown that this antibody detects dsRNA in addition to RNA-DNA hybrids. Accordingly, it cannot be ruled out that the increased S9.6 signal is due to increased dsRNA.
  
  To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 3B; 4D and E; 6B, D and E of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease, confirms higher R-loop accumulation in WRNIP1-deficient or UBZ WRNIP1 mutant cells compared to control cells (Fig. 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript).
  
  (3) Multiple pieces of data do not support the conclusions. For example, Figure 1D shows shWRNIP1 to reduce damage in Aph+DRB cells compared to MRC5SV cells with Aph+DRB. This result suggests that WRNIP1 actually increases DNA damage in stressed cells with transcription blocked. Another result is seen in Figure 4a, where the number of PLA spots (presumably TRCs) increases in the shWRNIP1WT cells with Aph+RNH1 compared to Aph alone. If R-loops are required for TRC accumulation, then the RNH1 should decrease the PLA foci. This result instead suggests that WRNIP leads to increased TRCs in stressed cells with R-loops cleared by RNH1.
  
  Regarding Figure 1D, in MRC5SV cells, DRB does not significantly increase DNA damage upon Aph treatment. Therefore, it is not correct to conclude that WRNIP1 exacerbates DNA damage in stressed cells with transcription blocked.
  
  Regarding Figure 4A, while the outcome may appear unexpected, and we do not provide data that explain the nature of this phenotype, a previous study demonstrated that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, leading to a significant accumulation of DNA damage (Shen et al., 2017). Accordingly, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.
  
  (4) The data are mostly phenomenological and fail to yield mechanistic insight. For example, the authors state that "it remains unclear whether WRNIP1 is directly involved in the mechanisms of Rloop removal/resolution". Unfortunately, the data presented in this manuscript do not provide new insights into this unresolved question.
  
  We agree with the reviewer that elucidating the mechanism by which WRNIP1 contributes to R-loop suppression would be of interest. Nevertheless, the findings presented here provide compelling evidence of a novel role for WRNIP1 in preventing R-loop accumulation. Investigating how WRNIP1 accomplishes this function will require significant effort, which we are committed to undertaking.
  
  (5) The authors only show merged images making it impossible to visualize differences in PLA foci.
  
  For a better visualization of nuclei signals in the PLA panels of Figs 4A, B, C, D and E; 6B, singlecolor images have been provided.
  
  In addition to including the controls I mentioned in the public review, I recommend investigating the mechanism of how WRNIP1 prevents R-loop accumulation. If it is indeed related to its UBZ domain, then does that mean ubiquitination is an important step in R-loop removal? I believe elucidating this would be a novel and significant contribution. If it's not related to ubiquitination, then how does the UBZ domain regulate R-loops?
  
  We agree with the reviewer that investigating the precise role of the UBZ domain of WRNIP1 in Rloop prevention would be of interest, and several experiments are required to adequately address this issue. However, as discussed, we hypothesize that the UBZ domain might contribute to directing WRNIP1 to DNA at TRC sites through RAD18.
  
  I recommend using purified RNH1-dead-GFP to detect R-loops as opposed to the S9.6 antibody. The Cimprich lab has published this recently as a tool for detecting R-loops in fixed cells.
  
  As explained in point 2), to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we used treatment with RNase III, following established protocols (Crossley et al., 2020). New experiments are reported in the revised version of the manuscript for R-loops in all cell lines (see Fig. 3B of the revised manuscript).
  
  Additionally, colocalization by PLA of WRNIP1/R-loops, RAD51/R-loops, FANCD2/R-loops, and R-loop accumulation by anti-S9.6 antibody in cells depleted of FANCD2 are presented (see Figs. 4D and E; 6B and D of the revised manuscript).
  
  Furthermore, we repeated the dot-blot experiment (see Fig. 2B of the revised manuscript). We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an antidsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.
  
  Importantly, overall, our findings suggest that treatment with RNase III does not substantially change the results obtained without it, but in some cases, such as in Fig. 4D, makes them are more readily interpretable. Specifically, the new protocol allows us to visualize a higher number of PLA foci, and Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment (see Fig. 4D of the revised manuscript).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.23.546223v2
www.biorxiv.org www.biorxiv.org

New submission 12/09/2023, 08:42:40

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We would like to thank the reviewers for their thoughtful evaluation of our manuscript. We considered all the comments and prepared the revised version. The following are our responses to the reviewers’ comments. All references, including those in the original manuscript are included at the end of this point-by-point response.
  
  Reviewer #1 (Public Review):
  
  Weaknesses:
  
  1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.
  
  The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, yeasts such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These statements have been incorporated into our revised manuscript (lines 391-397). Nevertheless, the term “core” in the manuscript and title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we replaced the “core” with “key,” a term that is more appropriate to our context.
  
  2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?
  
  Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?
  
  The reviewer asked whether the microbial species detected from the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions would be needed, such as artificially introducing wild flies onto fresh bananas in the laboratory. Nevertheless, the microbes potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).
  
  Alternative sources of microbes also merit consideration. For example, microbes may have been introduced to unfermented bananas by penetration through peel injuries (lines 1300-1301). In addition, they could be introduced by insects other than flies, given that rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. The explanation of these possibilities have been incorporated into DISCUSSION (lines 414427) of our revised manuscript.
  
  Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?
  
  Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. For the traps where adult flies were caught, we identified the species of the drosophilids as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We added these descriptions in MATERIALS AND METHODS (lines 511-512 and 560-562), and DISCUSSION (lines 378-379).
  
  3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).
  
  Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.
  
  Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of the genes showed low expression levels. The RNA-seq data of the yeast-fed larvae is shown in Author response Table 1. While a subset of genes exhibited significantly elevated expression in the nonsupportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than the difference in the nutritional conditions. Similar expression profiles were observed in the bacteria-fed larvae as well (data not shown). Therefore, it is difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.
  
  Author response table 1.
  
  Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.
  
  They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).
  
  We did not observe significant differences in the gene expression profiles of the larvae fed on different microbial species within bacteria or fungi, or between those fed on bacteria and those fed on fungi. For example, the gene expression profiles of larvae fed on the various supportive microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on various nonsupportive microbes.
  
  Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria. Thus, it is challenging to discuss the potential differential impacts of yeast and bacteria on larval growth, if any.
  
  Author response image 1.
  
  Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisks in the label highlight “LAB + AAB” or “LAB” samples clustered separately from the other samples in those conditions; “” indicates a sample in a “LAB + AAB” condition (Lactiplantibacillus plantarum + Acetobacter orientalis), and “*” indicates a sample in a “LAB” condition (Leuconostoc mesenteroides). Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; Sa. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; BY4741, Saccharomyces cerevisiae BY4741 strain.
  
  4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?
  
  The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?
  
  Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with the microbiota in food samples. We found that adult flies and early-stage foods, as well as larvae and late-stage foods, harbored similar microbial species (Figure 1F). Additionally, previous studies examining the gut microbiota in wild adult flies have detected microbes belonging to the same species or taxa as those isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).
  
  While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we have cited the study by Dodge et al., 2023 in our revised manuscript and discussed the possibility that predominant microbes in adult flies may show a propensity for colonization (lines 410-413).
  
  Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?
  
  The reviewer inquires whether the supportive microbes in our study stimulate gut signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). Based on our RNA-seq data, this is unlikely. The aforementioned study demonstrated that seven protease genes are upregulated through Imd pathway stimulation by a bacterium that promotes the larval growth. In our RNA-seq analysis, these seven genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response table 2A; Le. mes + A. ori in Author response table 2B). Rather, they exhibited a tendency to be upregulated by the presence of non-supportive microbes (St. bac or Pi. klu in Author response table 2A; La. pla in Author Response Table 2B).
  
  Author response table 2.
  
  Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.
  
  Reviewer #2 (Public Review):
  
  Weaknesses:
  
  The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.
  
  The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.
  
  The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We have mentioned this point in DISCUSSION of the revised manuscript (lines 408-410).
  
  Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.
  
  While we recognize the importance of comprehensive mechanistic analysis, elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be a subject of future research.
  
  Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second-instar stage. This observation implies that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We mentioned a previously reported interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 433-436). LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates, and the co-inoculation of AAB with LAB mutant strains defective in lactate production to assess both larval growth and continuous larval association with AAB. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.
  
  Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.
  
  We appreciate the reviewer's recommendations. The explanation of the universality of our findings has been included in the revised DISCUSSION (lines 391-397). We have also added descriptions on the implication of compositional shifts occurring in adult microbiota (lines 404413), possible inoculation routes of different microbes (lines 414-427), and hypotheses on the mechanism of larval growth promotion by yeasts (lines 469-476), all of which could be the focus of our future study.
  
  Reviewer #3 (Public Review):
  
  Weaknesses:
  
  Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect earlystage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?
  
  When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?
  
  We collected traps and early-stage samples 2.5 days after setting up the traps. This duration was determined from pilot experiments. A shorter collection time resulted in a lower likelihood of obtaining traps visited by adult flies, whereas a longer collection time caused overcrowding of larvae as well as deaths of adults from drowning in the liquid seeping out of the fruits. These procedural details have been included in the MATERIALS AND METHODS section of the revised manuscript (lines 523-526).
  
  What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?
  
  We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We have included these possibilities in the DISCUSSION section of the revised manuscript (lines 417-421).
  
  Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.
  
  We are grateful for the reviewer's insightful suggestion regarding shifts in the adult microbiome. We have included in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages or after adult eclosion (lines 404-413).
  
  Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used?
  
  In this metabolomic analysis, LC-MS/MS with triple quadrupole MS monitors the formation of fragment ions from precursor ions specific to each target compound. The use of PFPP columns, which provide excellent separation of amino acids and nucleobases, allows chromatographic peaks of many structural isomers to be separated into independent peaks. In addition, all measured compounds are compared with data from a standard library to confirm retention time agreement. Structural isomers were separated either by retention time on the column or by compound-specific MRM signals (in fact, leucine and isoleucine have both unique MRM channels and column separations). Detailed MRM conditions are identical to the previously published study (Oka et al., 2017). These have been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 810-824).
  
  Were standard curves produced?
  
  Since relative quantification of metabolite amounts was performed in this study, no standard curve was generated to determine absolute concentrations. However, a standard compound of known concentration (single point) was measured to confirm retention time and relative area values.
  
  Were internal, deuterated controls used?
  
  Internal standards for deuterium-labeled compounds were not used in this study. This is because it is not realistic to obtain deuterium-labeled compounds for all compounds since a large number of compounds are measured. However, an internal standard (L-methionine sulfone) is added to the extraction solvent to calculate the recovery rate. This has been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 824-825).
  
  Reviewer #1 (Recommendations For The Authors):
  
  Additional comments 1. The authors should do a better job of presenting their data. It took me quite a while to understand the protocol of Figure 1. Panel 1A, B, C could be improved. For instance, 1A suggests that flies are transferred to the lab while this is in fact the banana trap. Indicate 'Banana trap colonized by flies' rather 'wild-type flies in the trap'. 1C: should indicate that the food suspension comes from the banana trap. 1B,D,D: do not use pale color as legend. Avoid the use of indices in Figure 2 (Y1 rather than Y1). Grey colors are difficult to distinguish in Figure 2. Etc. It is a pain for reviewers that figure legends are on the verso of each figure and not just below.
  
  We thank the reviewer for the detailed suggestions to improve the clarity and comprehensibility of our figures. We have improved the figures according to the suggestions. As for the figure legends, we have placed them below each respective figure whenever possible.
  
  Clarify in the text if 'sample' means food substratum or flies/larvae (ex. line 116 and elsewhere).
  
  We have revised the word “sample” throughout our manuscript and eliminated the confusion.
  
  Line 170 - clarify what you mean by fermented food.
  
  We have replaced the “fermented larval foods” with “fermented bananas” in our revised manuscript (line 165).
  
  Line 199 - what is the meaning of 'stocks'.
  
  We have replaced the “stocks” with “strains” (line 195).
  
  Line 320 - explain more clearly what the yeast-conditioned banana-agar plate and cell suspension supernatant are, and what the goals of using these media are. This will help in understanding the subsequent text.
  
  We have added a supplemental figure illustrating the sample preparation for the metabolomic analysis (Figure S6), with the following legend describing the procedure (lines 1335-1346): “Sample preparation process for the metabolomic analysis. We suspected that the supportive live yeast cells may release critical nutrients for larval growth, whereas the non-supportive yeasts may not. To test this possibility, we made three distinct sample preparations of individual yeast strains (yeast cells, yeast-conditioned banana-agar plates, and cell suspension supernatants). Yeast cells were for the analysis of intracellular metabolites, whereas yeast-conditioned banana-agar plates and cell suspension supernatants were for that of extracellular metabolites. The samples were prepared as the following procedures. Yeasts were grown on banana-agar plates for 2 days at 25°C, and then scraped from the plates to obtain “yeast cells.” Next, the remaining yeasts on the resultant plates were thoroughly removed, and a portion from each plate was cut out (“yeast-conditioned banana agar”). In addition, we suspended yeast cells from the agar plates into sterile PBS, followed by centrifugation and filtration to eliminate the yeast cells, to prepare “cell suspension supernatants.”
  
  Figure 5 is difficult to understand. Provide more explanation. Consider moving the 'all metabolites panel' to Supp. Better explain what this holidic medium is.
  
  The holidic medium is a medium that has been commonly used in the Drosophila research community, which contains ~40 known nutrients, and supports the larval development to pupariation (Piper et al., 2014; Piper et al., 2017). We have introduced this explanation to the RESULTS section of the manuscript (lines 322-327). However, the scope of our research reaches beyond the analysis of the holidic medium components, because feeding the holidic medium alone causes a significant delay in larval growth, suggesting a lack of nutritional components (Piper et al., 2014). Thus, we believe the "All Metabolites" panels should be placed alongside the corresponding “The holidic medium components” panels.
  
  I could not access Figure 6 when downloading the PDF. The page is white and an error message appears - it is problematic to review a paper lacking a figure.
  
  We regret any inconvenience caused, perhaps due to a system error. Please refer to the Author response image 2, which is identical to Figure 6 of our original manuscript.
  
  Author response image 2.
  
  Supportive yeasts facilitate larval growth by providing nutrients, including branched-chain amino acids, by releasing them from their cells (Figure 6 from the original manuscript). (A and B) Growth of larvae feeding on yeasts on banana agar supplemented with leucine and isoleucine. (A) The mean percentage of the live/dead individuals in each developmental stage. n=4. (B) The percentage of larvae that developed into second instar or later stages. The “Not found” population in Figure 6A was omitted from the calculation. Each data point represents data from a single tube. Unique letters indicate significant differences between groups (Tukey-Kramer test, p < 0.05). (C) The biosynthetic pathways for leucine and isoleucine with S. cerevisiae gene names are shown. The colored dots indicate enzymes that are conserved in the six isolated species, while the white dots indicate those that are not conserved. Abbreviations of genera are given in the key in the upper right corner. LEU2 is deleted in BY4741. (D-G) Representative image of Phloxine B-stained yeasts. The right-side images are expanded images of the boxed areas. The scale bar represents 50 µm. (H) Summary of this study. H. uvarum is predominant in the early-stage food and provides Leu, Ile, and other nutrients that are required for larval growth. In the late-stage food, AAB directly provides nutrients, while LAB and yeasts indirectly contribute to larval growth by enabling the stable larva-AAB association. The host larva responds to the nutritional environment by dramatically altering gene expression profiles, which leads to growth and pupariation. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; Pi. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; GF, germ-free.
  
  Line 323 - Consider rewriting this sentence (too long, explain what the holidic medium is and why this is interesting). "In the yeast-conditioned banana-agar plates, which were anticipated to contain yeast-derived nutrients, many well-known nutrients included in a chemically defined synthetic (holidic) medium for Drosophila melanogaster (Piper et al., 2014, 2017) were not increased compared to the sterile banana-agar plates; instead, they exhibited drastic decreases irrespective of the yeast species."
  
  We thank the reviewer's suggestion to improve the readability of our manuscript. We have rewritten the sentence in the revised manuscript (lines 320-328) as follows: “The yeastconditioned banana-agar plates were expected to contain yeast-derived nutrients. On the contrary, the result revealed a depletion of various metabolites originally present in the sterile banana agar (Figure 5A). This result prompted us to focus on the metabolites in the chemically defined (holidic) medium for Drosophila melanogaster Piper et al., 2014; Piper et al., 2017. This medium contains ~40 known nutrients, and supports the larval development to pupariation, albeit at the half rate compared to that on a yeast-containing standard laboratory food Piper et al., 2014; Piper et al., 2017. Therefore, the holidic medium could be considered to contain the minimal essential nutrients required for larval growth. Our analysis indicated a substantial reduction of these known nutrients in the yeast-conditioned plates compared to their original quantities (Figure 5B).”
  
  Reviewer #2 (Recommendations For The Authors):
  
  Suggestions for improved or additional experiments, data or analyses.
  
  It should be clearly shown (or stated) that isolated microbes, such as H. uvarum and Pa. agglomerans, are indigenous microbes in wild Drosophila melanogaster in their outdoor sampling.
  
  We thank the reviewer for the suggestions. Addressing the presence of isolated microbes within wild D. melanogaster adults is important, but cannot be feasible with our data for the following reasons. Our microbiota analysis of adults was conducted using pooled individuals of multiple Drosophila species, rather than using D. melanogaster exclusively. Moreover, the microbial isolation and the analysis of adult microbiota were carried out in two independent samplings (Figures 1A and 1E in the original manuscript, respectively). As a result, the microbial species detected in the adults were slightly different from those isolated from the food samples collected in the previous sampling. Nevertheless, it is worth noting that H. uvarum dominated in 2 out of the 3 adult samples, constituting >80% of the fungal composition. Pantoea agglomerans was not detected in the adults, although Enterobacterales accounted for >59% in 2 out of the 3 samples. Therefore, these isolated microbial species, or at least their phylogenetically related species, are presumed to be indigenous to wild D. melanogaster.
  
  If the reviewer’s suggestion was to state the dominance of H. uvarum and Pantoea agglomerans in early-stage foods, we have added a supplemental figure showing the species-level microbial compositions corresponding to Figure 1B of the original manuscript (Figure S1), and further revised the manuscript (lines 180-186).
  
  The reviewer supposes that the indigenous microbes of flies may differ from what they usually eat. In this study, the authors use banana-based food, but is it justified in terms of the natural environment of the places where those microbes were isolated? In other words, did sampled wild flies eat bananas outside the laboratory at Kyoto University?
  
  Drosophila spp. inhabit human residential areas and feed on various fermented fruits and vegetables. In the areas surrounding Kyoto University, they can be found in garbage in residential dwellings as well as supermarkets. In this regard, fruits are natural food sources of wild Drosophila in the area.
  
  Among various fruits, bananas were selected based on the following two reasons. Firstly, bananas were commonly used in previous Drosophila studies as a trap bait or a component of Drosophila food (Anagnostou et al., 2010; Stamps et al., 2012; Consuegra et al., 2020). Secondly, and rather practically, bananas can be obtained in Japan all year at a relatively low cost. Previous studies have used various fruits such as grapes (Quan and Eisen, 2018), figs (Pais et al., 2018), and raspberries (Cho and Rohlfs, 2023). However, these fruits are only available during limited seasons and are more expensive per volume than bananas. Thus, they were not practical for our study, which required large amounts of fruit-based culture media. We have included a brief explanation regarding this point in MATERIALS AND METHODS (lines 514-518).
  
  In Fig. 6B, the Leu and Ile experiment, is the added amount of those amino acids appropriate in the context that they mention "...... supportive yeasts had concentrations of both leucine and isoleucine that were at least four-fold higher than those of non-supportive yeasts"?
  
  We acknowledge that the supplementation should be carried out ideally in a quantity equivalent to the difference between the released amounts of supportive and non-supportive species. However, achieving this has been highly challenging. Previous studies determined the amount of amino acid supplementation by quantifying their concentration in the bacteriaconditioned media (Consuegra et al., 2020; Henriques et al., 2020). However, we found that quantifying the exact concentrations of the amino acids is not feasible with our yeasts. As shown in Figure 5B in the original manuscript, the amino acid contents were markedly reduced in the yeast-conditioned banana agar compared to the agar without yeasts, presumably because of the uptake by the yeasts. Thus, the amino acids released from yeast cells on the banana-agar plate are not expected to accumulate in the medium. As this reviewer pointed out, in the cell suspension supernatants of the supportive yeasts, concentrations of both leucine and isoleucine were at least four-fold higher compared to those of non-supportive yeasts (Figures 5G-H in the original submission), However, this measurement does not give the absolute amount of either amino acid available for larvae. Given these constraints, we opted for the amino acid concentrations in the holidic medium, which support larval growth under axenic conditions (Piper et al., 2014). We also showed that the supplementation of the amino acids at that concentration to the bananaagar plate was not detrimental to larval growth (Figures 6A-B in the original manuscript). These rationales have been included in the revised ‘Developmental progression with BCAA supplementation’ section in MATERIALS AND METHODS of our manuscript (lines 840-847).
  
  In addition to the above, it can be included other amino acids or nutrients as control experiments.
  
  As mentioned in our manuscript (lines 365-368), we did supplement other amino acids, lysine and asparagine, which failed to rescue the larval growth.
  
  In the experiment of Fig. 2E, how about examining larval development using heat-killed LAB or yeast with live AAB? The reviewer speculates that one possibility is that AAB needs nutrients from LAB.
  
  We did not feed larvae with heat-killed LAB and live AAB for the following reasons. LAB grows very poorly on banana agar compared to yeasts, and preparation of LAB required many banana-agar plates even when we fed live bacteria to larvae. Adding dead LAB to banana-agar tubes would require far more plates, but this preparation is impractical. Furthermore, heat-killing may not allow the investigation of the contribution of heat-unstable or volatile compounds.
  
  As for the reviewer's suggestion regarding the addition of heat-killed yeast with AAB, heat-killed yeast itself promotes larval growth, as shown in Figures 4G and 4H in the original manuscript, so the contribution of yeast cannot be examined using this method.
  
  Recommendations for improving the writing and presentation.
  
  It would be good to mention that during sample collection, other insects (other than Drosophila species) were not found in the food if this is true.
  
  Insects other than Drosophila spp. were found in several traps in the sampling shown in Figures 1C-F. These insects, rove beetles (Staphylinidae) and sap beetles (Nitidulidae), seemed to share a niche with Drosophila in nature. Therefore, we believe that the contamination of these insects did not interfere with our goal of obtaining larval food samples. We added these descriptions and explanations to MATERIALS AND METHODS (lines 527531).
  
  There are many different kinds of bananas. It should be mentioned the detailed information.
  
  We had included the information on the banana in MATERIALS AND METHODS section (line 622).
  
  Concerning the place of sample collection, detailed longitude, and latitude information can be provided (this is easily obtained from Google Maps). When the collection was performed should also be mentioned. This may suggest the environment of the "wild flies" they collected.
  
  We added a table listing the dates of our collections, along with the longitude and latitude of each sampling place (Table S1A).
  
  The reviewer could not find how the authors conducted heat killing of yeast.
  
  We added the following procedure to the ‘Quantification of larval development’ section in MATERIALS AND METHODS (lines 680-688). “When feeding heat-killed yeasts to larvae, yeasts were added to the banana-agar tubes and subsequently heated as following procedures. The yeasts were revived from frozen stocks on banana-agar plates, incubated at 25°C, and then streaked on fresh agar plates. After 2-day incubation, yeast cells were scraped from the plates and suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions were added to banana-agar tubes prepared as described, and after centrifugation at 3,000 x g for 5 min, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts, which compensates for the reduced amount due to their inability to proliferate. The tubes were subsequently heated at 80°C for 30 min before adding germ-free larvae.”
  
  The reviewer prefers that all necessary information on how to see figures be provided in figure legends. For example, an explanation of some abbreviations is missing.
  
  We carefully re-examined the figure legends and added necessary information.
  
  Many of the figures are not kind to readers, i.e., one needs to refer to the legends and main text very frequently. Adding subheadings (titles) to each figure may help.
  
  We added subheadings to our figures to improve the comprehensibility.
  
  Reviewer #3 (Recommendations For The Authors):
  
  I have some minor questions/suggestions about the manuscript that, if addressed, may increase the clarity and quality of the work.
  
  Please, when referring to microbial species in the abbreviated form, use only the first letter of the genus. For example, P. agglomerans should be used, not Pa. agglomerans.
  
  We are concerned about the potential confusion caused by using only the first letter of genera, since several genera mentioned in our work share the first letters, such as P (Pichia and Pantoea), S (Starmerella, Saccharomyces, and Saccharomycopsis), or L (Lactiplantibacillus and Leuconostoc). Therefore, we used only the unabbreviated form of the above seven genera in our revised manuscript. We have also made every effort to avoid abbreviations in our figures and tables, but found it necessary to retain two-letter abbreviations when spaces are particularly limiting.
  
  In lines 294-298, how exactly was the experiment where yeasts were killed by anti-fungal agents performed? If these agents killed the yeast, how was the microbial growth on plates required to have biomass for fly inoculation obtained? Please, clarify this section.
  
  The yeasts were grown on normal banana-agar plates before the addition onto the anti-fungal agents-containing banana agar. We added the following procedure to MATERIALS AND METHODS (lines 689-695). “When feeding yeasts on banana agar supplemented with antifungal agents, the yeasts were individually grown on normal banana agar twice before being suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions was introduced onto the anti-fungal agents (10 mL/L 10% p-hydroxybenzoic acid in 70% ethanol and 6 mL/L propionic acid, following the concentration described in Kanaoka et al., 2023)-containing banana agar in 1.5 mL tubes. After centrifugation, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts.”
  
  In lines 557-558, please clarify how rDNA copy numbers can be calculated in this way.
  
  Considering the results of the ITS and 16S sequencing analysis, it was highly likely that rDNAs from bananas and Drosophila were amplified along with microbial rDNA in this qPCR. To estimate the microbial rDNA copy number, we assumed that the proportion of microbial rDNA within the total amplification products remains consistent between the qPCR and the corresponding sequencing analysis, because the template DNA samples and amplified regions were shared between the analyses. Based on this, the copy number of microbial rDNA was estimated by multiplying the qPCR results with the microbial rDNA ratio observed in the ITS or 16S sequencing analysis of each sample. This methodology has been detailed in the MATERIALS AND METHODS section (lines 609-615).
  
  In lines 609-611, how did you check for cells left from the previous day? Microscopy? Or do you mean that if there was liquid still in the sample you would not add more bacterial cultures? Please, clarify.
  
  We observed with the naked eye from outside the tubes to determine if additional AAB should be introduced. Since we placed AAB on the banana agar in a lump, we examined whether the lumps were gone or not. We have added these procedures in MATERIALS AND METHODS (lines 671-673).
  
  In Figure 2A, it is hard to differentiate between the gray tones. Please, improve this.
  
  We have distinguished the plots for different conditions by changing the shape of the markers on the graphs.
  
  In the legend of Figure 4, line 1101, I believe the panel letters are incorrect.
  
  We have corrected the manuscript (lines 1241-1242) from “heat-killed yeasts on banana agar (H and I) or live yeasts on a nutritionally rich medium (J and K)” to “heat-killed yeasts on banana agar (G and H) or live yeasts on a nutritionally rich medium (I and J).”
  
  In Figure S1, authors showed that bananas that were not inoculated still had detectable rDNA signal. Is this really because bacteria can penetrate the peel? Or could this be the “reagent microbiome”? Alternatively, could these microbes have been introduced during sample prep, such as cutting the bananas?
  
  The detection of rDNA in bananas that were not inoculated with microbes was unlikely to be due to microbial contamination during experimental manipulation. The reviewer pointed out the possibility that the “reagent microbiome”, presumably the microbes in PBS, are detected from the uninoculated bananas. This seems to be unlikely, considering the PBS was sterilized by autoclaving before use. To ensure that no viable microbe was left in the autoclaved PBS, we applied a portion of the PBS onto a banana-agar plate and confirmed no colony was formed after incubation for a few days. DNA derived from dead microbes might be present in the PBS, but the PBS-added bananas were incubated for 4 days, so it is also unlikely that a detectable amount of DNA remained until sample collection. Furthermore, we believe that no contamination occurred during sample preparation. Banana peels were treated with 70% ethanol before removing them extremely carefully to avoid touching the fruit inside. All tools were sterilized before use. Taking all of these into account, we speculate that the microbes were already present in the bananas before peeling. We added the details of the sample preparation processes in MATERIALS AND METHODS (lines 518-521 and 540).
  
  Other major revisions
  
  We deposited our yeast genome annotation data in the DDBJ Annotated/Assembled Sequences database, and the accession numbers have been added to the ‘Data availability’ section in MATERIALS AND METHODS (lines 868-873).
  
  The bacterial composition data in Figure 1B was corrected, because in the original version, the data for Place 3 and Place 4 was plotted in reverse. The original and revised plots are shown side by side in Author response image 3. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p5, lines 117-120).
  
  Author response image 3.
  
  Comparison of the original and revised version of bacterial composition graph in Figure 1B. Comparison of the original (left) and revised (right) version of the graph at the bottom of Figure 1B, which shows the result of bacterial composition analysis. The color key, which is unmodified, is placed below the revised version.
  
  The plot data and labels in the RNA-seq result heatmaps (Figures 3A and 4C) have been corrected. In these figures, row Z-scores of log2(TPM + 1) were to be plotted, as indicated by the key in each figure. However, in the original version, row Z-scores of TPM was erroneously plotted. Thus, Figures 3A and 4C of the original version have been replaced with the correct plots, and the original and revised plots are shown side by side in Author response images 4A and 4B. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 222-226 and p9, lines 277-281).
  
  Author response image 4.
  
  Comparison of the original and revised version of Figures 3A and 4C. (A and B) Comparison of the original (left) and revised (right) version of Figures 3A (A) or 4C (B).
  
  The keys in the original Figures 3D and 4F indicate that log2(fold change) was used to plot all data. However, when plotting the data from the previous study (Zinke et al., 2002), their “fold change value” was used. We have corrected the keys, plots, and legend of Figure 3D to reflect the different nature of the data from our RNA-seq analysis and those from microarray analysis by Zinke et al. The original and revised plots are shown side by side in Author response image 5. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 228230 and p9, 277-284).
  
  Author response image 5.
  
  Comparison of the original and revised version of Figures 3D and 4F. (A and B) Comparison of the original (left) and revised (right) version of Figures 3D (A) or 4F (B).
  
  The labels in Figure S5C and S5D (Figure S4C and S4D in the original version) have been corrected (they are "Pichia kluyveri > Supportive" and "Starmerella bacillaris > Supportive" rather than "Non-support. > H. uva" and "Non-support. > K. hum"). Additionally, we have reintroduced the circle indicating the number of “dme04070: Phosphatidylinositol signaling system” DEGs in Figure S5D, which was missing in Figure S4D of the original version. The original and revised figures are shown in Author response image 6.
  
  Author response image 6.
  
  Comparison of the original and revised version of Figures S5C and S5D. (A and B) Comparison of the original (left) and revised (right) versions of Figures S5C (A) or S5D (B). The original figures corresponding to the aforementioned figures were Figures S4C and S4D, respectively.
  
  The "Fermentation stage" column in Table 1, which indicated whether each microbe was considered an early-stage microbe or a late-stage microbe, has been removed to avoid confusion. This is because some of the microbes (Hanseniaspora uvarum, Pichia kluyveri, and Pantoea agglomerans) were employed in both of the feeding experiments using the microbes detected from the early-stage foods (Figures 2A, 2B, S2A, and S2B) and those from the late-stage foods (Figures 2C, 2D, S2C, and S2D).
  
  The leftmost column in Table S7 has been edited to indicate species names rather than “Sample IDs,” because the IDs were not used in anywhere else in the paper.
  
  Reference
  
  Chandler, J. A., Lang, J., Bhatnagar, S., Eisen, J. A. and Kopp, A. (2011). Bacterial communities of diverse Drosophila species: Ecological context of a host-microbe model system. PLoS Genetics 7, e1002272.
  
  Chandler, J. A., Eisen, J. A. and Kopp, A. (2012). Yeast communities of diverse Drosophila species: Comparison of two symbiont groups in the same hosts. Applied and Environmental Microbiology 78, 7327–7336.
  
  Cho, H. and Rohlfs, M. (2023). Transmission of beneficial yeasts accompanies offspring production in Drosophila—An initial evolutionary stage of insect maternal care through manipulation of microbial load? Ecology and Evolution 13, e10184.
  
  Consuegra, J., Grenier, T., Akherraz, H., Rahioui, I., Gervais, H., da Silva, P. and Leulier, F. (2020). Metabolic Cooperation among Commensal Bacteria Supports Drosophila Juvenile Growth under Nutritional Stress. iScience 23, 101232.
  
  Dodge, R., Jones, E. W., Zhu, H., Obadia, B., Martinez, D. J., Wang, C., Aranda-Díaz, A., Aumiller, K., Liu, Z., Voltolini, M., et al. (2023). A symbiotic physical niche in Drosophila melanogaster regulates stable association of a multi-species gut microbiota. Nat Commun 14, 1557.
  
  Erkosar, B., Storelli, G., Mitchell, M., Bozonnet, L., Bozonnet, N. and Leulier, F. (2015). Pathogen Virulence Impedes Mutualist-Mediated Enhancement of Host Juvenile Growth via Inhibition of Protein Digestion. Cell Host & Microbe 18, 445–455.
  
  Hanson, M. A. and Lemaitre, B. (2020). New insights on Drosophila antimicrobial peptide function in host defense and beyond. Current Opinion in Immunology 62, 22–30.
  
  Henriques, S. F., Dhakan, D. B., Serra, L., Francisco, A. P., Carvalho-Santos, Z., Baltazar, C., Elias, A. P., Anjos, M., Zhang, T., Maddocks, O. D. K., et al. (2020). Metabolic cross-feeding in imbalanced diets allows gut microbes to improve reproduction and alter host behaviour. Nat Commun 11, 4236.
  
  Oka, M., Hashimoto, K., Yamaguchi, Y., Saitoh, S., Sugiura, Y., Motoi, Y., Honda, K., Kikko, Y., Ohata, S., Suematsu, M., et al. (2017). Arl8b is required for lysosomal degradation of maternal proteins in the visceral yolk sac endoderm of mouse embryos. Journal of Cell Science jcs.200519.
  
  Pais, I. S., Valente, R. S., Sporniak, M. and Teixeira, L. (2018). Drosophila melanogaster establishes a species-specific mutualistic interaction with stable gut-colonizing bacteria. PLOS Biology 16, e2005710.
  
  Piper, M. D. W., Blanc, E., Leitão-Gonçalves, R., Yang, M., He, X., Linford, N. J., Hoddinott, M. P., Hopfen, C., Soultoukis, G. A., Niemeyer, C., et al. (2014). A holidic medium for Drosophila melanogaster. Nature Methods 11, 100–105.
  
  Piper, M. D. W., Soultoukis, G. A., Blanc, E., Mesaros, A., Herbert, S. L., Juricic, P., He, X., Atanassov, I., Salmonowicz, H., Yang, M., et al. (2017). Matching Dietary Amino Acid Balance to the In Silico-Translated Exome Optimizes Growth and Reproduction without Cost to Lifespan. Cell Metab 25, 610–621.
  
  Quan, A. S. and Eisen, M. B. (2018). The ecology of the drosophila-yeast mutualism in wineries. PLOS ONE 13, e0196440.
  
  Solomon, G. M., Dodangoda, H., McCarthy-Walker, T. T., Ntim-Gyakari, R. R. and Newell, P. D. (2019). The microbiota of Drosophila suzukii influences the larval development of Drosophila melanogaster. PeerJ 7, e8097.
  
  Zinke, I., Schütz, C. S., Katzenberger, J. D., Bauer, M. and Pankratz, M. J. (2002). Nutrient control of gene expression in Drosophila: microarray analysis of starvation and sugar-dependent response. The EMBO Journal 21, 6162–6173.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.01.551563v1
www.biorxiv.org www.biorxiv.org

Interpretable Protein-DNA Interactions Captured by Structure-Sequence Optimization

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1:
  
  Comment 0: Summary: This work presents an Interpretable protein-DNA Energy Associative (IDEA) model for predicting binding sites and affinities of DNA-binding proteins. Experimental results demonstrate that such an energy model can predict DNA recognition sites and their binding strengths across various protein families and can capture the absolute protein-DNA binding free energies.
  
  We appreciate the reviewer’s careful assessment of the paper, and we thank the reviewer for the insightful suggestions and comments.
  
  Comment 1: Strengths: (1) The IDEA model integrates both structural and sequence information, although such an integration is not completely original. (2) The IDEA predictions seem to have agreement with experimental data such as ChIP-seq measurements.
  
  We appreciate the reviewer’s positive comments on the strength of the paper.
  
  Comment 2: Weaknesses: (1) The authors claim that the binding free energy calculated by IDEA, trained using one MAX-DNA complex, correlates well with experimentally measured MAX-DNA binding free energy (Figure 2) based on the reported Pearson Correlation of 0.67. However, the scatter plot in Figure 2A exhibits distinct clustering of the points and thus the linear fit to the data (red line) may not be ideal. As such. the use of the Pearson correlation coefficient that measures linear correlation between two sets of data may not be appropriate and may provide misleading results for non-linear relationships.
  
  We thank the reviewer for the insightful comments and agree that a linear fit between our predictions and the experimental data may not be the best measure of performance. The primary utility of the IDEA model is to predict high-affinity DNA-binding sequences for a given DNA-binding protein by assessing the relative binding affinities across different DNA sequences. In this regard, the ranked order of predicted sequence binding affinities serves as a better metric for evaluating the success of this model. To evaluate this, we calculated both Spearman’s rank correlation coefficient, which does not rely on linear correlation, and the Pearson correlation coefficient between our predictions and the experimental results. As shown in Figure 2, our computation shows a Spearman’s rank correlation coefficient of 0.65 for the MAX-based predictions using one MAX-DNA complex (PDB ID: 1HLO), supporting the model’s capability to effectively distinguish strong from weak binders.
  
  Although our model generally captures the relative binding affinities across different DNA sequences, its predictive accuracy diminishes for low-affinity sequences (Figure 2).
  
  This could be due to two limitations of the current modeling framework: (1) The model is residue-based and estimates binding free energy as the additive sum of contributions from individual contacting amino-acid-nucleotide pairs. This assumption does not account for cooperative effects caused by simultaneous changes at multiple nucleotide positions. One potential direction to further improve the model would be to use a finergrained representation by incorporating more atom types within contacting residues, and to use a many-body potential to better capture cooperative effects from multiple mutations. (2) The model assumes that the target DNA adopts the same binding interface as in the reference crystal structure. However, sequence-dependent DNA shape has been shown to be important in determining protein-DNA binding affinity [1]. To address this limitation, a future direction is to use deep-learning-based methods to incorporate predicted DNA shape or protein-DNA complex structures based on their sequences [2, 3] into our model prediction.
  
  To fully evaluate the predictive power of IDEA, we have included Spearman’s rank correlation coefficient for every correlation plot in this manuscript and have updated the relevant texts. Across all our analyses, the Spearman’s rank correlation coefficients reveal similar predictive performance as the Pearson correlation coefficients. Additionally, we have included in our discussion the current limitations of our model and potential directions for future improvement.
  
  We have edited our Discussion Section to include a discussion on the limitations of the current model. Specifically, the added texts are:
  
  “Although IDEA has proved successful in many examples, it can be improved in several aspects. The model currently assumes the training and testing sequences share the same protein-DNA structure. While double-stranded DNA is generally rigid, recent studies have shown that sequence-dependent DNA shape contributes to their binding specificity [1, 2, 4]. To improve predictive accuracy, one could incorporate predicted DNA shapes or structures into the IDEA training protocol. In addition, the model is residue-based and evaluates the binding free energy as the additive sum of contributions from individual amino-acid-nucleotide contacts. This assumption does not account for cooperative effects that may arise from multiple nucleotide changes. A potential refinement could utilize a finer-grained model that includes more atom types within contacting residues and employs a many-body potential to account for such cooperative effects.”
  
  Comment 3: (2) In the same vein, the linear Pearson Correlation analysis performed in Figure 5A and the conclusion drawn may be misleading.
  
  We thank the reviewer for the insightful comments. As noted in our response to the previous comment, we have added Spearman’s rank correlation coefficient in addition to the Pearson correlation coefficient to all correlation plots, including Figure 5A.
  
  Comment 4: (3) The authors included the sequences of the protein and DNA residues that form close contacts in the structure in the training dataset, whereas a series of synthetic decoy sequences were generated by randomizing the contacting residues in both the protein and DNA sequences. In particular, synthetic decoy binders were generated by randomizing either the DNA (1000 sequences) or protein sequences (10,000 sequences) from the strong binders. However, the justification for such randomization and how it might impact the model’s generalizability and transferability remain unclear.
  
  We thank the reviewer for the insightful comments. The number of randomizing sequences was chosen to strike a balance between sufficient sequence coverage and computational feasibility. Because proteins have more types of amino acids than four nucleotides in DNA, we utilized more protein decoy sequences than DNA decoys. To examine the robustness of our choice against different number of decoy sequences, we repeated the transferability analysis within the bHLH superfamily (Figure 3A) and the generalizability analysis across 12 protein families (Figure 2E) using two additional decoy sequence combinations: (1) 1000 DNA sequences and 1000 protein sequences; (2) 100 DNA sequences and 1000 protein sequences. As shown in Figure S15, we achieved similar results to those reported using the original decoy set, demonstrating the robustness of our model prediction against the variations in the number of decoys. We have included this figure as Figure S15.
  
  Comment 5: (4) The authors performed Receiver Operating Characteristic (ROC) analysis and reported the Area Under the Curve (AUC) scores in order to quantitate the successful identification of the strong binders by IDEA. It would be beneficial to analyze the precision-recall (PR) curve and report the PRAUC metric which could be more robust.
  
  We agree with the reviewer that more robust statistical metrics should be used to evaluate our model’s performance. We have included the PRAUC score as an additional evaluation metric of the model’s performance. Due to a significant imbalance in the number of strong and weak binders from the experimental data [5], where the experimentally identified strong binders are far fewer than the weak binders, we reweighted the sample to achieve a balanced evaluation [6], using 0.5 as the baseline for randomized prediction. As shown in Figure S5, IDEA achieves successful predictions in 18 out of 22 cases, demonstrating its predictive accuracy.
  
  The updated PRAUC result has been included as Figure S5 in the manuscript. We have also included the detailed precision-recall curves for each case in Figure S4.
  
  In addition, we have provided PRAUC scores for comparing the performance of IDEA with other models, and have summarized these results in Table S2.
  
  Reviewer #2:
  
  Comment 0: Summary: Zhang et al. present a methodology to model protein-DNA interactions via learning an optimizable energy model, taking into account a representative bound structure for the system and binding data. The methodology is sound and interesting. They apply this model for predicting binding affinity data and binding sites in vivo. However, the manuscript lacks discussion of/comparison with state-of-the-art and evidence of broad applicability. The interpretability aspect is weak, yet over-emphasized.
  
  We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.
  
  Comment 1: Strengths: The manuscript is well organized with good visualizations and is easy to follow. The methodology is discussed in detail. The IDEA energy model seems like an interesting way to study a protein-DNA system in the context of a given structure and binding data. The authors show that an IDEA model trained on one system can be transferred to other structurally similar systems. The authors show good performance in discriminating between binding-vs-decoy sequences for various systems, and binding affinity prediction. The authors also show evidence of the ability to predict genome-wide binding sites.
  
  We appreciate the reviewer’s strong assessment of the strengths of this paper. We have further refined our Methods Section to ensure all modeling details are clearly presented.
  
  Comment 2: Weaknesses: An energy-based model that needs to be optimized for specific systems is inherently an uncomfortable idea. Is this kind of energy model superior to something like Rosetta-based energy models, which are generally applicable? Or is it superior to family-specific knowledge-based models? It is not clear.
  
  We thank the reviewer for the insightful comments. The protein-DNA energy model facilitates the calculation of protein-DNA binding free energy based on protein-DNA structures and sequences. Because this model is optimized using the structure-sequence relationship of given protein-DNA complexes, it features specificity based on the conserved structural interface characteristic of each protein family. Because of that, its predictive accuracy depends on the degree of protein-DNA interface similarity between the training and target protein-DNA pairs, and is distinct from a general protein-DNA energy model, such as a Rosetta-based energy model. The model has some connections to the familyspecific energy model. As shown in Author response image 1, systems belonging to the same protein superfamily (MAX and PHO4) exhibit similar patterns in their learned energy models, in contrast to those from a different superfamily (PDX1).
  
  Author response image 1:
  
  Comparison of learned energy models for different protein-DNA complexes: MAX (A), PHO4 (B), and PDX1 (C). MAX and PHO4 are members of the Helixloop-helix (HLH) CATH protein superfamily (4.10.280.100), while PDX1 belongs to another Homeodomain-like CATH protein superfamily (1.10.10.60).
  
  To compare our approach with both general and family-specific knowledge-based energy models, we conducted two studies. First, we incorporated a knowledge-based generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide (e.g., phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups). For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with this generic one to test its ability to differentiate strong binders from weak binders in the HT-SELEX dataset [5]. As shown in Figure S6, the IDEA model generally achieves better performance than the generic energy model.
  
  Additionally, we compared IDEA with rCLAMPS, a family-specific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families.
  
  As shown in Table S1 and Table S2, IDEA also shows better performance than rCLAMPS in most cases across the C2H2 and homeodomain families, demonstrating that it has better predictive accuracy than both state-of-the-art family-specific and generic knowledgebased models.
  
  We have included relevant texts in Appendix Section Comparison of IDEA predictive performance Using HT-SELEX data to clarify this point. The added texts are:
  
  In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledgebased generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide, including phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups. For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with the DBD-Hunter model to assess its ability to differentiate strong binders from weak binders in the HTSELEX dataset [5]. Additionally, we compared IDEA with rCLAMPS, a familyspecific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. rCLAMPS learns a position-dependent amino-acid-nucleotide interaction energy model. To incorporate this model into the binding free energy calculation, we averaged the energy contributions across all occurrences of each amino-acid-nucleotide pair, which resulted in a 20-by-4 residue-type-specific energy matrix. This matrix is structurally analogous to the IDEA-trained energy model and can be directly integrated into the binding free energy calculations. As shown in Figure S6, Table S1, and Table S2, the IDEA model generally outperforms DBD-Hunter and rCLAMPS, demonstrating that it can achieve better predictive accuracy than both generic and family-specific knowledge-based models.
  
  Comment 3: Prediction of binding affinity is a well-studied domain and many competitors exist, some of which are well-used. However, no quantitative comparison to such methods is presented. To understand the scope of the presented method, IDEA, the authors should discuss/compare with such methods (e.g. PMID 35606422).
  
  We thank the reviewer for the insightful comments. As detailed in our response to Comment 5, we previously misused the term “binding specificity”, and would like to clarify that our model is designed to predict protein-DNA binding affinity. To compare the performance of IDEA with state-of-the-art protein-DNA predictive models, we examined the predictive accuracies of two additional popular computational models: ProBound [8] and DeepBind [9]. ProBound has been shown to have a better performance than several earlier predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To benchmark these models’ performance, we examine each method’s capability to identify strong binders with the HT-SELEX datasets covering 22 proteins from 12 protein families [5]. As suggested by Reviewer 1, we also calculated the PRAUC score, reweighted to account for data imbalance [6], as a complementary metric for evaluating the model performance.
  
  As shown in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive methods. It is important to note that both ProBound and DeepBind were trained on a curated version of the HT-SELEX data [13], which overlaps with the testing data [5]. Compared with them, IDEA was trained only on the given structural and sequence information from a single protein-DNA complex, thus independent of the testing data. In order to assess how IDEA performs when incorporating knowledge from HT-SELEX data, we augmented the training by randomly including half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models. Overall, IDEA can be used to predict protein-DNA affinities in the absence of known binding sequence data, thereby filling a critical gap when such experimental datasets are unavailable.
  
  Additionally, we have conducted a 10-fold cross-validation using the same HT-SELEX data [5] and found that IDEA outperformed a recent regression model that considers the shape of DNA with different sequences [5].
  
  We have revised our text to include the comparison between IDEA and other predictive models. Specifically, we revised the text in Section: IDEA Generalizes across Various Protein Families.
  
  The revised text reads:
  
  “To examine IDEA’s predictive accuracy across different DNA-binding protein families, we applied it to calculate protein-DNA binding affinities using a comprehensive HT-SELEX dataset [5]. We focused on evaluating the capability of IDEA to distinguish strong binders from weak binders for each protein with an experimentally determined structure. We calculated the probability density distribution of the top and bottom binders identified in the SELEX experiment. A well-separated distribution indicates the successful identification of strong binders by IDEA (Figure 2D and S4). Receiver Operating Characteristic (ROC) analysis was performed to calculate the Area Under the Curve (AUC) and the precision-recall curve (PRAUC) scores for these predictions. Further details are provided in the Methods Section Evaluation of IDEA Prediction Using HT-SELEX Data. Our analysis shows that IDEA successfully differentiates strong from weak binders for 80% of the 22 proteins across 12 protein families, achieving AUC and balanced PRAUC scores greater than 0.5 (Figure 2D and S5). To benchmark IDEA’s performance against other leading methods, we compared its predictions with several popular models, including the sequence-based predictive models ProBound [8] and DeepBind [9], the familybased energy model rCLAMPS [10], and the knowledge-based energy model DBD-Hunter [7]. IDEA demonstrates performance comparable to these stateof-the-art approaches, and incorporating sequence features further improves its prediction accuracy (Figure S6, Table S1, and Table S2). We also performed 10-fold cross-validation on the binding affinities of protein–DNA pairs in this dataset and found that IDEA outperforms a recent regression model that considers the shape of DNA with different sequences [5] (Figure S7). Details are provided in Section: Comparison of IDEA predictive performance Using HT-SELEX data.”
  
  We also added one section Comparison of IDEA predictive performance Using HT-SELEX data in the Appendix to fully explain the comparison between IDEA and other popular models. The added texts are:
  
  “To benchmark the performance of IDEA against state-of-the-art protein-DNA predictive models, we evaluated its ability to recognize strong binders with the HT-SELEX datasets across 22 proteins from 12 families [5]. Specifically, we compare IDEA with two widely used sequence-based models: ProBound [8] and DeepBind [9]. ProBound has demonstrated superior performance over many other predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To use ProBound, we retrieved the trained binding model for each protein from motifcentral.org and used the GitHub implementation of ProBoundTools to infer the binding scores between protein and target DNA sequences. Except for POU3F1, binding models are available for all proteins. Therefore, we excluded POU3F1 and evaluated the protein-DNA binding affinities for the remaining 21 proteins. To use DeepBind, sequence-specific binding affinities were predicted directly with its web server. The Area Under the Curve (AUC) and the Precision-Recall AUC (PRAUC) scores were used as metrics for comparison. An AUC score of 1.0 indicates a perfect separation between the strong- and weak-binder distributions, while an AUC score of 0.5 indicates no separation. Because there is a significant imbalance in the number of strong and weak binders from the experimental data [5], where the strong binders are far fewer than the weak binders, we reweighted the samples to achieve a balanced evaluation, using 0.5 as the baseline for randomized prediction [6]. As summarized in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive models. In order to assess the performance of IDEA when augmented with additional protein-DNA binding data, we augmented IDEA using randomly selected half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models.”
  
  “We also performed 10-fold cross-validation using the same HT-SELEX datasets, following the protocol described in the Methods Section Enhanced Modeling Prediction with SELEX Data. For each protein, we divided the entire dataset into 10 equal, randomly assigned folds. In each iteration, we used randomly selected 9 of the 10 folds as the training dataset and the remaining fold as the testing dataset. This process was repeated 10 times so that each fold served as the test set once. We then reported the average R2 scores across these iterations to evaluate IDEA’s predictive performance. Our results are compared with the 1mer and 1mer+shape methods from [5], the latest regression model that considers the shape of DNA with different sequences (Figure S7). This comparative analysis shows IDEA achieved higher predictive accuracy than the state-of-the-art sequence-based protein-DNA binding predictors for proteinDNA complexes that have available experimentally resolved structures.”
  
  “Overall, these results demonstrate that IDEA can be used to predict the proteinDNA pairs in the absence of known binding sequence data, thus filling an important gap in protein-DNA predictions when experimental binding sequence data are unavailable.”
  
  Comment 4: The term “interpretable” has been used lavishly in the manuscript while providing little evidence on the matter. The only evidence shown is the family-specific residue-nucleotide interaction/energy matrix and speculations on how these values are biologically sensible. Recent works already present more biophysical, fine-grained, and sometimes family-independent interpretability (e.g. PMID 39103447, 36656856, 38352411, etc.). The authors should put into context the scope of the interpretability of IDEA among such works.
  
  We thank the reviewer for the insightful comment and agree that “interpretability” should be discussed in a relevant context. In our work, interpretability refers to the familyspecific amino-acid-nucleotide interaction energies identified from the model training, which reveal interaction preferences within protein-DNA binding interfaces. As detailed in our response to Comment 6, we performed principal component analysis (PCA) on the learned energy models and observed clustering of learned energy models corresponding to protein families. Therefore, the IDEA-learned energy models can be used as a signature to capture the energetic preferences of amino-acid-nucleotide interactions within a given protein family. This preference can be used to infer preferred sequence binding motifs, similar to those identified by other computational tools [10, 4, 15, 16].
  
  We have revised the text to clarify the “interpretability” as the family-specific aminoacid-nucleotide interactions that govern sequence-dependent protein-DNA binding, and to discuss IDEA’s interoperability within the context of recent works, including those suggested by the reviewers.
  
  We have revised the text in Introduction. The new text reads:
  
  “Here, we introduce the Interpretable protein-DNA Energy Associative (IDEA) model, a predictive model that learns protein-DNA physicochemical interactions by fusing available biophysical structures and their associated sequences into an optimized energy model (Figure 1). We show that the model can be used to accurately predict the sequence-specific DNA binding affinities of DNA-binding proteins and is transferrable across the same protein superfamily. Moreover, the model can be enhanced by incorporating experimental binding data and can be generalized to enable base-pair resolution predictions of genomic DNA-binding sites. Notably, IDEA learns a family-specific interaction matrix that quantifies energetic interactions between each amino acid and nucleotide, allowing for a direct interpretation of the “molecular grammar” governing sequence-specific protein-DNA binding affinities. This interpretable energy model is further integrated into a simulation framework, facilitating mechanistic studies of various biomolecular functions involving protein-DNA dynamics.”
  
  We have revised the text in Results. The new text reads:
  
  “IDEA is a coarse-grained biophysical model at the residue resolution for investigating protein-DNA binding interactions (Figure 1). It integrates both structures and corresponding sequences of known protein-DNA complexes to learn an interpretable energy model based on the interacting amino acids and nucleotides at the protein-DNA binding interface. The model is trained using available protein-DNA complexes curated from existing databases [17, 18].
  
  Unlike existing deep-learning-based protein-DNA binding prediction models, IDEA aims to learn a physicochemical-based energy model that quantitatively characterizes sequence-specific interactions between amino acids and nucleotides, thereby interpreting the “molecular grammar” driving the binding energetics of protein-DNA interactions. The optimized energy model can be used to predict the binding affinity of any given protein-DNA pair based on its structures and sequences. Additionally, it enables the prediction of genomic DNA binding sites by a given protein, such as a transcription factor. Finally, the learned energy model can be incorporated into a simulation framework to study the dynamics of DNA-binding processes, revealing mechanistic insights into various DNA-templated processes. Further details of the optimization protocol are provided in Methods Section Energy Model Optimization.”
  
  The revised text in Section: Discussion now reads:
  
  “Another highlight of IDEA is its ability to present an interpretable, familyspecific amino acid-nucleotide interaction energy model for given proteinDNA complexes. The optimized IDEA energy model can not only predict sequence-specific binding affinities of protein-DNA pairs but also provide a residue-specific interaction matrix that dictates the preferences of amino acidnucleotide interactions within specific protein families (Figure S11). This interpretable energy matrix would facilitate the discovery of sequence binding motifs for target DNA-binding proteins, complementing both sequencebased [24, 16, 25] and structure-based approaches [10, 26, 4, 15]. Additionally, we integrated this physicochemical-based energy model into a simulation framework, thereby improving the characterization of protein-DNA binding dynamics. IDEA-based simulation enables the investigation into dynamic interactions between various proteins and DNA, facilitating molecular-level understanding of the physical mechanisms underlying many DNA-binding processes, such as transcription, epigenetic regulations, and their modulation by sequence variations, such as single-nucleotide polymorphisms (SNPs) [22, 23].”
  
  Comment 5: The manuscript disregards subtle yet important differences in commonly used terminology in the field. For example, the authors use the term ”specificity” and ”affinity” almost interchangeably (for example, the caption for Figure 3A uses ”specificity” although the Methods text describes the prediction as about ”affinity”). If the authors are looking to predict specificity, IDEA needs to be put in the context of the corresponding state-of-the-art (PMID 36123148, 39103447, 38867914, 36124796, etc).
  
  We really appreciate the reviewer for pointing out the conflation of “specificity” and “affinity” in our manuscript. To clarify, the primary function of IDEA is to predict the binding affinities of protein-DNA pairs in a sequence-specific manner. We have revised the text to clarify the distinction between affinity and specificity and acknowledge prior works, including those provided by the reviewers, that focus on predicting protein-DNA binding specificity.
  
  We have revised the Section title IDEA Accurately Predicts Protein-DNA Binding Specificity to IDEA Accurately Predicts Sequence-Specific Protein-DNA Binding Affinity; and ResidueLevel Protein-DNA Energy Model for Predicting Protein-DNA Recognition Specificities to Predictive Protein-DNA Energy Model at Residue Resolution.
  
  We have revised the text in Introduction. The revised text reads:
  
  “Computational methods complement experimental efforts by providing the initial filter for assessing sequence-specific protein-DNA binding affinity. Numerous methods have emerged to enable predictions of binding sites and affinities of DNA-binding proteins [27, 9, 1, 5, 28, 29, 30, 31, 8]. These methods often utilized machine-learning-based training to extract sequence preference information from DNA or protein by utilizing experimental high-throughput (HT) assays [27, 9, 1, 5, 28, 8], which rely on the availability and quality of experimental binding assays. Additionally, many approaches employ deep neural networks [29, 30, 31], which could obscure the interpretation of interaction patterns governing protein-DNA binding specificities. Understanding these patterns, however, is crucial for elucidating the molecular mechanisms underlying various DNA-recognition processes, such as those seen in TFs [32].”
  
  We have revised the text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily.
  
  The revised text reads:
  
  “Since IDEA relies on the sequence-structure relationship of given protein-DNA complexes to reach predictive accuracy, we inquired whether the trained energy model from one protein-DNA complex could be generalized to predict the sequence-specific binding affinities of other complexes. To test this, we assessed the transferability of IDEA predictions across all 11 structurally available protein-DNA complexes within the MAX TF-associated CATH superfamily (CATH ID: 4.10.280.10, Helix-loop-helix DNA-binding domain). We trained IDEA based on each of these 11 complexes and then used the trained model to predict the MAX-based MITOMI binding affinity. Our results show that IDEA generally makes correct predictions of the binding affinity when trained on proteins that are homologous to MAX, with Pearson and Spearman Correlation coefficients larger than 0.5 (Figure 3A and Figure S10).”
  
  We have revised the caption of Figure 3: The revised text reads:
  
  “IDEA prediction shows transferability within the same CATH superfamily. (A) The predicted MAX binding affinity, trained on other protein-DNA complexes within the same protein CATH superfamily, correlates well with experimental measurement. The proteins are ordered by their probability of being homologous to the MAX protein, determined using HHpred [33]. Training with a homologous protein (determined as a hit by HHpred) usually leads to better predictive performance (Pearson Correlation coefficient > 0.5) compared to non-homologous proteins. (B) Structural alignment between 1HLO (white) and 1A0A (blue), two protein-DNA complexes within the same CATH Helix-loop-helix superfamily. The alignment was performed based on the Ebox region of the DNA [34]. (C) The optimized energy model for 1A0A, a protein-DNA complex structure of the transcription factor PHO4 and DNA, with 33.41% probability of being homologous to the MAX protein. The optimized energy model is presented in reduced units, as explained in the Methods Section: Training Protocol.”
  
  We have revised the text in Section Discussion: The revised text now reads:
  
  “The protein-DNA interaction landscape has evolved to facilitate precise targeting of proteins towards their functional binding sites, which underlie essential processes in controlling gene expression. These interaction specifics are determined by physicochemical interactions between amino acids and nucleotides. By integrating sequences and structural data from available proteinDNA complexes into an interaction matrix, we introduce IDEA, a data-driven method that optimizes a system-specific energy model. This model enables high-throughput in silico predictions of protein-DNA binding specificities and can be scaled up to predict genomic binding sites of DNA-binding proteins, such as TFs. IDEA achieves accurate de novo predictions using only proteinDNA complex structures and their associated sequences, but its accuracy can be further enhanced by incorporating available experimental data from other binding assay measurements, such as the SELEX data [35, 36, 37], achieving accuracy comparable or better than state-of-the-art methods (Figures S2 and S7, Table S1 and S2). Despite significant progress in genome-wide sequencing techniques [38, 39, 40, 41], determining sequence-specific binding affinities of DNA-binding biomolecules remains time-consuming and expensive. Therefore, IDEA presents a cost-effective alternative for generating the initial predictions before pursuing further experimental refinement.”
  
  We have revised the text in Discussion to clarify that the acquired binding affinities of target DNA sequences can be used to help existing models to infer specific DNA binding motifs.
  
  The revised text now reads:
  
  Another highlight of IDEA is its ability to present an interpretable, familyspecific amino acid-nucleotide interaction energy model for given proteinDNA complexes. The optimized IDEA energy model can not only predict sequence-specific binding affinities of protein-DNA pairs but also provide a residue-specific interaction matrix that dictates the preferences of amino acidnucleotide interactions within specific protein families (Figure S11). This interpretable energy matrix would facilitate the discovery of sequence binding motifs for target DNA-binding proteins, complementing both sequencebased [24, 16, 25] and structure-based approaches [10, 26, 4, 15]. Additionally, we integrated this physicochemical-based energy model into a simulation framework, thereby improving the characterization of protein-DNA binding dynamics. IDEA-based simulation enables the investigation into dynamic interactions between various proteins and DNA, facilitating molecular-level understanding of the physical mechanisms underlying many DNA-binding processes, such as transcription, epigenetic regulations, and their modulation by sequence variations, such as single-nucleotide polymorphisms (SNPs) [22, 23].
  
  Comment 6: It is not clear how much the learned energy model is dependent on the structural model used for a specific system/family. It would be interesting to see the differences in learned model based on different representative PDB structures used. Similarly, the supplementary figures show a lack of discriminative power for proteins like PDX1 (homeodomain family), POU, etc. Can the authors shed some light on why such different performances?
  
  We thank the reviewer for the insightful comments and agree that the trained energy model should be presented in the context of protein families. To further analyze the dependence of the energy model on protein family, we visualized the trained energy models for 24 proteins, including all proteins from the HT-SELEX dataset as well as PHO4 (PDB ID: 1A0A) and CTCF (PDB ID: 8SSQ), spanning 12 distinct protein families. To quantitatively assess similarities and differences among these energy models, we flattened each normalized energy model into an 80-dimensional vector and performed principal component analysis (PCA). As shown in Author response image 1 and Figure S11, energy models optimized from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results shown in Figure 3A, where the energy model trained from PHO4 has better transferability than those from the other two systems.
  
  We also greatly appreciate the reviewer’s suggestion to examine cases where IDEA failed to demonstrate strong discriminative power. When evaluating the model’s ability to distinguish between strong and weak binders, we used the available experimental structure most similar to the protein employed in the HT-SELEX experiments. In some instances, only the structure of the same protein from a different organism is available. For example, the HT-SELEX data for PDX1-DNA used the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, we used the mouse PDX1–DNA complex (PDB ID: 2H1K) for model training. The differences between species may limit the predictive accuracy of the model. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).
  
  We also examined the remaining cases where IDEA did not show a clear distinction between strong and weak binders: USF1, Egr1, and PROX1. For PROX1, we initially used the structure of a protein-DNA complex (PDB ID: 4Y60) in training. However, upon closer inspection, we discovered that this structure does not include the PROX1 protein, but SOX-18, a different transcription factor. This explains the inaccurate prediction made by IDEA. Since no experimental PROX1-DNA complex structure is currently available, we have removed this case from our HT-SELEX evaluation.
  
  IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.
  
  Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by kMITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.
  
  We have included additional text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily to discuss the PCA analysis and the dependence of the model’s transferability on the similarity among the learned energy models.
  
  The revised text now reads:
  
  “The transferability of IDEA within the same CATH superfamily can be understood from the similarities in protein-DNA binding interfaces, which determine similar learned energy models. For example, the PHO4 protein (PDB I”D: 1A0A) shares a highly similar DNA-binding interface with the MAX protein (PDB ID: 1HLO) (Figure 3B), despite sharing only a 33.41% probability of being homologous. Consequently, the energy model derived from the PHO4DNA complex (Figure 3C) exhibits a similar amino-acid-nucleotide interactive pattern as that learned from the MAX-DNA complex (Figure 2B). To further evaluate the similarity between the learned energy models and their connection to protein families, we performed principal component analysis (PCA) on the normalized energy models across 24 proteins from 12 protein families [5]. Our analysis (Figure S11) reveals that most of the energy models from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability between them. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results in Figure 3A, where the energy model trained on PHO4 has better transferability than those trained on USF1 or TCF4.”
  
  We have also added an Appendix section titled Analysis of examples where IDEA fails to recognize strong DNA binders to discuss the examples in which IDEA did not perform well:
  
  “We examine IDEA’s capability in identifying strong binders from the HT-SELEX dataset across 12 protein families [5]. The model successfully predicts 18 out of 22 protein-DNA systems, but the performance is reduced in 4 cases. Closer investigations revealed the source of these limitations. In some instances, only the protein from a different organism is available. For example, the PDX1 HT-SELEX data utilized the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, the mouse PDX1–DNA complex structure (PDB ID: 2H1K) was used for model training. Differences between model organisms may reduce predictive accuracy. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).
  
  IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.
  
  Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by k-MITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.”
  
  Comment 7: It is also not clear if IDEA’s prediction for reverse complement sequences is the same for a given sequence. If so, how is this property being modelled? Either this description is lacking or I missed it.
  
  We thank the reviewer for the insightful comments. Given a target protein-DNA sequence, the IDEA protocol substitutes it into a known protein-DNA complex structure to evaluate the binding free energy, which can be converted into binding affinity. IDEA uses sequence identity to determine whether the forward or reverse strand of the DNA should be replaced. Only the strand most similar to the target sequence is substituted. As a result, the model treats reverse-complement sequences differently. As the orientations of test sequences are specified from 5’ to 3’ in all datasets used in this study (e.g., processed MITOMI, HT-SELEX, and ChIP-seq data), this approach ensures that the target sequences are replaced and evaluated correctly. In cases where sequence orientation is not provided (though this was not an issue in this study), we recommend replacing both the forward and reverse strands with the target sequence separately and evaluating the corresponding protein–DNA binding free energies. Since strong binders are likely to dominate the experimental signals, the higher predicted binding affinity, with stronger binding free energies, should be taken as the model’s final prediction.
  
  We have added one section to the Methods Section titled Treatment of Complementary DNA Sequences to clarify these modeling details.
  
  The specific text reads:
  
  To replace the DNA sequence in the protein-DNA complex structure with a target sequence, IDEA uses sequence identity to determine whether the target sequence belongs to the forward or reverse strand of the DNA in the proteinDNA structure. The more similar strand is selected and replaced with the target sequence. As the orientations of test sequences are specified from 5’ to 3’ in all datasets used in this study (e.g., processed MITOMI, HT-SELEX, and ChIP-seq data), this approach ensures that the target sequences are replaced and evaluated correctly. In cases where sequence orientation is not provided (though this was not an issue in this study), we recommend replacing both the forward and reverse strands with the target sequence separately and evaluating the corresponding protein–DNA binding free energies. Since strong binders are likely to dominate the experimental signals, the higher predicted binding affinity, with stronger binding free energy, should be taken as the model’s final prediction.”
  
  “Comment 8: Page 21 line 403, the E-box core should be CACGTG instead of CACGTC.
  
  We apologize for our oversight and have corrected the relevant text.
  
  Comment 9: The citation for DNAproDB is outdated and should be updated (PMID 39494533).
  
  We thank the reviewer for pointing this out and have updated our citation accordingly.
  
  Reviewer #3:
  
  Comment 0: Summary: Protein-DNA interactions and sequence readout represent a challenging and rapidly evolving field of study. Recognizing the complexity of this task, the authors have developed a compact and elegant model. They have applied well-established approaches to address a difficult problem, effectively enhancing the information extracted from sparse contact maps by integrating artificial sequences decoy set and available experimental data. This has resulted in the creation of a practical tool that can be adapted for use with other proteins.
  
  We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.
  
  Comment 1: Strengths: (1) The authors integrate sparse information with available experimental data to construct a model whose utility extends beyond the limited set of structures used for training. (2) A comprehensive methods section is included, ensuring that the work can be reproduced. Additionally, the authors have shared their model as a GitHub project, reflecting their commitment to transparency of research.
  
  We appreciate the reviewer’s strong assessment of the strengths of this paper. In addition to sharing our model on GitHub, we have also uploaded the original data and the essential scripts required to reproduce the results presented in the manuscript. We hope this further demonstrates our commitment to transparency and reproducibility.
  
  Comment 2: Weaknesses: (1) The coarse-graining procedure appears artificial, if not confusing, given that full-atom crystal structures provide more detailed information about residue-residue contacts. While the selection procedure for distance threshold values is explained, the overall motivation for adopting this approach remains unclear. Furthermore, since this model is later employed as an empirical potential for molecular modeling, the use of P and C5 atoms raises concerns, as the interactions in 3SPN are modeled between Cα and the nucleic base, represented by its center of mass rather than P or C5 atoms.
  
  We appreciate the reviewer’s insightful comments. The selection of P and C5 atoms was based on different relative positions of protein and DNA across various complex structures, each with distinctive protein-DNA structural interfaces. To illustrate this, we selected two representative structures where our algorithm selected C5 and P atoms, respectively: MAX-DNA (PDB ID: 1HLO) and FOXP3 (PDB ID: 7TDW). As shown in Author response image 2, in the case of 1HLO, more C5 atoms are within the cutoff distance of 10 A from˚ the protein Cα atoms, thus capturing essential contacting interactions. In contrast, 7TDW has more P atoms within this cutoff. Importantly, several P atoms are distributed on the minor groove of the DNA, which were not captured by the C5 atoms. To maximize the inclusion of relevant structural contacts, we employed a filtering scheme that selectively chooses either P or C5 atoms based on their proximity to the protein to enhance the model prediction. We note that while this scheme is helpful, the IDEA predictions remain robust across different atom selections. To assess this robustness, we performed binding affinity predictions using only P atoms on the HT-SELEX dataset across 12 protein families [5]. Our predictions (Author response table 1) show comparable performance to that achieved using our filtering scheme.
  
  Author response image 2.
  
  Comparison between P and C5 atoms in proximity to the protein 3D structures of MAX–DNA (A) and FOXP-DNA (B) complexes, where P atoms (red sphere) and C5 atoms (blue sphere) that are within 10 A of Cα atoms are highlighted.
  
  When incorporating the trained IDEA energy model into a simulation model, we acknowledge a potential mismatch between the resolution of the data-driven model (one coarse-grained site per nucleotide) and the 3SPN simulation model (three coarse-grained sites per nucleotide). The selection of nucleic base sites for molecular interactions in the 3SPN model follows our previous work [44] and its associated code implementation. While revisiting this part of the manuscript, we identified an inconsistency in the reported results in Figure 5A of our initial version: Specifically, we previously used the protein side-chain atoms, rather than only the Cα atoms, in model training. Retraining the data using the Cα atoms results in reduced prediction performance for the IDEA model (Figure 5A). Nonetheless, incorporating this updated energy model into simulations still yielded high accuracy in the predicted absolute binding free energies (Author response image 3A), demonstrating the robustness of our simulation framework in predicting absolute binding free energies against variations in atom selection during the IDEA model training. Following the reviewer’s suggestion, we also incorporated the IDEA-trained energy model as short-range van der Waals interactions between protein Cα atoms and DNA P atoms. As shown in Author response image 3B, our simulation reveals a slightly improved performance over our original implementation, with higher Pearson and Spearman correlation coefficients and a fitted slope closer to 1.0. This result suggests that a more consistent atom selection scheme between the data-driven and simulation models can improve the overall predictions. Accordingly, we have updated Figure 5 with this improved setup, using the simulation model with short-range vdW interactions implemented between protein Cα atoms and DNA P atoms (Figure 5C), ensuring consistency between the IDEA model and simulation framework.
  
  Author response table 1.
  
  Comparison of IDEA performance using two DNA atom selection schemes: the filtering scheme presented in the manuscript (C5 and P atoms) versus using only P atoms. Cases where the two schemes result in different atom selections are highlighted in bold.
  
  We acknowledge that a gap still exists between the resolution of the data-driven and simulation models. To ensure a completely consistent coarse-grained level between these two models, we will work on implementing the IDEA model output for 1-bead-per-nucleotide DNA simulation models in the future.
  
  Comment 3: (2) Although the authors use a standard set of metrics to assess model quality and predictive power, some ∆∆G predictions compared to MITOMI-derived ∆∆G values appear nonlinear, which casts doubt on the interpretation of the correlation coefficient.
  
  Author response image 3.
  
  Comparison of simulations using different representative atoms (A) Protein-DNA binding simulation with the IDEA-model incorporated as short-range van der Waals between protein Cα atom and nucleic base site. (B) Protein-DNA binding simulation with the IDEA-model incorporated as short-range van der Waals between protein Cα atom and DNA P atoms. The predicted free energies are robust to the choice of DNA representative atoms. The predicted binding free energies are presented in physical units, and error bars represent the standard deviation of the mean.
  
  We thank the reviewer for the insightful comments and agree that the linear fit between our model’s prediction and the experimental data may not be the best measure of performance. The primary utility of the IDEA model is to predict high-affinity DNA-binding sequences for a given DNA-binding protein by assessing the relative binding affinities across different DNA sequences. In this regard, the ranked order of predicted sequence binding affinities serves as a better metric for evaluating the success of this model. To evaluate this, we calculated both Spearman’s rank correlation coefficient, which does not rely on linear correlation, and the Pearson correlation coefficient between our predictions and the experimental results. As shown in Figure 2, our computation shows a Spearman’s rank correlation coefficient of 0.65 for the MAX-based predictions using one MAX-DNA complex (PDB ID: 1HLO), supporting the model’s capability to effectively distinguish strong from weak binders.
  
  As reflected in Figure 2 of the main text, although our model generally captures the relative binding affinities across different DNA sequences, its predictive accuracy diminishes for low-affinity sequences (Figure 2). This could be due to two limitations of the current modeling framework: (1) The model is residue-based and estimates binding free energy as the additive sum of contributions from individual contacting amino-acid-nucleotide pairs. This assumption does not account for cooperative effects caused by simultaneous changes at multiple nucleotide positions. One potential direction to further improve the model would be to use a finer-grained representation by incorporating more atom types within contacting residues, and to use a many-body potential to better capture cooperative effects from multiple mutations. (2) The model assumes that the target DNA adopts the same binding interface as in the reference crystal structure. However, sequencedependent DNA shape has been shown to be important in determining protein-DNA binding affinity [1]. To address this limitation, a future direction is to use deep-learningbased methods to incorporate predicted DNA shape or protein-DNA complex structures based on their sequences [2, 3] into our model prediction.
  
  To fully evaluate the predictive power of IDEA, we have included Spearman’s rank correlation coefficient for every correlation plot in this manuscript. Across all our analyses, the Spearman’s rank correlation coefficients reveal similar predictive performance as the Pearson correlation coefficients. Additionally, we have included in our discussion the current limitations of our model and potential directions for future improvement.
  
  We have edited our Discussion Section to include a discussion on the limitations of the current model. Specifically, the added texts are:
  
  “Although IDEA has proved successful in many examples, it can be improved in several aspects. The model currently assumes the training and testing sequences share the same protein-DNA structure. While double-stranded DNA is generally rigid, recent studies have shown that sequence-dependent DNA shape contributes to their binding specificity [1, 2, 4]. To improve predictive accuracy, one could incorporate predicted DNA shapes or structures into the IDEA training protocol. In addition, the model is residue-based and evaluates the binding free energy as the additive sum of contributions from individual amino-acid-nucleotide contacts. This assumption does not account for cooperative effects that may arise from multiple nucleotide changes. A potential refinement could utilize a finer-grained model that includes more atom types within contacting residues and employs a many-body potential to account for such cooperative effects.”
  
  Comment 4: (3) The discussion section lacks information about the model’s limitations and a comprehensive comparison with other models. Additionally, differences in model performance across various proteins and their respective predictive powers are not addressed.
  
  We thank the reviewer for the insightful comments. As discussed in the response to Comment 3, the current structural model has several limitations, which may reduce predictive accuracy for weak DNA binders. We have noted these limitations in the Discussion section.
  
  To compare the performance of IDEA with state-of-the-art protein-DNA predictive models, we examined the predictive accuracies of two additional popular computational models: ProBound [8] and DeepBind [9]. ProBound has been shown to have a better performance than several earlier predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To benchmark these models’ performance, we examine each method’s capability to identify strong binders with the HT-SELEX datasets covering 22 proteins from 12 protein families [5]. As suggested by Reviewer 1, we also calculated the PRAUC score, reweighted to account for data imbalance [6], as a complementary metric for evaluating the model performance.
  
  As shown in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive methods. It is important to note that both ProBound and DeepBind were trained on a curated version of the HT-SELEX data [13], which overlaps with the testing data [5]. Compared with them, IDEA was trained only on the given structural and sequence information from a single protein-DNA complex, thus independent of the testing data. In order to assess how IDEA performs when incorporating knowledge from HT-SELEX data, we augmented the training by randomly including half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models. We further benchmarked IDEA using a 10-fold cross-validation on the same HT-SELEX data [5] and found that IDEA outperformed a recent regression model that considers the shape of DNA with different sequences [5]. Overall, IDEA can be used to predict protein-DNA affinities in the absence of known binding sequence data, thereby filling a critical gap when such experimental datasets are unavailable.
  
  In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledge-based generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide (e.g., phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups). For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with this generic one to test its ability to differentiate strong binders from weak binders in the HT-SELEX dataset [5]. As shown in Figure S6, the IDEA model generally achieves better performance than the generic energy model. Additionally, we compared IDEA with rCLAMPS, a family-specific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. As shown in Table S1 and Table S2, IDEA also shows better performance than rCLAMPS in most cases across the C2H2 and homeodomain families, demonstrating that it has better predictive accuracy than both family-specific and generic knowledge-based models.
  
  We have revised our text to include the comparison between IDEA and other predictive models. Specifically, we revised the text in Section: IDEA Generalizes across Various Protein Families.
  
  The revised text reads:
  
  “To examine IDEA’s predictive accuracy across different DNA-binding protein families, we applied it to calculate protein-DNA binding affinities using a comprehensive HT-SELEX dataset [5]. We focused on evaluating the capability of IDEA to distinguish strong binders from weak binders for each protein with an experimentally determined structure. We calculated the probability density distribution of the top and bottom binders identified in the SELEX experiment. A well-separated distribution indicates the successful identification of strong binders by IDEA (Figure 2D and S4). Receiver Operating Characteristic (ROC) analysis was performed to calculate the Area Under the Curve (AUC) and the precision-recall curve (PRAUC) scores for these predictions. Further details are provided in the Methods Section Evaluation of IDEA Prediction Using HT-SELEX Data. Our analysis shows that IDEA successfully differentiates strong from weak binders for 80% of the 22 proteins across 12 protein families, achieving AUC and balanced PRAUC scores greater than 0.5 (Figure 2E and S5). To benchmark IDEA’s performance against other leading methods, we compared its predictions with several popular models, including the sequence-based predictive models ProBound [8] and DeepBind [9], the familybased energy model rCLAMPS [10], and the knowledge-based energy model DBD-Hunter [7]. IDEA demonstrates performance comparable to these stateof-the-art approaches (Figure S6, Table S1, and Table S2), and incorporating sequence features further improves its prediction accuracy. We also performed 10-fold cross-validation on the binding affinities of protein–DNA pairs in this dataset and found that IDEA outperforms a recent regression model that considers the shape of DNA with different sequences [5] (Figure S7). Details are provided in Section: Comparison of IDEA predictive performance Using HT-SELEX data.”
  
  We also added one section Comparison of IDEA predictive performance Using HT-SELEX data in the Appendix to fully explain the comparison between IDEA and other popular models.
  
  The added texts are:
  
  “To benchmark the performance of IDEA against state-of-the-art protein-DNA predictive models, we evaluated its ability to recognize strong binders with the HT-SELEX datasets across 22 proteins from 12 families [5]. Specifically, we compare IDEA with two widely used sequence-based models: ProBound [8] and DeepBind [9]. ProBound has demonstrated superior performance over many other predictive protein-DNA models, including JASPAR 2018 [11], HOCOMOCO [12], Jolma et al. [13], and DeepSELEX [14]. To use ProBound, we retrieved the trained binding model for each protein from motifcentral.org and used the GitHub implementation of ProBoundTools to infer the binding scores between protein and target DNA sequences. Except for POU3F1, binding models are available for all proteins. Therefore, we excluded POU3F1 and evaluated the protein-DNA binding affinities for the remaining 21 proteins. To use DeepBind, sequence-specific binding affinities were predicted directly with its web server. The Area Under the Curve (AUC) and the Precision-Recall AUC (PRAUC) scores were used as metrics for comparison. An AUC score of 1.0 indicates a perfect separation between the strong- and weak-binder distributions, while an AUC score of 0.5 indicates no separation. Because there is a significant imbalance in the number of strong and weak binders from the experimental data [5], where the strong binders are far fewer than the weak binders, we reweighted the samples to achieve a balanced evaluation, using 0.5 as the baseline for randomized prediction [6]. As summarized in Figure S6, Table S1, and Table S2, IDEA ranked second among the three predictive models. In order to assess the performance of IDEA when augmented with additional protein-DNA binding data, we augmented IDEA using randomly selected half of the HT-SELEX data (see the Methods Section Enhanced Modeling Prediction with SELEX Data). The augmented IDEA model achieved the best performance among all the models.”
  
  “In addition, we compared the performance of IDEA with both general and family-specific knowledge-based energy models. First, we incorporated a knowledgebased generic protein-DNA energy model (DBD-Hunter) learned from the protein-DNA database, reported by Skoinick and coworkers [7], into our prediction protocol. This model assigns interaction energies to different functional groups within each DNA nucleotide, including phosphate (PP), sugar (SU), pyrimidine (PY), and imidazole (IM) groups. For our comparison, we averaged the energy contributions of these groups within each nucleotide and replaced the IDEA-learned energy model with the DBD-Hunter model to assess its ability to differentiate strong binders from weak binders in the HTSELEX dataset [5]. Additionally, we compared IDEA with rCLAMPS, a familyspecific energy model developed to predict protein-DNA binding specificity in the C2H2 and homeodomain families. rCLAMPS learns a position-dependent amino-acid-nucleotide interaction energy model. To incorporate this model into the binding free energy calculation, we averaged the energy contributions across all occurrences of each amino-acid-nucleotide pair, which resulted in a 20-by-4 residue-type-specific energy matrix. This matrix is structurally analogous to the IDEA-trained energy model and can be directly integrated into the binding free energy calculations. As shown in Figure S6, Table S1, and Table S2, the IDEA model generally outperforms DBD-Hunter and rCLAMPS, demonstrating that it can achieve better predictive accuracy than both generic and family-specific knowledge-based models.”
  
  “We also performed 10-fold cross-validation using the same HT-SELEX datasets, following the protocol described in the Methods Section Enhanced Modeling Prediction with SELEX Data. For each protein, we divided the entire dataset into 10 equal, randomly assigned folds. In each iteration, we used randomly selected 9 of the 10 folds as the training dataset and the remaining fold as the testing dataset. This process was repeated 10 times so that each fold served as the test set once. We then reported the average R2 scores across these iterations to evaluate IDEA’s predictive performance. Our results are compared with the 1mer and 1mer+shape methods from [5], the latest regression model that considers the shape of DNA with different sequences (Figure S7). This comparative analysis shows IDEA achieved higher predictive accuracy than the state-of-the-art sequence-based protein-DNA binding predictors for proteinDNA complexes that have available experimentally resolved structures.”
  
  “Overall, these results demonstrate that IDEA can be used to predict the proteinDNA pairs in the absence of known binding sequence data, thus filling an important gap in protein-DNA predictions when experimental binding sequence data are unavailable.”
  
  We also greatly appreciate the reviewer’s suggestion to examine the model’s performance across different proteins. To do this, we first evaluated the dependence of IDEA prediction on the availability of experimental structures similar to the target protein-DNA complexes. To quantitatively assess similarities and differences among the IDEA-derived energy models, we flattened each normalized energy model into an 80-dimensional vector and performed principal component analysis (PCA). As shown in Author response image 1 and Figure S11, energy models optimized from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results shown in Figure 3A, where the energy model trained from PHO4 has better transferability than those from the other two systems. Therefore, the availability of experimental structures from protein-DNA complexes more similar to the target can lead to better predictive performance.
  
  We also examine cases in which the IDEA model failed to show strong discriminative power for protein-DNA complexes in the HT-SELEX datasets [5] (Figures 2E and S5). When evaluating the model’s ability to distinguish between strong and weak binders, we used the available experimental structure most similar to the protein employed in the HT-SELEX experiments. In some instances, only the structure of the same protein from a different organism is available. For example, the HT-SELEX data for PDX1-DNA used the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, we used the mouse PDX1–DNA complex (PDB ID: 2H1K) for model training. The differences between species may limit the predictive accuracy of the model. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequencebased prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).
  
  We also examined the remaining cases where IDEA did not show a clear distinction between strong and weak binders: USF1, Egr1, and PROX1. For PROX1, we initially used the structure of a protein-DNA complex (PDB ID: 4Y60) in training. However, upon closer inspection, we discovered that this structure does not include the PROX1 protein, but SOX-18, a different transcription factor. This explains the inaccurate prediction made by IDEA. Since no experimental PROX1-DNA complex structure is currently available, we have removed this case from our HT-SELEX evaluation.
  
  IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.
  
  Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by kMITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.
  
  In summary, IDEA’s predictive performance depends on the availability of experimental structures closely related to the target protein-DNA complexes, both in terms of protein sequences and model organisms.
  
  We have included additional text in Section: IDEA Demonstrates Transferability across Proteins in the Same CATH Superfamily to discuss the PCA analysis and the dependence of the model’s transferability on the similarity among the learned energy models.
  
  The revised text now reads:
  
  “The transferability of IDEA within the same CATH superfamily can be understood from the similarities in protein-DNA binding interfaces, which determine similar learned energy models. For example, the PHO4 protein (PDB ID: 1A0A) shares a highly similar DNA-binding interface with the MAX protein (PDB ID: 1HLO) (Figure 3B), despite sharing only a 33.41% probability of being homologous. Consequently, the energy model derived from the PHO4DNA complex (Figure 3C) exhibits a similar amino-acid-nucleotide interactive pattern as that learned from the MAX-DNA complex (Figure 2B). To further evaluate the similarity between the learned energy models and their connection to protein families, we performed principal component analysis (PCA) on the normalized energy models across 24 proteins from 12 protein families [5]. Our analysis (Figure S11) reveals that most of the energy models from the same protein family fall within the same cluster, while those from different protein families exhibit distinct patterns. Moreover, the relative distance between energy models in PCA space reflects the degree of transferability between them. For example, PHO4 (PDB ID: 1A0A) is positioned close to MAX (PDB ID: 1HLO), whereas USF1 (PDB ID: 1AN4) and TCF4 (PDB ID: 6OD3) are farther away. This is consistent with the results in Figure 3A, where the energy model trained on PHO4 has better transferability than those trained on USF1 or TCF4.”
  
  We have also added an Appendix section titled Analysis of examples where IDEA fails to recognize strong DNA binders to discuss the examples in which IDEA did not perform well:
  
  “We examine IDEA’s capability in identifying strong binders from the HT-SELEX dataset across 12 protein families [5]. The model successfully predicts 18 out of 22 protein-DNA systems, but the performance is reduced in 4 cases. Closer investigations revealed the source of these limitations. In some instances, only the protein from a different organism is available. For example, the PDX1 HT-SELEX data utilized the human PDX1 protein, but no human PDX1–DNA complex structure is available. Therefore, the mouse PDX1–DNA complex structure (PDB ID: 2H1K) was used for model training. Differences between model organisms may reduce predictive accuracy. A similar limitation applies to POU3F1, where an available mouse complex (PDB ID: 4Y60) was used to predict human protein–DNA interactions. Notably, DeepBind [9], a sequence-based prediction tool, also failed to distinguish strong from weak binders when using the mouse POU3F1 protein (AUC score: 0.457), but this was corrected with the human POU3F1 protein (AUC score: 0.956).
  
  IDEA also fails to fully resolve the binding preference of USF1. A closer examination of the HT-SELEX data reveals a lack of distinction among the sequences, as most sequences, including those with the lowest M-word (binding affinity) scores, contain the DNA-binding E-box sequence CACGTG. Therefore, USF1 represents a challenging example where the experimental data only consists of strong binders with limited variations in binding affinity, which likely results from differences in flanking sequences of the E-box motif.
  
  Egr1 stands as a peculiar example. Whereas IDEA does not effectively distinguish between the strong and weak binders in the current HT-SELEX dataset, its predictions are consistent with other experimental datasets, including binding affinities measured by k-MITOMI [42] (Figure S8A, B), preferred binding sequences from protein-binding microarray, an earlier HT-SELEX experiment, and bacterial one-hybrid data [43]. Therefore, further investigation of the current HT-SELEX data is needed to reconcile these differences.”
  
  Comment 5: The authors provide an implementation of their model via GitHub, which is commendable. However, it unexpectedly requires the Modeller suite, despite no details about homology modeling being included in the methods section.
  
  We thank the reviewer for the helpful comments. We did not use the homology modeling module of Modeller. Instead, we only used a single Python script, buildseq.py, from the Modeller package to extract the protein and DNA sequences from the given PDB structure. We have clarified this in the README file on our GitHub repository.
  
  Comment 6: While the manuscript is written in clear and accessible English, some sentences are quite long and could benefit from rephrasing (e.g., lines 49-52).
  
  Thank you for the helpful suggestion. We agree that the original sentence was overly long and have revised it by splitting it into two for improved clarity and readability.
  
  The revised version reads:
  
  “The very robustness of evolution [46, 47, 48, 49] provides an opportunity to extract the sequence-structure relationships embedded in existing complexes. Guided by this principle, we can learn an interpretable binding energy landscape that governs the recognition processes of DNA-binding proteins.”
  
  Comment 7: In line 82, the citations appear out of place, as the context seems to suggest the use of the newly developed model.
  
  Thank you for this insightful suggestion. We have rephrased the sentence to better connect with the context of this section.
  
  The revised text now reads:
  
  “Finally, the learned energy model can be incorporated into a simulation framework to explore the dynamics of DNA-binding processes, revealing mechanistic insights into various DNA-templated processes.”
  
  Comment 8: Line 143 ”different structure from the bHLH TFs and thus requires a different atom” This is the first instance in the manuscript where the atom selection for distance thresholding is mentioned, making the text somewhat confusing.
  
  We thank the reviewer for the insightful comment and agree that the atom selection scheme appears abruptly in this section. To improve clarity, we have moved the detailed atom selection scheme and its rationale to the Methods Section titled Structural Modeling of Protein and DNA.
  
  Comment 9: Figures: Overall, the figures are visually appealing but could be further improved.
  
  We appreciate the positive feedback regarding the visual presentation of our figures. Following the reviewer’s suggestions and to further enhance clarity, we have revised several figures to improve labeling, layout, and annotations.
  
  Comment 10: Figure 1: The description ”highlighted in blue” considers changing to ”highlighted in blue on the structure.”.
  
  We have revised the text based on your suggestion.
  
  Comment 11: Figure 2: Panel B is missing a color bar legend and units, as is the case in Figure 3C. Additionally, the placement of Panel C is unconventional - it appears it should be Panel D. The color scheme for the spheres is not fully described. Panel E: There are too many colors used; consider employing different markers to improve clarity.
  
  Thank you for the helpful suggestions.
  
  For Figure 2B and Figure 3C, we would like to clarify that the predicted energies are presented in reduced units due to an undetermined prefactor introduced during the model optimization. This point has now been clarified in the figure captions and is also explained in the Methods section titled Training Protocol.
  
  Additionally, we have rearranged Panels C and D to improve the figure layout and have fully described the color coding used in the structural representations.
  
  We have updated it to read:
  
  “Results for MAX-based predictions. (A) The binding free energies calculated by IDEA, trained using a single MAX–DNA complex (PDB ID: 1HLO), correlate well with experimentally measured MAX–DNA binding free energies [50]. ∆∆G represents the changes in binding free energy relative to that of the wild-type protein–DNA complex. (B) The heatmap, derived from the optimized energy model, illustrates key amino acid–nucleotide interactions governing MAX–DNA recognition, showing pairwise interaction energies between 20 amino acids and the four DNA bases—DA (deoxyadenosine), DT (deoxythymidine), DC (deoxycytidine), and DG (deoxyguanosine). Both the predicted binding free energies and the optimized energy model are expressed in reduced units, as explained in the Methods Section Training Protocol. Each cell represents the optimized energy contribution, where blue indicates more favorable (lower) energy values, and red indicates less favorable (higher) values. (C) The 3D structure of the MAX–DNA complex (zoomed in with different views) highlights key amino acid–nucleotide contacts at the protein–DNA interface. Notably, several DNA deoxycytidines (red spheres) form close contacts with arginines (blue spheres). Additional nucleotide color coding: adenine (yellow spheres), guanine (green spheres), thymine (pink spheres). (D) Probability density distributions of predicted binding free energies for strong (blue) and weak (red) binders of the protein ZBTB7A. The mean of each distribution is marked with a dashed line. (E) Summary of AUC scores for protein–DNA pairs across 12 protein families, calculated based on the predicted probability distributions of binding free energies.”
  
  We fully agree that Panel E was visually overwhelming. We have revised the plot by using a combination of color and marker shapes to more clearly distinguish between different protein families, as suggested.
  
  Comment 12: Typos:
  
  Line 18: Gene expressions → Gene expression?
  
  Line 28: performed → utilized ?
  
  We really appreciate the suggestions and have corrected the text accordingly.
  
  References
  
  (1) Tianyin Zhou, Ning Shen, Lin Yang, Namiko Abe, John Horton, Richard S Mann, Harmen J Bussemaker, Raluca Gordan, and Remo Rohs. Quantitative modeling ofˆ transcription factor binding specificities using DNA shape. Proceedings of the National Academy of Sciences, 112(15):4654–4659, 2015.
  
  (2) Jinsen Li, Tsu-Pei Chiu, and Remo Rohs. Predicting DNA structure using a deep learning method. Nat Commun, 15(1):1243, February 2024.
  
  (3) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile˙ Zemgulytˇ e, Eirini Arvan-˙ iti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Zˇ´ıdek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.
  
  (4) Raktim Mitra, Jinsen Li, Jared M. Sagendorf, Yibei Jiang, Ari S. Cohen, Tsu-Pei Chiu, Cameron J. Glasscock, and Remo Rohs. Geometric deep learning of protein–DNA binding specificity. Nat Methods, 21(9):1674–1683, September 2024.
  
  (5) Lin Yang, Yaron Orenstein, Arttu Jolma, Yimeng Yin, Jussi Taipale, Ron Shamir, and Remo Rohs. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models. Mol Syst Biol, 13(2):910, February 2017.
  
  (6) Takaya Saito and Marc Rehmsmeier. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE, 10(3):e0118432, March 2015.
  
  (7) Mu Gao and Jeffrey Skolnick. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res, 36(12):3978–3992, July 2008.
  
  (8) H. Tomas Rube, Chaitanya Rastogi, Siqian Feng, Judith F. Kribelbauer, Allyson Li, Basheer Becerra, Lucas A. N. Melo, Bach Viet Do, Xiaoting Li, Hammaad H. Adam, Neel H. Shah, Richard S. Mann, and Harmen J. Bussemaker. Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning. Nat Biotechnol, 40(10):1520–1527, October 2022.
  
  (9) Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol, 33(8):831–838, August 2015.
  
  (10) Joshua L. Wetzel, Kaiqian Zhang, and Mona Singh. Learning probabilistic proteinDNA recognition codes from DNA-binding specificities using structural mappings. Genome Res, 32(9):1776–1786, September 2022.
  
  (11) Aziz Khan, Oriol Fornes, Arnaud Stigliani, Marius Gheorghe, Jaime A CastroMondragon, Robin van der Lee, Adrien Bessy, Jeanne Cheneby, Shubhada R Kulka-` rni, Ge Tan, Damir Baranasic, David J Arenillas, Albin Sandelin, Klaas Vandepoele, Boris Lenhard, Benoˆıt Ballester, Wyeth W Wasserman, Franc¸ois Parcy, and Anthony Mathelier. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Research, 46(D1):D260–D266, January 2018.
  
  (12) Ivan V. Kulakovskiy, Ilya E. Vorontsov, Ivan S. Yevshin, Ruslan N. Sharipov, Alla D. Fedorova, Eugene I. Rumynskiy, Yulia A. Medvedeva, Arturo Magana-Mora, Vladimir B. Bajic, Dmitry A. Papatsenko, Fedor A. Kolpakov, and Vsevolod J. Makeev. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res, 46(D1):D252–D259, January 2018.
  
  (13) Arttu Jolma, Jian Yan, Thomas Whitington, Jarkko Toivonen, Kazuhiro R. Nitta, Pasi Rastas, Ekaterina Morgunova, Martin Enge, Mikko Taipale, Gonghong Wei, Kimmo Palin, Juan M. Vaquerizas, Renaud Vincentelli, Nicholas M. Luscombe, Timothy R. Hughes, Patrick Lemaire, Esko Ukkonen, Teemu Kivioja, and Jussi Taipale. DNABinding Specificities of Human Transcription Factors. Cell, 152(1-2):327–339, January 2013.
  
  (14) Maor Asif and Yaron Orenstein. DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs. Bioinformatics, 36(Supplement 2):i634–i642, December 2020.
  
  (15) Oriol Fornes, Alberto Meseguer, Joachim Aguirre-Plans, Patrick Gohl, Patricia M Bota, Ruben Molina-Fernandez, Jaume Bonet, Altair Chinchilla-Hernandez, Ferran´ Pegenaute, Oriol Gallego, Narcis Fernandez-Fuentes, and Baldo Oliva. Structurebased learning to predict and model protein–DNA interactions and transcriptionfactor co-operativity in cis -regulatory elements. NAR Genomics and Bioinformatics, 6(2):lqae068, April 2024.
  
  (16) Sofia Aizenshtein-Gazit and Yaron Orenstein. DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning. Bioinformatics, 38(Suppl 2):ii62–ii67, September 2022.
  
  (17) Stephen K Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Henry Chao, Li Chen, Paul A Craig, Gregg V Crichlow, Kenneth Dalenberg, Jose M Duarte, Shuchismita Dutta, Maryam Fayazi, Zukang Feng, Justin W Flatt, Sai Ganesan, Sutapa Ghosh, David S Goodsell, Rachel Kramer Green, Vladimir Guranovic, Jeremy Henry, Brian P Hudson, Igor Khokhriakov, Catherine L Lawson, Yuhe Liang, Robert Lowe, Ezra Peisach, Irina Persikova, Dennis W Piehl, Yana Rose, Andrej Sali, Joan Segura, Monica Sekharan, Chenghua Shao, Brinda Vallat, Maria Voigt, Ben Webb, John D Westbrook, Shamara Whetstone, Jasmine Y Young, Arthur Zalevsky, and Christine Zardecki. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 51(D1):D488–D508, November 2022.
  
  (18) Raktim Mitra, Ari S. Cohen, Jared M. Sagendorf, Helen M. Berman, and Remo Rohs. DNAproDB: an updated database for the automated and interactive analysis of protein-DNA complexes. Nucleic Acids Res, 53(D1):D396–D402, January 2025.
  
  (19) Natalia Petrenko, Yi Jin, Liguo Dong, Koon Ho Wong, and Kevin Struhl. Requirements for RNA polymerase II preinitiation complex formation in vivo. eLife, 8:e43654, January 2019.
  
  (20) Rudolf Jaenisch and Adrian Bird. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet, 33(3):245–254, March 2003.
  
  (21) Claire Marchal, Jiao Sima, and David M. Gilbert. Control of DNA replication timing in the 3D genome. Nat Rev Mol Cell Biol, 20(12):721–737, December 2019.
  
  (22) Lucia A. Hindorff, Praveen Sethupathy, Heather A. Junkins, Erin M. Ramos, Jayashri P. Mehta, Francis S. Collins, and Teri A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362–9367, June 2009.
  
  (23) Tuuli Lappalainen, Alexandra J Scott, Margot Brandt, and Ira M Hall. Genomic analysis in the age of human genome sequencing. Cell, 177(1):70–84, 2019.
  
  (24) Sonali Mukherjee, Michael F. Berger, Ghil Jona, Xun S. Wang, Dale Muzzey, Michael Snyder, Richard A. Young, and Martha L. Bulyk. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat Genet, 36(12):1331– 1339, December 2004.
  
  (25) Shaoxun Liu, Pilar Gomez-Alcala, Christ Leemans, William J. Glassford, Lucas A. N. Melo, Xiang-Jun Lu, Richard S. Mann, and Harmen J. Bussemaker. Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning. bioRxiv, page 2024.01.24.577115, April 2025.
  
  (26) Tsu-Pei Chiu, Satyanarayan Rao, and Remo Rohs. Physicochemical models of protein–DNA binding with standard and modified base pairs. Proc. Natl. Acad. Sci. U.S.A., 120(4):e2205796120, January 2023.
  
  (27) Matthew T Weirauch, Atina Cote, Raquel Norel, Matti Annala, Yue Zhao, Todd R Riley, Julio Saez-Rodriguez, Thomas Cokelaer, Anastasia Vedenko, Shaheynoor Talukder, and others. Evaluation of methods for modeling transcription factor sequence specificity. Nature biotechnology, 31(2):126–134, 2013.
  
  (28) Chaitanya Rastogi, H. Tomas Rube, Judith F. Kribelbauer, Justin Crocker, Ryan E. Loker, Gabriella D. Martini, Oleg Laptenko, William A. Freed-Pastor, Carol Prives, David L. Stern, Richard S. Mann, and Harmen J. Bussemaker. Accurate and sensitive quantification of protein-DNA binding affinity. Proc. Natl. Acad. Sci. U.S.A., 115(16), April 2018.
  
  (29) Rahmatullah Roche, Bernard Moussad, Md Hossain Shuvo, Sumit Tarafder, and Debswapna Bhattacharya. EquiPNAS: improved protein–nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Research, 52(5):e27–e27, March 2024.
  
  (30) Yufan Liu and Boxue Tian. Protein–DNA binding sites prediction based on pretrained protein language model and contrastive learning. Briefings in Bioinformatics, 25(1):bbad488, November 2023.
  
  (31) Binh P. Nguyen, Quang H. Nguyen, Giang-Nam Doan-Ngoc, Thanh-Hoang Nguyen-Vo, and Susanto Rahardja. iProDNA-CapsNet: identifying protein-DNA binding residues using capsule neural networks. BMC Bioinformatics, 20(S23):634, December 2019.
  
  (32) Trevor Siggers and Raluca Gordan. Protein–DNA binding: complexities and multi-ˆ protein codes. Nucleic Acids Research, 42(4):2099–2111, February 2014.
  
  (33) Johannes Soding, Andreas Biegert, and Andrei N. Lupas. The HHpred interactive¨ server for protein homology detection and structure prediction. Nucleic Acids Research, 33(suppl 2):W244–W248, July 2005.
  
  (34) William Humphrey, Andrew Dalke, and Klaus Schulten. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics, 14:33–38, 1996.
  
  (35) Arttu Jolma, Teemu Kivioja, Jarkko Toivonen, Lu Cheng, Gonghong Wei, Martin Enge, Mikko Taipale, Juan M Vaquerizas, Jian Yan, Mikko J Sillanpa¨a, and others.¨ Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome research, 20(6):861–873, 2010.
  
  (36) Nobuo Ogawa and Mark D Biggin. High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro. Gene Regulatory Networks: Methods and Protocols, pages 51–63, 2012.
  
  (37) Alina Isakova, Romain Groux, Michael Imbeault, Pernille Rainer, Daniel Alpern, Riccardo Dainese, Giovanna Ambrosini, Didier Trono, Philipp Bucher, and Bart Deplancke. SMiLE-seq identifies binding motifs of single and dimeric transcription factors. Nature methods, 14(3):316–322, 2017.
  
  (38) Paul G. Giresi, Jonghwan Kim, Ryan M. McDaniell, Vishwanath R. Iyer, and Jason D. Lieb. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res., 17(6):877–885, January 2007.
  
  (39) Peter J Park. ChIP–seq: advantages and challenges of a maturing technology. Nature reviews genetics, 10(10):669–680, 2009.
  
  (40) Terrence S. Furey. ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet, 13(12):840–852, December 2012.
  
  (41) Anna Bartlett, Ronan C. O’Malley, Shao-shan Carol Huang, Mary Galli, Joseph R. Nery, Andrea Gallavotti, and Joseph R. Ecker. Mapping genome-wide transcriptionfactor binding sites using DAP-seq. Nat Protoc, 12(8):1659–1672, August 2017.
  
  (42) Marcel Geertz, David Shore, and Sebastian J Maerkl. Massively parallel measurements of molecular interaction kinetics on a microfluidic platform. Proceedings of the National Academy of Sciences, 109(41):16540–16545, 2012.
  
  (43) Gary D. Stormo and Yue Zhao. Determining the specificity of protein–DNA interactions. Nat Rev Genet, 11(11):751–760, November 2010.
  
  (44) Xingcheng Lin, Rachel Leicher, Shixin Liu, and Bin Zhang. Cooperative DNA looping by PRC2 complexes. Nucleic Acids Research, 49(11):6238–6248, June 2021.
  
  (45) P. L. Privalov, A. I. Dragan, and C. Crane-Robinson. Interpreting protein/DNA interactions: distinguishing specific from non-specific and electrostatic from nonelectrostatic components. Nucleic Acids Research, 39(7):2483–2491, April 2011.
  
  (46) J D Bryngelson and P G Wolynes. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. U.S.A., 84(21):7524–7528, November 1987.
  
  (47) J. N. Onuchic, Z. Luthey-Schulten, and P. G. Wolynes. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem, 48:545–600, 1997.
  
  (48) N. P. Schafer, B. L. Kim, W. Zheng, and P. G. Wolynes. Learning To Fold Proteins Using Energy Landscape Theory. Isr J Chem, 54(8-9):1311–1337, August 2014.
  
  (49) Wen-Ting Chu, Zhiqiang Yan, Xiakun Chu, Xiliang Zheng, Zuojia Liu, Li Xu, Kun Zhang, and Jin Wang. Physics of biomolecular recognition and conformational dynamics. Rep. Prog. Phys., 84(12):126601, December 2021.
  
  (50) Sebastian J. Maerkl and Stephen R. Quake. A Systems Approach to Measuring the Binding Energy Landscapes of Transcription Factors. Science, 315(5809):233–237, January 2007.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.26.595895v3
www.biorxiv.org www.biorxiv.org

DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the previous reviews.
  
  eLife assessment
  
  This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.
  
  Public Reviews:
  
  We thank all Reviewers for their insightful comments and excellent suggestions.
  
  Reviewer #1 (Public Review):
  
  The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.
  
  No comment from authors
  
  The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.
  
  No comment from authors
  
  The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.
  
  No comment from authors for preceding text, comment addresses the following text
  
  Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.
  
  We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.
  
  While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.
  
  Author response image 1.
  
  Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged
  
  At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).
  
  It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.
  
  Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.
  
  We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.
  
  Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.
  
  Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).
  
  To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.
  
  Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.
  
  Reviewer #2 (Public Review):
  
  In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.
  
  The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.
  
  We concur with this assessment.
  
  Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.
  
  Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.
  
  However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.
  
  This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”
  
  If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.
  
  Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.
  
  We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.
  
  If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.
  
  In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.
  
  [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]
  
  Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.
  
  Reviewer #3 (Public Review):
  
  Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.
  
  [No comment from authors]
  
  Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.
  
  No comment from authors
  
  However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.
  
  We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.
  
  Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.
  
  Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.
  
  The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.
  
  Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.
  
  The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.
  
  Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.
  
  We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.
  
  (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.
  
  We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.
  
  (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.
  
  Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.
  
  (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.
  
  Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Major Points
  
  (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.
  
  We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.
  
  (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.
  
  We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:
  
  “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.
  
  This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”
  
  If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”
  
  (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.
  
  We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.
  
  (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.
  
  We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.
  
  (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).
  
  We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.
  
  If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.
  
  In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.
  
  [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]
  
  Minor Points
  
  (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.
  
  We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.
  
  (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.
  
  We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.
  
  While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.
  
  However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.
  
  (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.
  
  We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.
  
  (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.
  
  Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.
  
  Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.
  
  (5) Figure 2B needs a legend.
  
  Thank you for pointing this out. We have added a legend to Fig 2B.
  
  (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.
  
  Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:
  
  “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”
  
  Reviewer #3 (Recommendations For The Authors):
  
  The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:
  
  In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).
  
  We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.
  
  We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.
  
  The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.
  
  We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.
  
  We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.
  
  Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?
  
  The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.
  
  The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.
  
  Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).
  
  The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?
  
  We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.
  
  We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).
  
  What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?
  
  Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).
  
  The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?
  
  For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.
  
  Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?
  
  We believe the reviewer is referring to the following sentence:
  
  “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”
  
  In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.03.535399v2
www.biorxiv.org www.biorxiv.org

CRISPR-Edited DPSCs, Constitutively Expressing BDNF Enhance Dentin Regeneration in Injured Teeth

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This work employs both in vitro and in vivo/transplant methods to investigate the contribution of BDNF/TrkB signaling to enhancing differentiation and dentin-repair capabilities of dental pulp stem cells in the context of exposure to a variety of inflammatory cytokines. A particular emphasis of the approach is the employment of dental pulp stem cells in which BDNF expression has been enhanced using CRISPR technology. Transplantation of such cells is said to improve dentin regeneration in a mouse model of tooth decay.
  
  The study provides several interesting findings, including demonstrating that exposure to several cytokines/inflammatory agents increases the quantity of (activated) phospho-Trk B in dental pulp stem cells.
  
  However, a variety of technical issues weaken support for the major conclusions offered by the authors. These technical issues include the following:
  
  Thank you for your keen observation and evaluation, which helped us significantly improve our manuscript. We have addressed the concerns and comments point by point in detail and substantially revised the manuscript and Figures. We hope that the manuscript is acceptable in the current improvised version.
  
  Detailed response to your comments/concerns is as follows:
  
  (1) It remains unclear exactly how the cytokines tested affect BDNF/TrkB signaling. For example, in Figure 1C, TNF-alpha increases TrkB and phospho-TrkB immunoreactivity to the same degree, suggesting that the cytokine promotes TrkB abundance without stimulating pathways that activate TrkB, whereas in Figure 2D, TNF-alpha has little effect on the abundance of TrkB, while increasing phospho-TrkB, suggesting that it affects TrkB activation and not TrkB abundance.
  
  Thank you for your kind concern. Recently, we have demonstrated the effect and interaction of TNF-alpha and Ca2+/calmodulin-dependent protein kinase II on the regulation of the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling using TrkB inhibitor (Ref. below, and Figure 9). Moreover, we agree with your concern, and we have re-analyzed our replicates and found a better trend and significant abundance of TrkB as well (please refer to revised Figure 2D).
  
  Ref.: Kim, Ji Hyun, et al. (2025) "Ca 2+/calmodulin-dependent protein kinase II regulates the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling." Frontiers in Cell and Developmental Biology 13: 1558736.
  
  (2) I find the histological images in Figure 3 to be difficult to interpret. I would have imagined that DAPI nuclear stains would reveal the odontoblast layer, but this is not apparent. An adjacent section labeled with conventional histological stains would be helpful here. Others have described Stro-1 as a stem cell marker that is expressed on a minority of cells associated with vasculature in the dental pulp, but in the images in Figure 3, Stro-l label is essentially co-distributed with DAPI, in both control and injured teeth, indicating that it is expressed in nearly all cells. Although the authors state that the Stro-1-positive cells are associated with vasculature, but I see no evidence that is true.
  
  Thank you for your concern. STRO-1 is a mesenchymal stem cell marker also expressed in dental pulp stem cells; both populations are distributed in the pulp. DPSCs can contribute to tissue repair and regeneration in inflamed pulp by differentiating into odontoblasts and forming reparative dentin. Moreover, in the case of carious and inflamed pulp, they are disorganized depending on the extent of infection/injury. Our purpose here was to point out DPSCs presence, not vasculature, which will differentiate into odontoblasts in such a scenario. We have revised Figure 3 by adding magnified images and dotted lines to indicate the boundary between the pulp and dentin.
  
  Ref. Volponi A. A., Pang Y., Sharpe P. T. Stem cell-based biological tooth repair and regeneration. Trends in Cell Biology. 2010;20(12):715–722.
  
  (3) The data presented convincingly demonstrate that they have elevated BDNF expression in their dental pulp stem cells using a CRISPR-based approach I have a number of questions about these findings. Firstly, nowhere in the paper do they describe the nature of the CRISPR plasmid they are transiently transfecting. Some published methods delete segments of the BDNF 3'-UTR while others use an inactivated Cas9 to position an active transactivator to sequences in the BDNF promoter. If it is the latter approach, transient transfection will yield transient increases in BDNF expression. Also, as BDNF employs multiple promoters, it would be helpful to know which promoter sequence is targeted, and finally, knowing the identity of the guide RNAs would allow assessment for the potential of off-target effects I am guessing that the investigators employ a commercially obtained system from Santa Cruz, but nowhere is this mentioned. Please provide this information.
  
  Dear Reviewer, yes, you are right. We have used a commercially obtained system from Santa Cruz, i.e., BDNF CRISPR Activation Plasmid (h): sc-400029-ACT and UltraCruz® Transfection Reagent (sc-395739), and they have been mentioned in Chemicals and Reagents section of Materials and Methods as follows.
  
  “BDNF CRISPR Activation Plasmid (h) is a synergistic activation mediator (SAM) transcription activation system designed to upregulate gene expression specifically BDNF CRISPR Activation Plasmid (h) consists of three plasmids at a 1:1:1 mass ratio: a plasmid encoding the deactivated Cas9 (dCas9) nuclease (D10A and N863A) fused to the transactivation domain VP64, and a blasticidin resistance gene; a plasmid encoding the MS2-p65-HSF1 fusion protein, and a hygromycin resistance gene; a plasmid encoding a target-specific 20 nt guide RNA fused to two MS2 RNA aptamers, and a puromycin resistance gene.”
  
  The resulting SAM complex binds to a site-specific region approximately 200-250 nt upstream of the transcriptional start site and provides robust recruitment of transcription factors for highly efficient gene activation
  
  Following transfection, gene activation efficiency could be assayed by WB, IF, or IHC using antibody: pro-BDNF Antibody (5H8): sc-65514
  
  Author response image 1.
  
  (4) Another question left unresolved is whether their approach elevated BDNF, proBDNF, or both. Their 28 kDa western blot band apparently represents proBDNF exclusively, with no mature BDNF apparent, yet only mature BDNF effectively activates TrkB receptors. On the other hand, proBDNF preferentially activates p75NTR receptors. The present paper never mentions p75NTR, which is a significant omission, since other investigators have demonstrated that p75NTR controls odontoblast differentiation.
  
  Dear reviewer, thank you for your noticing the error.
  
  Pro-BDNF is produced as a 32-kDa precursor that undergoes N-glycosylation and glycosulfation on residues located within the pro-domain of the precursor. N-terminal cleavage of the precursor generates mature BDNF as well as a minor truncated form of the precursor (28 kDa) that arises by a different processing mechanism than mature BDNF. The precursor undergoes N-terminal cleavage within the trans-Golgi network and/or immature secretory vesicles to generate mature BDNF (14 kDa).
  
  We checked our data and band size, and it shows a little mistake (Thank you for your keen observation and pointing out). The CRISPR protocol required verification of gene activation by checking pro-BDNF, as mentioned in the methodology. The labeling has been revised in the figure as pro-BDNF, and the actual blot with a ladder has been shown below for clarification.
  
  (5) In any case, no evidence is presented to support the conclusion that the artificially elevated BDNF expression has any effect on the capability of the dental pulp stem cells to promote dentin regeneration. The results shown in Figures 4 and 5 compare dentin regeneration with BDNF-over-expressing stem cells with results lacking any stem cell transplantation. A suitable control is required to allow any conclusion about the benefit of over-expressing BDNF.
  
  We have tested the presence of BDNF overexpressing cells by the higher expression of GFP here. Moreover, a significant increment in the dentin mineralization volume indicates the advantage of BDNF-over-expressing stem cells. Recently, we published the in vitro effects of BDNF/TrkB on DPSCs odontoblastic differentiation strongly supporting our in vivo data. Currently, we are in a difficult position to conduct the animal study within a short period of time. We would definitely consider using positive control in our future studies.
  
  Ref.: Kim, Ji Hyun, et al. (2025) "Ca 2+/calmodulin-dependent protein kinase II regulates the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling." Frontiers in Cell and Developmental Biology 13: 1558736.
  
  (6) Whether increased BDNF expression is beneficial or not, the evidence that the BDNF-overexpressing dental pulp stem cells promote dentin regeneration is somewhat weak. The data presented indicate that the cells increase dentin density by only 6%. The text and figure legend disagree on whether the p-value for this effect is 0.05 or 0.01. In either case, nowhere is the value of N for this statistic mentioned, leaving uncertainty about whether the effect is real.
  
  A significant increment in the dentin mineralization volume by about 7.76% indicates the advantage of BDNF-over-expressing stem cells, and we believe this could be a breakthrough to advance stem cell engineering and therapy further to get this percentage higher in the future. The text in the result section shows that the p-value for this effect is 0.05. While N was 3 previously, we analyzed two more samples by CT scan and revised results, taking N = 5, which improved the results a little more to about 8.53%. Thank you for noticing; the figure legend has been corrected to 0.05.
  
  Similarly, our in vitro data in the current study supports the notion that it adds up to mineralization and odontoblastic differentiation. We recently published that BDNF/TrkB significantly enhances calcium deposits and mineralization using a battery of in vitro experiments.
  
  Ref.: Kim, Ji Hyun, et al. (2025) "Ca 2+/calmodulin-dependent protein kinase II regulates the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling." Frontiers in Cell and Developmental Biology 13: 1558736.
  
  (7) The final set of experiments applies transcriptomic analysis to address the mechanisms mediating function differences in dental pulp stem cell behavior. Unfortunately, while the Abstract indicates " we conducted transcriptomic profiling of TNFα-treated DPSCs, both with and without TrkB antagonist CTX-B" that does not describe the experiment described, which compared the transcriptome of control cells with cells simultaneously exposed to TNF-alpha and CTX-B. Since CTX-B blocks the functional response of cells to TNF-alpha, I don't understand how any useful interpretation can be attached to the data without controls for the effect of TNF alone and CTX-B alone.
  
  Dear reviewer, yes, we did it alone and together as well. Earlier, we showed only the combined results and mentioned the interaction between TNFα and TrkB. We have included the results from TNFα alone and combined them with CTX-B for better comparison (Please refer to Figure 8). Figure 8C1 clearly shows the reversal of certain factors with the treatment of TrkB inhibitor compared to figure 8C with TNFα alone treated group.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, the authors investigate the potential for overexpressing BDNF in dental pulp stem cells to enhance dentin regeneration. They suggest that in the inflammatory environment of injured teeth, there is increased signaling of TrkB in response to elevated levels of inflammatory molecules.
  
  Strengths:
  
  The potential application to dentin regeneration is interesting.
  
  Weaknesses:
  
  There are a number of concerns with this manuscript to be addressed.
  
  Thank you for your compliments, keen observation, and evaluation, which helped us significantly improve our manuscript. We have addressed the concerns and comments point by point in detail and substantially revised the manuscript and Figures. We hope that the manuscript is acceptable in the current improvised version.
  
  Detailed response to your comments/concerns is as follows:
  
  (1) Insufficient citation of the literature. There is a vast literature on BDNF-TrkB regulating survival, development, and function of neurons, yet there is only one citation (Zhang et al 2012) which is on Alzheimer's disease.
  
  More references have been cited accordingly.
  
  (2) There are several incorrect statements. For example, in the introduction (line 80) TrkA is not a BDNF receptor.
  
  Thank you for noticing the typo; the sentence has been corrected.
  
  (3) Most important - Specific antibodies must be identified by their RRID numbers. To state that "Various antibodies were procured:... from BioLegend" is unacceptable, and calls into question the entire analysis. Specifically, their Western blot in Figure 4B indicates a band at 28 kDa that they say is BDNF, however the size of BDNF is 14 kDa, and the size of proBDNF is 32 and 37 kDa, therefore it is not clear what they are indicating at 28 kDa. The validation is critical to their analysis of BDNF-expressing cells.
  
  Dear reviewer, thank you for your kind concern. Sorry for the inconvenience; we have added RRID numbers of antibodies.
  
  Pro-BDNF is produced as a 32-kDa precursor that undergoes N-glycosylation and glycosulfation on residues located within the pro-domain of the precursor. N-terminal cleavage of the precursor generates mature BDNF as well as a minor truncated form of the precursor (28 kDa) that arises by a different processing mechanism than mature BDNF. The precursor undergoes N-terminal cleavage within the trans-Golgi network and/or immature secretory vesicles to generate mature BDNF (14 kDa).
  
  We checked our data and band size, and it shows a mistake in recognizing ladder size. It is actually a 14kDa band which has been shown. The labeling has been revised in the figure, and the actual blot with a ladder has been shown below for clarification. Similarly, our data focused on the fact that the observed cellular effects are more consistent with BDNF/TrkB-mediated pathways, which are known to promote survival and differentiation.
  
  (4) Figure 2 indicates increased expression of TrkB and TrkA, as well as their phosphorylated forms in response to inflammatory stimuli. Do these treatments elicit increased secretion of the ligands for these receptors, BDNF and NGF, respectively, to activate their phosphorylation? Or are they suggesting that the inflammatory molecules directly activate the Trk receptors? If so, further validation is necessary to demonstrate that.
  
  Thank you for your kind concern. TNF-α increases the number of TrkB receptors. The enhanced TrkB activation may result from a greater number of receptors and/or increased activation of individual receptors. In either case, inflammatory agents enhance the TrkB receptor signaling pathway.
  
  Recently, we have demonstrated the effect and interaction of TNF-alpha and Ca2+/calmodulin-dependent protein kinase II on the regulation of the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling using TrkB inhibitor (Ref. below, and Figure 9). For now, we have added figure 9 for the proposed mechanism of action based on our recent and current study.
  
  Ref.: Kim, Ji Hyun, et al. (2025) "Ca 2+/calmodulin-dependent protein kinase II regulates the inflammatory hDPSCs dentino-differentiation via BDNF/TrkB receptor signaling." Frontiers in Cell and Developmental Biology 13: 1558736.
  
  (5) Figure 7 - RNA-Seq data, what is the rationale for treatment with TNF+ CTX-B? How does this identify any role for TrkB signaling? They never define their abbreviations, but if CTX-B refers to cholera toxin subunit B, which is what it usually refers to, then it is certainly not a TrkB antagonist.
  
  Thank you for your concern. Cyclotraxin-B (CTX-B) is a TrkB antagonist (mentioned in the revised manuscript). In order to identify the underlying mechanism, we ought to locate certain transcriptional factors interacting with the TrkB/BDNF signaling, leading to differentiation and dentinogenesis. Therefore, we treated it with a TrkB inhibitor.
  
  Earlier, we showed only the combined results and mentioned the interaction between TNFα and TrkB. We have included the results from TNFα alone and combined them with CTX-B for better comparison (Please refer to Figure 8). Figure 8C1 clearly shows the reversal of certain factors with the treatment of TrkB inhibitor compared to figure 8C with TNFα alone treated group. We agree that the precise role of CTX-B in modulating TrkB signaling requires further clarification and have now included this point in the revised discussion while we are currently working on this aspect.
  
  Reviewer #3 (Public review):
  
  In general, although the authors interpret their results as pointing towards a possible role of BDNF in dentin regeneration, the results are over-interpreted due to the lack of proper controls and focus on TrkB expression, but not its isoforms in inflammatory processes. Surprisingly, the authors do not study the possible role of p75 in this process, which could be one of the mechanisms intervening under inflammatory conditions.
  
  Thank you for your compliments, keen observation, and evaluation, which helped us significantly improve our manuscript. We have addressed the concerns and comments point by point in detail and substantially revised the manuscript and Figures. We hope that the manuscript is acceptable in the current improvised version.
  
  Detailed response to your comments/concerns is as follows:
  
  (1) The authors claim that there are two Trk receptors for BDNF, TrkA and TrkB. To date, I am unaware of any evidence that BDNF binds to TrkA to activate it. It is true that two receptors have been described in the literature, TrkB and p75 or NGFR, but the latter is not TrkA despite its name and capacity to bind NGF along with other neurotrophins. It is crucial for the authors to provide a reference stating that TrkA is a receptor for BDNF or, alternatively, to correct this paragraph.
  
  Dear reviewer, we apologize for the inconvenience; it was an error. BDNF binds to TrkB, and the sentence has been corrected.
  
  (2) The authors discuss BDNF/TrkB in inflammation. Is there any possibility of p75 involvement in this process?
  
  Mature BDNF binds to the high-affinity receptor tyrosine kinase B (TrkB), activating signaling cascades, while pro-BDNF binds to the p75 neurotrophin receptor (p75NTR). So, we don’t think there’s a possibility, as our data shows mature BDNF production. Here, we initially screened the TrkA and TrkB involvement in dentinogenesis and chose to work with BDNF and its receptor TrkB. Future studies can be directed to elucidate its mechanism of action in the context of dentinogenesis.
  
  (3) The authors present immunofluorescence (IF) images against TrkB and pTrkB in the first figure. While they mention in the materials and methods section that these antibodies were generated for this study, there is no proof of their specificity. It should be noted that most commercial antibodies labeled as anti-TrkB recognize the extracellular domain of all TrkB isoforms. There are indications in the literature that pathological and excitotoxic conditions change the expression levels of TrkB-Fl and TrkB-T1. Therefore, it is necessary to demonstrate which isoform of TrkB the authors are showing as increased under their conditions. Similarly, it is essential to prove that the new anti-p-TrkB antibody is specific to this Trk receptor and, unlike other commercial antibodies, does not act as an anti-phospho-pan-Trk antibody.
  
  Thank you for your kind concern.
  
  Human TrkB has 7 isoforms and predicted Mw ranges from 35 to 93kDa. It has 11 potential N-glycosylation sites. The given antibody (isotype: Mouse IgG2a, κ) has been shown to interact with SHC1, PLCG1 and/or PLCG2, SH2B1 and SH2B2, NGFR, SH2D1A, SQSTM1 and KIDINS220, FRS2.
  
  And, sorry for the misunderstanding and text mistake. We procured all the antibodies from the market using proven products, and didn’t check any specific isoform. We have mentioned the details of antibodies and reagents in the chemicals section of the methodology.
  
  (4) I believe this initial conclusion could be significantly strengthened, without opening up other interpretations of the results, by demonstrating the specificity of the antibodies via Western blot (WB), both in the presence and absence of BDNF and other neurotrophins, NGF, and NT-3. Additionally, using WB could help reinforce the quantification of fluorescence intensity presented by the authors in Figure 1. It's worth noting that the authors fixed the cells with 4% PFA for 2 hours, which can significantly increase cellular autofluorescence due to the extended fixation time, favoring PFA autofluorescence. They have not performed negative controls without primary antibodies to determine the level of autofluorescence and nonspecific background. Nor have they indicated optimizing the concentration of primary antibodies to find the optimal point where the signal is strong without a significant increase in background. The authors also do not mention using reference markers to normalize specific fluorescence or indicating that they normalized fluorescence intensity against a standard control, which can indeed be done using specific signal quantification techniques in immunocytochemistry with a slide graded in black-and-white intensity controls. From my experience, I recommend caution with interpretations from fluorescence quantification assays without considering the aforementioned controls.
  
  Thank you for your insightful comments. We have now included a negative control image in the revised Figures. This control confirms that the observed fluorescence signal is specific and not due to autofluorescence or nonspecific background. In our lab, we have been using these antibodies and already optimized the concentration to use in certain cell types. Additionally, we followed the manufacturer’s recommended antibody concentration and protocol throughout our experiments to ensure an optimal signal-to-noise ratio.
  
  We agree that extended fixation with 4% PFA may increase autofluorescence; however, including negative controls helps account for this effect. We also ensured consistent imaging parameters and applied the same exposure settings across all samples to allow for a valid comparison of fluorescence intensity. We appreciate your emphasis on careful quantification and have clarified these methodological details in the revised Methods section.
  
  (5) In Figure 2, the authors determine the expression levels of TrkA and TrkB using qPCR. Although they specify the primers used for GAPDH as a control in materials and methods, they do not indicate which primers they used to detect TrkA and TrkB transcripts, which is essential for determining which isoform of these receptors they are detecting under different stimulations. Similarly, I recommend following the MIQE guidelines (Minimum Information for Publication of Quantitative Real-Time PCR experiments), so they should indicate the amplification efficiency of their primers, the use of negative and positive controls to validate both the primer concentration used, and the reaction, the use of several stable reference genes, not just one.
  
  We appreciate the reviewer’s suggestion regarding the specificity of primers and the amplification efficiency. In response, we have now included the primer sequences used for detecting TrkA and TrkB transcripts in the revised Materials and Methods section (Quantitative real-time PCR analysis of odontogenic differentiation marker gene expression in dental pulp stem cells). This ensures clarity on which isoforms of these receptors were assessed under different conditions. We also acknowledge the importance of following MIQE guidelines, and we got the primer provided by Integrated DNA Technologies with standard desalting purification and guaranteed yield.
  
  (6) Moreover, the authors claim they are using the same amounts of cDNA for qPCRs since they have quantified the amounts using a Nanodrop. Given that dNTPs are used during cDNA synthesis, and high levels remain after cDNA synthesis from mRNA, it is not possible to accurately measure cDNA levels without first cleaning it from the residual dNTPs. Therefore, I recommend that the authors clarify this point to determine how they actually performed the qPCRs. I also recommend using two other reference genes like 18S and TATA Binding Protein alongside GAPDH, calculating the geometric mean of the three to correctly apply the 2^-ΔΔCt formula.
  
  Thank you for your kind concern. We agree that residual dNTPs from cDNA synthesis could impact the accuracy of cDNA quantification. To address this, we have used the commercially available and guaranteed kit. The kit used is mentioned in Materials and Methods. We will definitely consider using 18S and TATA Binding Protein alongside GAPDH in our future studies. For now, we request you consider the results generated against GAPDH control.
  
  (7) Similarly, given that the newly generated antibodies have not been validated, I recommend introducing appropriate controls for the validation of in-cell Western assays.
  
  We apologize for the text mistake. Antibodies were procured commercially and not generated. We have corrected the sentence.
  
  (8) The authors' conclusion that TrkB levels are minimal (Figure 2E) raises questions about what they are actually detecting in the previous experiments might not be the TrkB-Fl form. Therefore, it is essential to demonstrate beyond any doubt that both the antibodies used to detect TrkB and the primers used for qPCR are correct, and in the latter case, specify at which cycle (Ct) the basal detection of TrkB transcripts occurs. Treatment with TNF-alpha for 14 days could lead to increased cell proliferation or differentiation, potentially increasing overall TrkB transcript levels due to the number of cells in culture, not necessarily an increase in TrkB transcripts per cell.
  
  Thank you for your comments. We appreciate your kind concerns. Here, we are trying to demonstrate that TrkB gets activated in inflammatory conditions. We have also provided the details on primers and antibodies. We have used commercial antibodies and qPCR primers, and they have been extensively validated with previous publications. The efficiency and validation of qPCR primers were provided by a company.
  
  Moreover, we used the minimal concentration of TNF-alpha twice a week, and before using it, we did preliminary experiments to determine whether it affected any experimental condition.
  
  (9) Overall, there are reasonable doubts about whether the authors are actually detecting TrkB in the first three images, as well as the phosphorylation levels and localization of this receptor in the cells. For example, in Figure 3 A to J, it is not clear where TrkB is expressed, necessitating better resolution images and a magnified image to show in which cellular structure TrkB is expressed.
  
  Thank you for your comment. Here, we aimed to show the expression of TrkB receptors in inflamed/infected pulp, especially in minority-distributed DPSCs. TrkB is present on the cell membrane and perinuclear region. We have provided a single-cell (magnified) image in the figure for better clarification.
  
  (10) In Figure 4, the authors indicate they have generated cells overexpressing BDNF after recombination using CRISPR technology. However, the WB they show in Figure 4B, performed under denaturing conditions, displays a band at approximately 28kDa. This WB is absolutely incorrect with all published data on BDNF detection via this technique. I believe the authors should demonstrate BDNF presence by showing a WB with appropriate controls and BDNF appearing at 14kDa to assume they are indeed detecting BDNF and that the cells are producing and secreting it. What antibodies have been used by the authors to detect BDNF? Have the authors validated it? There are some studies reporting the lack of specificity of certain commercial BDNF antibodies, therefore it is necessary to show that the authors are convincingly detecting BDNF.
  
  Dear reviewer, thank you for your kind concern. Firstly, we apologize for the inconvenience.
  
  Pro-BDNF is produced as a 32-kDa precursor that undergoes N-glycosylation and glycosulfation on residues located within the pro-domain of the precursor. N-terminal cleavage of the precursor generates mature BDNF and a minor truncated form of the precursor (28 kDa) that arises by a different processing mechanism than mature BDNF. The precursor undergoes N-terminal cleavage within the trans-Golgi network and/or immature secretory vesicles to generate mature BDNF (14 kDa).
  
  We checked our data and band size, and it shows a mistake in recognizing ladder size. It is actually a 14kDa band which has been shown. The labeling has been revised in the figure, and the actual blot with a ladder has been shown below for clarification. Similarly, our data focused on the fact that the observed cellular effects are more consistent with BDNF/TrkB-mediated pathways, which are known to promote survival and differentiation.
  
  (11) While the RNA sequencing data indicate changes in gene expression in cells treated with TNFalpha+CTX-B compared to control, the authors do not show a direct relationship between these genetic modifications with the rest of their manuscript's argument. I believe the results from these RNA sequencing assays should be put into the context of BDNF and TrkB, indicating which genes in this signaling pathway are or are not regulated, and their importance in this context.
  
  Thank you for your concern. In order to identify the underlying mechanism, we ought to locate certain transcriptional factors interacting with the TrkB/BDNF signaling, leading to differentiation and dentinogenesis. Therefore, we treated it with a TrkB inhibitor.
  
  Earlier, we showed only the combined results and mentioned the interaction between TNFα and TrkB. We have included the results from TNFα alone and combined them with CTX-B for better comparison (Please refer to Figure 8). Figure 8C1 clearly shows the reversal of certain factors with the treatment of TrkB inhibitor compared to figure 8C with TNFα alone treated group. We agree that the precise role of CTX-B in modulating TrkB signaling requires further clarification. We have now included this point in the revised discussion while working on this aspect. In a parallel study, we are trying to dig deep, especially the TCF family, as they have been documented to interact indirectly with BDNF and TrkB.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Some minor textual issues
  
  Line 120: It is obvious that TNFα stimulation caused significant phosphorylation of TrkB (p < 0.01) compared to TrkA (p < 0.05).
  
  Thank you for noticing the typo. The sentence has been corrected.
  
  The authors should consider rewording this sentence - I do not understand the intended meaning.
  
  Line 126: pronounced peak at 10 ng/mL. I am not convinced there is a peak. Looks like a plateau to me. To call it a peak one would have to show that the values at 10 ng/ml and 20 ng/ml are statistically different.
  
  We meant here the peak compared to 0.1 and 1ng/mL concentration and not compared to 20 ng/mL. The sentence has been elaborated accordingly.
  
  Reviewer #3 (Recommendations for the authors):
  
  The authors should show how they have validated the specificity of all the used antibodies as well as the efficiency and specificity of their qPCR data.
  
  We procured the commercially available antibodies (all of them have been extensively validated with previous publications) and also performed negative controls (provided in revised figures). We frequently used Western blot and validate it with band size. Primer sequences are also provided in the revised manuscript. We checked its specificity with R<sup>2</sup> of Standard Curve ≥ 0.98 and the single peak of melting curves. We edited accordingly in line 263.
  
  Once again, we thank all of you for your efforts in evaluating our study. It really helped us improve the quality of the manuscript. We hope all the queries have been answered and the revised manuscript is acceptable.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.11.627879v2
www.biorxiv.org www.biorxiv.org

New submission 11/02/2024, 11:59:28

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We are thankful for the handling of our manuscript. The following is a summary of our response and what we have done:
  
  (1) We are most thankful for the very thorough evaluation of our manuscript.
  
  (2) We were a bit shocked by the very negative commentary of referee 2.
  
  (3) We think, what put referee 2 off so much is that we were overconfident in the strength of our conclusions. We consider such overconfidence a big mistake. We have revised the manuscript to fix this problem.
  
  (4) We respond in great depth to all criticism and also go into technicalities.
  
  (5) We consider the possibility of a mistake. Yet, we carefully weighed the evidence advanced by referee 2 and by us and found that a systematic review supports our conclusions. Hence, we also resist the various attempts to crush our paper.
  
  (6) We added evidence (peripherin-antibody staining; our novel Figure 2) that suggests we correctly identified the inferior olive.
  
  (7) The eLife format – in which critical commentary is published along with the paper – is a fantastic venue to publish, what appears to be a surprisingly controversial issue.
  
  eLife assessment
  
  This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.
  
  Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the brain before. (iv) We show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.
  
  Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in our novel Figure 2, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibers in what the referee thinks is the inferior olive.
  
  Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.
  
  Reviewer #1 :
  
  Summary:
  
  This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist.
  
  Comment: We are incredibly grateful for this positive assessment.
  
  Changes: None.
  
  Strengths:
  
  - The authors made excellent use of the precious sample materials from 9 captive elephants.
  
  - The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".
  
  - Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers.
  
  Comment: The referee provides a concise summary of our findings.
  
  Changes: None.
  
  Weaknesses:
  
  - As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques).
  
  - The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species?
  
  Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 5 (our former Figure 4) for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 7, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.
  
  Changes: We clarify the relation of myelin stripe and trunk fold patterns in our description of Figure 7.
  
  Reviewer #2 (Public Review):
  
  The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.
  
  Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.
  
  Changes: None.
  
  The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported.
  
  Comment: We agree.
  
  Changes: None.
  
  The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.
  
  Comment & Change: We were not aware of the papers of Verhaart and included them in the revised manusript.
  
  Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.
  
  Comment: We have three comments here:
  
  (1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.
  
  (2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 7A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.
  
  (3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Figure 2). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.
  
  Changes: None.
  
  Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2).
  
  Comment: We have two comments here:
  
  (1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.
  
  (2) In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely to be correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Referee Table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.
  
  Changes: Inclusion of Referee Table 1, which we discuss in depth below.
  
  The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159. Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007. The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".
  
  Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.
  
  Changes: We prepared a Author response image 1 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.
  
  Author response image 1.
  
  The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.
  
  But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem.
  
  (1) Intense cytochrome oxidase reactivity.
  
  (2) Large size of the putative trunk module.
  
  (3) Elongation of the putative trunk module.
  
  (4) The arrangement of these putative modules corresponds to elephant head
  
  anatomy.
  
  (5) Myelin stripes within the putative trunk module that apparently match trunk folds. <br /> (6) Location apparently matches other mammals.
  
  (7) Repetitive modular organization apparently similar to other mammals. <br /> (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.
  
  Comment: We agree those are key issues.
  
  Changes: None.
  
  Let's examine these justifications more closely.
  
  (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported.
  
  Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.
  
  Changes: (1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in the manuscript and in our response to the general recommendations.
  
  (2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.
  
  (3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.
  
  Author response image 2.
  
  Cytochrome-oxidase staining overview. Coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C).
  
  The same structures can be recognized in Author response image 2 and Supplememntary figure 36 of Maseko et al. (2013). The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.
  
  At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.
  
  Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions.
  
  Comment: These are key points of our paper that the referee does not discuss.
  
  Changes: None.
  
  (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.
  
  Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.
  
  Changes: None.
  
  (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported.
  
  Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.
  
  Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.
  
  (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.
  
  Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.
  
  Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1).
  
  (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported.
  
  Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.
  
  Changes: None.
  
  Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.
  
  Comment: We disagree. To summarize:
  
  (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.
  
  (2)–(5)The referee does not really discuss our evidence on these points.
  
  (6) We were wrong and have now fixed this mistake.
  
  (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Referee Table 1).
  
  (8) The referee bends the comparative evidence against us.
  
  Changes: None.
  
  A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.
  
  To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.
  
  We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).
  
  Author response image 3 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).
  
  Author response image 3.
  
  Overview of the brainstem organization in rodents & elephants
  
  The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 3). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.
  
  Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 3) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Author response image 2). Our anti-peripherin staining (the novel Figure 2 of our manuscript) indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.
  
  The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.
  
  The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (former Figure 4, now Figure 5) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted.
  
  A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.
  
  We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.
  
  Author response table 1
  
  Qualitative evaluation of elephant brainstem partitioning schemes
  
  ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive
  
  We scored features that are clear and shared by all mammals – as far as we know them – as very attractive.
  
  We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive.
  
  Attractive features are either less clear or less well-shared features.
  
  Unattractive features are either less clear or less clearly not shared features.
  
  Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.
  
  What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors.
  
  Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.
  
  Changes: Ad hoc tract tracing see above.
  
  So what are these "bumps" in the elephant brainstem?
  
  Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?
  
  The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.
  
  Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.
  
  Changes: Please see our discussion above.
  
  Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals?
  
  Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.
  
  Changes: Please see our discussion above.
  
  What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature.
  
  Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (our novel Figure 2). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.
  
  Changes: Please see our discussion above.
  
  What do the authors actually have?
  
  The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.
  
  Comment: The referee reiterates their views.
  
  Changes: None.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.
  
  Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.
  
  Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.
  
  Strengths:
  
  The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.
  
  Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.
  
  Changes: None.
  
  Weaknesses:
  
  While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.
  
  Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.
  
  Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.
  
  Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.
  
  Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.
  
  Changes: We hope to publish comparative data on elephant brain size and shape later this year.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  I realize that elephant brains are a limiting resource in this project, along with the ability to perform functional investigations. However, I believe that Prof. Jon Kaas (Vanderbilt University) has one or more series of Nissl-stained brainstems from elephants. These might be of potential interest, as they were previously used to explore general patterns of trigeminal brainstem organization in a comparative manner (see Sawyer and Sarko, 2017, "Comparative Anatomy and Evolution of the Somatosensory Brain Stem" in the Evolution of Nervous System series) and might shed light on the positioning of the trigeminal complex and IO, with parts of the trigeminal nerve itself still attached to these sections.
  
  Comment: The referee suggests adding data from more elephants and we think this is a great suggestion because our ns are small. We followed this advice. We agree we need more comparative neuroanatomy of elephants and the urgency of this matter is palpable in the heated debate we have with Referee 2. Specifically, we need more long-range and short-range analysis of elephant brains.
  
  Changes: We plan to include data in the revised manuscript about cytoarchitectonics (Nissl), cytochrome-oxidase reactivity, and possibly also antibody reactivity from an additional animal, i.e., from the African elephant cow Bibi. The quality of this specimen is excellent and the post-mortem time to brain extraction was very short.
  
  We also have further plans for connectivity analysis (see our response above), but such data will not become available fast enough for the revision.
  
  Other recommendations:
  
  - A general schematic showing input from trunk to PrV to the trigeminal subnuclei (as well as possibly ascending connections) might be informative to the reader, in terms of showing which neural relay is being examined.
  
  Comment: We think this is a very good suggestion in principle, but we were not satisfied with the schematics we came up with.
  
  Changes: None.
  
  - Perhaps a few more sentences described the significance of synchrotron tomography for those who may be unfamiliar.
  
  Comment & Change: We agree and implement this suggestion.
  
  - "Belly-shaped" trunk module description is unclear on page 9.
  
  Comment & Change: We clarified this matter.
  
  - Typo on the last sentence of page 9.
  
  Comment & Change: We fixed this mistake.
  
  Reviewer #2 (Recommendations For The Authors):
  
  The data is only appropriate a specialized journal and is limited to the Golgi analysis of neurons within the inferior olivary complex of the elephant. This reviewer considers that the remainder of the work is speculation and that the paper in its current version is not salvageable.
  
  Comment: Rather than suggesting changes, the referee makes it clear that the referee does not want to see our paper published. We think this desire to reject is not rooted in a lack of quality of our work. In fact, we did an immense amount of work (detailed cytoarchitectonic analysis of six (now seven) elephant brainstems rather than one as in the case of our predecessors), cell counts, and X-ray tomography. Instead, we think the problem is rooted in the fact that we contradict the referee. To us, such suppression of diverging opinions – provided they are backed up with data – is a scientifically deeply unhealthy attitude. Science lives from the debate and this is why we did not exclude any referees even though we knew that our results do not align with the views of all of the few actors in the field.
  
  Changes: We think the novel eLife publishing scheme was developed to prevent such abuse. We look forward to having our data published along with the harsh comments of the referee. The readers and subsequent scientific work will determine who’s right and who’s wrong.
  
  In order to convince readers of the grand changes to the organization of the brainstem in a species suggested by the authors the data presented needs to be supported. It is not.
  
  Comment: Again, this looks to us like more of the ‘total-rejection-commentary’ than like an actual recommendation.
  
  Changes: None.
  
  The protocol for the cytochrome oxidase histochemistry is not available in the locations indicated by the authors, and it is very necessary to provide this, as I fully believe that the staining obtained is not real, given the state of the tissue used.
  
  Comment: We apologize again for not including the necessary details on our cytochrome-oxidase staining.
  
  From these comments (and the initial comments above) it appears that the referee is uncertain about the validity of cytochrome-oxidase staining. We (M.B., the senior author) have been doing this particular stain for approximately three decades. The referee being unfamiliar with cytochrome-oxidase staining is fine, but we can’t comprehend how the referee then comes to the ‘full belief’ that our staining patterns are ‘not real’ when the visual evidence indicates the opposite. We feel the referee does not want to believe our data.
  
  From hundreds of permutations, we can assure the referee that cytochrome-oxidase staining can go wrong in many ways. The most common failure outcome in elephants is a uniform light brown stain after hours or days of the cytochrome-oxidase reaction. This outcome is closely associated with long ≥2 days post-mortem/fixation times and reflects the quenching of cytochrome-oxidases by fixation. Interestingly, cytochrome-oxidase staining in elephant brains is distinctly more sensitive to quenching by fixation than cytochrome-oxidase staining in rodent brains. Another, more rare failure of cytochrome-oxidase staining comes as entirely white or barely colored sections; this outcome is usually associated with a bad reagent (most commonly old DAB, but occasionally also old or bad catalase, in case you are using a staining protocol with catalase). Another nasty cytochrome-oxidase staining outcome is smeary all-black sections. In this case, a black precipitate sticks to sections and screws up the staining (filtering and more gradual heating of the staining solution usually solve this problem). Thus, you can get uniformly white, uniformly light brown, and smeary black sections as cytochrome-oxidase staining failures. What you never get from cytochrome-oxidase staining as an artifact are sections with a strong brown to lighter brown differential contrast. All sections with strong brown to lighter brown differential contrast (staining successes) show one and the same staining pattern in a given brain area, i.e., brownish barrels in the rodent cortex, brownish barrelettes (trigeminal nuclei) in the rodent brainstem, brownish putative trunk modules/inferior olives (if we believe the referee) in the elephant brainstem. Cytochrome-oxidase reactivity is in this regard remarkably different from antibody staining. In antibody staining you can get all kinds of interesting differential contrast staining patterns, which mean nothing. Such differential contrast artifacts in antibody staining arise as a result of insufficient primary antibody specificity, the secondary antibody binding non-specifically, and of what have you not reasons. The reason that the brown differential contrast of cytochrome-oxidase reaction is pretty much fool-proof, relates to the histochemical staining mechanism, which is based on the supply of specific substrates to a universal mitochondrial enzyme. The ability to reveal mitochondrial metabolism and the universal and ‘fool-proof’ staining qualities make the cytochrome-oxidase reactivity a fantastic tool for comparative neuroscience, where you always struggle with insufficient information about antigen reactivity.
  
  We also note that the contrast of cytochrome-oxidase reactivity seen in the elephant brainstem is spectacular. As the Referee can see in our Figure 1C we observe a dark brown color in the putative trunk module, with the rest of the brain being close to white. Such striking cytochrome-oxidase reactivity contrast has been observed only very rarely in neuroanatomy: (i) In the rest of the elephant brain (brainstem, thalamus cortex) we did not observe as striking contrast as in the putative trunk module (the inferior olive according to the referee). (ii) In decades of work with rodents, we have rarely seen such differential activity. For example, cortical whisker-barrels (a classic CO-staining target) in rodents usually come out as dark brown against a light brown background.
  
  What all of this commentary means is that patterns revealed by differential cytochrome-oxidase staining in the elephant brain stem are real.
  
  Changes: We added details on our cytochrome-oxidase reactivity staining protocol and commented on cytochrome-oxidase reactivity in the elephant brain in general.
  
  The authors need to recognize that the work done in Africa on elephant brains is of high quality and should not be blithely dismissed by the authors - this stinks of past colonial "glory", especially as the primary author on these papers is an African female.
  
  Comment: The referee notes that we unfairly dismiss the work of African scientists and that our paper reflects a continuation of our horrific colonial past because we contradict the work of an African woman. We think such commentary is meant to be insulting and prefer to return to the scientific discourse. We are staunch supporters of diversity in science. It is simply untrue, that we do not acknowledge African scientists or the excellent work done in Africa on elephant brains. For example, we cite no less than four papers from the Manger group. We refer countless times in the manuscript to these papers, because these papers are highly relevant to our work. We indeed disagree with two anatomical assignments made by Maseko et al., 2013. Such differences should not be overrated, however. As we noted before, such differences relate to only 2 out of 147 anatomical assignments made by these authors. More generally, discussing and even contradicting papers is the appropriate way to acknowledge scientists. We already expressed we greatly admire the pioneering work of the Manger group. In our view, the perfusion of elephants in the field is a landmark experiment in comparative neuroanatomy. We closely work with colleagues in Africa and find them fantastic collaborators. When the referee is accusing us of contradicting the work of an African woman, the referee is unfairly and wrongly accusing us of attacking a scientist’s identity. More generally, we feel the discussion should focus on the data presented.
  
  Changes: None.
  
  In addition, perfusing elephants in the field with paraformaldehyde shortly after death is not a problem "partially solved" when it comes to collecting elephant tissue (n.b., with the right tools the brain of the elephant can be removed in under 2 hours). It means the problem IS solved. This is evidenced by the quality of the basic anatomical, immuno-, and Golgi-staining of the elephant tissue collected in Africa.
  
  Comment: This is not a recommendation. We repeat: In our view, the perfusion of elephants in the field by the Manger group is a landmark experiment in comparative neuroanatomy. Apart, from that, we think the referee got our ‘partially solved comment’ the wrong way. It is perhaps worthwhile to recall the context of this quote. We first describe the numerous limitations of our elephant material; admitting these limitations is about honesty. Then, we wanted to acknowledge previous authors who either paved the way for elephant neuroanatomy (Shoshani) or did a better job than we did (Manger; see the above landmark experiment). These citations were meant as an appreciation of our predecessors’ work and by far not meant to diminish their work. Why did we say that the problems of dealing with elephant material are only partially solved? Because elephant neuroanatomy is hard and the problems associated with it are by no means solved. Many previous studies rely on single specimen and our possibilities of accessing, removing, processing, and preserving elephant brains are limited and inferior to the conditions elsewhere. Doing a mouse brain is orders of magnitude easier than doing an elephant brain (because the problems of doing mouse anatomy are largely solved), yet it is hard to publish a paper with six elephant brains because the referees expect evidence at least half as good as what you get in mice.
  
  Changes: We replaced the ‘partially solved’ sentence.
  
  The authors need to give credit where credit is due - the elephant cerebellum is clearly at the core of controlling trunk movement, and as much as primary sensory and final stage motor processing is important, the complexity required for the neural programs needed to move the trunk either voluntarily or in response to stimuli, is being achieved by the cerebellum. The inferior olive is part of this circuit and is accordingly larger than one would expect.
  
  Comment: We think it is very much possible that the elephant cerebellum is important in trunk control.
  
  Changes: We added a reference to the elephant cerebellum in the introduction of our manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.15.567239v4
www.biorxiv.org www.biorxiv.org

New submission 07/07/2023, 08:59:19

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  MCM8 and MCM9 are paralogues of the eukaryotic MCM2-7 proteins. MCM2-7 form a heterohexameric complex to function as a replicative helicase while MCM8-9 form another hexameric helicase complex that may function in homologous recombination-mediated longtract gene conversion and/or break-induced replication. MCM2-7 complex is loaded during the low Cdk period by ORC, CDC6, and Cdt1, when the origin DNA may intrude into the central channel via the MCM2-MCM5 entry "gate". In the S phase, MCM2-7 complex is activated as CMG helicase with the help of CDC45 and GINS complex. On the other hand, it still remains unclear how MCM8-9 complex is loaded onto DNA and then activated.
  
  In this study, the authors first investigated the cryo-EM structure of chicken MCM8-9 (gMCM89) complex. Based on the data obtained, they suggest that the observed gMCM8-9 structure might represent the structure of a loading state with possible DNA entry "gate". The authors further investigated the cryo-EM structure of human MCM8-9 (hMCM8-9) complex in the presence of the activator protein, HROB, and compared the structure with that obtained without HROB1, which the authors published previously. As a result, they suggest that MCM8-9 complex may change the conformation upon HROB binding, leading to helicase activation. Furthermore, based on the structural analyses, they identified some important residues and motifs in MCM8-9 complex, mutations of which actually impaired the MCM8-9 activity in vitro and in vivo.
  
  Overall, the data presented would support the authors' conclusions and would be of wide interest for those working in the fields of DNA replication and repair. One caveat is that most of the structural data are shown only as ribbon model without showing the density map data obtained by cryo-EM, which makes accurate evaluation of the data somewhat difficult.
  
  We thank the reviewer for the positive comments on our work. For evaluating all the structural data, in our revised manuscript, we have presented the density maps of the cryo-EM structures of the gMCM8/9 complex in supplementary figure S5 and S6. In addition, the 3D cryo-EM map of the gMCM8/9 complex and the hMCM8/9 NTD ring have been deposited to the EMDB database with accession number EMD-32346 and EMD-33989, respectively. The corresponding atomic models have been deposited at the RSCB PDB under the accession code 7W7P and 7YOX, respectively. All these data have been released in May 2023.
  
  Reviewer #2 (Public Review):
  
  MCM8 and MCM9 together form a hexameric DNA helicase that is involved in homologous recombination (HR) for repairing DNA double-strand breaks. The authors have previously reported on the winged-helix structure of the MCM8 (Zeng et al. BBRC, 2020) and the Nterminal structure of MCM8/9 hexametric complex (MCM8/9-NTD) (Li et al. Structure, 2021). This manuscript reports the structure of a near-complete MCM8/9 complex and the conformational change of MCM8/9-NTD in the presence of its binding protein, HROB, as well as the residues important for its helicase activity.
  
  The presented data might potentially explain how MCM8/9 works as a helicase. However, additional studies are required to conclude this point because the presented MCM8/9 structure is not a DNA-bound form and HROB is not visible in the presented structural data. Taking into these accounts, this work will be of interest to biologists studying DNA transactions.
  
  A strength of this paper is that the authors revealed the near-complete MCM8/9 structure with 3.66A and 5.21A for the NTD and CTD, respectively (Figure 1). Additionally, the authors discovered a conformational change in the MCM8/9-NTD when HROB was included (Figure 4) and a flexible nature of MCM8/9-CTD (Figure S6 and Movie 1).
  
  The biochemical data that demonstrate the significance of the Ob-hp motif and the N-C linker for DNA helicase activity require careful interpretation (Figures 5 and 6). To support the conclusion, the authors should show that the mutant proteins form the hexamer without problems. Otherwise, it is conceivable that the mutant proteins are flawed in complex formation. If that is the case, the authors cannot conclude that these motifs are vital for the helicase function.
  
  A weakness of this paper is that the authors have already reported the structure of MCM8/9NTD utilizing human proteins (Li et al. Structure, 2021). Although they succeeded in revealing the high-resolution structure of MCM8/9-NTD with the chicken proteins in this study, the two structures are extremely comparable (Figure S2), and the interaction surfaces seem to be the same (Figure 2).
  
  Another weakness of this paper is that the presented data cannot fully elucidate the mechanistic insights into how MCM8/9 functions as a helicase for two reasons. 1) The presented structures solely depict DNA unbound forms. It is critical to reveal the structure of a DNA-bound form. 2) The MCM8/9 activator, HROB, is not visible in the structural data. Even though HROB caused a conformational change in MCM8/9-NTD, it is critical to visualize the structure of an MCM8/9HROB complex.
  
  We appreciate the reviewer’s comments on our work. Regarding the first weakness mentioned above, the previously reported cryo-EM structure of hMCM8/9 NTD ring was achieved with a resolution of 6.6 Å. At this level of resolution, we were only able to observe the overall shape of the structure and a partial representation of the protein's secondary structure. It is hard for us to discern any specific details regarding the interaction interface between MCM8 and MCM9. In this study, we solved the structure of gMCM8/9 NTD ring with a resolution of 3.67 Å. We believe that the higher resolution of gMCM8/9 NTD structure provides a significant advantage in analyzing the interaction surface between MCM8 and MCM9. This improved resolution has enabled us to gain valuable insights into the assembly mechanism of the MCM8/9 hexamer, representing a significant step forward in our understanding of the MCM8/9 helicase complex. In response to the second weakness raised by the reviewer, we fully agree with the reviewer that high-resolution structures of the MCM8/9 complex with DNA or HROB are necessary to elucidate the mechanism of this helicase complex. We are actively working towards obtaining these complex structures using cryo-EM and X-ray crystal diffraction.
  
  Moreover, we would like to address the reviewer's concern regarding the mutant proteins used in the in vitro helicase assays. We have conducted additional experiments to confirm that these mutant proteins do not impair the formation of the MCM8/9 hexamer. Specifically, we performed size exclusion chromatography (SEC) analyses of the wild-type (WT) MCM8/9 complex, as well as MCM8 and MCM9 mutant proteins (Author response image 1). The results demonstrated that all the proteins behaved consistently and displayed similar SEC profiles during the purification process. Notably, the N-C linker deletion mutant (hMCM8_Δ369-377+MCM9_Δ283-287) combining the MCM8 and MCM9 N-C linker deletions also behaved similarly with WT MCM8/9 (Author response image 2). These findings strongly suggest that the mutations in the OB-hps regions and the N-C linkers do not disrupt the hexamer formation of the MCM8/9 complex. Author response image 1 and Author response image 2 have been included into the supplementary figure S8 and S11, respectively.
  
  Author response image 1.
  
  SEC profiles of WT and OB-hps mutants of MCM8/9 complex.
  
  Author response image 2.
  
  SEC profiles of WT and N-C linker mutant of MCM8/9 complex.
  
  Reviewer #1 (Recommendations For The Authors):
  
  I would like to provide some suggestions to improve the manuscript.
  
  1) Throughout the manuscript, more density map data obtained by the cryo-EM should be shown for accurate evaluation of the data. For example, in Figure 1C, the authors state that inner channel of the gMCM8-9 hexamer is ~28 angstrom, apparently based on the ribbon model. This is not appropriate because the space upon ribbon model is not same as that upon the density map. For Figure 1B, they state that "The domain structures of gMCM8-9 fit well into their electron map". If so, please show the actual docking data. Also for Figure 2, the docking presentation between the side chains in the ribbon model and the density map should be shown.
  
  We sincerely appreciate the reviewer for the constructive suggestions. In addition to releasing our structural data in the EMDB and PDB, we have also followed the reviewer’s suggestions to included more density map data in the supplementary material. In fact, when calculating the dimeter of the inner channel of the MCM8/9 hexamer, we also measured that upon the density map (Author response image 3. A and B), which is consistent with our report in our manuscript. To further evaluate the structure of MCM8/9, we have included additional docking structures based on the density map (Author response image 3. C-F). Moreover, for Figure 2, more docking presentation are provided and the key residues involved in the hydrophobic interactions were highlighted in a bold manner (Author response image 4). Author response image 3 and Author response image 4 have been included into the supplementary figure S5 and S6, respectively.
  
  Author response image 3.
  
  The cryo-EM structure of gMCM8/9. (A and B) Reconstructed cryo-EM map of gMCM8/9. The diameter of the inner channel of MCM8/9 was measured at ~28 Å. (C-F) Representative regions of the cryo-EM structure of gMCM8/9 NTD are shown based on their density map. C, chain A (MCM9); D, chain B (MCM8); E, chain C (MCM9); F, chain D (MCM8).
  
  Author response image 4.
  
  Representative regions of the cryo-EM structure of gMCM8/9 NTD. (A and B), the region mediated hydrophobic interaction in figure 2B. A (MCM8), B (MCM9). (C and D), the region mediated hydrophobic interaction in figure 2C. C (MCM8), D (MCM9). The key residues were in bold.
  
  2) Figures 4, 5, and 6: For helicase assay, more detailed experimental conditions (e.g. concentrations of DNA substrates and proteins used) should be presented. In addition, it should be described how Flag-hMCM8-9 complex (Figure 4C) was purified.
  
  We sincerely appreciate the constructive suggestion provided by the reviewer. In the revised manuscript, we have included more experimental details in the helicase assays, including the concentrations of DNA substrates and proteins. The following paragraph describes the updated experimental procedure and also provided in the revise version of the manuscript.
  
  Helicase assays: To prepare the substrate, the oligonucleotide (5'(dT)40GTTTTCCCAGTCACGACG-TTGTAAAACGACGGCCAGTGCC-3') containing a 40 nt region complementary to the M13mp18(+) stand and a 40 nt oligo-dT at the 5′ end was labeled at the 3′ terminus with [α-32P] dCTP (Perkin Elmer) and annealed to the single-stranded DNA M13mp18 (24). 0.1 nM (in molecules) DNA substrates were respectively mixed with 5 µg recombinant MCM8/9 complex and its mutants as indicated within each 15 µl volume reaction in the helicase buffer (25 mM HEPES, pH 7.5, 1 mM magnesium acetate, 25 mM sodium acetate, pH 5.2, 4 mM ATP, 0.1 mg/ml BSA, 1 mM DTT). 2.5 µg HROB was used as an activator. To avoid re-annealing, the reaction was supplemented with a 100-fold unlabeled oligonucleotide. The reactions were then incubated at 37 °C for 60 min and stopped by adding 1 µl of stop buffer (0.4% SDS, 30 mM EDTA, and 6% glycerol) and 1µl of proteinase K (20 mg/ml, Sigma) into the reaction for another 10 min incubation at 37 °C. The products were separated by 15% polyacrylamide gel electrophoresis in 1× TBE buffer and analyzed by the Amersham typhoon (Cytiva).
  
  In addition, to describe the expression of Flag-hMCM8/9 complex in Figure 4C, we have included the Pull-Down Assay in the “Material and Methods” section. The description is as follow: The HEK293T cells transfected with Flag-hMCM8/9-FL or Flag-hMCM8/9-NTD were cultured overnight and washed twice with cold phosphate-buffered saline (PBS). Cell pellets were resuspended with lysis buffer (20 mM Tris, pH7.5, 150 mM NaCl, 5mM EDTA, 0.5% NP-40, 10% glycerol, protease inhibitor cocktail (Roche, 04693132001)). After incubation for 45 min at 4°C with gentle agitation, the whole-cell lysates were collected by centrifugation (12,000 × g for 15 min, at 4 °C). GST beads coupled with 2 μg GST-HROB or GST alone were then incubated with an equal volume of above HEK293T cell lysates at 4°C for 4h. The beads were washed four times with lysis buffer. Proteins bound to the beads were separated by SDS–PAGE and subsequently immunoblotted with anti-Flag antibody (Cytiva).
  
  3) Figure 3C: This is just an assumed model. Please clearly state it in the manuscript.
  
  We appreciate the reviewer’s comment. We guess the reviewer is referring to Figure 5C. As Figure 3C depicts the top view of the gMCM8/9 hexamer structurally aligned with the MCM2-7 double hexamer (wheat) by aligning their respective C-tier ring. On the other hand, Figure 5C represents an assumed model where we docked a forked DNA fragment into the central channel of the gMCM8/9 hexamer. To address this assumed model, we have made the following clarification in the revised manuscript: “We artificially docked a forked DNA into the central channel to generate a gMCM8/9-DNA model and found that the OB-hps of gMCM8 are capable to closely contact with it and insert their highly positively charged terminal loops into the major or minor grooves of the DNA strand, implying that they could be involved in substrate DNA processing and/or unwinding (Figure 5C)”.
  
  4) Figure S1, C and D: The coloring of the gMCM8-9 CTD appears to show higher resolution than the NTD. May this be mispresentation?
  
  We appreciate the reviewer's valuable feedback, and we have thoroughly re-evaluated Figure S1C and D. At the beginning, the local resolution distributions of the gMCM8/9 NTD and gMCM8/9 CTD were calculated using CryoSPARC. Upon re-examination, we found that the density maps of the gMCM8/9 CTD may be lower than 3.66 Å, because the density map of the gMCM8/9 CTD does not reveal more structural details than what is observed in the gMCM8/9 NTD. Thus, although the map shown in Figure S1D may appear to show a greater distribution of high-resolution regions., we would like to clarify that this discrepancy could be attributed to an optical illusion. We thank the reviewer for bringing this to our attention.
  
  5) Figure S9: Is the "mean resolution" 5.21 angstrom identical to the Gold standard FSC? If not, please estimate the resolution using FSC, like other maps in this paper.
  
  We thank the reviewer for the constructive suggestion. In response to this feedback, we would like to clarify the resolution estimation process for the gMCM8/9 CTD. Initially, we calculated the resolution of the gMCM8/9 CTD using the gold standard Fourier shell correlation (FSC) method, which yielded a resolution of 3.66 Å. However, upon further analysis, we identified an issue with the GSFSC Resolution curves, which led to an overestimation of the resolution based on the density map of the gMCM8/9 CTD. To ensure a more reliable and accurate estimation, we employed the Phenix software package to calculate the mean resolution during the refinement process of the gMCM8/9 CTD structure. The calculated mean resolution was determined to be 5.21 Å, which aligns more reasonably with the characteristics of the density map. To address any potential misunderstandings and provide clarity, we have explicitly labeled and described the evaluation process for this mean resolution in the "Single particle data processing" section of the Materials and Methods.
  
  Minor points:
  
  1) Throughout the manuscript, there are several typographical and grammatical errors, which should be corrected. For example, in "Introduction", "GNIS complex" should be "GINS complex".
  
  We thank the reviewer for pointing out the typographical and grammatical errors. We have corrected the grammar errors and polished our manuscript with the help of native speakers.
  
  Reviewer #2 (Recommendations For The Authors):
  
  1) "During HR repair, MCM8/9 was rapidly recruited to the DNA damage sites and colocalized with the recombinase Rad51 (21). It also interacted with the nuclease complex MRN (MRE11RAD50-NBS1) and was required for DNA resection at DSBs to facilitate the HR repair (Introduction)."
  
  There is a debate about whether MCM8/9-HROB colocalizes with RAD51 and whether it works upstream or downstream of RAD51 (Park et al. MCB, 2013; Lee et al. Nat Commun., 2015; Lutzmann et al. Mol Cell, 2012; Nishimura et al. Mol Cell, 2012; Natsume et al. G&D, 2017; Hustedt et al. G&D, 2019; Huang et al. Nat Commun., 2020).
  
  We completely agree with the reviewer that previous studies have reported contradictory results regarding to the function of MCM8/9 in homologous recombination. Based on the structure information of MCM8/9, now we do not have direct evidence to resolve the ongoing debate. Nonetheless, based on our findings, we speculate that the MCM8/9 complex is likely involved in multiple steps within the process of homologous recombination. The structural insights provided by our study serve as a foundation for further investigations and may contribute to a better understanding of the complex and multifaceted roles of MCM8/9 in homologous recombination repair.
  
  2) I noted that the BioRxiv version 1 (https://www.biorxiv.org/content/10.1101/2022.01.26.477944v1?versioned=true) contains a near-complete MCM8/9 with human protein based on the crystal analysis. Because its structure is comparable to chicken MCM8/9 revealed by cryo-EM, I highly suggest including this data in the manuscript.
  
  We would like to thank the reviewer for this suggestion. The resolution of the hMCM8/9 crystal structure presented in our previous BioRxiv version is 6.6 Å, which is a little low. Moreover, it cannot provide more information than the present cryo-EM structures of MCM8/9. We are dedicated to optimizing the crystal quality and implementing strategies to enhance the resolution of the structure. We hope to present an improved crystal structure of hMCM8/9 in our forthcoming article.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.01.26.477944v3
www.biorxiv.org www.biorxiv.org

Genome-wide mapping of native co-localized G4s and R-loops in living cells

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife assessment
  
  This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.
  
  We thank the editors for their helpful comments.
  
  Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.
  
  In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.
  
  Strengths:
  
  HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.
  
  We appreciate your valuable points.
  
  Weaknesses:
  
  This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.
  
  In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.
  
  There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).
  
  The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.
  
  The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.
  
  In the revised version, we have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.
  
  The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.
  
  Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that F We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.
  
  This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.
  
  In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq. In addition, RNase treatment markedly abolished the HBD-seq signals and the non-labeled controls exhibit obviously reduction of HepG4-seq signals, demonstrating that HBD-seq and HepG4-seq were not contamination from tagmentation of asccessible DNA.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.
  
  The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.
  
  Strengths:
  
  (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.
  
  (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.
  
  (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.
  
  We appreciate your valuable points.
  
  Weaknesses:
  
  (1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.
  
  The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.
  
  (2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.
  
  Thank you for the suggestions. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.
  
  (3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.
  
  Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.
  
  (4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.
  
  Thank you for the suggestions. We have included the discussion in the revised version.
  
  (5) The manuscript lacks appropriate statistical analyses to support the major conclusions.
  
  We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We have carefully added details of all statistical analysis.
  
  (6) The discussion could be expanded to address potential limitations and alternative explanations for the results.
  
  Thank you for the suggestions. We have included the discussion about this point in the revised version.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.
  
  Strengths:
  
  By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.
  
  The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.
  
  Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.
  
  Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.
  
  In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.
  
  We appreciate your valuable points.
  
  Weaknesses:
  
  As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.
  
  Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.
  
  Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.
  
  Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Figures 1A&1G. Although no merge images were provided, it seems that the biotin signals are strongly enriched outside the nucleus. This suggests that hemin is not specific for G4s in DNA. Does it mean that Hemin can also recognise G4 on RNAs? How do the authors understand the cytoplasmic signal?
  
  Hemin indeed could interact with RNA G4 to obtain the peroxidase activity like DNA G4-hemin complex (PMID: 27422869, 32329781, 31257395). The cytoplasmic signals in Figure 1A&1G were derived from RNA G4.
  
  Figure 1A: The fact that there is no Alexa647 signal without hemin or Bio-An does not actually demonstrate that the signals are specific. These controls do not actually test for the specificity of the G4-Hemin interaction.
  
  The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In this study, we performed the IF to confirm this phenomena.
  
  Figure 1C: It looks like the HepG4-seq signals are simply an amplification of the noise given by the Tn5 (the non-label ctrl has the same pattern, albeit weaker). It is unclear why this happens but it might happen if somehow hemin increased the probability that the Tn5 is close to chromatin in an unspecific manner (it would cut G-rich, nucleosome-poor, accessible sites in an unspecific manner). To discard this possibility, it would be interesting to investigate directly which loci are biotinylated. For this, the authors could extract and sonicate the genomic DNA and use streptavidin to enrich for biotinylated fragments. Strand-specific DNA sequencing could then be used to map the biotinylated loci.
  
  In the cell culture medium, there were a certain amount of hemin from serum and a low dosage of biotin from the basal medium DMEM, which could not be avoid. Thus, these contaminated hemin and biotin would generate the background signals observed in the Non-label control samples. The biotinylated sites were specifically recognized by the recombinant Streptavidin monomer which further recruits Tn5 to the biotinylated sites with the help of Moon-tag. Different from the signals in the HEK293 samples, a much more robust HepG4-seq signals were observed in the mESC samples and the signals were also abolished in the non-label control samples. Thus, the relatively small signal-to-noise ratio in the HEK293 samples suggest the week abundance of endogenous G4s in the HEK293 cells. Thus, we politely disagree that hemin increased the non-specific recruitment of Th5. In addition, the CUT&Tag technology has been wildly demonstrated to have a much lower background, high signal-to-noise ratio and high sensitivity. Thus, we also politely disagree to replace the CUT&Tag with the traditional DNA library preparation method.
  
  Figure 1H: No spike-in was added and the data are not quantitative. The number of replicates is unclear. 70000 extra peaks (10x) after inhibition of BLM or WRN seems enormous. These extra peaks should be better characterised: do they contain G4 motifs? Are they transcribed? etc...; again what kind of controls should be used here, in case the inhibition of BLP and WRN has a global impact on chromatin accessibility?
  
  To quantitatively compare different samples, we have normalized all samples according their de-duplicated uniquely mapping reads numbers. Given that the inhibitors were dissolved in the DMSO, we used the DMSO as the control. Since the Tn5 were specifically recruited the biotinylated G4 sites through the recombinant Streptavidin monomer protein and the moon tag system, the chromatin accessibility will not affect the Tn5, which were normally observed in the ATAT-seq.
  
  As suggested, we have analyzed the enriched motifs of the extra peaks induced by BLM or WRN inhibition and showed that the top enriched motifs are also G-rich in the supplementary Fig.1E. In addition, we analyzed the RNA-seq levels of genes-associated with these extra peaks. As shown in the figure below, the majority of these genes are actively transcribed.
  
  Author response image 1.
  
  Figure 2: The mutated version of HBD should have been used as a control. As shown clearly in PMID: 37819055, the HBD domain does interact in an unspecific manner with chromatin at low levels. As above, this might be enough to increase the local concentration of the Tn5 close to chromatin in the Cut&Tag approach and to cleave accessible sites close to TSS in an unspecific manner.
  
  As shown in Fig.2B and Fig.4A, we have included the RNase treatment as the control and showed that the HBD-seq-identified R-loops signals are dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). These data demonstrate the specificity of HBD-seq.
  
  Figure 2: What fraction of the HEPG4-seq signal is sensitive to RNase treatment? The authors used a combination of RNase A and RNase H but previous data have shown that the RNase A treatment is sufficient to remove the HBD-seq signal (which means that it is not actually possible on this sole basis to claim or disclaim that the signals do correspond to genuine R-loops). Do the authors have evidence that the RNase H treatment alone does impact their HBD-seq or HEPG4-seq signals?
  
  As shown in Fig.2B and Fig.4A, the HBD-seq-identified R-loops signals are all dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). The specificity of HBD on recognizing R-loops has been carefully demonstrated in the previous study (PMID: 33597247). In this study, we used the same two copies of HBD (2xHBD) and replaced the GST tag to EGFP-V5 to reduce the possibility of variable high molecular-weight aggregates caused by GST tag. In addition, RNase H treatment has been shown to fail to completely abolish the CUT&Tag signals since a subset of DNA-RNA hybrids with high GC skew are partially resistant to RNase H (PMID: 32544226, 33597247). In consideration of the high GC skew of co-localized G4s and R-loops, we combined the RNase A and RNase H. We currently did not have the RNaseH alone samples.
  
  Figure 3A: "RNA-seq analysis revealed that the RNA levels of co-localized G4s and R-loops-associated genes are significantly higher": the differences are not very convincing.
  
  In the Figure 3A, we have performed the Mann-Whitney test to examine the significance in the revised manuscript. RNA levels of co-localized G4s and R-loops-associated genes are indeed significantly higher than all genes, G4s or R-loops- associated genes with the Mann-Whitney test p < 2.2E-16.
  
  Figure 3B: the patterns for "G4" and "co-localised G4 and R-loop" are extremely similar, suggesting that nearly all G4s mapped here could also form R-loops. If this is the case, most of the HEPG4-seq signals should be sensitive to exogenous RNase H treatment or to the in vivo over-expression of RNase H1. This should be tested (see above).
  
  The percentage of co-localized G4 and R-loop in G4 peaks is 80.3% ( 5,459 out of 6,799) in HEK293 cells and 72.0% (68,482 out of 95,128) in mESC cells, respectively. The co-localization does not mean that G4 and R-loop interact with each other. We have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test.
  
  Figure 3C: there is no correlation between the FC of G4 and the FC of RNA; this is not really consistent with the idea that the stabilisation of G4 is the driver rather than a consequence of the transcriptional changes.
  
  Given that the treatment of WRN or BLM inhibition induced a large mount of G4 accumulation (Fig.1H-I), we examined the transcription effect on genes associated with these accumulated G4s in Fig.3C. We indeed observed the effect of G4 accumulation on transcription of G4-associated genes. Given that G4 stabilization triggers the transcriptional changes, it does not mean that the transcriptional changes should be highly correlated with the increase levels of G4s. To our knowledge, we have not observed this type of connections in the previous studies.
  
  l279: the overlap with H3K4me1 is really not convincing.
  
  For all G4 peaks, the signals of H3K4me1 indeed exhibit a high background around the center of G4 peaks but we still could observe a clear peak in the center.
  
  Figure 5C: it should be clearly indicated here that the authors compare Cut&Tag and ChIP data. The origin of the ChIP-seq data is also unclear and should be indicated.
  
  Thank you for the suggestions. We have clarified this point.
  
  For the ChIP data, we have described the origin of ChIP-seq data in the “Data availability” section as below: “The ChIP-seq data of histone markers and RNAP are openly available in GNomEx database (accession number 44R) (Wamstad et al., 2012).”
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Figure 1A. An experimental condition lacking H2O2 (-H2O2) should be included.
  
  We have added this control in Fig.1A
  
  (2) Does RNAse H affect G4 profiles?
  
  We have not tested the effect of RNase H on G4 forming. However, we have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test on G4. In addition, to treat cells wit RNase H, we have to permeabilize cells first to let RNase H enter the nuclei. If so, we will lose the pictures of endogenous G4s.
  
  (3) Figure 2G. R-loops are detected upstream of the KPNB1 gene. What is this region? Is it transcribed?
  
  We are so sorry to make a mistake when we prepared this figure. We have change it to the correct one in Fig. 2G. The R-loop is around the TSS of KPNB1. We also showed the RNA-seq data in this region in Author response image 2 below. This region is indeed transcribed.
  
  Author response image 2.
  
  (4) Did BLM and WRN inhibition specifically affect the expression of genes containing colocalized G4s and R-loops? Was the effect seen in other genes as well? Appropriate statistical analyses are needed.
  
  In the Fig.3, we have shown that the accumulation of co-localized G4 and R-loops induced by the inhibition of BLM or WRN significantly caused the changes of genes (480 in BLM inhibition, 566 in WRN inhibition) containing these structures most of which are localized at the promoter-TSS regions. We indeed detected the effect in other genes as well. There were 918 and 1020 genes with significantly changes (padjust <0.05 & FC >=2 or FC <=0.5) in BLM and WRN inhibition, respectively.
  
  (5) The claim that "The co-localized G4s and R-loops-mediated transcriptional regulation in HEK293 cells" (title of Figure 3) is not supported by the presented data. A causality link is not established in this study, which only reports correlations between G4s/R-loops and transcription regulation.
  
  We politely disagree with this point. BLM and WRN are the best characterized DNA G4-resolving helicase ((Fry and Loeb, 1999; Mendoza et al., 2016; Mohaghegh et al., 2001). Here, we used the selective small molecules to specifically inhibit their ATPase activity and observed dramatical induction of G4 accumulation. Notably, the accumulated G4s that trigger the transcriptional changes are mainly located at the promoter-TSS region. If the transcriptional changes trigger the G4 accumulations, we should not observe such a biased distribution and more accumulated G4s should be detected in the gene body.
  
  (6) The effect of Dhx9 KO on colocalized G4s/R-loops and transcription is not clear. The suggestion that Dhx9 could regulate transcription by modulating G4s, R-loops, and co-localized G4s and R-loops is not supported by the presented data. Additional experiments and statistical analyses are needed to conclude the role of Dhx9 on colocalized G4s/Rloops and transcription.
  
  Dhx9 has been extensively studied and reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Thus, it is not necessary to repeat these assays again. To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. These data have clearly shown the roles of Dhx9 directly modulating the stabilities of G4s and R-loops. Furthermore, we showed that loss of Dhx9 caused 816 Dhx9 directly bound colocalized G4 and R-loop associated genes significantly differentially expressed, supporting the transcriptional regulation of Dhx9. We performed the differential analysis following the standard pipeline: DESeq2 for RNA-seq and DiffBind for HepG4-seq and HBD-seq. The statistical details have been described in the figure legends.
  
  (7) The conclusion that Dhx9 regulates the self-renewal and differentiation capacities of mESCs is vague. Additional experiments are needed to elucidate the exact contribution of Dhx9.
  
  In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. In this study, we have shown that depletion of Dhx9 significantly attenuated the proliferation of the mESCs and also influenced the capacity of mESCs differentiation into three germline lineages during the EB assay. In addition, we showed that depletion of Dhx9 significantly reduced the protein levels of mESCs pluripotent markers Nanog and Lin28a. The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The study on the involvement of native co-localized G4s and R-loops in transcriptional regulation further enriches the readers' understanding of genomic regulatory networks, and the functional dissection of Dhx9 also lays a good foundation for the study of the dynamic regulatory mechanisms of co-localized G4s and R-loops. Unfortunately, however, the authors lack a strong basis for questioning the widely used BG4 and S9.6 antibodies, and the co-localized G4s and R-loops sequencing data obtained by the developed and optimized method also lack parallel comparison with existing sequencing technologies, which cannot indicate that HepG4-seq and HBD-seq are more reliable and superior than BG4 and S9.6 antibody-based sequencing technologies. There are also some minor errors in the manuscript that need to be corrected.
  
  Thank you for the constructive comments. We have added a new section (Comparisons of HepG4-seq and HBD-seq with previous methods) and a new figure 9 to parallelly compare our methods to other widely-used methods.
  
  (1) This work mainly focuses on co-localized G4s and R-loops, but in the introduction section, the interplay between G4s and R-loops is only briefly mentioned. It is suggested that the importance of the interplay of G4s and R-loops for gene regulation should be further expanded to help readers better understand the significance of studying co-localized G4s and R-loops.
  
  Thank you for the comments. The current studies about the interplay between G4s and R-loops are limited. We have summarized all we could find in the literatures.
  
  (2) The authors mentioned that "a steady state equilibrium is generally set at low levels in living cells under physiological conditions (Miglietta et al., 2020) and thus the addition of high-affinity antibodies may pull the equilibrium towards folded states", in my understanding this is one of the important reasons why the authors optimized the G4s and R-loops detection assays, I wonder if there is a reliable basis for this statement. If there is, I suggest that the authors can supplement it in the manuscript.
  
  The main reason we develop the new method is to develop an antibody-free method to label the endogenous G4s in living cells. We ever tried to capture endogenous G4s using the tet-on controlled BG4. Unfortunately, we found that even a short time induction of BG4 in living cells was toxic. The traditional antibody-based methos rely on permeabilizing cells first to let the antibodies enter the nuclei. In this case, it is easy to lost the physiological pictures of endogenous G4s. We will add more discussion about this point. For R-loops, we just further optimized the GST-2xHBD-mediated method to avoid the problem of GST-tag. GST-fusion proteins are prone to form variable high molecular-weight aggregates and these aggregates often undermine the reliability of the fusion proteins.
  
  (3) Some questions about HepG4-seq:
  
  Is there a difference in hemin affinity for intramolecular G quadruplexes, interstrand G quadruplexes, and their different topologies? If so, does this bias affect the accuracy of sequencing results based on G4-hemin complexes?
  
  Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s (PMID: 32329781). Thus, the dynamics of G4 conformation possibly affect the HepG4-seq signals. In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.
  
  HepG4-seq is based on proximity labeling and peroxidase activity of the G4-hemin complex. The authors tested and confirmed that the addition of hemin and Bio-An in the experiment had no significant influences on sequencing results, but the effect of exogenous H2O2 treatment may also need to be taken into account since ROS can mediate the formation of G4s.
  
  For HepG4-seq protocol, we only treat cells with H2O2 for one minute. Thus, we thought that the side effect of H2O2 treatment should be limited in such a short time.
  
  (4) As we know, there have been at least two structure data of the S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the author's bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare in parallel with the data they have and the data generated by the widely used S9.6 antibody.
  
  Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.
  
  (5) It is hoped that the results of immunofluorescence experiments can be statistically analyzed.
  
  We have performed the statistical analysis and included the data in the new figure.
  
  (6) Some minor errors:
  
  Line 168, "G4-froming" should be "G4-forming";
  
  Figure 5E, the color of the "Repressed" average signal at the top of the HepG4-seq heatmap should be blue;
  
  Figure 7C, the abbreviation "Gloop" should be indicated in the text or in the figure caption.
  
  Thank you for pointing out these issues. We are sorry for these mistakes. We have corrected them in the revised version.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.03.597194v2
www.biorxiv.org www.biorxiv.org

Molecular determinants of Neu5Ac binding to tripartite ATP independent periplasmic (TRAP) transporter

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.
  
  Strengths:
  
  The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.
  
  Weaknesses:
  
  The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.
  
  The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.
  
  The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.
  
  The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.
  
  However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.
  
  Author response table 1.
  
  Reviewer #2 (Public Review):
  
  In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.
  
  The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.
  
  The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.
  
  Please see the comments above.
  
  Reviewer #3 (Public Review):
  
  The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.
  
  The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.
  
  We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.
  
  Reviewing Editor (Recommendations for the Authors):
  
  After discussing the reviews, the reviewers and reviewing editor have agreed on a list of the most important suggested revisions for the authors, which, if satisfactorily addressed, would improve the assessment of the work. These suggested revisions are listed below. We also include the full Recommendations For The Authors from each of the individual reviewers.
  
  (1) The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. Additional mutagenesis and activity experiments to test the contribution of this site to transport would strengthen the manuscript. Measuring Na+ concentration-response relations and calculating Hill slopes in WT vs. an M site mutant would be a good experiment. Given the lack of functional data and poor density, it does not seem appropriate to build the M site sodium in the PDB model.
  
  The density is well defined to suggest a metal bound (waters would not be clearly defined at this resolution). While our modeling of the site as a Na+ is arbitrary, this was done to satisfy the refinement programs where we have a known scatterer modeled. We could model this density with other metals, but unlike crystallographic refinement, real-space refinement of cryoEM maps does not produce a difference map that might allow us to identify the metal but not conclusively.   The density of the maps is good (we have added better figures to demonstrate this). We tried making multiple mutations to test for activity – unfortunately, we are still struggling to express proteins with mutations in this site in sufficient quantities to carry out transport assays.
  
  In the absence of being able to do the experiments, we did MD simulations (carried out by Senwei Quan and Jane Allison at University of Auckland). Our results are shown below – we are not certain without further studies that these should be included in the current paper (we will add them as authors if the editor feels that this evidence is critical).
  
  Author response table 2.
  
  We are showing this for review to suggest that K+, Ca2+, and Na+ were tried, and only Na+ stays stably in the binding pocket. The rest of the results will also have to be explained, which would change the focus of the paper.
  
  We also provided the sequence to Alphafold3 and asked it to identify the possible metal binding sites—when the input was Na+, it found all three binding sites.
  
  Summary: Both our experimental data and computational studies suggest the observed metal binding site is real but at the moment, it is not possible to refine the structure and put an unidentified metal. Computational studies suggest that this is a high-probability Na+ site.
  
  Demonstration of cooperativity between the Na+ site and transport require carrying out these experiments with mutations in these sites in a concentration-dependent manner. Unfortunately, our inability to produce well-expressed and purified proteins with mutations in a short time frame failed.
  
  (2) The authors identified the Neu5Ac binding site but only tested 2 residues for their involvement in substrate interactions, which was very limited. Given that the major highlight of this paper is the identification of the Neu5Ac binding site, it would strengthen the manuscript if the authors provided a more extensive series of mutagenesis experiments - testing at least the effect of D521A would be important. One inconsistency is Ser345 mutagenesis not affecting transport, and the authors should further discuss in the text why they think that is.
  
  D521A has been tested in H. influenzae, and this mutation results in loss of transport. This residue is highly conserved and occupies the same position. We expect the result to remain the same.
  
  We have added a few extra lines to discuss Serine 345: “Ser 345 OG is 3.5Å away from the C1-carboxylate oxygen – a distance that would result in a weak interaction between the two groups. It is, therefore, not surprising that the mutation into Ala did not affect transport. The space created by the mutation can be occupied by a water molecule.”
  
  (3) The purification and assessment of the stability of the protein are described in text alone with no accompanying data. It would be beneficial to include these data (e.g. in the Supplementary info) as it allows the reader to evaluate the protein quality.
  
  This is now added as Supplementary Figure 2.
  
  (4) The structural figures throughout the paper could benefit from more clarity to better support the conclusions. Specific critiques are listed below:
  
  - Figure 1: since the unbound map has a similar reported resolution, displaying the unbound structure's substrate binding site with the same contour would clearly demonstrate that the appearance of this density is substrate-dependent.
  
  - Figure 1: the atomic fit of the ligand to the density, and the suggested coordination by side chain and backbone residues, would be useful in this figure.
  
  - Figure 1: I think it would be more intuitive to compare apo and bound structures with the same local resolution scale.
  
  We have remade Figure 1 “Architecture of FnSiaQM with nanobody. (A and B) Cryo-EM maps of FnSiaQM unliganded and sialic acid bound at 3.2 and 3.17 Å, respectively. The TM domain of FnSiaQM is colored using the rainbow model (N-terminus in blue and C-terminus in red). The nanobody density is colored in purely in red. The density for modeled lipids is colored in tan and the unmodelled density in gray. The figures were made with Chimera at thresholds of 1.2 and 1.3 for the unliganded and sialic acid-bound maps. (C and D) The cytoplasmic view of apo and sialic acid bound FnSiaQM, respectively. Color coding is the same as in panels A and B. The density corresponding to sialic acid and sodium ions are in purple. The substrate binding sites of apo and sialic acid bound FnSiaQM are shown with key residues labeled. The density (blue mesh) around these atoms was made in Pymol with 2 and 1.5 s for the apo and the sialic acid, respectively, with a carve radius of 2 Å.”
  
  The local resolution maps have been moved to Supplementary Figure 3.
  
  - Figure 3, Figure 5a: The mesh structures throughout the manuscript are blocky and very difficult to look at and interpret, especially for the ion binding sites, which are currently suggestive of but not definitively ion densities. Either using transparent surfaces, higher triangle counts, or smoothing the surface might help this.
  
  We have made Figure 3 again with higher triangle counts. We tried all three suggestions and this provided the best figure. We have replaced Figure 5A with density for Neu5Ac and residues around it.
  
  - Figure 5A: It would be important to show the densities of the entire binding pocket, especially coordinating side chains, to show the reader what is and isn't demonstrated by this structure.
  
  - It's not clear how Figure 5D is supposed to show that the cavity can accommodate Neu5Gc, as suggested by the text - please make the discussed cavity clearer in the Figure.
  
  We have now marked with an arrow the Methyl Carbon where the hydroxyl group is added. We have mentioned that in the legend. It is open to the periplasmic side of the cavity.
  
  - Supplementary Figure 4: Please label coordinating residue sites.
  
  Labels have been added to Supplementary Figure 6 which was earlier Supplementary Figure 4.
  
  (5) Intro section: the authors should introduce the work on HiSiaP around the role of the R147 residue in high-affinity Neu5Ac binding, which coordinates the carboxylate of Neu5Ac, and which is a generally conserved mechanism for organic acid binding in other TRAP transporters. This context will help magnify their discovery later that in the membrane domains, it is a key serine and not an arginine that coordinates the carboxylate group (probably as the local concentration of Neu5Ac is high and tight binding site is not desirable for rapid transport, which is mentioned in the discussion).
  
  Thank you for pointing this out. We have added a new sentence to the introduction.
  
  “All the SiaP structures show the presence of a conserved Arginine that binds to the C1-carboxylate of Neu5Ac, and this Arg residue is critical as the high electrostatic affinity may be important to have a strong binding affinity that sequesters the small amounts that reach the bacterial periplasmic space (Glaenzer et al., 2017).”
  
  (6) TRAP transporters exist for many organic compounds and not just sialic acid, which might be nice to make the reader aware of.
  
  We initially did not do this as this is an advance paper and this was discussed in the earlier paper (Currie et. al., 2024). However, we have now added a sentence to the introduction. “Additionally, amino acids, C4-dicarboxylates, aromatic substrates and alpha-keto acids are also transported by TRAP transporters (Vetting et al., 2015). “
  
  (7) On p. 12, the authors describe the Neu5Ac binding site as a large solvent-exposed vestibule, having previously described the substrate-bound state as occluded. These descriptions should be adjusted to make clear which structure is being referenced. The clarity of this would be substantially improved if the authors included a figure that showed this occlusion - currently none of the structure figures clearly demonstrate what the authors are referring to. There are several conspicuous unmodeled densities proximal to the substrate, reminiscent of lipids (in between transport and scaffold domain) and possibly waters/ions. Given this, it is really surprising that the substrate binding site is described as "solvent-exposed" since the larger molecules seem to occlude the pocket. The authors should further process their dataset and discuss the implications of these surrounding densities.
  
  We have processed the data sets carefully both with cryosparc and relion and the resolution described here is same with both software with the cryosparc maps slightly better in terms of interpretability of peripheral helices and described in the manuscript. The current sample (FnTRAP) with the nanobody is a relatively stable sample (in our experience with other similar proteins) as evident from the number of images and particles to achieve a decent resolution and thus the workflow is straightforward and simple. There are number of non-protein densities, which in principle can be modelled but we have chosen a conservative approach not to model these extra densities (except for the two lipids, few ions) due to limit of the resolution. It is possible that increasing the number of particles will result in an increase in resolution but from the estimated B-factor (125 or 135 Å2 for unliganded and liganded), this will certainly require lot of more images with no guarantee of increased resolution.
  
  The question of outward open Vs outward occluded is a valid point. We have now modified this in the manuscript. “The Neu5Ac binding site has a large solvent-exposed vestibule towards the cytoplasmic side, while its periplasmic side is sealed off. Cryo-EM map shows the presence of multiple densities that could be modeled as lipids, possibly preventing the substrate from leaving the transporter. However, the densities are not well defined to model them as specific lipids, hence they have not been modeled. We describe this as the “inward-facing open state” with the substrate-bound.”
  
  (8) On p.15, the activity of FnSiaPQM in liposomes is reported, although the impetus for this study is not clear. Presumably, the reason for its inclusion is to ensure that the structurally characterized protein is active. It would be useful to say this at the start of the section if this is the case. This study nicely shows that the energetics and requirements of transport are identical to all the previous studies on Neu5Ac TRAP transporters - it would be good to acknowledge this somewhere in this section as well.
  
  These changes have been incorporated. We have added a line to say why we did this and added as the last line that this is similar to other SiaPQM’s characterized.
  
  (9) Figure 5C. The authors show the transport activity with and without valinomycin. The authors do not explain the rationale for testing and reporting both conditions for these mutants; an explanation is required, or the data should be simplified. The expected membrane potential induced by valinomycin should be mentioned in the legend.
  
  We have simplified Figure 5C and added the expected membrane potential value.
  
  (10) The authors state that the S300A mutant is inactive. However, unless the authors also measured the background binding/transport of radiolabelled substrate in the absence of protein, then the accuracy of this statement is not clear because Figure 5C does indicate some activity for S300A, albeit much lower than WT. This is an important point in light of the authors' suggestion that the membrane protein does not need a binding site of high affinity or stringent selectivity.
  
  We thank the reviewer for pointing this out we have now added a line in the experimental protocols “The experimental values were corrected by subtracting the control, i.e. the radioactivity taken up in liposomes reconstituted in the absence of protein. The radioactivity associated with the control samples, i.e. empty liposomes was less than 10% with respect to proteoliposomes.”.
  
  (11) There are several issues and important omissions in the work cited:
  
  - It is not normal practice to cite a reference in the abstract and the citation is only to the second structure of HiSiaQM, which does not fairly reflect previous work in the field by only referring to their own work. Also throughout the article, it is normal practice with in-text citations to order them chronologically, i.e. earliest first. Please update this.
  
  This article was submitted as an “Research advance article”. The instructions specifically say that “Research advance article should cite the article in eLife this paper advances. Hence the citation of the “second structure of HiSiaQM”. In fact, in the manuscript we explicitly say “The first structure of _Hi_SiaQM (4.7 Å resolution) demonstrated that it is composed of 15 transmembrane helices and two helical hairpins.”   We are following the policy laid out.
  
  Zotero organizes multiple references in alphabetical order, we did not choose to do it that way – the suggestion of bias is not true. The final version of the accepted paper will have numbers, and this argument will automatically be corrected.
  
  - Intro: please cite the primary papers discovering other families of sialic acid transporters.
  
  - Intro: When introducing information on the binding site, dissociation constant of Neu5Ac, and thermodynamics of ligand binding to SiaP, the authors should also include references to the work done by others in addition to their own work.
  
  The Setty et al. paper was the first to demonstrate that the two-component systems are distinct, and that the binding protein of the TRAP system binds enthalpically while the binding protein of the ABC system binds entropically (SiaP vs SatA). As the reviewer points out, this is significant because it highlights how the Arg binding to the carboxylate, which is the enthalpic driver in this case and contributes to the difference between sugar binding to SiaP and SatA. Many studies have published binding affinities of molecules to SiaP, but this paper offers valuable insight into the differences between these systems. We have cited a number of the SiaP papers from other groups, including acknowledging the first structure of SiaP from H. influenzae by Muller et al., in 2006.
  
  - p.5 "TRAP transporters are postulated to employ an elevator-type mechanism...". This postulation has been experimentally tested and published, so should be discussed and referenced (Peter et al. 2024. https://doi.org/10.1038/s41467-023-44327-3).
  
  We have now corrected this error. We removed “are postulated to” and added the reference.
  
  - p.5 "Notably, the transport of Neu5Ac by TRAP transporters requires at least two sodium ions (Davies et al., 2023)." The requirement for at least 2 Na+ ions for Neu5Ac transport was first demonstrated in Mulligan et al. PNAS 2009, so should also be cited (for completion, so should Mulligan et al. JBC. 2012 and Currie et al. elife 2023, which have also shown this requirement is a commonality amongst all Neu5Ac TRAP transporters).
  
  Added.
  
  - P.12, Mulligan et al, JBC, 2012 should be added to the citations in the first sentence.
  
  Added.
  
  - p.19 "Interestingly, even the dicarboxylate transporter from V. cholerae (VcINDY) binds to its ligand via electrostatic interactions with both carboxylate groups". Other references are more appropriate than the one used to support this statement.
  
  Also added references for Mancusso et. al, 2012, Nie et.al, 2017 and Sauer et.al., 2022 here.
  
  - p.19. "The structure of the protein in the outward-facing conformation is unknown". The authors do not discuss the mechanistic findings from Peter et al 2024 Nat Comm here. The work described in that paper revealed an experimentally verified model of the OFS of HiSiaQM, so really needs to be included.
  
  This is not an experimentally determined 3D structure. They have shown the possible existence of this by microscopy, but the structure is not determined. The work mentioned is a wonderful piece of work, but it does not report the three-dimensional structure of the protein in the outward-facing conformation to allow us to understand the nature of the molecular interactions.
  
  - The reference to Kinz-Thompson et al 2022 on p. 6 is not appropriate - neither the HiSiaQM papers nor the PpSiaQM paper makes reference to this work when identifying the binding site. More suitable references are used, for example, Mancusso et al 2012, Nie et al 2017 and Sauer et al 2022; this should be reported accurately.
  
  Added the suggested references. We think the paper (Kinz-Thomposin et al 2022) is relevant and have also kept that reference.
  
  - Garaeva et al report the opposite of what the authors mention - "In the human neutral amino acid transporter (ASCT2), which also uses the elevator mechanism, the HP1 and HP2 loops have been proposed to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019)." In fact, this paper suggested a one-gate model of transport (HP2), where HP1 seems uninvolved in gating.
  
  The Reviewer is correct. We were wrong and not clear. The entire paragraph has been rewritten.
  
  “While, both the HP1 and HP2 loops have been hypothesized to be involved in gating, in the human neutral amino acid transporter (ASCT2), (which also uses the elevator mechanism), only the HP2 loops have been shown to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019). Hence, it is suggested that there is a single gate that controls substrate binding. Superposition of the _Pp_SiaQM and _Hi_SiaQM structures do not reveal any change in these loop structures upon substrate binding. For TRAP transporters, the substrate is delivered to the QM protein by the P protein; hence, these loop changes may not play a role in ligand binding or release. This may support the idea that there is minimal substrate specificity within SiaQM and that it will transport the cargo delivered by SiaP, which is more selective.”
  
  - p.19 "suggesting that SSS transporters have probably evolved to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al, 2018)." Surely this goes without saying since Wahlgren et al 2018 demonstrated that SiaT, an SSS, could transport sialic acid? It's unclear why this was included here - perhaps it needs to be rewritten to make the point more clearly, but as it stands, this statement appears self-evident. Furthermore, these proteins can transport all kinds of molecules (see TCDB 2.A.21). This statement needs to be clarified.
  
  This was a comparison to other Neu5Ac binding sites in other Neu5Ac transporters. We have modified the sentence. “The polar groups bind to both the C1-caboxylate side of the molecule and the C8-C9 carbonyls, suggesting that Proteus mirabilis Neu5Ac transporter (SSS type) evolved specifically to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al., 2018)”. These were arguments we were making to suggest that the lack of tight binding could also mean reduced specificity.
  
  - The authors reconstitute the FnSiaQM and measure transport with SiaP, which resembles closely what is known for both HiSiaPQM, VcSiaPQM, which is not cited (https://doi.org/10.1074/jbc.M111.281030).
  
  - Regarding lipids between transport and scaffold domains: there is precedent for such lipids in the elevator transporter GltPh, Wang, and Boudker (eLife 2020) proposed similar displacements during transport and would be appropriate to cite here.
  
  We have now cited the reference to the Mulligan et al., 2012 paper. We also added a sentence on the findings of GltPh paper by Wang and Boudker. Thank you for pointing this out.
  
  (12) p.9 "TRAP transporters, as their name suggests, comprise three units: a substrate-binding protein (SiaP) and two membrane-embedded transporter units (SiaQ and SiaM) (Severi et al., 2007)." This is somewhat odd phrasing because the existence of fused membrane components has been well-documented for a long time. The addition of "Many" at the start of the sentence fixes this.
  
  Added Many.
  
  (13) On p.12 the authors compare the ligand-induced conformational changes of FnSiaQM with ASCT2, citing Garaeva et al, 2019. This comparison does not make sense considering TRAP transporters and ASCT2 do not share a common fold. A far superior comparison is with DASS transporters, which actually do have the same fold as TRAP transporters. And, importantly, the Na+ and substrate-induced conformational changes have been investigated for DASS transporters revealing a unique mechanism likely shared by TRAP transporters (Sauer et al, Nat Comm, 2022). The text on p.12 should be adjusted to replace the ASCT comparison with a VcINDY comparison.
  
  The purpose of citing the ASCT2 paper was only concerning the HP1 and HP2 gates. The authors show that HP2 changes conformation only. Comparing the two FnSiaQM structures – with and without ligand, we see no change in either the HP1 or the HP2 loops. On Page 17, when we describe the structure, we do specifically mention that the overall architecture is similar to VcINDY and the DASS transporters.
  
  (14) p.12 "For TRAP transporters, the substrate is delivered to the QM protein by the SiaP" protein;" "SiaP protein" should be "P protein"
  
  Corrected.
  
  (15) p.18. "periplasmic membrane" should be "cytoplasmic membrane".
  
  Corrected.
  
  (16) p.19. "This prevents Neu5Ac from binding..." There is no evidence for this so this needs to be softened, e.g. "This likely prevents Neu5Ac from...".
  
  Agree – Modified.
  
  (17) Figure 2B is rather small, cramped, and difficult to see. We suggest that the authors make that panel larger, or include it as a stand-alone supplementary figure.
  
  We have moved this figure into a supplementary figure as suggested by the reviewer.
  
  (18) The authors describe the Neu5Ac binding site in SiaQM. It would be helpful if the authors provided a figure in support of the statement that the Neu5Ac binding site architecture is similar to dicarboxylate in VcINDY (especially as Neu5Ac is a monocarboxylate).
  
  The Neu5Ac binding site is NOT similar to the VcINDY binding site. But, we understand the origin of the comment. We have now changed the sentence: “The overall architecture of the Neu5Ac binding site is similar to that of citrate/malate/fumarate in the di/tricarboxylate transporter of V. cholerae (Vc_INDY), but the residues involved in providing specificity are different (Kinz-Thompson _et al., 2022; Mancusso et al., 2012; Nie et al., 2017; Sauer et al., 2022). Neu5Ac binds to the transport domain without direct interactions with the residues in the scaffold domain. The majority of the interactions are with residues in the HP1 and HP2 loops of the transport domain (Figure 5B). Asp521 (HP2), Ser300 (HP1), and Ser345 (helix 5) interact with the substrate through their side chains, except for one interaction between the main chain amino group of residue 301 and the C1-carboxylate oxygen of Neu5Ac. Mutation of the residue equivalent to Asp521 has been shown to result in loss of transport (Peter et al., 2022). To evaluate the role of residues Ser-300 and Ser-345, we mutated them to alanine and performed the transport assays.”
  
  (19) When comparing the binding modes of Neu5Ac to different proteins in Figure 6, it would be helpful to include the structure in this paper as well.
  
  The Neu5Ac binding site is present in figure 5. We would prefer not to show it again in Figure 6.
  
  Additionally, there is a clear binding mode of Neu5Ac in Figure 1 as well.
  
  (20) The manuscript would benefit from a more detailed comparison between Na+-bound (described as apo) and Na+/Neu5Ac structures, especially the prospective gates. If this transporter behaves anything like the archetypical ion-coupled glutamate transporters, some structural changes in the gates might be expected to facilitate transport domain movement when the substrate is loaded, but not when only Na+ is bound. It would be important to discuss and visualize these changes.
  
  We have described in the manuscript that there is NO change in the HP1 and HP2 gates between the unliganded structure and the Neu5Ac bound structure. The major difference we observe is the ordering of the third metal binding site.
  
  A figure comparing the substrate binding pockets between the different high-resolution structures would also be informative. Do the bonding distances between ligands and side chains significantly change between homologs?
  
  This is the only Neu5Ac bound structure. Since the specificity to the substrate comes from the variability of the residues that interact it, we do not believe that this figure would not add much value.
  
  (21) A supplementary figure (or an inset to Figure 2) showing pairwise percent identity between different characterized QM transporters would be useful.
  
  We have now added a Supplementary Figure 4 showing the comparison of the three QM sequences whose structures have been determined.
  
  (22) There is relatively minimal EM processing. More rigorous processing would require relatively little effort and could boost resolution, making this a vastly improved manuscript with a much more confident interpretation of structures.
  
  We described the overall workflow. The processing was rigorous. After obtaining the first maps, we created templates with the structure and did template-based picking. We then did several rounds of 2D classification followed by homogenous refinement, Non-Uniform Refinement. We then made masks and carried out local refinement. We then got the best maps and did a 3D classification. Refined the 3D classes independently. Then, we regrouped them based on how similar they were. We then went back and picked particles again (we used different methods of particle picking, but template-based picking resulted in the final set of particles used) and went through the whole process again. At the end of the refinement, we carried out global and local CTF refinement followed by reference-based motion correction. The final refinement was then done with the Bayesian polished particles. The final refinement was local refinement with a mask over only the transporter and the nano-body. After the reviews came, we tried multi-body refinement in Relion5. It did not improve resolution. We have expanded the legend to supplementary Figure 2 (without listing all the different things we tried). The best resolution we obtained for the structure was 3.1 Å. However, it is important to note that the local resolution of the map around the ligand is good.
  
  We realized this is not easy to depict in a local resolution map. So, we wrote a script to take every atom, then take a radius of 5 Å (again we tried different radii and used the optimal one; we are preparing a manuscript to describe this), take all the local resolution values within the 5 Å spere and average it and add it as B-factor that atom. We have moved the local resolution map figure to the supplement and replaced Figure 1 with a Cartoon, where the color represents the local resolution in which the atom is.
  
  (23) Calling the structure without Neu5Ac bound an "apo" structure is confusing since it indeed has the ligand Na+ present and bound. "Na+" and "Na+/Neu5Ac" structures would be more appropriate.
  
  Changed all “apo” to “unliganded”.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.29.587382v2
www.biorxiv.org www.biorxiv.org

New submission 03/10/2023, 09:01:17

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers for their insightful and constructive comments of our work that have helped to strengthen the manuscript. In response to the additional suggestions provided by the reviewers, we have made revisions by adding or replacing five main figures, three supplementary figures, refining the text, and clarifying certain conclusions. Detailed responses to the reviewers’ points can be found below.
  
  Additional experiments, textual changes, or modulation of claims are needed to address weaknesses in the SOD1 portion of the study. Specifically:
  
  A) These studies require an assessment of the on-target efficacy of the inhibitors at the relevant concentration ranges. Ideally, they should have minimal effects against SOD1 knockout cell lines (an acute challenge at a time point before the growth defects become apparent) and show better efficacy in SOD1-overexpressing lines. Key experiments (changes in superoxide, OCR profiling, DNA alkaline comet assay) would be more convincing if they were carried out with SOD1 knockout lines to compare against the inhibitor effects (3-4 days after introducing sgSOD1 when growth defects are not apparent). In addition, SOD activity should be measured directly following inhibitor treatment.
  
  We agree with the reviewers that the on- vs. off-target effects of the pharmacologic SOD1 inhibitors is a critical point to address. We have validated that SOD activity is reduced following treatment with ATN-224 in Figure 2 – Figure supplement 1A.
  
  Nevertheless, we acknowledge that the potential for off-target effects of these inhibitors cannot be completely ruled out. To address this concern, we have incorporated a discussion regarding the potential off-target effects of both LCS-1 and ATN-224.
  
  B) Assays should be included to support that SOD1 activity is altered. ATN-224 and LCS-1 are used to inhibit SOD1 function in the majority of the experiments, which should be supported by SOD activity assays to confirm SOD inhibition. Further, the concentration of ATN-224 used in this paper (12.5 uM) is beyond the concentration of what has been reported to inhibit SOD1 function in human blood cells. In Figure 4D, the authors demonstrate comparable SOD1 total protein levels in WT and PPM1Dmutant cells. However, the authors should further address whether PPM1D-mutation alters SOD1 activity via SOD activity assays.
  
  We thank the reviewers for these suggestions. We have performed SOD activity assays which confirmed that SOD activity is inhibited upon treatment with ATN-224 at two concentrations (6.25 and 12.5 uM). Although we also did this for LCS-1-treated cells as well, in our hands, we did not see reduced SOD activity. However, LCS-1 has been shown to inhibit SOD activity in other publications including PMID: 21930909 and PMID: 32424294. From these assays, we have also found that PPM1D-mutant cells had increased SOD activity at baseline, despite having similar levels of SOD1 protein. These data have been added to Figure 2–Figure supplement 1A.
  
  C) Some conclusions are not fully supported by the data provided. The authors claimed that "upon inhibition of SOD1, there was an increase in ROS that was specific to the mutant cells" in Figure 2E. Comparison of ROS levels among untreated, ATN-224, and LCS-1 of PPM1D-mutant cells should have been made and the statistics analysis among these groups should have been provided. Moreover, in Figure 2-Figure Supplement 1E, LCS-1 treatment does not increase ROS levels in PPM1D mutant LCLs. Performing these experiments with control and SOD1 deletion cells would have strengthened the results. Along with this point, the authors should comment on why SOD2 is not identified as a top hit in the CRISPR screen, as SOD2 deletion accumulates superoxide in cells.
  
  After performing additional statistical analyses for Figure 2E, we found that the minor increase in ROS levels in the mutant cells after SOD1 inhibition was not statistically significant. We have revised the text accordingly.
  
  As for why SOD2 was not identified as a top hit, we postulate that this may be due to inherent dependency of the WT cell lines on SOD2.
  
  D) Fig. 1 - SOD1 appears to be clustered with several other genes in the volcano plot (including FANC proteins). Did any other ROS-detoxifying enzymes show similar fitness scores? The effects of the SOD1 sgRNA are striking, however, it would be useful to see qPCR or immunoblot data confirming robust depletion.
  
  Thank you for your suggestion. We have validated the loss of SOD1 protein expression after SOD1 sgRNA deletion by immunoblot and have added this data to Figure 1– figure supplement 1D. While other ROS-detoxifying enzymes were not significantly enriched in the top 37 hits, interestingly, the Fanconi Anemia pathway also has roles in counteracting oxidative stress. FA-deficient cells have mitochondrial dysfunction and redox imbalance, and several of the FA family proteins are implicated in mitophagy. Therefore, there may be an interesting interplay between SOD1 and the FA pathway that is worth highlighting in the discussion of our manuscript even though there was no experimental investigation performed.
  
  E) Fig. 2 - What are the relative SOD1 levels in the mutant PPM1D vs. WT. cell lines? The effects of the chemical inhibitors are stronger in MOLM-13 than in the other two lines. These data could also point to whether LCS-1 and ATN-224 cytotoxicity are on-target or off-target at these concentrations, which is a key issue not currently addressed in these studies. This is a particular concern as the OCI-AML2 line shows a stronger growth defect with CRISPR SOD1 KO (in Fig 1) but the smallest effects with these chemical inhibitors. The authors should also include SOD1 levels for Figure 1D and Figure 4Figure supplement 1C.
  
  SOD1 protein expression is similar between WT and PPM1D-mutant cell lines and the loss of SOD1 after SOD1 sgRNA deletion was validated by immunoblot. These data have been added to Figure 1- figure supplement 1D and Figure 4D.
  
  F) Does SOD1 co-expression in PPM1D-mutant patient AML correspond to poorer disease outcomes? This can be evaluated in publicly available patient datasets and would support the idea of SOD1 synthetic lethality.
  
  Unfortunately, there are no publicly available patient datasets with sufficient cases of de novo PPMDmutant AML to assess this question.
  
  G) While endogenous mitochondrial superoxide levels are elevated in PPM1D mutant lines, it is entirely unclear why SOD1 inhibition should affect mitochondrial superoxide as it detoxifies cytosolic superoxide. Also unclear why the DCFDA signal (which measures total hydroperoxides) is increased under SOD1 inhibition - SOD1 dismutates superoxide radicals into hydrogen peroxide, therefore unless SOD2 is compensating for SOD1 loss, one might expect hydroperoxides to be lower (unless some entirely different oxidase is increasing their levels). None of these outcomes appear to be considered. Finally, it is not explained how lipid peroxidation, which requires the production of hydroxyl or similarly high-potency radicals, is being caused by increased superoxide or peroxides. One possibility is there is an increase in labile iron, in which case this phenotype would be rescued by the iron chelator desferal, and by the lipophilic antioxidant, ferrostatin.
  
  We measured intracellular labile iron levels by flow cytometry by staining the cells with FerroOrange at baseline and after SOD1 inhibition with our pharmacologic inhibitors (ATN-224 at 12.5 uM and LCS-1 at 1.25 uM). Across the three leukemia cell lines, we saw variable results in iron levels with no appreciable patterns (see below). Therefore, we cannot make conclusions about the contribution of labile iron to our observed phenotypes.
  
  Author response image 1.
  
  H) Do the sgSOD1 cells also show similar increases in MitoSox green, DCFDA, and BODIPY signal? These experiments would clarify whether the effects of the inhibitors are directly related directly to SOD1 loss or if they represent off-target effects from the inhibitors and/or compensatory changes in SOD2.
  
  We do not observe changes in SOD2 in the several contexts in which we have examined this. We cannot exclude off-target effects of the inhibitors so have clarified this in the text.
  
  I) The authors may want to assess whether Rac1 or NADPH oxidase activity is altered in the SOD1 KO in WT vs. PPM1D cells. Their results may be the consequence of compromised ROS-driven survival signaling or DNA repair rather than direct ROS-induced damage, which is not caused directly by superoxide (or hydrogen peroxide).
  
  We appreciate the reviewer’s recommendations. However, due to time constraints, we regret not being able to assess Rac1 or NADPH oxidase activity. Nevertheless, we recognize the possibility of altered ROS-driven signaling rather than ROS-induced damage as a driver of our phenotype and have incorporated this possibility into our discussion.
  
  J) Fig. 3 - the effects on mitochondrial respiratory parameters, while statistically significant, do not seem biologically striking. Also, these data are shown for OCI-AML2 cells which show the smallest cytotoxic effects with the SOD1 inhibitors among the 3 lines tested. They do however show the most robust growth defect with sgSOD1. This discrepancy could suggest that mitochondrial dysfunction does not underlie the observed growth defect and/or the inhibitor cytotoxicity is not on-target. Ideally, mitochondrial profiling should also be carried out on this cell line with inducible SOD1 depletion. Have the authors assessed whether the mitochondrial Bcl family proteins are affected by the inhibitors?
  
  We assessed a few members of the mitochondrial Bcl-family proteins including MCL-1, BCL-2, and BCL-XL during the revision process. PPM1D-mutant cells have mildly increased expression of these anti-apoptotic proteins at baseline and the expression is not altered by pharmacologic SOD1 inhibition (see Author response image 2 below). Due to time constraints, we were unable to perform seahorse assays and mitochondrial profiling in the SOD1-deletion cells.
  
  Author response image 2.
  
  K) Fig. 4 - Currently the data in this figure do not support the authors' claim that PPM1D-mutant cells have impaired antioxidant defense mechanisms, leading to an elevation in ROS levels and reliance on SOD1 for protection. It should be noted that oxidative stress specifically refers to adverse cellular effects of increasing ROS, not baseline levels of various redox parameters. Ideally, levels of GSSG/GSH would be a better measure of potential redox stress tolerance than the total antioxidant capacity assay. Finally, oxidative stress can be assessed by challenging the wt and mutant PPM1D cell lines with oxidant stressors such as paraquat which elevates superoxide, or drugs like erastin which elevate mitochondrial ROS. The immunoblot shows negligible changes in the antioxidant proteins assayed. Again, this blot should include SOD2 which is the most relevant antioxidant in the context of mitochondrial superoxide.
  
  We measured intracellular glutathione levels by flow cytometry and found that PPM1D-mutant cells had a greater proportion of cells with low levels of GSH. This data has been added as Figure 4D. We have also repeated the western blot to look at the antioxidant proteins catalase, SOD1, and thioredoxin after SOD1-deletion and pharmacologic SOD1 inhibition. We evaluated SOD2 protein levels in these experiments, as suggested. Smooth muscle actin (SMA) is included in the antibody cocktail as a loading control. However, it is unclear to us as to why PPM1D-mutant cells consistently have significantly higher levels of SMA. Therefore, we included a separate loading control, Vinculin. Repeat of these western blots showed a clearer difference between WT and PPM1D-mutant cells in the levels of these antioxidant proteins in which PPM1D-mutant cells have decreased levels of catalase and thioredoxin. These blots also show that SOD2 levels may be mildly increased in the PPM1D-mutant cells at baseline but is not significantly upregulated upon SOD1 inhibition. We have replaced the original immunoblot from Figure 4D with the revised blots that more clearly demonstrate the reduced levels of catalase and thioredoxin, now figure 4E.
  
  L) Fig. 5 - These data support that DNA breaks are elevated in PPM1D mutant vs. wt cells. However, the data with the chemical SOD1 inhibitor again do not convince us that the enhanced levels are due to on-target effects on SOD1. Use of the alkaline comet assay is appropriate for these studies and the 8-oxoguanine data do indicate contributions from oxidative DNA base damage. But these are unlikely to result directly from altered superoxide levels, as this species cannot directly oxidize DNA bases or cause DNA strand breaks.
  
  Thank you to the reviewers for raising this point. We have performed comet assays in SOD1-deletion cells to look at levels of DNA damage. Consistent with the reviewers’ point, we do not see a significant increase in DNA breaks after SOD1 deletion. We have removed the data using the SOD1 inhibitor and instead show the COMET analysis in the PPM1D-mut and SOD1-KO cells (see Figure 5F). We now make the point that increased DNA damage with SOD1 loss cannot explain the vulnerability of the double-mutant cells.
  
  M) Instead of using NAC, which elevates glutathione synthesis but also has several known side effects, the authors may want to determine whether Tempol, a SOD mimetic can rescue the effects of SOD1 knockout or inhibition. This would directly prove that SOD1 functional loss underlies the observed growth defect and cytotoxicity from genetic SOD1 knockdown or chemical inhibition.
  
  This is an excellent suggestion; we have added comments to this effect into the discussion.
  
  N) It is recommended the discussion focus more strongly on how the signaling function of superoxide vs. its reactions with other molecular entities to induce genotoxic outcomes could be contributing to the observed phenotypes. The discussion of FANC proteins, which were targets with similar fitness scores but not experimentally investigated at all, is an unwarranted digression.
  
  Thank you for this recommendation. We have expanded the discussion to focus more on the signaling functions of superoxide. However, considering the role of the Fanconi Anemia pathway in mitigating DNA damage and oxidative stress, we believe the discussion on the FANC proteins is important due to the possible intersection with SOD1. Therefore, we have refined this portion discussion to focus more on the interplay between SOD1 and FA.
  
  O) The complete lack of consideration of SOD2 in these studies is a missed opportunity as it reduces mitochondrial superoxide levels but elevates hydrogen peroxide levels. It would be very interesting to see whether SOD1 inhibition leads to compensatory increases in SOD2. SOD2 can be easily measured by immunoblot. Furthermore, measuring total superoxide via hydroethidium in a flow cytometric assay vs. mitochondrial ROS in PPM1D mut vs. wt cells and under SOD1 knockout would enable a determination of which species dominates (cytosolic or mitochondrial). These experiments are required to fill some logical gaps in the interpretation of their redox data.
  
  During the revision process, we have included SOD2 in our studies and have found that loss of SOD1 via genetic deletion and pharmacologic inhibition does not lead to compensatory increases in SOD2 (Figure 4D). Additionally, we have measured cytoplasmic superoxide levels using dihydroethidium to differentiate between cytoplasmic vs. mitochondrial superoxide. We found that at baseline levels, the mutant cells also harbored more cytoplasmic superoxide. We have added this figure as Figure 2C and moved the original mitochondrial superoxide data to Figure 2-figure supplement 1C.
  
  P) Given the DNA breaks observed in PPM1D mutant cells, it is highly recommended that the authors assess whether iron levels are elevated in mut vs. wt cells and whether desferal can rescue observed SOD1 inhibition defects. Also, it has been reported that PPM1D promotes homologous recombination by forming a stable complex with BRCA1-BARD1, thereby enhancing their recruitment to doublestrand break sites. The authors should comment on why there is no difference in repair via HR in WT and PPM1D mutant cells in Figure 5C.
  
  Please see comment G regarding our findings about iron levels.
  
  The reviewers pose an interesting question as to why there is no difference in HR repair between WT and mutant cells, given the reported role of PPM1D in promoting HR. We have addressed this question in the main text. We believe that several factors can limit the extent of HR enhancement in PPM1D-mutant cells. For example, HR is typically confined to the S/G2 phase and thus may be constrained by cell cycling, among other regulatory mechanisms.
  
  Other comments:
  
  A) The authors described in the Method section that "The CRISPR Screen PPM1D mutant Cas9expressing OCI-AML2 cell lines were transduced with lentivirus library supernatant." The authors need to provide information on whether the MOI of the CRISPR screen has been well controlled to ensure that the majority of the cell population has a single copy of sgRNA transduction.
  
  We performed a lentiviral titer curve prior to the screen to determine the volume of viral supernatant to add for a multiplicity of infection (MOI) of 0.3. This important detail has been added to our Methods.
  
  B) The study convincingly shows differences between parental leukemic cells and the PPM1D mutants but one important control is missing in experiments related to Fig. 2 and 3. All PPM1D mutant clones used in this study were subjected to the blasticidin selection of the transduced cells to generate cells stably expressing Cas9 and subsequently, the clones with successful PPM1D targeting were expanded. The authors should demonstrate that increased ROS production is not just a consequence of the lentiviral transduction and antibiotic selection and that it corresponds to increased PPM1D activity in PPM1D mutant cells. To do that, authors could compare PPM1D clones to parental cells that underwent the same selection procedure (OCI-AML2-Cas9 cells and OCI-AML3-Cas9 cells).
  
  It is true that the parental OCI-AML2 and OCI-AML3 cell lines underwent four days of blasticidin selection to create the stably expressing Cas9 cell lines. However, after the four-day period, the blasticidin was removed from the cell culture media. From there, we induced the PPM1D-mutations into the Cas9-expressing “WT” cell lines using the RNP-based CRISPR/Cas9 delivery method and single cells were then sorted into 96-well plates. Clones were expanded and validated using Sanger sequencing, TIDE analysis, and western blot. In all of our assays, we compare the WT Cas9 cells to the PPM1D-mutant Cas9 cells. Additionally, the cells have been expanded and passaged several times after blasticidin-selection. Therefore, we believe it is unlikely that there are residual ROSinducing effects from the antibiotic treatment.
  
  C) The authors mention that they identified 3530 genes differentially expressed in parental and PPM1D mutant cells (line 267) but it is unclear what was the threshold for statistical significance. They mention FDR<0.05 in the Methods but show GSEA analysis with FDR<0.25 in Figure 4A. Source data for Fig. 4 is missing and the list of differentially expressed genes is not shown.
  
  The source data files for Figures 1 and 4 will be uploaded with the revised manuscript. Upon reviewing the source data, we noticed an error in the number of differentially expressed genes. We have corrected this in line 274 and you will see that this correlates with Figure 4-source data 1. For the thresholds, we used an FDR<0.05 for the differential gene expression analysis, and an FDR <0.25 in the GSEA, which is an appropriate threshold for GSEA. We have clarified these thresholds in the methods section.
  
  D) Include a definition of MFI in Figure legend Fig.2 and also in the Methods section. The unit should be indicated at both the x and y axes.
  
  We have defined MFI in the figure legends and methods sections and have updated the figures accordingly.
  
  E) Legend to Figure 2 - Figure Supplement 1 E should define the grey and pink columns (likely WT and mutants LCLs).
  
  Thank you. We have defined the grey and pink columns as WT and PPM1D-mutant cell lines, respectively for Figure 2 – Figure supplement 2D and E.
  
  F) Reporter assays in Fig. 5 convincingly show that NHEJ capacity is reduced in PPM1D mut cells. In the text, the authors state that this might reflect the impact of PPM1D on LSD1 (line 365). Although this might be the case, other options are equally possible. It would be appropriate to include a reference to the ability of PPM1D to counteract gH2AX and ATM which generate the most upstream signals in DDR.
  
  Thank you to the reviewers for raising this excellent point. We have revised the text to incorporate the impact of PPM1D on yH2AX and ATM on NHEJ.
  
  G) The authors correctly state that truncation of PPM1D leads to protein stabilization (line 85) and that it is present in U2OS cells (line 355). These observations have first been reported by Kleiblova et al 2013 and therefore one reviewer believes that this reference should be included. This study also identified truncating PPM1D mutation in colon adenocarcinoma. HCT116 cells and the role of PPM1D mutation in promoting the growth of colon cancer has subsequently been tested in an animal model (Burocziova et al., 2019).
  
  Thank you. We have added this reference to our text in line 360.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.08.31.555634v2
www.biorxiv.org www.biorxiv.org

New submission 14/09/2023, 10:51:43

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We want to thank you for organizing the review process a of our manuscript ‘Human skeletal muscle organoids model fetal myogenesis and sustain uncommitted PAX7+ myogenic progenitor’ for eLife and the reviewers for providing their criticisms.
  
  We have changed some Figures within the manuscript and added two new Supplementary Figures as outlined below
  
  Reviewer #1 (Public Review):
  
  The authors aimed to establish a cell culture system to investigate muscle tissue development and homeostasis. They successfully developed a complex 3D cell model and conducted a comprehensive molecular and functional characterization. This approach represents a critical initial step towards using human cells, rather than animals, to study muscular disorders in vitro. Although the current protocol is time-consuming and the fetal cell model may not be mature enough to study adult-onset diseases, it nonetheless provides a valuable foundation for future disease modelling studies using isogenic iPSC lines or patient-derived cells with specific mutations. The manuscript does not explore whether or how this stem cell model can advance our understanding of muscular diseases, which would be an exciting avenue for future research. Overall, the detailed protocol presented in this paper will be useful for informing future studies and provides an important resource to the stem cells community. The inclusion of data on disease modelling using isogenic iPSC lines or patient-derived cells would further enhance the manuscript's impact.
  
  We agree, that data on disease modelling using patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We are not aiming to disease model e.g. LGMD or Duchenne within the context of this study. Our protocol is just the starting point of us and others to use this organoid protocol for skeletal muscle disease modelling in further studies. We already have a study of Duchenne musculular dystrophy modelling using our organoid system under way.
  
  Reviewer #2 (Public Review):
  
  This paper illustrates that PSCs can model myogenesis in vitro by mimicking the in vivo development of the somite and dermomyotome. The advantages of this 3D system include (1) better structural distinctions, (2) the persistence of progenitors, and (3) the spatial distribution (e.g. migration, confinement) of progenitors. The finding is important with the implication in disease modeling. Indeed the authors tried DMD model although it suffered the lack of deeper characterization.
  
  The differentiation protocol is based on a current understanding of myogenesis and compelling. They characterized the organoids in depth (e.g. many time points and immunofluorescence). The evidence is solid, and can be improved more by rigorous analyses and descriptions as described below.
  
  Major comments:
  
  1) Consistency between different cell lines.
  
  I see the authors used a few different PSC lines. Since organoid efficiency differ between lines, it is important to note the consistency between lines.
  
  2) Heterogeneity among each organoid
  
  Let's say authors get 10 organoids in one well. Are they similar to each other? Does each organoid possess similar composition of cells? To determine the heterogeneity, the authors could try either FACS or multiple sectioning of each organoid.
  
  Concerning the raised issue of consistency between different PSC lines we stated under Material and Methods that skeletal muscle organoids were generated from six hiPSC lines: CB-CD34 iPSC, DMD iPSC, DMD_iPS1, BMD_iPS1, LGMD2A iPSC, LGMD2A-isogenic iPSC. We have evaluated the organoid approach with six hiPSC lines with independent genetic backgrounds with more than 5 independent derivations per line, for the control line (CB CD34+) with more than 20 derivations. At the time of creating the first preprint in 2020 our reported protocol was based on about 45 independent differentiation inductions.
  
  The heterogeneity among each organoid is a valid point, however very cumbersome to address with FACS or multiple sectioning.
  
  We have now addressed the heterogeneity of organoids within a line and the consistency of organoids between different lines by diffusion map analysis for early organoid stages and further single cell RNA seq analyses for mature stages and include this data as Figure 4 – figure supplement 6.
  
  3) Consistency of Ach current between organoids.
  
  Related to comment 2, are the currents consistent between each organoid? How many organoids were recorded in the figures? Also, please comment if the current differ between young and aged organoids.
  
  The acetylcholine (ACh)-induced changes in holding currents in Figure 3K are representative recordings with n=6. The further recordings in Figure 3 – Figure Supplemental 3 for organoids derived from three additional lines, were also recorded with n=6. Cells were taken for electrophysiological characterization in all analyses from 8 weeks organoids.
  
  4) Communication between neural cells and muscle?
  
  The authors did scRNAseq, but have not gone deep analysis. I would recommend doing Receptorligand mapping and address if neural cells and muscle are interacting.
  
  We are now providing a characterization of the cell-cell communication network for all clusters at week 12 of human skeletal muscle organoid development as the new Figure 4 – figure supplement 5.
  
  5) More characterization of DMD organoids.
  
  One of the key applications of muscle organoids is disease model. They have generated DMD muscle organoids, but rarely characterized except for currents. I recommend conducting immunofluorescence of DMA organoids to confirm structure change. Very intriguing to see scRNAseq of DMD organoids and align with disease etiology.
  
  We agree, that data on disease modelling using DMD patient-derived cells would further enhance the manuscript's impact. The manuscript in its current form should present our skeletal muscle organoid differentiation protocol to the community with a focus of the developmental processes which are mimiced by this model. We already have a study of Duchenne muscular dystrophy modelling using our organoid system under way.
  
  6) More characterization of engraft.
  
  Authors could measure the size of myotube between mice and human.
  
  We have quantitatively evaluated the myotubes in the transplantation experiment illustrated in Figure 4I,J. The mean diameter is 41+/-6 µm for the human and 63+/-7 µm for the mice fibers (n=15 each). See Author response image 1.
  
  Author response image 1.
  
  Does PAX7+ satellite cell exist in engraft? To exclude cell fusion events make up the observation, I recommend to engraft in GFP+ immunodeficient mice. Could the authors comment how long engraft survive.
  
  We would claim satellite cells within our engrafts with the DAPI-blue nuclei surrounded by green human lamin A/C as in Author response image 2. We have analysed all our mice six weeks post transplantation for engrafting similar to other groups in the field.
  
  Author response image 2.
  
  Reviewer #1 (Recommendations For The Authors):
  
  The manuscript ends abruptly with the mouse transplantation experiment that appears a bit preliminary. It basically shows that cells survive but functional (or ultrastructural) integration is not shown. Suggest clarifying motivation and interpretation of the in vivo data.
  
  Back in 2020 our manuscript had already passed detailed review processes whereby we struggled by not providing any in vivo data concerning repopulation of our progenitor cells. Coming from the human pluripotent stem cell biology field we have never completely understood the value of this hybrid experiments to test human cells in mouse again.
  
  For the current version, we have then taken additional efforts to transplant our progenitor cells into injured skeletal muscle cells similarly to other groups in the field (Alexander et al., 2016, Marg et al., 2019, Tanoury et al., 2020) (Figure 4I,J). A proof that 3D-derived progenitor cells have a clear repopulation advantage over progenitor cells derived in a 2D protocol would go beyond what can be done within the scope of our study. We are still mainly basing our claims on the extended bulk and single RNA seq comparison to progenitor cells obtained by others. However, to address the demand of several experts to test our cells also in vivo, we can also provide in vivo data in the current manuscript version.
  
  Within the Discussion we are suggesting further evaluations using these transplantations: It would be of interest for future studies to investigate whether increased engraftment can be achieved in 3D protocols (Faustino Martins et al., 2020; Shahriyari et al., 2022; ours) versus 2D patterned progenitor cells.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor comments:
  
  7) Plot CD82 gene on UMAP of Figure 4
  
  We had provided a CD82 scRNAseq analysis within the t-SNE plots of Figure 3 – figure supplement 1, which is demonstrating, that CD82-positive cells almost exclusively overlap with Pax7-positive cells, being a subcluster of them. We agree, that the reader will benefit from this further analysis and we are now providing in Author response image 3 additional CD82 and Pax7 UMAP plots on the myogenic progenitor / satellite cell clustering analysis of Figure 4F within the new Figure 4 – figure supplement 4E.
  
  Author response image 3.
  
  8) Immunofluorescence of CD82 in organoids
  
  We have tried CD82 immunofluorescence analysis on our organoids but are not very satisfied with the technical outcome. The available CD82 antibody seems to be primarily suited for FACS analysis and not for immunohistochemistry on slices.
  
  9) Change red-green color of the heatmap. Color-blind person cannot see it well
  
  We have changed all heatmaps to yellow-purple in the main Figure 2G and the Supplemental Figures S2.1 and S3.1..
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2020.09.14.295832v5
www.biorxiv.org www.biorxiv.org

Decoding the complexity of delayed wound healing following Enterococcus faecalis infection

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.
  
  Strengths:
  
  It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.
  
  Weaknesses:
  
  Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of reepithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.
  
  Thank you for your thoughtful review and acknowledgment of the thoroughness of our analysis.
  
  First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study.
  
  We acknowledge your concerns regarding the limitations of our study, particularly regarding the small number of mice per group and the examination of only one time point post-wounding. We agree that a more comprehensive analysis across multiple time points would provide a deeper understanding of the temporal changes induced by infection. While our primary focus in this study was to elucidate the foundational responses to bacteria-infected wounds, we attempted to augment our analysis by incorporating publicly available datasets of similar nature. However, these datasets lacked power in terms of cell number and populations. Nonetheless, we have bolstered our analysis by applying a crossentropy test on the integrated dataset and reporting its significance (Figure S1F), ensuring the robustness of our single-cell RNA sequencing datasets.
  
  Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study.
  
  We also recognize the significance of comparing infected wounds to unwounded skin to establish a baseline for transcriptional changes. While we attempted to incorporate publicly available unwounded skin samples into our analysis, we encountered limitations in the number of cells, particularly within the immune population. This constraint is addressed in the Limitations section of the manuscript.
  
  Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds.
  
  Regarding the concern about differences between murine and human wound healing mechanisms, we took measures during tissue isolation to mitigate this issue, extracting incisions of the wounds rather than contracted tissues. Despite the primary mode of wound closure in mice being contraction, we believe our analysis still offers valuable insights into cellular responses to infection relevant to human wound healing.
  
  We appreciate your constructive criticism of our study. Despite these constraints, we believe our work provides valuable insights into the transcriptional changes induced by infection in healing wounds.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.
  
  Strengths:
  
  The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.
  
  Weaknesses:
  
  The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.
  
  We are thankful for your acknowledgment of the thoroughness of our analysis and the cautious nature of our conclusions.
  
  The analysis is purely descriptive, and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing.
  
  Regarding your concern about the purely descriptive nature of our analysis and the lack of functional validation of identified factors, we agree on the importance of understanding the functional roles of transcriptional changes in wound healing. To address this limitation, we plan to conduct functional experiments, such as perturbation assays or in vivo validation studies, to validate the roles of specific factors identified in our analysis.
  
  The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.
  
  We acknowledge the importance of comparing wounded tissue to unwounded skin to establish a baseline for understanding transcriptional changes. This point is noted and acknowledged in the limitations section of our manuscript.
  
  We appreciate your feedback and assure you that we will consider your suggestions in future iterations of our research.
  
  Recommendations For The Authors:
  
  We are grateful for the positive overall assessment of our revised work by the reviewers. Critical comments on specific aspects of our work are listed verbatim below followed by our responses.
  
  Reviewer 1 (Recommendations for the Authors):
  
  (1) The figures are a bit cluttered and hard to parse out. The different parts of the figure seem to be scattered all over the place with no consistent order.
  
  Thank you for your feedback regarding the figures in our manuscript. We acknowledge your concern that some panels may appear cluttered and challenging to navigate. In response, we made concerted efforts to declutter certain panels, taking into account page size constraints and ensuring a minimum font size for readability.
  
  (2) I didn't really understand what the last sentence on page 6 meant. Is this meant to say that these could be biomarkers of infection?
  
  We thank the reviewer for noting this lack of clarity. We revised the statement.
  
  Updated manuscript (lines 111-113)
  
  “Overall, the persistent E. faecalis infection contributed to higher Tgfb1 expression, whilst Pdgfa levels remained low, correlating with delayed wound healing.”
  
  (3) >(3) A reference on page 19 didn't format correctly.
  
  We thank the reviewer for catching the typos. We corrected the reference formatting.
  
  Updated manuscript (lines 503-505)
  
  “We confirm the immune-suppressive role of E. faecalis in wound healing, consistent with previous findings in different experimental settings (Chong et al., 2017; Kao et al., 2023; Tien et al., 2017).”
  
  (4) The title doesn't really address the scope of the finding which goes beyond immunomodulatory.
  
  The reviewer is correct! We therefore revised the title to cover all aspects of the study as:
  
  “Decoding the complexity of delayed wound healing following Enterococcus faecalis infection”
  
  Reviewer 2 (Recommendations for the Authors):
  
  (1) On page 6, the expression of Tgfb1 is described as "aggravated" by wounding alone. I am not sure whether this means Tgfb1 levels are increased or decreased. It appears from the data that it is increased, which was confusing to me since I interpreted "aggravated" as meaning decreased. So perhaps a different more straightforward word could be used to describe the data.
  
  We modified this ambiguous statement to:
  
  Updated manuscript (lines 105-106)
  
  “By contrast, wounding alone resulted in higher transforming growth factor beta 1 (Tgfb1) expression.”
  
  (2) On page 7, the authors state that "cells from infected wounds...demonstrated distinct clustering patterns compared to cells from uninfected wounds (Figure S1F)" but when I look at the data in this figure, I cannot really see a difference. Perhaps the differences could be more clearly highlighted?
  
  Thank you for pointing out this issue. We appreciate the reviewer's comment. We utilized the crossentropy test for statistical comparison, employing UMAP embedding space data. While the data underwent batch correction based on infection status, the UMAP plots for each condition may appear visually similar. However, it's important to note that the number of cells per clusters between the infected and uninfected conditions varies significantly. This aspect influences the selection of points (cells) and their nearest neighbours for statistical testing within each cluster in the embedding space. To address this concern, we have included a table indicating the number of cells per cell type alongside the plot (Figure S1F), providing additional context for the interpretation of our results.
  
  Author response table 1.
  
  Author response image 1.
  
  (3) On page 8, Zeb2hi cells are described as "immunosuppressive" and yet the genes are highlighted to express in include Cxcl2 and IL1b which I would classify as inflammatory, not immunosuppressive. Can the authors be a bit more clear on why they describe the phenotype of these cells as "immunosuppressive"?
  
  We agree with the reviewer that this is a bit counterintuitive. Conventionally, CXCL2 is thought to be chemoattractant for neutrophil recruitment. However, the infection-specific keratinocyte cluster expressing Cxcl2, Il1b, Wfdc17 along with Zeb2 and Thbs1 indicate their myeloid-derived suppressor cell-like features, which play immunosuppressive roles during infection and in cancer (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021).
  
  Updated manuscript (lines 159-163)
  
  “As the barrier to pathogens, keratinocytes secrete a broad range of cytokines that can induce inflammatory responses (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021). However, Zeb2hi keratinocytes co-expressing Cxcl2, Il1b, and Wfdc17, indicate myeloidderived suppressor cell-like phenotype which implies an immunosuppressive environment (Hofer et al., 2021; Veglia et al., 2021).”
  
  (4) On pages 8-9, Keratinocytes are described to express MHC class II. I find this quite unexpected since class II is usually thought to be expressed primarily by APCs such as DCs and B cells. Is there a precedent for keratinocytes to express class II? The authors should acknowledge that this is unexpected and in need of further validation, or support the claim with references in which class II expression has been previously observed on keratinocytes (and is thus not unexpected)
  
  Although MHC class II expression is predominantly on immune cells, an antigen-presenting role for keratinocytes has been reported in many studies (Banerjee et al., 2004; Black et al., 2007; Carr et al., 1986; Gawkrodger et al., 1987; Jiang et al., 2020; Li et al., 2022; Oh et al., 2019; Tamoutounour et al., 2019). Therefore, antigen-presenting role of keratinocytes is known and expected, and we think that this should be further investigated in in the context of wound infection.
  
  Updated manuscript (lines 177-179)
  
  “These genes are associated with the major histocompatibility complex (MHC) class II, suggesting a self-antigen presenting keratinocyte population, which have a role in costimulation of T cell responses (Meister et al., 2015; Tamoutounour et al., 2019).”
  
  REFERENCES
  
  Alshetaiwi, H., Pervolarakis, N., McIntyre, L. L., Ma, D., Nguyen, Q., Rath, J. A., Nee, K., Hernandez, G., Evans, K., Torosian, L., Silva, A., Walsh, C., & Kessenbrock, K. (2020). Defining the emergence of myeloid-derived suppressor cells in breast cancer using single-cell transcriptomics. Science Immunology, 5(44), eaay6017. https://doi.org/10.1126/sciimmunol.aay6017
  
  Banerjee, G., Damodaran, A., Devi, N., Dharmalingam, K., & Raman, G. (2004). Role of keratinocytes in antigen presentation and polarization of human T lymphocytes. Scandinavian Journal of Immunology, 59(4), 385–394. https://doi.org/10.1111/j.0300-9475.2004.01394.x
  
  Black, A. P. B., Ardern-Jones, M. R., Kasprowicz, V., Bowness, P., Jones, L., Bailey, A. S., & Ogg, G. S. (2007). Human keratinocyte induction of rapid effector function in antigen-specific memory CD4+ and CD8+ T cells. European Journal of Immunology, 37(6), 1485–1493. https://doi.org/10.1002/eji.200636915
  
  Carr, M. M., McVittie, E., Guy, K., Gawkrodger, D. J., & Hunter, J. A. (1986). MHC class II antigen expression in normal human epidermis. Immunology, 59(2), 223–227.
  
  Gawkrodger, D. J., Carr, M. M., McVittie, E., Guy, K., & Hunter, J. A. (1987). Keratinocyte expression of MHC class II antigens in allergic sensitization and challenge reactions and in irritant contact dermatitis. The Journal of Investigative Dermatology, 88(1), 11–16. https://doi.org/10.1111/1523-1747.ep12464641
  
  Jiang, Y., Tsoi, L. C., Billi, A. C., Ward, N. L., Harms, P. W., Zeng, C., Maverakis, E., Kahlenberg, J. M., & Gudjonsson, J. E. (2020). Cytokinocytes: The diverse contribution of keratinocytes to immune responses in skin. JCI Insight, 5(20), e142067, 142067. https://doi.org/10.1172/jci.insight.142067
  
  Li, D., Cheng, S., Pei, Y., Sommar, P., Kärner, J., Herter, E. K., Toma, M. A., Zhang, L., Pham, K., Cheung, Y. T., Liu, Z., Chen, X., Eidsmo, L., Deng, Q., & Xu Landén, N. (2022). Single-Cell Analysis Reveals Major Histocompatibility Complex II‒Expressing Keratinocytes in Pressure Ulcers with Worse Healing Outcomes. The Journal of Investigative Dermatology, 142(3 Pt A), 705–716. https://doi.org/10.1016/j.jid.2021.07.176
  
  Oh, S., Chung, H., Chang, S., Lee, S.-H., Seok, S. H., & Lee, H. (2019). Effect of Mechanical Stretch on the DNCB-induced Proinflammatory Cytokine Secretion in Human Keratinocytes. Scientific Reports, 9(1), 5156. https://doi.org/10.1038/s41598-019-41480-y
  
  Siriwach, R., Ngo, A. Q., Higuchi, M., Arima, K., Sakamoto, S., Watanabe, A., Narumiya, S., & Thumkeo, D. (2022). Single-cell RNA sequencing identifies a migratory keratinocyte subpopulation expressing THBS1 in epidermal wound healing. iScience, 25(4), 104130. https://doi.org/10.1016/j.isci.2022.104130
  
  Tamoutounour, S., Han, S.-J., Deckers, J., Constantinides, M. G., Hurabielle, C., Harrison, O. J., Bouladoux, N., Linehan, J. L., Link, V. M., Vujkovic-Cvijin, I., Perez-Chaparro, P. J., Rosshart, S. P., Rehermann, B., Lazarevic, V., & Belkaid, Y. (2019). Keratinocyte-intrinsic MHCII expression controls microbiota-induced Th1 cell responses. Proceedings of the National Academy of Sciences of the United States of America, 116(47), 23643–23652. https://doi.org/10.1073/pnas.1912432116
  
  Veglia, F., Sanseviero, E., & Gabrilovich, D. I. (2021). Myeloid-derived suppressor cells in the era of increasing myeloid cell diversity. Nature Reviews. Immunology, 21(8), 485–498. https://doi.org/10.1038/s41577-020-00490-y
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.16.566967v3
www.biorxiv.org www.biorxiv.org

Transplantation of exogenous mitochondria mitigates myocardial dysfunction after cardiac arrest

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 3 (Public review):
  
  Major comments:
  
  (1) Can isolated mitochondria be transported to cultured cardiomyocytes, such as H9C2 cells, in vitro?
  
  Thank you for this insightful question. Mitochondria are highly dynamic organelles that play a crucial role in cellular energy metabolism. When cells encounter various stressors and increased energy demands, they can benefit from the incorporation of exogenous mitochondria. In 2013, Masuzawa et al. (Masuzawa, et al.,2013) were the first to demonstrate that transplanted mitochondria are internalized by cardiomyocytes 2 to 8 hours after transplantation, significantly contributing to the preservation of myocardial energetics. Ali et al. (Ali, et al.,2020) discovered that exogenous mitochondria could be internalized by H9C2 cardiomyocytes as quickly as 5 minutes after co-incubation, resulting in an acute enhancement of normal cellular bioenergetics following mitochondrial transplantation. Pacak et al. (Pacak, et al.,2015) established that the internalization of mitochondria into cardiomyocytes is time-dependent and occurs through actin-dependent endocytosis.
  
  Collectively, these evidences illustrate that exogenous mitochondria can be effectively internalized by H9C2 cells and other cardiomyocytes, our experiments further confirmed that mitochondrial transplantation can be incorporated by the myocardium in vivo.
  
  (2) The description of results in the manuscript is too simple. It lacks detail on the rationale behind the experiments and the significance of the data.
  
  Thank you for this suggestion. We have realized that the results in the submitted manuscript have not been adequately interpreted. We have added necessary details on the rationale behind the experiments and the significance of the data to the results section (Lines 57~59, 69~73, 81~88, 91~98, 100~102, 103~104, 10<sup>9</sup>~115, 124~129, 135~146, 149~157, 159~161, 168~169, 178~179). We would like to express our gratitude to the reviewers once again and hope that our modifications will meet their requirements.
  
  (3) The authors demonstrate that mitochondrial transplantation reduces cardiomyocyte apoptosis. Therefore, Western blot analysis of apoptosis-related caspases could be provided for further confirmation.
  
  Thank you for this constructive comment. We fully agree with the reviewer's perspective on the detection of apoptosis-related caspases and have conducted a Western blot assay to investigate the impact of mitochondria on myocardial tissue. Our new evidence indicates that rats receiving mitochondrial transplantation exhibited reduced expression of cleaved caspase-3 compared with those in the NS and Vehicle groups (Fig. 6G, 6H, Lines 168~169), suggesting that mitochondrial transplantation decreased the level of apoptosis in the myocardium.
  
  (4) Do donor mitochondria fuse with recipient mitochondria? Relevant experiments and data should be provided to address this question.
  
  This is a very helpful comment. Investigating the fate of transplanted mitochondria in myocardial cells after CA is of great significance. The internalization of exogenous mitochondria has been observed across various cell types (Liu, et al.,2021; Shanmughapriya, et al.,2020). Notably, a recent study indicated that after being incorporated into host cells, isolated mitochondria are transported to endosomes and lysosomes. Subsequently, most of these mitochondria escape from these compartments and fuse with the endogenous mitochondrial network (Cowan, et al.,2017). We have discussed this in the manuscript. (Lines 217~220)
  
  Oxidative stress, a pathophysiological phenomenon common to cells suffering from ischemia/reperfusion insults after CA/CPR, was implicated to promote internalization and survival of exogenous mitochondria (Aharoni-Simon, et al.,2022). In our study, we confirmed that mitochondrial transplantation can enhance the metabolism of cardiomyocytes, increase ATP level, and reduce reactive oxygen species (ROS). Our results indirectly confirm that isolated mitochondria can successfully fuse with myocardial mitochondria.
  
  (5) In Figure 5A, the histograms are not labeled with the specific experimental groups.
  
  We apologize for this oversight. We have labeled the specific experimental groups in the histograms presented in Figure 6B and 6C (originally Figure 5A).
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) The age, gender, and strain of the donor rats should be specified in the Methods section. Additionally, it is not obvious what doses of mitochondria were injected into the rats and how the dosage was initially determined.
  
  Thanks for your suggestion. We have included relevant information about the donor rats in the Methods section（Lines 361~362）.
  
  In Mito group, each animal received 0.5 mL of 1× 10<sup>9</sup>/mL mitochondrial suspension. (Lines 342~345). Considerable amounts of data have demonstrated the efficacy of mitochondrial transplantation in cellular, animal, and human research (Alemany, et al.,2024; Kaza, et al.,2017; Liu, et al.,2023). However, there is currently no evidence to determine the optimal dosage for transplantation. In previous research, isolated mitochondria (1 × 10<sup>9</sup>) were delivered to the left coronary ostium in pigs, and can be a viable treatment modality in cardiac ischemia-reperfusion injury (Blitzer, et al.,2020; Guariento, et al.,2020). Additionally, the dose of 1× 10<sup>9</sup> mitochondria achieve the maximal hyperemic effect when administered via intracoronary injection (Shin, et al.,2019). Considering that Sprague-Dawley (SD) rats are smaller than pigs and that there is a loss of mitochondria during pulmonary circulation, we adopted a mitochondrial transplantation dose of 5× 10<sup>8</sup>. We will explore the optimal dosage in our future research.
  
  (2) In Figure 4a, the number of transplanted mitochondria appears to be very low. Considering the high number of mitochondria present in cardiomyocytes, it is unclear whether this small amount of transplanted mitochondria can significantly impact complex II activity and ATP levels in myocardial tissues, as shown in Figures 4b-d, or improve survival post-ROSC, as shown in Figure 2d. Could the observed benefits of mitochondrial transplantation be due to the indirect effects of the injected mitochondria, such as the release of mitochondrial contents, rather than the mitochondria themselves, as discussed by Bertero et al. (2021, Circ. Research)? This issue should be addressed in the manuscript.
  
  Thanks for this wonderful comment. As presented in Fig. 4 (originally Figure 4A), our results indicated the internalization of mitochondria by myocardium, shown by colocalization of Mito-tracker and myocardium marker. We would like to make our points here regrading to Fig. 4:
  
  (1) Significant left ventricular systolic and diastolic dysfunction that occurs in the myocardium shortly after the return of ROSC is referred to post-cardiac arrest myocardial dysfunction (PAMD) (Laurent, et al.,2002). It has demonstrated the efficacy of mitochondrial transplantation for the heart following ischemia-reperfusion injury in cellular, animal, and human studies, despite inadequate mitochondrial internalization (Liu, et al.,2023). A low number of transplanted mitochondria may improve cardiac function.
  
  (2) Only biologically active mitochondria can be specifically labeled with Mito-tracker. Therefore, cardiomyocytes uptake mitochondria that possess complete functionality. Previous results have demonstrated that mitochondrial contents, such as nonviable mitochondria, mitochondrial fractions, mitochondrial deoxyribonucleic acid, ribonucleic acid, exogenous adenosine diphosphate and ATP, do not provide protection to the ischemic heart (McCully, et al.,2017; McCully, et al.,2009).
  
  (3) The specific mechanism for mitochondrial internalization has yet to be fully elucidated. We totally agree with reviewer’s opinion pertaining the presence of other mechanisms of mitochondria transplantation that play a role in cardiac protection. Multiple mechanism may involve in the cardiac protection effect of mitochondria transplantation, and we are actively seeking reasonable approach to verify these hypotheses in an underway study (Lines 236~246).
  
  (3) In Figure 4g, the claims regarding sarcomere length, mitochondrial structure, the number of cristae, accumulated calcium etc. seem to rely on the visual interpretation of representative images. To ensure a reliable interpretation of the data, a blinded quantification of each image in each group should be conducted. The same applies to the claims made in Figure 5E.
  
  Thanks for this suggestion. We have quantitatively evaluated the electron microscope images and HE images of the myocardium to ensure reliable interpretation. Corresponding supplements have been added to the methods (Lines 433~441, 494~496), results sections (Lines 10<sup>9</sup>~115, 178~179), and Figures 5C, 5D, 6K and 6H (originally Figures 4G and 5E).
  
  (4) In line 69, it is unclear why the authors claim that MAP and HR decrease at 1, 2, 3, and 4 hours after ROSC in all groups compared to the Sham group, despite stating in line 72 that "MAP and HR did not differ at any observational time points (P>0.05, Figure 2C)."
  
  We apologize for our inaccurate phrasing. In the presented study, there was no statistically significant difference between MAP and HR at any observational timepoints (P>0.05, Figure 2C). In the NS, Vehicle and Mito groups, the MAP and HR decreased at 1, 2, 3, and 4 hours after ROSC, reaching their nadir at 1 hour. Subsequently, MAP and HR increased gradually but did not show any statistically significant differences compared with the Sham group. (Lines 69~73).
  
  (5) The absence of increased mitochondrial content in the mito-groups should be discussed further in the manuscript.
  
  Thank you for your suggestion. We discussed the reasons why the mass of isolated mitochondria did not increase in Lines 224~235.
  
  (6) The N in Figure 5d should be provided.
  
  Thanks for your suggestion. We have revised the figure legend to include N of Figure 6F (originally Figures 5D).
  
  (7) Figure 6 demonstrates content beyond the findings in this manuscript. This reviewer recommends limiting the graphical abstract to the findings specifically in this paper.
  
  Thanks for your great advice. We have revised Figure 7 (originally Figure 6) and restricted the graphical abstract to the findings presented in this paper.
  
  Minor issues:
  
  (8) The order of data in Figure 4 should be consistent with the text in the manuscript. Figures 4E-F-G are described before Figures 4B-C-D in the text. Similarly, Figure 5F was described before Figure 5E in the text.
  
  Thanks for your great advice. We have rearranged the order of the pictures to align with the text. Thank you for your proposal.
  
  (9) In Figure 4A, the locations of the epicardium, muscle, and endocardium should be indicated for clarity. Also, it is not obvious where the close-up box refers to in the actual image.
  
  Thank you for your suggestion. We primarily seek evidence of mitochondrial internalization within the endocardium, as injury occurs first during myocardial ischemia (Kuwada and Takenaka,2000). The close-up box in Fig. 4 refers to the endocardium.
  
  (10) In Figure 5A, the group annotations are missing from the MDA and SOD graphs. The standard deviation bars for the SOD vehicle and SOD mito groups (3rd and 4th columns) appear to overlap. Can the authors provide the actual p-values?
  
  We apologize for the mission of group annotations in the MDA and SOD graphs. The p-value between the Vehicle group and the Mito group was 0.004. The SOD activity level of myocardial samples in the groups are presented in Table 1.
  
  Author response table 1.
  
  The SOD activity levels of myocardial samples in groups (U/mgprot)
  
  (11) In line 58, NS abbreviation is used without defining what NS is.
  
  We apologize for not including the full name of NS. NS is the abbreviation of normal. It has now been marked in the manuscript. (Line 58)
  
  (12) In line 118, what MDA stands for is not described until line 348. MDA should be defined in the text for the general audience.
  
  We apologize for this. We have defined it in the manuscript. (Lines 156~157)
  
  (13) In line 192, the authors state that "mitochondrial transplantation... increased the expression of antioxidant enzymes after four hours of ROSC," while only SOD activity levels were assessed in the manuscript. Increased activity levels do not necessarily imply an increase in expression levels. This discrepancy should be addressed in the Discussion section.
  
  Sorry for confusing the ‘activity’ with ‘expression’. Although mitochondrial transplantation has been shown to be involved in the restoration of manganese superoxide dismutase levels after ischemic insults, the changes in antioxidant enzyme expression level were not evaluated at the protein level in this paper (Tashiro, et al.,2022). To avoid misunderstandings, we have replaced the term ‘expression’ with ‘activity’ as appropriate. (Lines 268~271)
  
  (14) Mitochondria from non-ischemic gastrocnemius muscle of health donor animals were isolated and a manner that maximized their healing potential. This sentence is not clear.
  
  We apologize for the confusing sentence in the original manuscript. To improve clarity, we have revised that sentence. We isolated mitochondria from allogeneic gastrocnemius muscle tissue of healthy rats and maintained optimal mitochondrial activity and therapeutic effects. (Lines 199~201)
  
  Minor grammar issues:
  
  In line 153, mitochondrial should be mitochondria.
  
  Figure 2D: Percent servival should be percent survival.
  
  There should be a blank in complex IIactivity Figure 4B, and complex IV activity in Figure 4C.
  
  In line 134, Four hours of ROSC, Tissue samples from. Tissue is capital.
  
  In line 190, Similaerly should be similarly.
  
  Thank you for your valuable comments. We apologize for the grammatical issues caused by our oversight. We have made the necessary corrections in the manuscript and figures. (Lines 198, 179, and 268), Figure 2D, Figure 5E (originally Figure 4B); Figure 5F (originally Figure 4C).
  
  Reviewer #2 (Recommendations For The Authors):
  
  Some details are lacking clarity, such as the rationale behind choosing certain doses or time points for interventions.
  
  Thank you for this valuable suggestion. We have explained the rationale behind the selection of the dosage and the timing of the intervention. (Lines 201~212)
  
  I would suggest verifying mitochondrial function using the seahorse experiment oxygen consumption, and to check mitochondrial oxidative stress. I would also suggest checking the mitochondrial permeability transition pore opening, using for example calcein cobalt quenching or simply a kit to examine this further.
  
  Thank you for your valuable advice. In our manuscript, we added results regarding mitochondrial reactive oxygen species (ROS) and the mitochondrial permeability transition pore (mPTP) opening. As anticipated, mitochondrial transplantation reduced the increase in mitochondrial ROS and the mPTP opening in ischemic myocardium. (Lines 135~146, 149~157, 442~455, 460~476, Figure 5H, 5I, 6A)
  
  We agree that seahorse experiment oxygen consumption would be beneficial for understanding the intricacies of their interactions and enhancements. Additionally, Ali et al. (Ali, et al.,2020) have demonstrated that introducing non-autologous mitochondria from healthy skeletal muscle cells into normal cardiomyocytes results in a short-term improvement in bioenergetics, as measured using a Seahorse Extracellular Flux Analyzer. In our results, we have not yet conducted cellular experiments, The process of isolating cells from the myocardial tissue of adult SD rats for Seahorse analysis can lead to secondary damage to the myocardial cells (Jacobson, et al.,1985). In this experiment, we measured ATP content and the activity of mitochondrial complexes to evaluate energy changes after mitochondrial transplantation. We will conduct cell experiments and utilize Seahorse measurements to further clarify the alterations in myocardial energy in future.
  
  For Figure 3B, it would be beneficial to include the relative quantification of the mitochondrial marker COX-IV. Additionally, if feasible, I suggest verifying the representation of the mitochondria outer membrane TOM20 or VDAC.
  
  Thank you for your great suggestion. As suggested, we added TOM20 to assess the purity of the isolated mitochondria and reached the same conclusion: the isolated mitochondria exhibited high purity (Figure 3B). TOM20 was expressed in both muscle lysates and isolated mitochondria, whereas GAPDH was exclusively found in the muscle lysate. (We re-validated the purity of the mitochondria by using relative quantification of TOM20 and COX VI.)
  
  In Figure 2C, the clarity of the graphs depicting both arterial pressure (MAP) and heart rate (HR) is lacking and could potentially confuse the reader. I recommend incorporating color coding instead of relying solely on symbols, or by presenting the data in a more comprehensible format and that aligns with graph B as well.
  
  Thank you for your constructive comments. We have color-coded the diagrams in Figure 2B and 2C.
  
  In Figure 4A, please include high-magnification of the mitochondria to provide a more detailed examination.
  
  Thank you for this insightful comment. We have provided a high-magnification image of the mitochondria in Figure 4.
  
  Regarding lines 81-82, I recommend specifying the sentence more precisely for better clarity and understanding.
  
  Thank you for your comments. We have revised the sentences in lines 83~86 to enhance their clarity for readers.
  
  In the Materials and Methods section, it is crucial to provide precise details. For instance, when staining the exogenous mitochondria with MitoTracker Red, it is important to specify the duration of staining, such as the standard 20 minutes for example. Additionally, it is advisable to mention the number of times these mitochondria were washed with the respiratory solution to ensure thorough removal of excess MitoTracker, thus preventing unintended staining of endogenous mitochondria with MitoTracker red upon injection of pre-labeled mitochondria.
  
  Thank you for your suggestion. We have added the necessary details regarding Mito-Tracker Red dyeing. (Lines 373~376) In addition, we also added other details in necessary (Lines 373~376, 379~382, 395~396, 397~400, 487~488). We appreciate your suggestion once again.
  
  The sensitivity of JC-1 dye to temperature and pH fluctuations underscores the necessity for meticulous experimental conditions. It is crucial for the authors to elucidate why they chose to maintain the samples at 4 {degree sign} C for 60 minutes, especially considering the dye's optimal operating temperature of 25 {degree sign} C. Providing a rationale behind this deviation from standard protocol would enhance the scientific rigor and reproducibility of the study. Please add more information on the objectives used in the fluorescence microscope (BX53, OLYMPUS, Tokyo, Japan) and the software used.
  
  We sincerely apologize for the mistake in this sentence. The purified mitochondria, which are stained with JC-1, should be stored at 4°C and examined using a fluorescence microscope within 60 minutes. Purified mitochondria were incubated with JC-1 staining solution at 37°C for 20 minutes. The fluorescence microscope used in our experiment is equipped with a WHN 10/22 eyepiece, and the software version is OLYMPUS cellSens Standard 3.2. (Lines 379~382)
  
  Moreover, in the context of immunoblotting, it is imperative for the authors to furnish detailed information regarding the preparation of muscle tissue homogenates. Specifically, clarification is needed regarding the solution utilized for tissue grinding. Did the authors employ ice-cold RIPA lysis buffer or an alternative lysis buffer, supplemented with a protease inhibitor cocktail? Such details are pivotal for methodological transparency.
  
  Thanks for this wonderful comment. In the methods section, we added detailed information about protein extraction. (Lines 383~385)
  
  Furthermore, it would be beneficial for the authors to specify the instrument employed for scanning the immunoblots, as well as the software utilized for subsequent analysis of the immunoblot images. Providing this information would not only enhance the reproducibility of the findings but also facilitate the evaluation of the experimental results.
  
  Thank you for your suggestion. We have included the instrument used for scanning the Western blot, as well as the software used for image analysis in the manuscript. (Lines 397~400)
  
  Authors must exercise caution against copy-pasting. In line 282, there's a query regarding how the mitochondria were isolated. It is recommended to cite a specific reference and offer more comprehensive details. Despite the authors referencing a number within the text, the absence of numbered references makes it challenging to cross-reference.
  
  Thank you for pointing this out; we have updated the citation accordingly (Line 361).
  
  Figure 5C please double check some misspelling label errors (e.g: Vehicle and not Vehucle).
  
  We apologize for the misspelling in Figure 6E (originally Figure 5C) and have corrected it. Additionally, we have thoroughly reviewed the text for spelling errors and sincerely apologize once again for the previous mistakes. (Lines 249~252, 322)
  
  References:
  
  Aharoni-Simon M, Ben-Yaakov K, Sharvit-Bader M, Raz D, Haim Y, Ghannam W, Porat N, Leiba H, Marcovich A, Eisenberg-Lerner A, Rotfogel Z. 2022. Oxidative stress facilitates exogenous mitochondria internalization and survival in retinal ganglion precursor-like cells. SCI REP-UK 12:5122. doi:10.1038/s41598-022-08747-3
  
  Alemany VS, Nomoto R, Saeed MY, Celik A, Regan WL, Matte GS, Recco DP, Emani SM, Del NP, McCully JD. 2024. Mitochondrial transplantation preserves myocardial function and viability in pediatric and neonatal pig hearts donated after circulatory death. J THORAC CARDIOV SUR 167: e6-e21. doi: 10.1016/j.jtcvs.2023.05.010
  
  Ali PP, Kenney MC, Kheradvar A. 2020. Bioenergetics Consequences of Mitochondrial Transplantation in Cardiomyocytes. J AM HEART ASSOC 9: e14501. doi:10.1161/JAHA.119.014501
  
  Blitzer D, Guariento A, Doulamis IP, Shin B, Moskowitzova K, Barbieri GR, Orfany A, Del NP, McCully JD. 2020. Delayed Transplantation of Autologous Mitochondria for Cardioprotection in a Porcine Model. ANN THORAC SURG 109:711-719. doi: 10.1016/j.athoracsur.2019.06.075
  
  Cowan DB, Yao R, Thedsanamoorthy JK, Zurakowski D, Del NP, McCully JD. 2017. Transit and integration of extracellular mitochondria in human heart cells. SCI REP-UK 7:17450. doi:10.1038/s41598-017-17813-0
  
  Guariento A, Blitzer D, Doulamis I, Shin B, Moskowitzova K, Orfany A, Ramirez-Barbieri G, Staffa SJ, Zurakowski D, Del NP, McCully JD. 2020. Preischemic autologous mitochondrial transplantation by intracoronary injection for myocardial protection. J THORAC CARDIOV SUR 160: e15-e29. doi: 10.1016/j.jtcvs.2019.06.111
  
  Jacobson SL, Banfalvi M, Schwarzfeld TA. 1985. Long-term primary cultures of adult human and rat cardiomyocytes. BASIC RES CARDIOL 80 Suppl 1:79-82. doi:10.1007/978-3-662-11041-6_15
  
  Kaza AK, Wamala I, Friehs I, Kuebler JD, Rathod RH, Berra I, Ericsson M, Yao R, Thedsanamoorthy JK, Zurakowski D, Levitsky S, Del NP, Cowan DB, McCully JD. 2017. Myocardial rescue with autologous mitochondrial transplantation in a porcine model of ischemia/reperfusion. J THORAC CARDIOV SUR 153:934-943. doi: 10.1016/j.jtcvs.2016.10.077
  
  Kuwada Y, Takenaka K. 2000. [Transmural heterogeneity of the left ventricular wall: subendocardial layer and subepicardial layer]. J CARDIOL 35:205-218.
  
  Laurent I, Monchi M, Chiche JD, Joly LM, Spaulding C, Bourgeois B, Cariou A, Rozenberg A, Carli P, Weber S, Dhainaut JF. 2002. Reversible myocardial dysfunction in survivors of out-of-hospital cardiac arrest. J AM COLL CARDIOL 40:2110-2116. doi:10.1016/s0735- 1097(02)02594-9
  
  Liu D, Gao Y, Liu J, Huang Y, Yin J, Feng Y, Shi L, Meloni BP, Zhang C, Zheng M, Gao J. 2021. Intercellular mitochondrial transfer as a means of tissue revitalization. SIGNAL TRANSDUCT TAR 6:65. doi:10.1038/s41392-020-00440-z
  
  Liu Q, Liu M, Yang T, Wang X, Cheng P, Zhou H. 2023. What can we do to optimize mitochondrial transplantation therapy for myocardial ischemia-reperfusion injury? MITOCHONDRION 72:72-83. doi: 10.1016/j.mito.2023.08.001
  
  Masuzawa A, Black KM, Pacak CA, Ericsson M, Barnett RJ, Drumm C, Seth P, Bloch DB, Levitsky S, Cowan DB, McCully JD. 2013. Transplantation of autologously derived mitochondria protects the heart from ischemia-reperfusion injury. AM J PHYSIOL-HEART C 304:H966-H982. doi:10.1152/ajpheart.00883.2012
  
  McCully JD, Cowan DB, Emani SM, Del NP. 2017. Mitochondrial transplantation: From animal models to clinical use in humans. MITOCHONDRION 34:127-134. doi: 10.1016/j.mito.2017.03.004
  
  McCully JD, Cowan DB, Pacak CA, Toumpoulis IK, Dayalan H, Levitsky S. 2009. Injection of isolated mitochondria during early reperfusion for cardioprotection. AM J PHYSIOL-HEART C 296:H94-H105. doi:10.1152/ajpheart.00567.2008
  
  Pacak CA, Preble JM, Kondo H, Seibel P, Levitsky S, Del NP, Cowan DB, McCully JD. 2015. Actin-dependent mitochondrial internalization in cardiomyocytes: evidence for rescue of mitochondrial function. BIOL OPEN 4:622-626. doi:10.1242/bio.201511478
  
  Shanmughapriya S, Langford D, Natarajaseenivasan K. 2020. Inter and Intracellular mitochondrial trafficking in health and disease. AGEING RES REV 62:101128. doi: 10.1016/j.arr.2020.101128
  
  Shin B, Saeed MY, Esch JJ, Guariento A, Blitzer D, Moskowitzova K, Ramirez-Barbieri G, Orfany A, Thedsanamoorthy JK, Cowan DB, Inkster JA, Snay ER, Staffa SJ, Packard AB, Zurakowski D, Del NP, McCully JD. 2019. A Novel Biological Strategy for Myocardial Protection by Intracoronary Delivery of Mitochondria: Safety and Efficacy. JACC-BASIC TRANSL SC 4:871-888. doi: 10.1016/j.jacbts.2019.08.007
  
  Tashiro R, Bautista-Garrido J, Ozaki D, Sun G, Obertas L, Mobley AS, Kim GS, Aronowski J, Jung JE. 2022. Transplantation of Astrocytic Mitochondria Modulates Neuronal Antioxidant Defense and Neuroplasticity and Promotes Functional Recovery after Intracerebral Hemorrhage. J NEUROSCI 42:7001-7014. doi:10.1523/JNEUROSCI.2222-21.2022
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.29.591638v2
www.biorxiv.org www.biorxiv.org

CPT1A Mediates Radiation Sensitivity in Colorectal Cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Fats and lipids serve many important roles in cancers, including serving as important fuels for energy metabolism in cancer cells by being oxidized in the mitochondria. The process of fatty acid oxidation is initiated by the enzyme carnitine palmitoyltransferase 1A (CPT1A), and the function and targetability of CPT1A in cancer metabolism and biology have been heavily investigated. This includes studies that have found important roles for CPT1A in colorectal cancer growth and metastasis.
  
  In this study, Chen and colleagues use analysis of patient samples and functional interrogation in animal models to examine the role CPT1A plays in colorectal cancer (CRC). The authors find that CPT1A expression is decreased in CRC compared to paired healthy tissue and that lower expression correlates with decreased patient survival over time, suggesting that CPT1A may suppress tumor progression. To functionally interrogate this hypothesis, the authors both use CRISPR to knockout CPT1A in a CRC cell line that expresses CPT1A and overexpress CPT1A in a CRC cell line with low expression. In both systems, increased CPT1A expression decreased cell survival and DNA repair in response to radiation in culture. Further, in xenograft models, CPT1A decreased tumor growth basally and radiotherapy could further decrease tumor growth in CPT1A-expressing tumors. As CRC is often treated with radiotherapy, the authors argue this radiosensitization driven by CPT1A could explain why CPT1A expression correlates with increased patient survival.
  
  Lastly, Chen and colleagues sought to understand why CPT1A suppresses CRC tumor growth and sensitizes the tumors to radiotherapy in culture. The antioxidant capacity of cells can increase cell survival, so the authors examine antioxidant gene expression and levels in CPT1A-expressing and non-expressing cells. CPT1A expression suppresses the expression of antioxidant metabolism genes and lowers levels of antioxidants. Antioxidant metabolism genes can be regulated by the FOXM1 transcription factor, and the authors find that CPT1A expression regulates FOXM1 levels and that antioxidant gene expression can be partially rescued in CPT1A-expressing CRC cells. This leads the authors to propose the following model: CPT1A expression downregulates FOXM1 (via some yet undescribed mechanism) which then leads to decreased antioxidant capacity in CRC cells, thus suppressing tumor progression and increasing radiosensitivity. This is an interesting model that could explain the suppression of CPT1A expression in CRC, but key tenets of the model are untested and speculative.
  
  Strengths:
  
  Analysis of CPT1A in paired CRC tumors and non-tumor tissue using multiple modalities combined with analysis of independent datasets rigorously show that CPT1A is downregulated in CRC tumors at the RNA and protein level.
  
  The authors use paired cell line model systems where CPT1A is both knocked out and overexpressed in cell lines that endogenously express or repress CPT1A respectively. These complementary model systems increase the rigor of the study.
  
  The finding that a metabolic enzyme generally thought to support tumor energetics actually is a tumor suppressor in some settings is theoretically quite interesting.
  
  We would like to thank Reviewer #1 for the positive comments.
  
  Weaknesses:
  
  The authors propose that CPT1A expression modulates antioxidant capacity in cells by suppressing FOXM1 and that this pathway alters CRC growth and radiotherapy response. However, key aspects of this model are not tested. The authors do not show that FOXM1 contributes to the regulation of antioxidant levels in CRC cells and tumors or if FOXM1 suppression is key to the inhibition of CRC tumor growth and radiosensitization by CPT1A. Thus, the model the authors propose is speculative and not supported by the existing data.
  
  We thank the reviewer for the valuable comment. In this study, we employed Western blotting to assess the protein levels of the ROS scavenging enzymes CAT, SOD1, and SOD2 following FOXM1 overexpression. This approach allowed us to evaluate how FOXM1 regulates ROS clearance and mediates cellular radiation resistance. Further in-vivo evidence is needed and will be addressed in future research.
  
  The authors propose two mechanisms by which CPT1A expression triggers radiosensitization: decreasing DNA repair capacity (Figure 3) and decreasing antioxidant capacity (Figure 5). However, while CPT1A expression does alter these capacities in CRC cells, neither is functionally tested to determine if altered DNA repair or antioxidant capacity (or both) are the reason why CRC cells are more sensitive to radiotherapy or are delayed in causing tumors in vivo. Thus, this aspect of the proposed model is also speculative.
  
  We thank the reviewer for the valuable comment. In this study, we combined a colony formation assay, multi-target single-hit survival model, comet assay, and Western blotting (for γH2AX) to evaluate DNA damage and repair in cells. Additionally, we employed qPCR, Western blotting, and enzyme activity kits to assess the direct ROS-scavenging activities of the peroxisomal enzymes CAT, SOD1, SOD2, and SOD3.
  
  The authors find that CPT1A affects radiosensitization in cell culture and assess this in vivo. In vivo, CPT1A expression slows tumor growth even in the absence of radiotherapy, and radiotherapy only proportionally decreases tumor growth to the same extent as it does in CPT1A non-expressing CRC tumors. The authors propose from this data that CPT1A expression also sensitizes tumors to radiotherapy in vivo. However, it is unclear whether CPT1A expression causes radiosensitization in vivo or if CPT1A expression acts as an independent tumor suppressor to which radiotherapy has an additive effect. Additional experiments would be necessary to differentiate between these possibilities.
  
  We thank the reviewer for the valuable comment. As shown in Figure 4D, in the absence of CPT1A knockdown, radiotherapy reduced the percentage of Ki67-positive cells in the xenograft tumors by 32.9% (approximately 39.6% of the pre-irradiation baseline). In contrast, upon CPT1A knockdown, radiotherapy only led to a 14.5% reduction in the percentage of Ki67-positive cells (approximately 15.6% of the pre-irradiation baseline). Furthermore, as illustrated in Figures 4E and 4F, in the absence of CPT1A overexpression, radiotherapy resulted in a 0.10-g decrease in tumor weight (around 52.5% of the pre-irradiation weight), whereas with CPT1A overexpression, radiotherapy induced a more pronounced 0.12-g reduction in tumor weight (approximately 89.7% of the pre-irradiation weight). Collectively, these findings indicate that CPT1A exhibits a radiosensitising effect. We have incorporated these relevant details in the Results section (Lines 196-201 and 204-208).
  
  The authors propose in Figure 3 that DNA repair capacity is inhibited in CRC cells by CPT1A expression. However, the gH2AX immunoblots performed in Figure 3H-I that measure DNA repair kinetics are not convincing that CPT1A expression impairs DNA repair kinetics. Separate blots are shown for CPT1A expressing and non-expressing cell lines, not allowing for rigorous comparison of gH2AX levels and resolution as CPT1A expression is modulated.
  
  We thank the reviewer for the valuable comment. In this study, we also employed a colony formation assay, multi-target single-hit survival model, and comet assay to elucidate the impact of CPT1A on DNA repair capacity. These methods all indicated that DNA repair capacity is inhibited in CRC cells by CPT1A expression.
  
  There are conflicting studies (PMID: 37977042, 29995871) that suggest that CPT1A is overexpressed in CRC and contributes to tumor progression rather than acting as a tumor suppressor as the authors propose. It would be helpful for readers for the authors to discuss these studies and why there is a discrepancy between them.
  
  We thank the reviewer for the valuable comment. We have expanded the discussion of these findings in the relevant section of the manuscript (Lines 317-318). We speculated that the differences between our observations and previous reports may be attributable to the inherent heterogeneity of tumor tissues as well as variations in tumor stage.
  
  Reviewer #2 (Public Review):
  
  The manuscript by Chen et al. describes how low levels of CPT1A in colorectal cancer (CRC) confer radioresistance by expediting radiation-induced ROS clearance. The authors propose that this mechanism of ROS homeostasis is regulated through FOXM1. CPT1A is known for its role in fatty acid metabolism via beta-oxidation of long-chain fatty acids, making it important in many metabolic disorders and cancers.
  
  Previous studies have suggested that the upregulation of CPT1A is essential for the tumor-promoting effect in colorectal cancers (CRC) (PMID: 32913185). For example, CPT1A-mediated fatty acid oxidation promotes colorectal cancer cell metastasis (PMID: 2999587), and repression of CPT1A activity renders cancer cells more susceptible to killing by cytotoxic T lymphocytes (PMID: 37722058). Additionally, inhibition of CPT1A-mediated fatty-acid oxidation (FAO) sensitizes nasopharyngeal carcinomas to radiation therapy (PMID: 29721083). While this suggests a tumor-promoting effect for CPT1A, the work by Chen et al. suggests instead a tumor-suppressive function for CPT1A in CRC, specifically that loss or low expression of CPT1A confers radioresistance in CRC. This makes the findings important given that they oppose the previously proposed tumorigenic function of CPT1A. However, the data presented in the manuscript is limited in scope and analysis.
  
  Major Limitations:
  
  (1) Analysis of Patient Samples
  
  - Figure 1D shows that CPT1A levels are significantly lower in COAD and READ compared to normal tissues. It would be beneficial to show whether CPT1A levels are also significantly lower in CRC compared to other tumor types using TCGA data.
  
  We thank the reviewer for the valuable comment. We assessed the expression levels of CPT1A across all cancer types in the TCGA dataset and found that the abundance of CPT1A in CRC was significantly lower compared to cholangiocarcinoma (CHOL), esophageal carcinoma (ESCA), kidney chromophobe (KICH), acute myeloid leukemia (LAML), and stomach adenocarcinoma (STAD) (Author response image 1).
  
  Author response image 1.
  
  The mRNA level of CPT1A across all cancer types in the TCGA dataset.
  
  - The analysis should include a comparison of closely related CPT1 isoforms (CPT1B and CPT1C) to emphasize the specific importance of CPT1A silencing in CRC.
  
  We thank the reviewer for the valuable comment. We further examined the mRNA expression levels of the CPT1 isoforms CPT1B and CPT1C in COAD and READ tumor samples and their respective normal tissue counterparts. The results showed that CPT1B was significantly upregulated in READ tumor samples compared to normal tissues. Similarly, CPT1C was significantly overexpressed in both READ and COAD tumor samples relative to their normal tissue controls (Author response image 2).
  
  Author response image 2.
  
  The mRNA expression levels of CPT1B and CPT1C in rectal adenocarcinoma (READ) and colon adenocarcinoma (COAD) based on data from the TCGA database. A. CPT1B expression in READ. B. CPT1B expression in COAD. C. CPT1C expression in READ. D. CPT1C expression in COAD.
  
  - Figure 2 lacks a clear description of how IHC scores were determined and the criteria used to categorize patients into CPT1A-high and CPT1A-low groups. This should be detailed in the text and figure legend.
  
  We thank the reviewer for the valuable comment. We have provided a detailed description of the methodology used to determine the IHC scores and criteria applied to categorise patients into CPT1A-high and CPT1A-low groups in the Materials and Methods section (Lines 418-426) as well as the legend of Figure 2A.
  
  - None of Figure 2B or 2C show how many patients were assigned to the CPT1A-low and CPT1A-high groups.
  
  We thank the reviewer for the valuable comment. We have added the number of patients in the CPT1A-low and CPT1A-high groups to the legends of Figures 2B and 2C.
  
  (2) Model Selection and Experimental Approaches
  
  - The authors primarily use CPT1A knockout (KO) HCT116 cells and CPT1A overexpression (OE) SW480 cells for their experiments, which poses major limitations.
  
  We thank the reviewer for the valuable comment.
  
  - The genetic backgrounds of the cell lines (e.g., HCT116 being microsatellite instable (MSI) and SW480 not) should be considered as they might influence treatment outcomes. This should be acknowledged as a major limitation.
  
  We thank the reviewer for the valuable comment. Indeed, the genetic background differences among cell lines represent a significant limitation. We have addressed this issue in the discussion section (Lines 363-365).
  
  - Regardless of their CPT1A expression levels, for the experiments with HCT116 and SW480 cells in Figure 3C-F, it would be useful to see whether HCT116 cells can be further sensitized to radiotherapy in overexpression and whether SW480 cells can be desensitized through CPT1A KO.
  
  We thank the reviewer for the valuable comment. Due to the inherently high levels of CPT1A in the HCT116 cell line, we attempted to perform relevant experiments but were unable to achieve significant overexpression. Similarly, we faced challenges with the SW480 cell line, which has lower levels of CPT1A. We could thus not provide additional insights in this respect.
  
  - The use of only two CRC cell lines is insufficient to draw broad conclusions. Additional CRC cell lines should be used to validate the findings and account for genetic heterogeneity. The authors should repeat key experiments with additional CRC cell lines to strengthen their conclusions.
  
  We thank the reviewer for the valuable comment. To address this issue, we used a radiation-resistant variant of the HCT-15 cell line as a new approach to investigate whether CPT1A is associated with cellular radiation sensitivity. We believe that the data obtained from these acquired resistant cell lines are comparable to those from the ordinary cell lines mentioned in the reviewer’s comment.
  
  (3) Pharmacological Inhibition
  
  Several studies have reported beneficial outcomes of using CPT1 pharmacological inhibition to limit cancer progression (e.g., PMID: 33528867, PMID: 32198139), including its application in sensitization to radiation therapy (PMID: 30175155). Since the authors argue for the opposite case in CRC, they should show this through pharmacological means such as etomoxir and whether CPT1A inhibition phenocopies their observed genetic KO effect, which would have important implications for using this inhibitor in CRC patients.
  
  We thank the reviewer for the valuable comment. The referenced literature has indeed attracted our attention. Our research group is concurrently investigating the role of CPT1A in tumor radiotherapy and immunology, utilising CPT1A inhibitors for experimental validation. We look forward to publishing these related studies to further support the conclusions presented in our manuscript.
  
  (4) Data Representation and Statistical Analysis
  
  - The relative mRNA expression levels across the seven cell lines (Supplementary Figure 1C) differ greatly from those reported in the DepMap (https://depmap.org/portal/). This discrepancy should be addressed.
  
  We thank the reviewer for the valuable comment. The observed differences in mRNA levels may be attributable to variations in cell culture density. For subsequent radiation sensitivity experiments, we maintained our cell culture density at approximately 70–80% confluence.
  
  - The statistical significance of differences in mRNA and protein levels between RT-sensitive and RT-resistant cells should be shown (Supplementary Figure 1C, D).
  
  As suggested, we have included a statistical analysis of the differences in CPT1A mRNA levels between RT-sensitive and -resistant cells in Figure 3 and Supplementary Figure 1C. However, further analysis revealed no significant difference in CPT1A protein levels between these groups. This was attributed to the high variability in grayscale values observed between the groups.
  
  Conclusion
  
  The study offers significant insights into the role of CPT1A in CRC radioresistance, proposing a tumor-suppressive function. However, the scope and depth of the analysis need to be expanded to fully validate these claims. Additional CRC cell lines, pharmacological inhibition studies, and a more detailed analysis of patient samples are essential to strengthen the conclusions.
  
  We would like to thank Reviewer #2 for the comments.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The study aims to elucidate the role of CPT1A in developing resistance to radiotherapy in colorectal cancer (CRC). The manuscript is a collection of assays and analyses to identify the mechanism by which CPT1A leads to treatment resistance through increased expression of ROS-scavenging genes facilitated by FOXM1 and provides an argument to counter this role, leading to a reversal of treatment resistance.
  
  Strengths:
  
  The article is well written with sound scientific methodology and results. The assays performed are well within the scope of the hypothesis of the study and provide ample evidence for the role of CPT1A in the development of treatment resistance in colorectal cancer. While providing compelling evidence for their argument, the authors have also rightfully provided limitations of their work.
  
  We would like to thank Reviewer #3 for the positive comments.
  
  Weaknesses:
  
  The primary weakness of the study is acknowledged by the authors at the end of the Discussion section of the manuscript. The work heavily relies on bioinformatics and in vitro work with little backing of in vivo and patient data. In terms of animal studies, it is to be noted that the model they have used is nude mice with non-orthotopic, subcutaneous xenograft, which may not be the best recreation of the patient tumor.
  
  We thank the reviewer for the insightful comment. Our research group is continuing to explore the role of CPT1A in colorectal cancer radiotherapy and immunotherapy. In a new study, we used a C57BL/6 mouse model to conduct in-vivo experiments. Preliminary data suggest that CPT1A confers heightened radiosensitivity to immunocompetent mice. We look forward to the forthcoming publication of this ongoing research project.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  The manuscript was challenging to read and contained many typographical errors and tangents that were not logically relevant to the logic of the paper. For example, in lines 365-367 the authors talk about peroxisomes being important for redox balance and that they will target peroxisomal pathways. However, the authors do not perform any experiments targeting peroxisomal pathways. So, I found myself quite perplexed. Careful proofreading of the manuscript would improve the utility for readers.
  
  We thank the reviewer for the insightful comments. We have made several additions throughout the manuscript to include more relevant information and experimental details, thereby improving the manuscript’s logical structure and readability. As described in the text, we used the DCFH-DA probe to measure ROS levels in cells, considering that regulation of intracellular ROS levels is a major function of peroxidases. We examined the transcriptional levels, protein expression, and enzymatic activities of peroxidases such as CAT, SOD1, SOD2, and SOD3 through qPCR, Western blotting, and specific assay kits.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Clarification and Flow
  
  Introduction Clarity: The introduction introduces several topics in succession without clearly connecting them. For example, the introduction of FOXM1 on Line 102 lacks clarity in its relationship to the study. Consider discussing these elements only in the discussion section to avoid confusion.
  
  We thank the reviewer for this insightful comment. We have moved the section on FOXM1 to the discussion to enhance readability (Lines 342-348).
  
  Explanation for Non-experts: Both the multi-target single-hit survival model and the comet assay require one sentence to explain their principles for non-experts in the field.
  
  As suggested, we have included brief explanations of the multi-target single-hit survival model and the comet assay in the Materials and Methods section to clarify these concepts to readers not familiar with the subject (Lines 458-460 and 462-465).
  
  (2) Specific Text Revisions
  
  - Line 302: "We transfected the CRISPR/Cas9 lentivirus into HCT 116 ... efficiency of the 2nd site was the highest" - Clarify what is meant by "second site." If you mean the second sgRNA, please use this term.
  
  As suggested, we have revised ‘2nd’ to ‘second’ (Lines 151 and 152).
  
  - Lines 358-359: For the results subsection "Low CPT1A levels accelerate post-radiation ROS scavenging," include an introductory sentence, such as: "To study the mechanism of low CPT1A expression in radiotherapy resistance, we conducted differential gene expression analysis between HCT116 CPT1A KO and NC cells."
  
  As suggested, we have added an introductory sentence in the section titled ‘Low CPT1A Levels Accelerate Post-Radiation ROS Scavenging’ (Lines 215-217).
  
  - Line 359: "The gene expression heatmap showed high consistency among replicates for both HCT 116-NC and HCT 116-KO cells (Supplementary Figure 3A)." If these are technical replicates performed on the same batch of KO or NC cells, please state this clearly.
  
  We have added the suggested information to improve clarity (Line 218).
  
  - Lines 360-362: "With CPT1A knockdown, we found 363 upregulated and 1290 downregulated genes (|log2(fold change)|>1 and P<0.05)." Ensure that the p-value is correct; it seems this should be q-value < 0.05.
  
  As suggested, we have revised ‘p’ to ‘q’ (Lines 220 and 496).
  
  - Line 363: Introduce the term "DEGs" as Differentially Expressed Genes in the main text, not just in the Materials and Methods (line 215).
  
  As suggested, we have introduced the term "DEGs" as Differentially Expressed Genes in the main text (Lines 221-222).
  
  - Lines 364-365: "Showing that the main enriched pathways were in peroxisomes, cell cycle nucleotide excision repair, and fatty acid degradation (Figure 5A)." The data does not support this statement. Clarify that the listed pathways are AMONG the enriched KEGG pathways.
  
  As suggested, we have revised the relevant part in the manuscript (Lines 222-224).
  
  - Line 370: "...following 6 Gy irradiation and 1 h of incubation with DCFH-DA (Figure 5C)." Write out the term DCFH-DA and explain it for non-experts: "a fluorescent redox probe used to detect reactive oxygen species."
  
  As suggested, we have added a brief explanation to clarify the term for readers not familiar with the subject (Lines 230-231).
  
  - Line 444: "CPT1A is an essential tumor suppressor." This statement has not been validated or referenced adequately.
  
  As suggested, we have removed the sentence to improve clarity.
  
  - Line 447: Clarify the relevance of the He, Zhang & Xu reference.
  
  We apologise for the error and have removed the reference.
  
  (3) Figure Improvements
  
  - Standardize Graph Labels: Ensure that graph axis labels and numbering are consistent and legible across the manuscript. For example, Figure 1A has large labels, while Figure 1B has much smaller labels. Ensure all graphs, such as 2C and 3G, have readable labels and numbering.
  
  We thank the reviewer for the insightful comment. We have revised the labels and numbering in Figures 1B, 2C, and 3G.
  
  - Figure 2B and 2C: Correct the x-axis label from "mouths" to "months."
  
  We thank the reviewer for this insightful comment. We have revised the labels in Figure 2B and 2C.
  
  - Figure 3 Legend: Clarify what is meant by "different groups of cell lines" in the legend of Figure 3. Specify whether these are single clones, pooled clones, or mixtures of cells in the text and/or figure legend.
  
  We thank the reviewer for this insightful comment. We have updated the legend of Figure 3 to enhance clarity.
  
  - Figures 3H and 3I: Label the blots clearly to indicate which refer to HCT116 NC and KO and which to SW480 RFP and OE.
  
  We thank the reviewer for this insightful comment. We have revised the labels in Figure 3H and 3I.
  
  - Supplementary Figure 2A: Describe the terms F and W in the legend.
  
  We thank the reviewer for this insightful comment. 'F' denotes fraction and 'W' denotes week. We have updated the legend of Figure 3 and Figure 3-figure supplement 2 to improve clarity.
  
  - Supplementary Data: Consider moving the data described in Supplementary Figure 2 to the main figures as it is among the most convincing data in the paper.
  
  We thank the reviewer for this insightful comment. We have decided to retain this figure at its current position, as we believe the data presented provide complementary evidence supporting the conclusion discussed earlier.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.26.586752v2
www.biorxiv.org www.biorxiv.org

Eugenol mimics exercise to promote skeletal muscle fiber remodeling and myokine IL-15 expression by activating TRPV1 channel

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Weaknesses:
  
  (1) Figure 1: Histomorphological analysis using immunostaining for type I, IIA, IIX, and IIB should be performed and quantified across different muscle groups and also in the soleus. Fiber type switch measured based on qPCR and Westerns does not sufficiently indicate the extent of fiber type switch. Better images for Fig. 1c should be provided.
  
  Thanks for your suggestion. In fact, we attempted immunofluorescent staining for Slow MyHC and Fast MyHC in GAS muscle. However, for the majority of our results, we only observed positive expression of Slow MyHC in a small portion of the muscle sections (as shown in the figure below), so we did not present this result.
  
  In addition, due to the size limitations on uploading image files to Biorxiv, we had to compress the images, resulting in lower resolution pictures. We have attempted to submit clearer images in Fig. 1C
  
  Author response image 1.
  
  Green: Slow MyHC; Red: Fast MyHC
  
  (2) Figure 2: Histomorphological analysis for SDH and NADH-TR should be performed and quantified in different muscle groups. Seahorse or oroborous respirometry experiments should be performed to determine the actually increase in mitochondrial respiratory capacity either in isolated mitochondria or single fibers from vehicle and Eugenol-treated mice. Em for mitochondrial should be added to determine the extent of mitochondrial remodeling. The current data is insufficient to indicate the extent of mitochondrial or oxidative remodeling.
  
  That's a good suggestion. However, we regret to inform you that we are unable to present these results due to a lack of relevant experimental equipment and samples.
  
  (3) Figure 2: Gene expression analysis is limited to a few transcriptional factors. A thorough analysis of gene expression through RNA-seq should be performed to get an unbiased effect of Eugenol on muscle transcriptome. This is especially important because eugenol is proposed to work through CaN/NFAT signaling, major transcriptional regulators of muscle phenotype.
  
  Thanks for your suggestion. Indeed, we believe that in terms of reliability and accuracy, RNA-seq is not as good as RT-qPCR. The advantage of RNA-seq lies in its high throughput, making it suitable for screening unknown transcription factor regulatory mechanisms. In this study, the signaling pathways regulating myokines and muscle fiber type transformation are known and limited, with only the CaN/NFATc1 and the AMPK pathway. Since eugenol mainly acts through the Ca2+ pathway, we primarily focus on the CaN/NFATc1 signaling pathway.
  
  (4) I suggest the inclusion of additional exercise or performance testing including treadmill running, wheel running, and tensiometry. Quantification with a swimming test and measurement of the exact intensity of exercise, etc. is limited.
  
  That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.
  
  (5) In addition to muscle performance, whole-body metabolic/energy homeostatic effects should also be measured to determine a potential increase in aerobic metabolism over anaerobic metabolism.
  
  That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.
  
  (6) For the swimming test and other measurements, only 4 weeks of vehicle vs. Eugenol treatment was used. For this type of pharmacological study, a time course should be performed to determine the saturation point of the effect. Does exercise tolerance progressively increase with time?
  
  Thanks for your suggestion. Due to the potential damage that exhaustive swimming tests inflict on mice, the tested mice are subsequently eliminated to avoid potential interference with the experiment. Therefore, this experiment is only suitable for conducting tests at individual time points.
  
  (7) The authors should also consider measuring adaptation to exercise training with or without Eugenol.
  
  Thanks for your suggestion. The purpose of this study is to investigate whether eugenol mimics exercise under standard dietary conditions. In our future research, we will consider exploring the effects of eugenol under HFD and exercise conditions.
  
  (8) Histomorphological analysis of Wat is also lacking. EchoMRI would give a better picture of lean and fat mass.
  
  That's a good suggestion. However, we did not collect the slices of WAT tissue, so we are unable to supplement this result, we feel sorry for it. In addition, we apologize for being unable to detect lean and fat mass due to a lack of EchoMRI equipment.
  
  (9) The experiments performed to demonstrate that Eugenol functions through trpv1 are mostly correlational. Some experiments are needed with trpv1 KO or KD instead of inhibitor. Similarly, KD for other trpv channels should be tested (at least 1-4 that seem to be expressed in the muscle). Triple KO or trpv null cells should be considered to demonstrate that eugenol does not have another biological target.
  
  Thanks for your professional suggestion. AMG-517 is a specific inhibitor of TRPV1, with a much greater inhibitory effect on TRPV1 compared to other TRP channels. AMG-517 inhibits capsaicin (500 nM), acid (pH 5.0), or heat (45°C) induced Ca2+ influx in cells expressing human TRPV1, with IC50 values of 0.76 nM, 0.62 nM, and 1.3 nM, respectively. However, the IC50 values of AMG-517 for recombinant TRPV2, TRPV3, TRPV4, TRPA1, and TRPM8 cells are >20 μM (Gavva, 2008). Therefore, we believe that using AMG-517 instead of TRPV1 KO cells is sufficient to demonstrate the involvement of TRPV1 in the function of eugenol.
  
  While this study did not exclude the possibility of other TRP channels' involvement, it was based on the fact that eugenol does not promote mRNA expression of other TRP channels, as shown in Fig4A-C. Indeed, as far as we know, besides TRPV1, the effects of other TRP channels on myofiber type transformation remain unknown. This is an aspect that we plan to investigate in the future.
  
  Reference
  
  Gavva NR, Treanor JJ, Garami A, et al. Pharmacological blockade of the vanilloid receptor TRPV1 elicits marked hyperthermia in humans. Pain. 2008;136(1-2):202-210.
  
  (10) Eugenol + trpv1 inhibition studies are performed in c2c12 cells and only looks at myofiber genes expression. This is incomplete. Some studies in mitochondrial and oxsphos genes should be done.
  
  Thanks for your suggestion. In the inhibition experiment, we additionally examined the expression of mitochondrial complex proteins as shown in Figure 5C. And the relevant description has been added in lines 178-183 and 764-765.
  
  (11) The experiments linking Eugenol to ca handling, and calcineurin/nfat activation are all performed in c2c12 cells. There seems to be a link between Eugenol activation and CaN/NFAT activation and fiber type regulation in cells, however, this needs to be tested in mouse studies at the functional level using some of the parameters measured in aims 1 and 2.
  
  Thank you for your professional suggestion. We will attempt to continue these experiments in future studies.
  
  (12) The myokine studies are incomplete. The authors show a link between Eugenol treatment and myokines/IL-15 induction. However, this is purely co-relational, without any experiments performed to show whether IL-15 mediates any of the effects of eugenol in mice.
  
  Indeed, previous studies have adequately demonstrated the regulation of skeletal muscle oxidative metabolism by IL-15. The initial aim of this experiment was to investigate the mechanism by which eugenol promotes IL-15 expression. Through inhibition assays, EMSA, and dual luciferase reporter gene experiments, we have thoroughly demonstrated that eugenol promotes IL-15 expression via the CaN/NFATc1 signaling pathway, thus establishing a novel link between CaN/NFATc1 signaling and the myokine IL-15 expression. In the subsequent experiments, we plan to knock out IL-15 in eugenol-treated C2C12 cells to explore whether IL-15 mediates the effects of eugenol. This will be another aspect of our investigation.
  
  (13) An additional major concern is that it cannot be ruled out that Engenol is uniquely mediating its effects through trpv1. Ideally, muscle-specific trpv1 mice should be used to perform some experiments with Eugenol to confirm that this ion channel is involved in the physiological effects of eugenol.
  
  As you suggested, we agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. In our mice experiments, due to the lack of validation of skeletal muscle-specific TRPV1 knockout, we indeed cannot rule out that eugenol is uniquely mediating its effects through TRPV1. We acknowledge this as a limitation of our study. However, due to limitations in research funding and time, we are currently unable to supplement these experiments. Nevertheless, we believe that our results from in vitro experiments using a TRPV1 inhibitor (which selectively inhibits TRPV1) provide evidence of eugenol's action through TRPV1.
  
  Reviewer #2 (Public Review):
  
  Weaknesses:
  
  (1) Apart from Fig.2A and 2B, they mostly utilised protein expression changes as an index of tissue functional changes. Most of the data supporting the conclusions are thus rather indirect. More direct functional evidence would be more compelling. For example, a lipolysis assay could be used to measure the metabolic function of adipocytes after eugenol treatment in Fig.3. Functional activation of NFAT can be demonstrated by examining the nuclear translocation of NFAT.
  
  Thank you for your professional suggestion. Indeed, as shown in Figure 4G-I, we detected the expression of NFATc1 in the nucleus to illustrate its nuclear translocation.
  
  (2) To further demonstrate the role of TRPV1 channels in the effects of eugenol, TRPV1-deficient mice and tissues could also be used. Will the improved swimming test in Fig. 2B and increased CaN, NFAT, and IL-15 triggered by eugenol be all prevented in TRPV1-lacking mice and tissues?
  
  Thank you for your professional suggestion. We agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. However, due to limitations in research funding and time, we are currently unable to supplement these experiments.
  
  (3) Direct evidence of eugenol activation of TRPV1 channels in skeletal muscles is also lacking. The flow cytometry assay was used to measure Ca2+ changes in the C2C12 cell line in Fig. 5A. But this assay is rather indirect. It would be more convincing to monitor real-time activation of TRPV1 channels in skeletal muscles not in cell lines using Ca2+ imaging or electrophysiology.
  
  Thank you for your professional suggestion. As you suggested, we initially planned to use patch-clamp technique to detect membrane potential changes in skeletal muscle cells under eugenol treatment. However, due to experimental technical limitations, this experiment was not successfully conducted. Therefore, we were compelled to rely solely on flow cytometry to detect Ca2+ levels.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Most of the mRNA and protein data are consistent with each other. However, some of them are not obvious. For example, PGC1a mRNA was increased by eugenol in Fig. 2C but not seen in protein in Fig. 2D. Similarly, Complex I and V mRNA was increased in Fig. 2C but not obvious at protein levels in Fig. 2D, even though they claimed that Complex I and V were both upregulated by eugenol (see: line 123). Another example: IL-15 mRNA was increased by EUG100 but not by EUG50 in the GAS muscle in Fig. 8A. However, EUG50 increased IL-15 protein expression in Fig. 8B. Similar conflict was also seen in IL-15 expression in the TA muscle in Fig. 8A and 8C.
  
  Thanks for your question. As shown in the table below, by standardizing with β-Actin, our statistical data indeed indicate that eugenol promotes the expression of Complex I and V proteins (although the upregulation is minimal). Additionally, protein and mRNA expression do not always correlate, which may be due to potential post-transcriptional and post-translational regulation.
  
  Author response table 1.
  
  (2) Line 115: Figure 2A should be Figure 2B; Line 119: Figure 2B should be Figure 2A. Alternatively, swap Fig2A with Fig. 2B.
  
  Thanks for your correction, we have revised the relevant content in lines 111-113 and 724-725.
  
  (3) Abbreviations of ADF and ADG in Fig. 3A should be defined.
  
  Thank you for your suggestion. We have defined these abbreviations in lines 123-125.
  
  (4) Line 154: TRPV1 mRNA expression was promoted by 25 and 50uM eugenol, not by 12.5uM.
  
  Thank you for your correction. We have revised it in line 150.
  
  (5) Line 173: Increased expression of NFAT suggests that NFAT is activated. This is a rather weak statement. It is more convincing to show the nuclear translocation of NFAT by eugenol treatment.
  
  Thank you for your correction. We have revised the describtion in line 166.
  
  (6) Line 185: The data showing EUG increased slow MyHC fluorescence intensity in Fig. 5D are not clear at all. Quantification is required.
  
  Thank you for your suggestion. We have attempted to submit clearer images in Figure 5E, and the quantification have been provided.
  
  (7) Line 235: IL-15 expression is positively correlated with MyHC IIa, suggesting IL-15 is a slow muscle myokine (See line 2398). However, MyHC IIa is a marker of fast muscle fibres (see line 50).
  
  Thank you for your correction. As you pointed, MyHC IIa is fast-twitch oxidative muscle fiber. We have replaced ‘slow’ with ‘oxidative’ in line 235.
  
  (8) Fig.9C and 9D show that inhibition of TRPV1 and CaN attenuated the upregulation of IL-15 mRNA and protein by eugenol in C2C12 cell line. This result is important in demonstrating the link of TRPV1 and CaN to IL-15. It will be more interesting and physiologically relevant to perform this experiment in primary skeletal muscle cells isolated from mice.
  
  Thank you for your suggestion. This is indeed an interesting idea. We will attempt to continue our experiments in mice and primary porcine muscle cells in future studies.
  
  (9) It is concerning that 4-week-old male mice were used for the study. The 4-week-old mice are immature. Adult mice over 8 weeks should be used. It is thus unknown whether the findings are broadly applicable to adult age.
  
  Thanks for your professional question. Age indeed has an impact on the muscle fiber type in mammals. Based on previously observed patterns of muscle fiber changes with age in various mammals (Katsumata et al., 2021; Pandorf et al., 2012; Hill et al., 2020), we believe that changes in muscle fiber types occur more frequently in juvenile mammals, mainly manifesting as a sharp increase in fast muscle fibers. Therefore, interventions during the juvenile stage might be more effective in promoting the transformation of fast to slow muscle fibers. As a result, in most of our group's research using nutritional interventions to regulate muscle fiber types, we tend to start interventions from the age of 4 weeks in mice. If we began intervention at 8 weeks, we speculate that the effectiveness would not be as potent as starting at 4 weeks. Below are the patterns of muscle fiber changes with age in various mammalian models, provided for reference:
  
  (1) Changes in muscle fiber types with age in pigs:
  
  As shown in the following figure, there is a dramatic change in the muscle fiber types 12 days post birth in pigs, especially with a sharp increase in fast muscle fibers, which continues until day 45. After 45 days of age, the changes in muscle fiber types become relatively gradual.
  
  Author response table 2.
  
  Developmental change Of proportions Of muscle fiber types in Longissimus dorsi muscle determined by histochemical analysis for myosin adenosine triphosphatase activity (%)
  
  Least squares means and pooled standard errors (n = 3). MHC, myosin heavy chain; ND, not detected. *P<0.10, **P<0.01 Least square means followed by different letters on the same row are significantly different (P < 0.05).
  
  Reference:
  
  Katsumata, M., Yamaguchi, T., Ishida, A., & Ashihara, A. (2017). Changes in muscle fiber type and expression of mRNA of myosin heavy chain isoforms in porcine muscle during pre- and postnatal development. Animal science journal, 88(2), 364–371.
  
  (2) Changes in muscle fiber types with age in rats:
  
  As illustrated in the subsequent figure, the muscle fiber types in rats undergo significant changes before 20 days of age (3-week-old), notably with a pronounced increase in type IIb fast-twitch fibers. After reaching 20 days of age, the changes in type IIb muscle fibers tend to stabilize and become more gradual.
  
  Author response image 2.
  
  Reference:
  
  Pandorf, C. E., Jiang, W., Qin, A. X., Bodell, P. W., Baldwin, K. M., & Haddad, F. (2012). Regulation of an antisense RNA with the transition of neonatal to IIb myosin heavy chain during postnatal development and hypothyroidism in rat skeletal muscle. American journal of physiology. 302(7), R854–R867.
  
  (3) Changes in muscle fiber types with age in mice:
  
  As depicted in the following figure, when comparing 10-week-old mice to 78-week-old aged mice, there are no significant changes in muscle fiber types.
  
  Author response image 3.
  
  Reference:
  
  Hill, C., James, R. S., Cox, V. M., Seebacher, F., & Tallis, J. (2020). Age-related changes in isolated mouse skeletal muscle function are dependent on sex, muscle, and contractility mode. American journal of physiology. Regulatory, integrative and comparative physiology, 319(3), R296–R314.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.18.549410v2
www.biorxiv.org www.biorxiv.org

Metabolic disruption impairs ribosomal protein levels, resulting in enhanced aminoglycoside tolerance

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this study, the authors investigate the tolerance of aminoglycosides in E. coli mutants deleted in the Krebs cycle and respiratory chain enzymes. The motivation for this study is unclear. Transport of aminoglycosides is pmf-dependent, as the authors correctly note, and knocking out energy-producing components leads to tolerance of aminoglycosides, this has been well established. In S. aureus, clinically relevant "small colony" strains selected for in the course of therapy with aminoglycosides acquire null mutations in the biosynthesis of heme or ubiquinone, and have been studied in detail. In E. coli, such knockouts have not been reported in clinical isolates, probably due to severe fitness costs.
  
  Response: We sincerely appreciate the time and consideration the reviewer dedicated to evaluating our manuscript. It's important to highlight that while the transport of aminoglycosides is PMF-dependent, recent studies underscore the potential role of metabolic mutations in antibiotic tolerance, a facet that warrants further investigation. For instance, the study by Henimann’s and Michiels' groups explored genomic changes in E. coli strains (including uropathogenic UTI89 strains) subjected to daily antibiotic exposure (Van den Bergh et al., 2022). Notably, mutations predominantly occurred in genes of the nuo operon, a key component of E. coli energy metabolism, suggesting a link between metabolic adaptations and antibiotic tolerance. Furthermore, the research by Collin's group revealed previously unrecognized genes related to central metabolism (e.g., icd, gltD, sucA) that contribute to antibiotic resistance in E. coli cells exposed to multiple antibiotics, including aminoglycosides (Lopatkin et al., 2021). These findings are corroborated by the presence of similar mutations in clinical E. coli pathogens, as evidenced by the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection (Lopatkin et al., 2021). The clinical relevance of metabolic mutations in antibiotic tolerance is increasingly recognized, yet their underlying mechanisms remain enigmatic. Therefore, elucidating the role of metabolic pathways in conferring antibiotic tolerance is highly critical. We have updated the introduction to clearly convey our motivation in this study (see page 4).
  
  At the same time, single-cell analysis has shown that individual cells with a decrease in the expression of Krebs cycle enzymes are tolerant of antibiotics and have lower ATP (Manuse et al., PLoS Biol 19: e3001194). The authors of the study under review report that knocking out ICD, isocitrate dehydrogenase that catalyzes the rate-limiting step in the Krebs cycle, has little effect on aminoglycoside tolerance and actually leads to an increase in the level of ATP over time. This observation does not seem to make much sense and contradicts previous reports, specifically that E. coli ICD is tolerant of antibiotics and, not surprisingly, produces Less ATP (Kabir and Shimizu, Appl Micro-biol Biotechnol. 2004; 65(1):84-96; Manuse et al., PLoS Biol 19: e3001194). Mutations in other Krebs cycle enzymes, unlike ICD, do lead to a dramatic increase in tolerance of aminoglycosides according to the paper under review. This is all very confusing.
  
  Response: Although our data cannot be directly compared to that of Kabir and Shimizu (Mohiuddin Kabir and Shimizu, 2004), due to the utilization of entirely different experimental procedures and measurement techniques, we can draw some parallels to the study conducted by Lewis’ group (Manuse et al., 2021), despite certain differences in experimental protocols. Furthermore, the reviewer has made strong assertions regarding our manuscript based on the findings of Lewis’ group. Thus, we believe it's pertinent to expand our response regarding that study.
  
  In the study of Lewis’ group, bacterial cells were inoculated at a ratio of 1:100 into LB medium from an overnight culture (approximately 16 hours). Subsequently, the cultures were incubated at 37°C for approximately 2 hours, and ATP levels were measured using the BacTiter Glo kit (Promega, Madison, WI, USA). ATP levels were then normalized to cell density, determined through optical density measurements, and represented on a linear diagram. As demonstrated in Supplementary Figure S1c of their paper, there was a 10-15% reduction in normalized ATP levels in the icd mutant compared to the wild type. In our experiments, cells were grown for 24 hours in overnight cultures, diluted 100-fold in fresh media, and ATP levels were measured at 3, 4, 5, and 6 hours using the same kit. ATP levels were normalized to cell counts quantified by flow cytometry. Upon analyzing our data of the icd mutant for around 3 hours (the time point closest to that of the study of Lewis’ group), we observed a reduction of approximately 15-20% (without statistical significance) in the icd mutant compared to the wild-type (see raw data, linear plot, and logarithmic plot below; Author response image 1), which aligns with the findings of Lewis’ group.
  
  We further investigated the gentamicin tolerance of both wild-type and icd mutant strains of E. coli BW25113 (Author response image 2). Our findings indicate that the increased sensitivity of the icd mutant of the MG1655 strain to gentamicin is similar to the observation in the other E. coli strain.
  
  Author response image 1.
  
  ATP levels in the icd mutant. ATP levels of both the mutant and wild-type strains were measured at t=3 hours of cell growth and normalized to cell counts. The figure presents the raw data (a), linear plot (b), and logarithmic plot (c) of the same dataset. This data corresponds to the first panel of Figure 3B in the manuscript.
  
  Author response image 2.
  
  Gentamicin tolerance of wild-type and icd mutant strains of E. coli BW25113. Both wild type and mutant strains were treated with gentamicin (50 µg/ml) for 5 hours at the mid-exponential phase. Cells were plated before and after treatment for CFU/ml counts. The dashed line represents the limit of detection. CFU: Colony forming units.
  
  We think that there are two primary reasons why our study cannot contradict the findings of the Lewis group:
  
  Firstly, our study cannot be directly compared to theirs, as they did not comprehensively explore the impact of gene deletions on cell metabolism beyond the measurement of ATP levels at a single time point (Manuse et al., 2021). Our study encompasses various metabolic parameters such as cellular ATP, redox status, proton motive force (PMF), intracellular pH, and drug uptake throughout the exponential and/or early stationary phase. Additionally, we conducted proteomic analysis for five different strains including mutants and wild type. Moreover, we performed pathway enrichment analysis grounded in the statistical background of the entire genome, encompassing various functional pathway classification frameworks such as Gene Ontology annotations, KEGG pathways, and Uniprot keywords. The results of these pathway enrichment analyses are now available in the Supplementary File (see Supplementary Tables 11-17 in the current manuscript). Thus, we believe it is unjust to deem our study contradictory compared to the Lewis group's study, which does not have a comprehensive analysis of the metabolism of the mutant strains they investigated.
  
  Secondly, our study cannot be compared to that specific study (Manuse et al., 2021) due to the utilization of a distinct antibiotic (ciprofloxacin). Cell tolerance is heavily reliant on the mechanism of action of the antibiotic used. Therefore, the reviewer should have focused on studies closely related to aminoglycoside tolerance. Our study is not confusing or contradictory, as Lewis’ group also demonstrated that the tolerance of the icd mutant to gentamicin was significantly reduced while the tolerance of other TCA cycle mutant strains was increased in a different study (Shan et al., 2015). However, they did not delve into the metabolism of these mutant strains, as we did. We now mention this point in our manuscript (see pages 14-15).
  
  Apart from the confusing data, it is not clear what useful information may be obtained from the choice of the experimental system. The authors examine exponentially growing cells of E. coli for tolerance of aminoglycosides. The population at this stage of growth is highly susceptible to aminoglycosides, and only some rare persister cells can survive. However, the authors do not study persisters. A stationary population of E. coli is tolerant of aminoglycosides, and this is clinically relevant, but this is not the subject of the study.
  
  Response: Respectfully, we must express our disagreement with the reviewer's comments. Our experimental system is meticulously organized and logically structured. Mutant strains such as gltA, sucA, and nuoI deletions exhibit increased tolerance to all aminoglycosides tested, with their fractions clearly increasing around the mid-exponential phase between 3-4 hours (refer to Figure 2B in our manuscript). This surge in tolerance is evident at the population level as well (as depicted in Figure 1A in our manuscript, where certain mutant strains demonstrate complete survival to streptomycin, with survival fractions nearing 1). Given the pronounced increase observed around the mid-exponential phase, we primarily characterize the metabolism of these cells during this growth phase.
  
  It's essential to note that any investigation into antibiotic tolerance and/or resistance holds immense significance, regardless of the growth phase under scrutiny, as antibiotic tolerance/resistance poses a substantial healthcare challenge. Additionally, metabolic mutant strains do not necessarily entail severe fitness costs, as evidenced by Figure S2A published by the Lewis group (Manuse et al., 2021), a finding consistent with our study (see Figure 2B in our manuscript). This phenomenon could confer a survival advantage to bacterial cells, as they may acquire metabolic mutations to bolster their tolerance without incurring significant fitness costs. Furthermore, numerous studies suggest that bacterial cells may opt for the evolutionary pathway leading to increased tolerance before acquiring resistance mechanisms (Levin-Reisman et al., 2017; Santi et al., 2021). The presence of metabolic mutations in clinical E. coli pathogens has also been confirmed through the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection by Collin’s group (Lopatkin et al., 2021). Consequently, comprehending the tolerance mechanisms of metabolic mutations holds paramount importance.
  
  References
  
  Levin-Reisman I, Ronin I, Gefen O, Braniss I, Shoresh N, Balaban NQ. 2017. Antibiotic tolerance facilitates the evolution of resistance. Science (1979) 355:826–830. doi:10.1126/science.aaj2191
  
  Lopatkin AJ, Bening SC, Manson AL, Stokes JM, Kohanski MA, Badran AH, Earl AM, Cheney NJ, Yang JH, Collins JJ. 2021. Clinically relevant mutations in core metabolic genes confer antibiotic resistance. Science (1979) 371. doi:10.1126/science.aba0862
  
  Manuse S, Shan Y, Canas-Duarte SJ, Bakshi S, Sun WS, Mori H, Paulsson J, Lewis K. 2021. Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biol 19. doi:10.1371/journal.pbio.3001194
  
  Mohiuddin Kabir M, Shimizu K. 2004. Metabolic regulation analysis of icd-gene knockout Escherichia coli based on 2D electrophoresis with MALDI-TOF mass spectrometry and enzyme activity measurements. Appl Microbiol Biotechnol 65:84–96. doi:10.1007/s00253-004-1627-1
  
  Santi I, Manfredi P, Maffei E, Egli A, Jenal U. 2021. Evolution of Antibiotic Tolerance Shapes Resistance Development in Chronic Pseudomonas aeruginosa Infections. doi:10.1128/mBio.03482-20
  
  Shan Y, Lazinski D, Rowe S, Camilli A, Lewis K. 2015. Genetic basis of persister tolerance to aminoglycosides in Escherichia coli. mBio 6. doi:10.1128/mBio.00078-15
  
  Van den Bergh B, Schramke H, Michiels JE, Kimkes TEP, Radzikowski JL, Schimpf J, Vedelaar SR, Burschel S, Dewachter L, Lončar N, Schmidt A, Meijer T, Fauvart M, Friedrich T, Michiels J, Heinemann M. 2022. Mutations in respiratory complex I promote antibiotic persistence through alterations in intracellular acidity and protein synthesis. Nat Commun 13:546. doi:10.1038/s41467-022-28141-x
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This interesting study challenges a dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.
  
  Strengths:
  
  This is a complete study. These results are surprising and are based on various read-outs, such as ATP levels, pH measurement, membrane potential, and the uptake of fluorophore-labeled gentamicin. Utilizing a proteomic approach, the authors show instead that in tolerant mutants, there is a decrease in the levels of proteins associated with ribosomes (targets of AG), causing tolerance.
  
  Response: We sincerely appreciate the reviewer for taking the time to read our manuscript and offer valuable suggestions.
  
  Weaknesses:
  
  The use of a single high concentration of aminoglycoside: my main comment on this study concerns the use of an AG concentration well above the MIC (50 µg/ml or 25 µg/ml for uptake experiments), which is 10 times higher than previously used concentrations (Kohanski, Taber) in study showing a link with PMF. This significant difference may explain the discrepancies in results. Indeed, a high concentration of AG can mask the effects of a metabolic disruption and lead to less specific uptake. However, this concentration highlights a second molecular level of tolerance. Adding experiments using lower concentrations (we propose 5 µg/ml to compare with the literature) would provide a more comprehensive understanding of AG tolerance mechanisms during a decrease in metabolism.
  
  Another suggestion would be to test iron limitation (using an iron chelator as DIP), which has been shown to induce AG tolerance. Can the authors demonstrate if this iron limitation leads to a decrease in ribosomal proteins? This experiment would validate their hypothesis in the case of a positive result. Otherwise, it would help distinguish two types of molecular mechanisms for AG tolerance during a metabolic disruption: (i) PMF and uptake at low concentrations, (ii) ribosomal proteins at high concentrations.
  
  Response: While we acknowledge the intriguing possibility of exploring whether iron limitation results in a reduction of ribosomal proteins, we believe that this topic falls slightly outside the scope of our current study. This area warrants independent investigation since our current research did not specifically focus on iron-limited environments (LB medium is iron-rich, as referenced (Abdul-tehrani et al., 1999; Rodríguez-Rojas et al., 2015)). However, we fully concur with the notion that experimental outcomes may be contingent upon the concentration of aminoglycosides (AG). Hence, we repeated the critical experiments using a lower concentration of gentamicin (5 µg/mL), as suggested by the reviewer. Before delving into a discussion of these results, we wish to emphasize two key points. Firstly, the majority of our metabolic measurements, including ATP levels, redox activities, intracellular pH, and metabolomics, were conducted in mutant and wild-type cells in the absence of drugs. Our objective was to elucidate the impact of genetic perturbations of the TCA cycle on cell metabolism. Secondly, it's important to emphasize that our study does not invalidate the hypothesis that AG uptake is proton motive force (PMF)-dependent. We observed similar drug uptake across the strains tested, which is reasonable considering that their energy metabolism and PMF are not significantly altered compared to the wild type (at least we did not observe a consistent trend in their metabolic levels). Consequently, our study does not necessarily contradict with previous claims (Taber Harry W et al., 1987). We have now clarified this point in the manuscript (see pages 1 and 13).
  
  When we employed a lower gentamicin concentration, we still noted a significant elevation in tolerance among the gltA, sucA, and nuoI mutant strains compared to the wild type. Also, it remained evident that the observed tolerance in the mutant strains cannot be ascribed to differences in drug uptake or impaired PMF, as the levels of drug uptake and the disruption of PMF by gentamicin (at lower concentrations) in the mutant strains were comparable to those of the wild type. Moreover, since our metabolic measurements and proteomics analyses failed to reveal any notable alterations in energy metabolism in these strains, the consistency in drug uptake levels across both mutant and wild-type strains, even at lower concentrations, further bolsters the validity of our findings obtained at higher gentamicin concentrations. The new results have been incorporated into the Supplementary file (see Supplementary Figures S1, S5, S7, and S9) and discussed throughout the manuscript.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations For The Authors):
  
  Line 120: Luria-Bertani (LB), used Lysogeny Broth.
  
  Line 180: "RSG dye can be reduced by bacterial reductases of PMF" to be reformulated.
  
  Response: The suggested corrections have been incorporated into the manuscript.
  
  References
  
  Abdul-tehrani H, Hudson AJ, Chang Y, Timms AR, Hawkins C, Williams JM, Harrison PM, Guest JR, Andrews SC. 1999. Ferritin Mutants of Escherichia coli Are Iron Deficient and Growth Impaired, and fur Mutants are Iron Deficient, Journal of Bacteriology.
  
  Rodríguez-Rojas A, Makarova O, Müller U, Rolff J. 2015. Cationic Peptides Facilitate Iron-induced Mutagenesis in Bacteria. PLoS Genet 11. doi:10.1371/journal.pgen.1005546
  
  Taber Harry W, Mueller JP, Miller PF, Arrow AS. 1987. Bacterial Uptake of Aminoglycoside Antibiotics. Microbiol Rev 51:439–457. doi:10.1128/mr.51.4.439-457.1987
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.12.20.572673v3
www.biorxiv.org www.biorxiv.org

New submission 11/12/2023, 08:20:56

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer 1
  
  R1.1) Although very robust and capable of handling several situations, the researcher has to keep in mind that processing has to follow some basic rules in order for this pipeline to work properly. For instance, fiducials and scales need to be included in the photograph, and the slabs must be photographed against a contrasting background.
  
  Our pipeline does indeed have some prerequisites in terms of data acquisition – at the very least, a ruler must be present in the photographs. A contrasting background is not strictly needed, but does definitely facilitate segmentation. We have edited the Introduction and Discussion to emphasize these prerequisites.
  
  R1.2) Also, only coronal slices can be used, which can be limiting for certain situations.
  
  While the 3D reconstruction based on Eq. 1 is quite general, the segmentation is indeed tailored to coronal slices of the cerebrum. As explained in the paper, this orientation is standard when slicing the cerebrum, but axial or sagittal slicing may also be of interest – particularly when dissecting the brainstem or cerebellum. We acknowledge this limitation in the Discussion of the revised manuscript.
  
  R1.3) In the future, segmentation of the histological slices could be developed and histological structures added (such as small brainstem nuclei, for instance). Also, dealing with axial and sagittal planes can be useful to some labs.
  
  While outside the scope of this paper, these are good ideas for future directions, and are considered in the Discussion of the revised version.
  
  Reviewer 2
  
  R2.1) The current method could only perform accurate segmentation on subcortical tissues. It is of more interest to accurately segment cortical tissues, whose morphometrics are more predictive of neuropathology. The authors also mentioned that they would extend the toolset to allow for cortical tissue segmentation in the future.
  
  We agree with the reviewer that cortical parcellation has high value. We have included a new option in Photo-SynthSeg to parcellate the cortex using a machine learning block already existing in SynthSeg 2.0 (Billot et al, PNAS, 2023); see example in Figure 2 of the revised manuscript. This parcellation is volumetric; more accurate methods based on surfaces are out of the scope of this article and remain as future work. The manuscript has been edited to reflect these changes.
  
  R2.2) Brain tissues are not rigid bodies, so dissected slices could be stretched or squeezed to some extent. Also, dissected slices that contain temporal poles may have several disjoined tissues. Therefore, each pixel in dissected photographs may go through slightly diFerent transformations. The authors constrain that all pixels in each dissected photograph go through the same aFine transform in the reconstruction step probably due to concerns of computational complexity. But ideally, dissected photographs should be transformed with some non-linear warping or locally linear transformations. Or maybe the authors could advise how to place diFerent parts of dissected slices when taking dissection photographs to reduce such non-linearity of transforms.
  
  The reviewer is totally right. The problem with nonlinear warps is that, albeit trivial to implement, they compromise the robustness of the registration pipeline. This is because the nonlinear model introduces huge ambiguity in the space of solutions: for example, if one adds identical small nonlinear deformations to every slice, the objective function barely changes. The revised manuscript: (i) more thoroughly discussed this limitation; (ii) discusses nonlinear models for 3D reconstruction as future work; and (iii) makes recommendation about the tissue placement to minimize errors around the temporal pole.
  
  R2.3) For the quantitative evaluation of the segmentation on UW-ARDC, the authors calculated 2D Dice scores on a single slice for each subject. Could the authors specify how this single slice is chosen for each subject? Is it randomly chosen or determined by some landmarks? It's possible that the chosen slice is between dissected slices so SAMSEG cannot segment accurately.
  
  The slice is chosen to be close to the mid-coronal plane, while maximizing visibility of subcortical structures. The chosen slice is always a “real” dissected slice (rather than a digital “virtual” slice) and cannot be located in a gap between slices. This is clarified in the Quantitative Evaluation section of the revised manuscript.
  
  R2.4) Also from Figure 3, it seems that SAMSEG outperforms Photo-SynthSeg on large tissues, WM/Cortex/Ventricle. Is there an explanation for this observation?
  
  Since we use a single central coronal slice when computing Dice, SAMSEG yields very high Dice scores for large structures with strong contrast (e.g., the lateral ventricles). However, Photo-SynthSeg provides better results across the board, particularly when considering 3D analysis (see Figure 2 and results on volume correlations). We have added a comment on this issue to the revised manuscript.
  
  R2.5) In the third experiment, quantitative evaluation of 3D reconstruction, each digital slice went through random aFine transformations and illumination fields only. However, it's better to deform digital slices using random non-linear warping due to the non-rigidity of the brain as mentioned in R2.2. So, the reconstruction errors estimated here are quite optimistic. It would be more realistic if digital slices were deformed using random nonlinear warping.
  
  We agree with the reviewer and, as we acknowledge in the manuscript, the validation of the reconstruction error with synthetic data is indeed optimistic. The problem with adding nonlinear warps is that the results will depend heavily on the strength of the simulated deformation. We keep the warps linear as we believe that the value of this experiment lies in the trends that the errors reflect, as a function of slice thickness and its variability (“jitter”). This has been clarified in the revised manuscript.
  
  Reviewer 2 (recommendations for the authors)
  
  AR2.1) In the abstract, the authors mentioned that the segmentations of the 3D reconstructed stack deal with 11 brain regions, however, in most sections, only 9 tissue masks were compared, such as in Table 1, 2, and Figure 3. Also in the supplementary video, there are only 10 rendered tissues. So, what are these 11 regions? Is the background nonbrain region also counted as a region? And how these 11 regions were derived from the original 36 annotated tissues in T1-39?
  
  We particularly thank the reviewer for noticing this.
  
  The 11 regions are white matter, cortex, ventricle, thalamus, caudate, putamen, pallidum, hippocampus, amygdala, accumbens area, and ventral diencephalon. These are all bilateral labels, i.e., 22 regions in total. The original 36 labels include these 22 and: four labels for the cerebellum (left and right cortex and white matter); the brainstem; five labels for cerebrospinal fluid regions that we do not consider; the left and right choroid plexus; and two labels for white matter hypo intensities in the left and right hemisphere.
  
  As in many other papers, we leave “ventral diencephalon” and “accumbens area” out of the validation as they are not very well defined.
  
  We note that all regions except the accumbens are visible in Figure 1d. The ventral diencephalon is easy to miss as only a small portion of it is visible (when picking a slice, one needs to compromise in terms of how much of each structure is visible). Moreover, it has a very similar color to the cortex in the FreeSurfer convention (see picture below).
  
  Author response image 1.
  
  The accumbens is visible at 1m45s in the, segmented in orange (see capture below).
  
  Author response image 2.
  
  We have clarified these issues in the reviewed version of the manuscript.
  
  RA2.2) In Figure 1(f), why are the hippocampal volumes of confirmed AD subjects larger than those of the healthy controls? Is this a typo or is there any explanation for this?
  
  Yes, it is a typo. Again, thank you very much for noticing this.
  
  RA2.3) Typo on P3, "sex and gender were corrected" should be "age and gender were corrected".
  
  This has been corrected in the revised version.
  
  RA2.4) In the MADRC dataset, the authors mentioned that there are 18 full brains and 58 hemispheres, however, the total data size is 78. Is this a typo?
  
  Yes, it is. It has been corrected in the revised version.
  
  RA2.5) Comparing the binary masks in Figure 5(d) and the photographs in Figure 5(c), some tissues below the ventricles with high intensities are also removed from masks. Is this done by manual editing? If so, how long does it usually take to edit a clean mask for each subject?
  
  We used a combination of thresholding, morphological operations (erosion/dilation), and minor manual edits when needed – particularly to remove chunks of pial surface when they are visible, in the most anterior slices. The average is a couple of minutes per photograph. In the future, we plan to use these manually curated images to train a supervised convolutional neural network to perform the task automatically. These details are provided in the revised manuscript.
  
  RA2.6) In the method of 3d reconstruction, there are four weights for the optimization function. How did the authors determine such weights and do these weights have some impact on the reconstruction performance?
  
  The parameters were set by visual inspection of the output on a small pilot dataset, and do not have a strong impact on the reconstruction. The crucial aspect is to increase 𝜈 (the affine regularizer) and decrease 𝛼 (compliance with the external reference) when using a soft reference. These details have been added to the revised version.
  
  RA2.7) Finally for the deep learning-based segmentation, a U-Net was trained on GMM generated single-channel intensity synthetic images while the dissected photographs are color images with three channels. So, did the authors only input the grayscale photographs to the segmentation network? Are there any other preprocessing steps for color photographs, such as normalization? Is it possible to use GMM to generate color images as training data to better suit dissection photography?
  
  We did try simulating three channels during training, but the performance was actually worse than when simulating one channel and converting the RGB input to grayscale. This information has been added to the revised version.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.08.544050v4
www.biorxiv.org www.biorxiv.org

Enteric glia regulate Paneth cell secretion and intestinal microbial ecology

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The role of enteric glial cells in regulating intestinal mucosal functions at a steady state has been a matter of debate in recent years. Enteric glial cell heterogeneity and related methodological differences likely underlie the contrasting findings obtained by different laboratories. Here, Prochera and colleagues used Plp1-CreERT2 driver mice to deplete the majority of enteric glia from the gut. They found that glial loss has very limited effects on the transcriptome of gut cells 11 days after tamoxifen treatment (used to induce DTA expression), and by extension - more specifically, has only minimal impact on cells of the intestinal mucosa. Interestingly, in the colon (where Paneth cells are not present) they did observe transcriptomic changes related to Paneth cell biology. Although no overt gene expression alterations were found in the small intestine - also not in Paneth cells - morphological, ultrastructural, and functional changes were detected in the Paneth cells of enteric glia-depleted mice. In addition, and possibly related to Paneth cell dysfunction, enteric glia-depleted mice also show alterations in intestinal microbiota composition.
  
  In their analyses of enteric glia from existing single-cell transcriptomic data sets, it is stated that these come from 'non-diseased' humans. However, the data on the small intestine is obtained from children with functional gastrointestinal disorders (Zheng 2023). Data on colonic enteric glia was obtained from colorectal cancer patients (Lee 2020). Although here the cells were isolated from non-malignant regions, saying that the large intestines of these patients are nondiseased is probably an overstatement.
  
  In the Zheng et al. dataset, “functional GI disorders” refers to biopsies from children that do not have any histopathologic evidence of digestive disease. The children do, however, have at least one GI symptom that prompted a diagnostic endoscopy with biopsies, leading to the designation of “functional” disorder. Given that diagnostic endoscopies are invasive procedures that necessitate anesthesia, obtaining biopsies from asymptomatic children without any clinical indication would not be allowable per most institutional review boards, leading the authors of that study to use these samples as a control group. We had thus used the “non-diseased” label to encompass these samples as well as those from the unaffected regions of large intestine from colorectal cancer patients. We now recognize, however, that this label could be misleading, so we have revised the Results and Figure Legends to more accurately reflect details of control tissue origin for this and the Lee et al. (2020) datasets. Per the reviewer’s suggestion, we have removed the term “non-diseased”.
  
  Another existing dataset including human mucosal enteric glia of healthy subjects is presented in Smillie et al (2019). It would be interesting to see how the current findings relate to the data from Smillie et al.
  
  Per the reviewer’s suggestion, we have now added an analysis of the Smillie et al. dataset in Supp. Fig. 1B. This dataset derives from colonic mucosal biopsies from 12 healthy adults (8480 stromal cells) and 18 adults with ulcerative colitis (10,245 stromal cells from inflamed bowel segments and 13,147 from uninflamed), all between the ages of 20-77 years. These data show that SOX10, PLP1, and S100B are selectively expressed within the putative glial cluster from colonic mucosa of both healthy adults and individuals with ulcerative colitis, whereas GFAP is not detected (Supp. Fig. 1B). These observations are consistent with our observations from the two other human datasets already included in our manuscript in Fig. 1 and Supp. Fig. 1.
  
  The time between enteric glia depletion and analyses (mouse sacrifice) must be a crucial determinant of the type of effects, and the timing thereof. In the current study 11 days after tamoxifen treatment was chosen as the time point for analyses, which is consistent with earlier work by the lab using the same model (Rao et al 2017). What would happen when they wait longer than 11 days after tamoxifen treatment? Data, not necessarily for all parameters, on later time points would strengthen the manuscript significantly.
  
  This is an excellent question, particularly given the longer-lived nature of Paneth cells relative to other epithelial cell types. As detailed in our previous study, Cre<sup>+</sup> mice in the Plp1CreER-DTA model are well-appearing and indistinguishable from their Cre-negative control littermates through 11dpt. Unfortunately, a limitation of the model is that beyond 11dpt, Cre<sup>+</sup> mice become anorexic, lose body weight, and have signs of neurologic debility such as hindlimb weakness and uncoordinated gait. These deficits are overt by 14dpt and likely due to targeting Plp1<sup>+</sup> glia outside the gut, such as Schwann cells and oligodendrocytes (as described in another study which used a similar model to study demyelination in the central nervous system, PMID: 20851998). Given these CNS effects and that starvation is well known to affect Paneth cell phenotypes (PMIDs: 1167179, 21986443), we elected not to examine timepoints beyond 11dpt. Technological advances that enable more selective cell depletion will allow study of chronic effects of enteric glial loss in the future.
  
  The authors found transcriptional dysregulation related to Paneth cell biology in the colon, where Paneth cells are normally not present. Given the bulk RNA sequencing approach, the cellular identity in which this shift is taking place cannot be determined. However, it would be useful if the authors could speculate on which colonic cell type they reckon this is happening in.
  
  Per the reviewer’s suggestion, we have added a paragraph to the Discussion addressing one plausible hypothesis to explain this observation. Paneth-like cells have been described in the large intestine and are known, particularly in humans, to express markers typical of Paneth cells, such as lysozyme and defensins (PMID: 27573849, 31753849). These cells could represent the source of the Paneth cell-like transcriptional signature observed in our model. Alternatively, ectopic expression of Paneth cell-associated genes in the colon has been documented in certain pathological conditions, such as colorectal cancer models (e.g., PMID: 15059925), where changes in the local microenvironment appear to trigger activation of Paneth cell genes. Similar, yet unidentified changes in our model could potentially underlie the transcriptional dysregulation related to Paneth cell biology observed here.
  
  On the other hand, enteric glia depletion was found to affect Paneth cells structurally and functionally in the small intestine, where transcriptional changes were initially not identified. Only when performing GSEA with the in silico help of cell type-specific gene profiles, differences in Paneth cell transcriptional programs in the small intestine were uncovered. A comment on this discrepancy would be helpful, especially for the non-bioinformatician readers among us.
  
  Standard differential gene expression analysis (DEG) of the effects of glial loss revealed significant differences only in the colon, and even then, only a handful of genes were changed. These changes were not accompanied by corresponding changes at the protein level, at least as detectable by IHC. In the small intestine, there were no significant differences by standard DEG thresholds. Unlike DEG, gene set enrichment analyses (GSEA), provides a significance value based on whether there is a higher than chance number of genes that are changing in a uniform direction without consideration for the significance of the magnitude of change. Therefore, the GSEA detected that a significant number of genes in the curated Paneth cell gene list exhibited a positive fold change difference in the bulk RNA sequencing data. This prompted us to examine Paneth cells and other epithelial cell types in more detail by IHC, functional and ultrastructural analyses, which all converged on the observation that Paneth cells were relatively selectively disrupted in the epithelium of glial depleted mice.
  
  From looking at Figure 3B it is clear that Paneth cells are not the only epithelial cell type affected (after less stringent in silico analyses) by enteric glial cell depletion. Although the authors show that this does not translate into ultrastructural or numerical changes of most of these cell types, this makes one wonder how specific the enteric glia - Paneth cell link is. Besides possible indirect crosstalk (via neurons), it is not clear if enteric glia more closely associate with Paneth cells as compared to these other cell types. Immunofluorescence stainings of some of these cells in the Plp1-GFP mice would be informative here.
  
  Enteric glia have long been reported to closely associate with crypts, the sites of residence for Paneth cells and intestinal stem cells (PMID: 7043279, 16423922). Consistent with these reports, our observations from Plp1-eGFP mice confirm that enteric glia often appose the entire base of small intestinal crypts (see Author response image 1 below). Given this reproducible observation, we did not pursue histological quantification to compare preferential glial apposition to specific epithelial cell types. Enteric glia have been reported to form close associations with enteroendocrine cells as well (PMID: 24587096), which is not surprising because these cells are highly innervated; however, our analyses did not reveal changes in the abundance and morphology of these cells or other epithelial cell types.
  
  Author response image 1.
  
  (A) Immunohistochemical staining of a small intestinal cross-section from a Vil1<sup>Cre</sup>Rosa26<sup>tdTomato/+</sup> Plp1<sup>eGFP</sup> transgenic mouse in which enteric glia are labeled with green fluorescent protein (GFP) and intestinal epithelial cells are labeled with tdTomato. (B) Mucosal glia closely associate with epithelial cells in intestinal crypts. Scale bar – 20µm.
  
  The authors mention IL-22 as a possible link, but do Paneth cells express receptors for transmitters commonly released by enteric glia? Maybe they can have a look at putative cell-cell interactions by mapping ligand-receptor pairs in the scRNAseq datasets they used.
  
  Beyond IL-22R, it is established that Paneth cells express receptors for secreted WNT proteins, which enteric glia have been shown to express (PMID: 34727519). This interaction could potentially be involved in glial regulation of Paneth cells, but mice lacking glia do not exhibit the same phenotypes as mouse models with disrupted WNT signaling. For example, animals lacking the WNT receptor Frizzled-5 in Paneth cells have mislocalization of Paneth cells to the villi (PMID: 15778706), which we do not readily observe in Plp1CreER-DTA mice. Furthermore, while mucosal enteric glia have been proposed as a source of WNT ligands, this role has been specifically attributed to GFAP+ cells, which may or may not be glia in the mucosa. Moreover, several other cell types in the mucosa around crypts have also been identified as significant sources of WNT ligands (PMID: 16083717, 22922422). We have now added these ideas to the Discussion.
  
  Per the reviewer’s suggestion to use bioinformatics to explore other potential ligand-receptor pairings that might underlie glial regulation of Paneth cells, we conducted a CellPhoneDB analysis focused on these two cell types with a collaborator. This analysis highlighted a handful of potential ligand-receptor interactions, but none of these pathways could be clearly linked to the observed Paneth cell phenotype. Furthermore, virtually all the candidate interactions were not specific to glia, with the candidate ligands expressed by many other more abundant cell types in the mucosa. For these reasons, we decided not to include this analysis in the revised manuscript.
  
  Previously the authors showed that enteric glia regulation of intestinal motility is sex-dependent (Rao et al 2017). While enteric glia depletion caused dysmotility in female mice, it did not affect motility in males. For this reason, most experiments in the current study were conducted in male mice only. However, for the experiments focusing on the effect of enteric glia depletion on hostmicrobiome interactions and intestinal microbiota composition both male and female mice were used. In Figure 8A male and female mice are distinctly depicted but this was not done for Figure 8C. Separate characterization of the microbiome of male and female mice would have helped to figure out how much intestinal dysmotility (in females) contributes to the effect on gut microbial composition. This is an important exercise to confirm that the effect on the microbiome is indeed a consequence of altered Paneth cell function, as suggested by the authors (in the results and discussion, and in the abstract).
  
  In our microbiome analysis, we initially analyzed males and females separately but did not observe significant differences between the two sexes. Thus, we merged the data to increase the statistical power of the genotype comparisons. It was an oversight on our part to not label the datapoints by sex as we did for the other data in the manuscript. We have now revised the figures related to microbiome characterization (Fig. 5D-E and Supp. Fig. 8C) to indicate the sexes of the mice used. Stratifying the data by sex within-sample revealed no major sex-specific differences in microbiome diversity or enriched/depleted biomarkers in the core genotype-dependent observations.
  
  In this context, it would also be interesting to compare the bulk sequencing data after enteric glia depletion between female and male mice.
  
  Our bulk sequencing analysis of the effects of glial loss was conducted in male mice only in order to assess the effects independent of colonic dysmotility, a phenotype observed only in female Plp1CreER-DTA animals (PMID: 28711628). Given that we found rather muted transcriptional changes in male mice, we chose not to perform subsequent transcriptional analyses in female mice, further reasoning that any changes identified would most likely be attributable to dysmotility rather than direct glial effects. Future studies focusing on sex differences in the small intestine, where motility in the Plp1CreER-DTA model is unaffected by glial loss, could provide additional insights, especially in light of the recently reported sex differences in the gene expression and activity levels of enteric glia in the myenteric plexus (PMID: 34593632, 38895433).
  
  Reviewer #1 (Recommendations For The Authors):
  
  - Intro 2nd paragraph: please add to the sentence: "They found no major defects in epithelial properties AT STEADY STATE (or during homeostasis).
  
  Revised as suggested.
  
  - There seems to be a word missing in the 2nd sentence of paragraph 2 on page 4. "... but xxx consistent...".
  
  Reviewed and there were no missing words.
  
  - In the 2nd paragraph on page 8, when discussing GFAP expression in IBD patients, a reference is missing. Also, here it should be GFAP, not Gfap (in italics).
  
  Revised as suggested.
  
  Reviewer #2 (Public Review):
  
  This is an excellent and timely study from the Rao lab investigating the interactions of enteric glia with the intestinal epithelium. Two early studies in the late 1990s and early 2000s had previously suggested that enteric glia play a pivotal role in control of the intestinal epithelial barrier, as their ablation using mouse models resulted in severe and fatal intestinal inflammation. However, it was later identified that these inflammatory effects could have been an indirect product of the transgenic mouse models used, rather than due to the depletion of enteric glia. In previous studies from this lab, the authors had identified expression of PLP1 in enteric glia, and its use in CRE driver lines to label and ablate enteric glia.
  
  In the current paper, the authors carefully examine the role of enteric glia by first identifying that PLP1-creERT2 is the most useful driver to direct enteric glial ablation, in terms of the number of glial cells targeted, their proximity to the intestinal epithelium, and the relevance for human studies (GFAP expression is rather limited in human samples in comparison). They examined gene expression changes in different regions of the intestine using bulk RNA-seq following ablation of enteric glia by driving expression of diphtheria toxin A (PLP1-creERT2;Rosa26-DTA). Alterations in gene expression were observed in different regions of the gut, with specific effects in different regions. Interestingly, while there were gene expression changes in the epithelium, there were limited changes to the proportions of different epithelial cell types identified using immunohistochemistry in control vs glial-ablated mice. The authors then focused on the investigation of Paneth cells in the ileum, identifying changes in the ultrastructural morphology and lysozyme activity. In addition, they identified alterations in gut microbiome diversity. As Paneth cells secrete antimicrobial peptides, the authors conclude that the changes in gut microbiome are due to enteric glia-mediated impacts on Paneth cell activity.
  
  Overall, the study is excellent and delves into the different possible mechanisms of action, including the investigation of changes in enteric cholinergic neurons innervating the intestinal crypts. The use of different CRE drivers to target enteric glial cells has led to varying results in the past, and the authors should be commended on how they address this in the Discussion.
  
  We thank the reviewer for this positive feedback.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I have a few minor comments:
  
  Changes in bacterial diversity - the authors make a very compelling case that changes in the proportions of various intestinal microbiome species were impacted by the change in Paneth cell secretions resulting from the depletion of enteric glia. Another potential mechanism of action could be alterations in gut motility resulting from loss of enteric glia. It appears that faecal samples were collected from both male and female mice, and hence changes in colonic motility could be involved. This should be addressed in the Results and Discussion.
  
  We agree with the reviewer that GI dysmotility could influence microbial composition. To address this, we initially analyzed microbiome data separately for male and female mice, because only female Plp1CreER-Rosa26DTA exhibit dysmotility. We found no significant sex-specific differences in microbiome composition, however, which suggested to us that dysmotility was unlikely to be the primary driver of the observed microbial changes. Based on these findings, we opted to combine data from male and female mice in our final microbiome analysis. We have now revised the Results, Discussion, and Methods sections to clarify this.
  
  Supplementary Figure 2: it would be helpful to include some labels of landmarks on the tissues, and arrows pointing to immunoreactive cells.
  
  We have added labels and arrows to images in Supplementary Figure 2 per the reviewer’s suggestion.
  
  Figure 4B: It's hard to tell the difference in ultrastructural morphology of the Paneth cells between Cre- and Cre+ mice in the EM images. Heterogeneous granules (PG) seem to be labelled in cells from both genotypes of mice. Some outlines of cells or arrows pointing to errant granules would be helpful.
  
  We have added arrows indicated errant granules to images in Figure 4 per the reviewer’s suggestion.
  
  Reviewer #3 (Public Review):
  
  In this study, Prochera, et al. identify PLP1+ cells as the glia that most closely interact with the gut epithelium and show that genetic depletion of these PLP1+ glia in mice does not have major effects on the intestinal transcriptome or the cellular composition of the epithelium. Enteric glial loss, however, causes dysregulation of Paneth cell gene expression that is associated with morphological disruption of Paneth cells, diminished lysozyme secretion, and altered gut microbial composition.
  
  Overall, the authors need to first prove whether the Plp1CreER Rosa26DTA/+ mice system is viable.
  
  In previous work, we discovered that the gene Plp1 is broadly expressed by enteric glia and, within the mouse intestine, is quite specific to glial cells (PMID: 26119414). We characterized the Plp1CreER mouse line as a genetic tool in detail in this initial study. Then in a subsequent manuscript, we used Plp1CreER-DTA mice to genetically deplete enteric glia and study the consequences on epithelial barrier integrity, crypt cell proliferation, enteric neuronal health and gastrointestinal motility (PMID: 28711628). In this second study, we performed extensive validation of the Plp1CreER-DTA mouse model including detailed quantification of glial depletion in the small and large intestines across the myenteric, intramuscular and mucosa compartments by immunohistochemical (IHC) staining of whole tissue segments to sample thousands of cells. We found that the majority of S100B<sup>+</sup>enteric glia were depleted within 5 days in both sexes, including more than 88% loss of mucosal glia, and that this loss was stable at 3 subsequent timepoints (7, 9 and 14 days post-tamoxifen induction of Cre activity). Glial loss was further confirmed by IHC for GFAP in the myenteric plexus, and by ultrastructural analysis of the small intestine to ensure cell depletion rather than simply loss of marker expression. Our group was the first to use this model to study enteric glia, and since then similar models and our key observations have been replicated by other groups (PMID: 33282743, 34550727). Thus, we consider this model to be well established.
  
  Also, most experimental systems have been evaluated by immunohistochemistry, scRNAseq, and electron microscopy, but need quantitative statistical processing.
  
  RNA-sequencing and microbiome analyses are inherently quantitative (Figures 1A-B, Supp. Figure 1, Figure 2, Supp. Figure 4A, Figure 3A-B, Supp. Figure 5, Figure 5, and Supp. Figure 8C). Virtually all our other observations are also supported by quantitative analysis including analysis of mucosal glial markers (Fig. 1C-D), validation of Paneth cell transcript expression in the colon (Supp. Fig. 4B), measurement of epithelial cell type composition (Figure 3C, D), assessment of crypt innervation (Supp. Fig. 7E), and measurement of bacteria-to-crypt distance (Supp. Fig. 8A-B). The only observation that was not quantified was that of morphological abnormalities of Paneth cells. Given the inherently low sampling rate of EM studies, we felt that functional assays (explant secretion assays, effects on microbial composition) would be more meaningful for interrogation of a potential Paneth cell phenotype and thus elected to focus our quantitative analyses on those functional assays rather than further histological measurements.
  
  In addition, the value of the paper would be enhanced if the significance of why the phenotype appeared in the large intestine rather than the small intestine when PLP1 is deficient for Paneth cells is clarified.
  
  Please see detailed response to Reviewer 1 that addresses this comment and the corresponding addition to the Discussion.
  
  Major Weaknesses:
  
  (1) Supplementary Figure 2; Cannot be evaluated without quantification.
  
  Supplemental Figure 2 shows qualitative IHC observations that were highly reproducible across all the subjects indicated for each marker and align well with the quantitative transcriptional data from human subjects shown in Figure 1 and Supplemental Figure 1. The DAB staining in Supplemental Figure 2 could theoretically be quantified by staining intensity or counting cell number but we felt this would be arbitrary and difficult to achieve in a meaningful way with a single chromogen. The DAB reaction is associated with a non-linear relationship between amount of an antigen and staining intensity, especially at higher levels (PMID: 16978204, 19575836), because it is not a direct conjugate and relies upon an enzymatic reaction. The amplification step required for DAB staining using Horseradish Peroxidase (HRP) introduces variability, particularly with cytoplasmic markers and in complex tissue structures like the plexuses, where proteins are distributed throughout the glial network. Counting cell number also would not lead to fair comparisons between markers because while SOX10 shows a clear nuclear signal suitable for quantification, the other markers are all membrane or cytoplasmic proteins, making accurate counting nearly impossible in dense ganglia. Finally, quantifying cell number in 5-micron paraffin sections which have major differences in sampling from one subject to another in terms of presence of ganglia and ganglia size, would also make this prone to inaccuracy. Given these limitations and the robust qualitative data we have shown that aligns completely with the quantitative transcriptional analyses, we respectfully disagree with the reviewer’s comment.
  
  (2) Figure 2A; Is Plp1CreER Rosa26DTA/+ mice system established correctly? S100B immunohistology picture is not clear. A similar study is needed for female Plp1CreER Rosa26DTA/+ mice. What is the justification for setting 5 dpt, 11 dpt? Any consideration of changes to organs other than the intestine? Wouldn't it be clearer to introduce Organoid technology?
  
  Please see the detailed response to first comment. The Plp1CreER- DTA mouse model is well-established and there are detailed experimental justifications for the 5 and 11dpt timepoints as well as the focus on male mice for RNA-sequencing analyses. As described in our previous work (PMID: 28711628), Plp1<sup>+</sup> cells throughout the animal would be affected, including Schwann cells and oligodendrocytes, which is why we limit our analyses to the first 11dpt, when there are fewer confounding variables. The S100B immunohistology picture in Figure 2A was intended to be a schematic graphical representation of the paradigm of glial loss, not a data figure. Extensive validation of glial loss in this model was shown in our previous study. To improve clarity, we have now enlarged the picture for the reader.
  
  Regarding the suggestion to use organoid technology, standard intestinal epithelial organoids do not incorporate any elements of the enteric nervous system (ENS), which is the focus of this study. Some groups have made heroic efforts to incorporate ENS components into intestinal organoids by introducing neural crest progenitor cells and grafting the hybrid organoids under the renal capsule in mice (example PMID: 27869805); but these studies are still limited, and it remains unclear how much the preparations reflect functional, natively innervated intestine. Our ex vivo explant assay preserves native ENS-epithelial interactions, providing a more effective model for studying the relationship between enteric glia and Paneth cells.
  
  (3) Figure 2B; Need an explanation for the 5 genes that were altered in the colon. Five genes should be evaluated by RT-qPCR. Why was there a lack of change in the duodenum and ileum?
  
  While RT-qPCR validation of differentially expressed genes was once common practice, especially with microarray data, there is now robust evidence for strong correlations between RNA sequencing (RNAseq) results and RT-qPCR measurements of gene expression (PMID: 26208977, 28484260). Notably Rajkumar et al. (PMID: 26208977) demonstrated that RNAseq analyzed using DESeq2 (a method which we employed in our study), yields highly accurate results. They reported a 0% false positive rate and a 100% positive predictive value for DESeq2, rendering additional RT-qPCR validation redundant. We only performed RT-qPCR analysis of colonic Lyz1 expression because our IHC analyses failed to show any ectopic expression of the protein in the colons of Cre<sup>+</sup> mice (Supp. Figure 4D) and we wished to validate the gene expression change seen by RNAseq in an independent cohort to be absolutely sure. Per the detailed response to Reviewer 1, we do not have a mechanistic explanation for why there is selective transcriptional induction of Paneth cell-related genes in the colon upon glial depletion. We have elaborated on this in the revised Discussion.
  
  (4) Supplementary Figure 3; Top 3 genes should be evaluated by RT-qPCR.
  
  Given that none of the changes included in Supplementary Figure 3 for the duodenum or ileum reach the standard threshold for statistical significance and in view of the findings by Rajkumar, et al. (2015) described above, we don’t believe that evaluating expression of these genes by RT-qPCR would be informative in interpreting these negative results.
  
  (5) Supplementary Figure 4B, C, and D; Why not show analysis in the small intestine?
  
  We chose to focus on the colon for this analysis because this was the only region of the intestine that exhibited statistically significant differences in transcriptional profiles as assessed by DEG.
  
  (6) Supplementary Figure 4D; Cannot be evaluated without quantification.
  
  As shown in the representative images, no LYZ1 or DEFA5 signal was detected in the colons of Cre<sup>-</sup> or Cre<sup>+</sup> mice (n=3 mice per genotype; >100 crypts/mouse assessed), though it was readily detectable in the ileums of both genotypes. We have now added the number of crypts assessed to the figure legend.
  
  (7) Figure 3D; Cannot be evaluated without quantification.
  
  Please see Fig. 3C for quantification of each cell type marker shown in Figure 3D.
  
  (8) Supplementary Figure 5B and C; Top 3 genes should be evaluated by RT-qPCR.
  
  Please see detailed explanation to comments #3 and #4 above.
  
  (9) Supplementary Figure 6; Top 3 genes should be evaluated by RT-qPCR.
  
  This comment was likely made in error because Supplementary Fig. 6 does not show any gene expression data.
  
  (10) Figure 4A; Cannot be evaluated without quantification.
  
  We appreciate the reviewer’s comment here and strived very hard to add quantification of the Paneth cell granule phenotype seen by light microscopy to our study. IHC for LYZ1 is typically the gold standard for assessment of Paneth cell granules by light microscopy. In our hands, however, we encountered persistent issues with IHC for this protein. While it very reproducibly detected Paneth cells with sufficient specificity to enable quantification of number of immunoreactive cells (as shown in Figure 3C), it did not enable quantification of granule morphology because it consistently exhibited diffuse staining throughout the cell (see Author response image 2 below). This appearance persisted regardless of extensive titration of fixation parameters (time, temperature, fixative supplier, 10% NBF vs 4% PFA), tissue preparation (fixed as intact tubes versus “swiss-rolls”), permeabilization conditions, operator, antibody used, and other variables. Upon subsequently surveying the literature, it seems that similar diffuse staining patterns for LYZ1 have been observed by numerous other groups and this may simply be an experimental limitation.
  
  Author response image 2.
  
  Representative IHC images showing LYZ1 staining optimization. Ileal tissues from 8-10-week-old mice were prepared as either 'swiss-rolls' (A-D) or tubes (E-F) and fixed using different protocols: 10% neutral buffered formalin (NBF) from Epredia (#5710-LP) (A-B, E), 10% NBF from G-Biosciences (#786-1057) (C-D), or 4% paraformaldehyde (PFA) from VWR (#100503-917) (F). Fixations were conducted at room temperature (A, C) or at 4°C (B, D-F). Diffuse cytoplasmic LYZ1 staining is observed within Paneth cells, regardless of conditions of tissue preparation.
  
  As an alternative approach to detecting Paneth cell granules, we tried UEA-I lectin staining. This labeling approach was sufficient to reveal qualitative differences in Paneth granule morphology in Cre<sup>+</sup> mice, as shown in Fig. 4A. However, the transient nature of this lectin labeling made it very difficult to systematically quantify granule morphology in a blinded manner, as we did for our other analyses. Given these persistent challenges, we decided to present qualitative data on morphology by two orthogonal approaches (UEA-I staining by light microscopy and ultrastructure by EM) and focus on functional read-outs for quantitative analyses (explant secretion assays and microbiome analyses). In aggregate, we feel that these data provide robust and complementary evidence of the observed phenotype from independent experimental approaches.
  
  (11) Figure 4D; Cannot be evaluated without quantification.
  
  This comment was likely made in error because there is no Figure 4D.
  
  (12) Additional experiments on in vivo infection systems comparing Plp1CreER Rosa26DTA/+ mice and controls would be great.
  
  We agree that in vivo infection experiments would be very interesting to pursue, particularly given the potential role of Paneth cells in innate immunity. These studies are beyond the scope of the current manuscript, but we hope to report on them in the future.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Patients with inflammatory bowel disease (IBD); UC or CD.
  
  Revised per reviewer suggestion.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.15.589545v2
www.biorxiv.org www.biorxiv.org

New submission 11/02/2024, 12:18:21

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  Thanks to all the reviewers for their insightful and constructive comments, which are very helpful in improving the manuscript. We are encouraged by the many positive comments regarding the significance of our findings and the value of our data. Regarding the reviews’ concern on cell classification, we used several additional marker genes to explain the identification of cell clusters and subclusters. We have further analyzed and rewrote part of the text to address the concerns raised. Here is a point-by-point response to the reviewers’ comments and concerns. Figures R1-R9 were provided only for additional information for reviewers and were not included in the revised manuscript.
  
  Reviewer #1 (Public Review):
  
  In the article "Temporal transcriptomic dynamics in developing macaque neocortex", Xu et al. analyze the cellular composition and transcriptomic profiles of the developing macaque parietal cortex using single-cell RNA sequencing. The authors profiled eight prenatal rhesus macaque brains at five timepoints (E40, E50, E70, E80, and E90) and obtained a total of around 53,000 high-quality cells for downstream analysis. The dataset provides a high-resolution view into the developmental processes of early and mid-fetal macaque cortical development and will potentially be a valuable resource for future comparative studies of primate neurogenesis and neural stem cell fate specification. Their analysis of this dataset focused on the temporal gene expression profiles of outer and ventricular radial glia and utilized pesudotime trajectory analysis to characterize the genes associated with radial glial and neuronal differentiation. The rhesus macaque dataset presented in this study was then integrated with prenatal mouse and human scRNA-seq datasets to probe species differences in ventricular radial glia to intermediate progenitor cell trajectories. Additionally, the expression profile of macaque radial glia across time was compared to those of mouse apical progenitors to identify conserved and divergent expression patterns of transcription factors.
  
  The main findings of this paper corroborate many previously reported and fundamental features of primate neurogenesis: deep layer neurons are generated before upper layer excitatory neurons, the expansion of outer radial glia in the primate lineage, conserved molecular markers of outer radial glia, and the early specification of progenitors. Furthermore, the authors show some interesting divergent features of macaque radial glial gene regulatory networks as compared to mouse. Overall, despite some uncertainties surrounding the clustering and annotations of certain cell types, the manuscript provides a valuable scRNA-seq dataset of early prenatal rhesus macaque brain development. The dynamic expression patterns and trajectory analysis of ventricular and outer radial glia provide valuable data and lists of differentially expressed genes (some consistent with previous studies, others reported for the first time here) for future studies.
  
  The major weaknesses of this study are the inconsistent dissection of the targeted brain region and the loss of more mature excitatory neurons in samples from later developmental timepoint due to the use of single-cell RNA-seq. The authors mention that they could observe ventral progenitors and even midbrain neurons in their analyses. Ventral progenitors should not be present if the authors had properly dissected the parietal cortex. The fact that they obtained even midbrain cells point to an inadequate dissection or poor cell classification. If this is the result of poor classification, it could be easily fixed by using more markers with higher specificity. However, if it is the result of a poor dissection, some of the cells in other clusters could potentially be from midbrain as well. The loss of more mature excitatory neurons is also problematic because on top of hindering the analysis of these neurons in later developmental periods, it also affects the cell proportions the authors use to support some of their claims. The study could also benefit from the validation of some of the genes the authors uncovered to be specifically expressed in different populations of radial glia.
  
  We thank the Reviewer’s comments and apologize for the shortcomings of tissue dissection and cell capture.
  
  We used more marker genes for major cell classification, such as SHOX2, IGFBP5, TAC1, PNYN, FLT1, and CYP1B, in new Figure 1D, to improve the cell type annotation results. We improved the cell type annotation results by fixing cluster 20 from C20 as Ventral LGE-derived interneuron precursor and cluster by the expression of IGFBP5, TAC1, and PDYN; fixing cluster 23 from meningeal cells to thalamus cells by the expression of ZIC2, ZIC4, and SHOX2. These cell types were excluded in the follow-up analysis. Due to EN8 being previously incorrectly defined as midbrain neurons, it resulted in a misunderstanding of the dissection result as a poor dissection. After carefully reviewing the data analysis process, we determined that EN8 was a small group of cells in cluster 23 mistakenly selected during excitatory neuron analysis, as shown in Figure R5(A), which was corrected after revision. In the revised manuscript, we deleted the previous EN8 subcluster and renumbered the rest of the excitatory neuron subclusters in the new Figure 2.
  
  In addition, we also improved the description of sample collection as follows: “We collected eight pregnancy-derived fetal brains of rhesus macaque (Macaca mulatta) at five prenatal developmental stages (E40, E50, E70, E80, E90) and dissected the parietal lobe cortex. Because of the different development times of rhesus monkeys, prenatal cortex size and morphology are different. To ensure that the anatomical sites of each sample are roughly the same, we use the lateral groove as a reference to collect the parietal lobe for single-cell sequencing (as indicated by bright yellow in Figure S1A) and do not make a clear distinction between the different regional parts including primary somatosensory cortex and association cortices in the process of sampling”. As shown in Figure S1A, due to the small volume of the cerebral cortex at early time points, especially in E40, a small number of cells beyond the dorsal parietal lobe, including the ventral cortex cells and thalamus cells, were collected during the sampling process with the brain stereotaxic instrument.
  
  In this study, the BD method was used to capture single cells. Due to the fixed size of the micropores, this method might be less efficient in capturing mature excitatory neurons. However, it has a good capture effect on newborn neurons at each sampling time point so that the generation of excitatory neurons at different developmental time points can be well observed, as shown in Figure 2, which aligns with our research purpose.
  
  To verify the reliability of our cell annotation results, we compared the similarity of cell-type association between our study and recently published research(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652), using the scmap package to project major cell types in our macaque development scRNA-seq dataset to GSE226451. The river plot in Author response image 1 illustrates the broadly similar relationships of cell type classification between the two datasets.
  
  Author response image 1.
  
  Riverplot illustrates relationships between datasets in this study and recently published developing macaque telencephalon datasets major cell type annotation.
  
  Furthermore, bioinformatics analysis is used for the validation of genes specifically expressed in outer radial glia. We verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Author response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.
  
  Author response image 2.
  
  Heatmap shows the relative expression of genes displaying significant changes along the pseudotime axis of vRG to oRG from the dataset of Nicola Micali et al.2023(GEO: GSE226451). The columns represent the cells being ordered along the pseudotime axis.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This manuscript by Xu et al., is an interesting study aiming to identify novel features of macaque cortical development. This study serves as a valuable atlas of single cell data during macaque neurogenesis, which extends the developmental stages previously explored. Overall, the authors have achieved their aim of collecting a comprehensive dataset of macaque cortical neurogenesis and have identified a few unknown features of macaque development.
  
  Strengths:
  
  The authors have accumulated a robust dataset of developmental time points and have applied a variety of informatic approaches to interrogate this dataset. One interesting finding in this study is the expression of previously unknown receptors on macaque oRG cells. Another novel aspect of this paper is the temporal dissection of neocortical development across species. The identification that the regulome looks quite different, despite similar expression of transcription factors in discrete cell types, is intriguing.
  
  Weaknesses:
  
  Due to the focus on demonstrating the robustness of the dataset, the novel findings in this manuscript are underdeveloped. There is also a lack of experimental validation. This is a particular weakness for newly identified features (like receptors in oRG cells). It's important to show expression in relevant cell types and, if possible, perform functional perturbations on these cell types. The presentation of the data highlighting novel findings could also be clarified at higher resolution, and dissected through additional informatic analyses. Additionally, the presentation of ideas and goals of this manuscript should be further clarified. A major gap in the study rationale and results is that the data was collected exclusively in the parietal lobe, yet the rationale and interpretation of what this data indicates about this specific cortical area was not discussed. Last, a few textual errors about neural development are also present and need to be corrected.
  
  We thank you for your comments and suggestions concerning our manuscript. The comments and suggestions are all valuable and helpful for revising and improving our paper and the essential guiding significance to our research. We have studied the comments carefully and made corrections, which we hope to meet with approval. We have endeavored to address the multiple points raised by the referee.
  
  To support the reliability of our data and newly identified features, we verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Figure R2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.
  
  Our research results mainly explore the conserved features of neocortex development across species. By comparing evolution, we found the types of neural stem cells in the intermediate state, their generative trajectories, and gene expression dynamics accompanying cell trajectories. We further explored the stages of transcriptional dynamics during vRG generating oRG. More analysis was performed through transcriptional factor regulatory network analysis. We performed the TFs regulation network analysis of human vRG with pyscenic workflow. The top transcription factors of every time point in human vRG were calculated, and we used the top 10 TFs and their top 5 target genes to perform interaction analysis and generate the regulation network of human vRG in revised figure 6. In comparison of the pyscenic results of mouse, macaque and human vRG, it was obvious that the regulatory networks were not evolutionarily conservative. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.
  
  Although the parietal lobe is the center of the somatic senses and is significant for interpreting words as well as language understanding and processing. In this study, the parietal lobe area was selected mainly because of the convenience of sampling the dorsal neocortex. As we described in the Materials and Methods section as follows: “Because of the different development times of rhesus monkeys, prenatal cortex size and morphology are different. To ensure that the anatomical sites of each sample are roughly the same, we use the lateral groove as a reference to collect the parietal lobe for single-cell sequencing (as indicated by bright yellow in Figure S1A) and do not make a clear distinction between the different regional parts including primary somatosensory cortex and association cortices in the process of sampling”.
  
  Thanks for carefully pointing out our manuscript's textual errors about neural development. We have corrected them which were descripted in the following response.
  
  Reviewer #3 (Public Review):
  
  Summary: The study adds to the existing data that have established that cortical development in rhesus macaque is known to recapitulate multiple facets cortical development in humans. The authors generate and analyze single cell transcriptomic data from the timecourse of embryonic neurogenesis.
  
  Strengths:
  
  Studies of primate developmental biology are hindered by the limited availability and limit replication. In this regard, a new dataset is useful.
  
  The study analyzes parietal cortex, while previous studies focused on frontal and motor cortex. This may be the first analysis of macaque parietal cortex and, as such, may provide important insights into arealization, which the authors have not addressed.
  
  Weaknesses:
  
  The number of cells in the analysis is lower than recent published studies which may limit cell representation and potentially the discovery of subtle changes.
  
  The macaque parietal cortex data is compared to human and mouse pre-frontal cortex. See data from PMCID: PMC8494648 that provides a better comparison.
  
  A deeper assessment of these data in the context of existing studies would help others appreciate the significance of the work.
  
  We thank the reviewer for these suggestions and constructive comments. We agree with the reviewer that the cell number in our study is lower than in recently published studies. The scRNA sequencing in this study was completed between 2018 and 2019, the early stages of the single-cell sequencing technology application. Besides, we have been unable to get extra macaque embryos to enlarge the sample numbers recently since rhesus monkey samples are scarce. Therefore, the number of cells in our study is relatively small compared to recently published single-cell studies.
  
  The dataset suggested by the reviewers is extremely valuable, and we tried to perform analysis as the reviewer suggested to explore temporal expression patterns in different species of parietal cortex. The dataset from PMCID: PMC8494648 provides the developing human brain across regions from gestation week (GW)14 to gestation week (GW)25. Since this data set only covers the middle and late stages of embryonic neurogenesis, it did not fully match the developmental time points of our study for integration analysis. However, we quoted the results of this study in the discussion section.
  
  The human regulation analysis with pyscenic workflow was added into new figure 6 for the comparison of different species vRG regulatory network. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.
  
  Besides, we performed additional integration analysis of our dataset with the recently published macaque neocortex development datase (GEO accession: GSE226451) to verify the reliability of our cell annotation results and terminal oRG differentiation genes. The river plot in Figure R1 illustrates the broadly similar relationships of cell type classification between the two datasets. The result in Figure R2 showed that most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.
  
  Reviewer #1 (Recommendations For The Authors):
  
  1) Throughout the manuscript, the term "embryonic" or "embryogenesis" is used in reference to all timepoints (E40-E90) in this study. The embryonic period is a morphologically and anatomically defined developmental period that ends ~E48-E50 in rhesus macaque. Prenatal or developing is a more accurate term when discussing all timepoints of this study.
  
  We thank the reviewer for pointing out this terminology that needs to be clarified. We have now replaced “embryonic” with “prenatal” as a more appropriate description for the sampling time points in the manuscript.
  
  2) Drosophila should be italicized in the introduction.
  
  Thanks for suggesting that we have set the “Drosophila” words to italics in the manuscript.
  
  3) Introduction - "In rodents, radial glia are found in the ventricular zone (VZ), where they undergo proliferation and differentiation." This sentence implies that only within rodents are radial glia found within the ventricular zone. Radial glia are present within the ventricular zone of all mammals.
  
  Thanks for careful reading. This sentence has been corrected “In mammals, radial glial cells are found in the ventricular zone (VZ), where they undergo proliferation and differentiation.”
  
  4) Figure 1A - an image of the E40 brain is missing.
  
  We first sampled the prenatal developmental cortex of rhesus monkeys at the E40 timepoint. Unfortunately, we forgot to save the photo of the sampling at the E40 time point.
  
  5) Figure 1B and 1C - it is unclear why cluster 20 is not annotated in Figure 1 as in the text it is stated "Each of the 28 identified clusters could be assigned to a cell type identity..." This cluster expresses VIM and PAX6 suggestive of ventricular radial glia and is located topographically approximate to IPC cluster 8 and seems to bridge the gap between neural stem cells and the interneuron clusters. Additionally, cluster 20 appears to be subclustered by itself in the progenitor subcluster UMAP (Figure 3A) suggestive of a batch effect or cells with low quality. The investigation, quality control, and proper annotation of this cluster 20 is necessary.
  
  We appreciate for the reviewer’s suggestion. We detected specific expression marker genes of cluster 20, cells in this cluster specifically expressed VIM, IGFBP5 and TAC. According to the cell annotation results from a published study, we relabeled cluster 20 as ventral LGE-derived interneuron precursors (Yu, Yuan et al. Nat Neurosci. 2021. doi:10.1038/s41593-021-00940-3. PMID: 34737447.). Cluster 20 cells have been deleted in the new Figure 3A.
  
  6) Figure 1B UMAP - it is unexpected that meningeal cells would cluster topographically closer to the excitatory neuron cluster (one could even argue that the meningeal cell cluster is located within the excitatory neuron clusters) instead of next to or with the endothelial cell clusters. This is suspicious for a mis-annotated cell cluster. ZIC2 and ZIC3 were used as the principal marker genes for meningeal cells. However, these genes are not specific for meninges (PanglaoDB) and had not been identified as marker genes in a developmental sc-RNAseq dataset of the developing mouse meninges (DeSisto et al. 2020). Additional marker genes (COL1A1, COL1A2, CEMIP, CYP1B1, SLC13A3) may be helpful to delineate the identity of this cluster and provide more evidence for a meningeal origin.
  
  We thank the reviewer for the constructive advice. The violin plot in Author response image 3 has checked additional marker genes, including COL1A1, COL1A2, CEMIP, and CYP1B2. Cluster 23 does not express these marker genes but specifically expresses thalamus marker genes SHOX2(Rosin, Jessica M et al. Dev Biol. 2015. doi:10.1016/j.ydbio.2014.12.013. PMID: 25528224.) and TCF7L2(Lipiec, Marcin Andrzej et al. Development. 2020. doi: 10.1242/dev.190181. PMID: 32675279). According to the gene expression results, we corrected the cell definition of cluster 23 to thalamic cells in the revised manuscript. Specifically, we added marker genes SHOX2 and CYP1B1 in the new Figure 1D violin plot and corrected the cell definition of cluster23 from meninges to thalamus cells in the revised manuscript and figures.
  
  Author response image 3.
  
  Vlnplot of additional markers in cluster 23.
  
  7) From Figure 1A, it appears that astrocytes (cluster 13) are present at E40 and E50 timepoints. This is inconsistent with literature and experimental data of the timing of the neuron-glia switch in primates and inconsistent with the claim within the text that, "Collectively, these results suggested that cortical neural progenitors undergo neurogenesis processes during the early stages of macaque embryonic cortical development, while gliogenic differentiation... occurs in later stages." The clarification of the percentage of astrocytes at each timepoint would clarify this point.
  
  According to the suggestion, we have statistically analyzed the percentage of astrocytes (cluster 13) at each time point. The statistical results showed that the proportion of astrocytes was low to 0.1783% and 0.1046% at E40 and E50 time points, and increased significantly at E80 and E90, suggesting the onset of macaque gliogenesis might be around embryonic 80 days to 90 days. The result was consistent with published research on the timing of the neuron-glial transition in primates (Rash, Brian G et al. Proc Natl Acad Sci U S A. 2019. doi:10.1073/pnas.1822169116. PMID: 30894491). Besides, we thought that the cells in cluster 13 captured at E40 to E50 time points, with a total number of less than 200, maybe astrocyte precursor cells expressing the AQP4 gene (Yang, Lin, et al. Neuroscience bulletin. 2022. doi:10.1007/s12264-021-00759-9. PMID: 34374948).
  
  8) A subcluster of ExN neurons was identified and determined to be of midbrain origin based on expression of TCF7L2. Did this subcluster express other known markers of the developing midbrain (OTX2, LMX1A, NR4A2, etc...)? Additionally, does this subcluster suggest that the limits of the dissection extended to the midbrain in samples E40 and E50?
  
  We apologize for the previous inadequacy of the excitatory neuron cell annotation. In the description of the previous version of the manuscript, we misidentified the cells of the EN8 as midbrain cells. Following the reviewer’s suggestion, we verified the expression of more tissue- specific marker genes of EN8. As the violin diagram in Author response image 4 shows, other developing midbrain markers OTX2, NR4A2, and PAX7 did not express in EN8, but thalamus marker genes SHOX2, TCF7L2, and NTNG1 were highly expressed in EN8. Besides, dorsal cortex excitatory neuron markers NEUROD2, NEUROD6, and EMX1 were not expressed in EN8, which suggests that EN8 might not belong to cortical cells. After carefully reviewing the data analysis process, we determined that EN8 was a small group of cells in cluster 23 mistakenly selected during excitatory neuron analysis, as shown in Figure R5(A), which was corrected after revision. In the revised manuscript, we have removed EN8 from the analysis of excitatory neurons. In the revised manuscript, we have deleted the previous EN8 subcluster and renumbered the left excitatory neuron subclusters in new Figure 2 and Figure S3.
  
  Author response image 4.
  
  (A). Modified diagram of clustering of excitatory neuron subclusters collected at all time points, visualized via UMAP related to Figure 2A. (B) Vlnplot of different marker genes in EN8.
  
  9) "These data suggested that the cell fate determination by diverse neural progenitors occurs in the embryonic stages of macaque cortical development and is controlled by several key transcriptional regulators" The authors present a list of differentially expressed genes specific to the various radial glia clusters along pseudotime. Some of these radial glia DEGs are known and have been characterized by previous literature while other DEGs they have identified had not been previously shown to be associated with radial glia specification/maturation. However, this list of DEGs does not support the claim that cell fate determination is controlled by several key transcriptional regulators. What were the transcriptional regulators of radial glia specification identified in this study and how were they validated?
  
  We agree with the reviewer and honestly admit that the description of this part in the previous manuscript is inaccurate. The description has been deleted in the revised manuscrip.
  
  10) "Comparing vRG to IPC trajectory between human, macaque, and mouse, we found this biological process of vRG-to-IPC is very conserved across species, but the vRG to oRG trajectory is divergent between species. The latter process is almost invisible in mice, but it is very similar in primates and macaque." Firstly, macaques are primates, and the text should be updated to reflect this. Secondly, from Figure 5C., it seems there were no outer radial glia detected at all within the vRG-oRG and vRG-IPC developmental trajectories. This would imply that oRGs are not "almost invisible" in mice, but rather do not exist. The authors need to clarify the presence or absence of identifiable outer radial glia in the integrated dataset and relate the relative abundance of these cells to their interpretation of the developmental trajectories for each species.
  
  We apologize for the description inaccuracies in the manuscript and thank the reviewer for pointing out the expression errors. At your two suggestions, the description has been corrected in the revised manuscript as "Comparing vRG to IPC trajectory between human, macaque, and mouse, we found this biological process of vRG-to-IPC is very conserved across species. However, the vRG to oRG trajectory is divergent between species because the oRG population was not identified in the mouse dataset. The latter process is almost invisible in mice but similar in humans and macaques".
  
  Although several published research has shown that oRG-like progenitor cells were present in the mouse embryonic neocortex(Wang, Xiaoqun et al. Nature neuroscience.2011. doi:10.1038/nn.2807; Vaid, Samir et al. Development. 2018, doi:10.1242/dev.169276. PMID: 30266827). However, oRG cells were barely detected in the scRNA-seq dataset of mice cortical development studies(Ruan, Xiangbin et al. Proc Natl Acad Sci U S A. 2021. doi:10.1073/pnas.2018866118. PMID: 33649223; Di Bella, Daniela J et al. Nature. 2021. doi:10.1038/s41586-021-03670-5. PMID: 34163074; Chen, Ao et al. Cell. 2022. doi:10.1016/j.cell.2022.04.003. PMID: 35512705). There were no oRG populations detected in the mouse embryonic cortical development dataset (GEO: GSE153164) used for integration analysis in our study.
  
  11) "Ventral radial glia cells generate excitatory neurons by direct and indirect neurogenesis" This should be corrected to dorsal radial glia cells as this paper is discussing radial glia of the dorsal pallium.
  
  13) Editorially, gene names need to be italicized in the text, figures, and figure legends.
  
  14) Figure 5B - a scale bar showing the scale of the relative expression denoted by the dark blue color would be beneficial.
  
  15) Figure S7D is mislabeled in the figure legend.
  
  Merged response to points 11 to 15: Thank you for kindly pointing out the errors in our manuscript. We have corrected the above four points in the revised version.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Specific suggestions for authors:
  
  In the abstract the authors state: "thicker upper-layer neurons". I think it's important to be clear in the language by stating either that the layers are thicker or the neurons are most dense.
  
  Thanks for your good comments. The description of “thicker upper-layer neurons” was corrected to “the thicker supragranular layer” in the revised manuscript. The supragranular layer thickness in primates was much higher than in rodents, both in absolute thickness and in proportion to the thickness of the whole neocortex (Hutsler, Jeffrey J et al. Brain research. 2005. doi:10.1016/j.brainres.2005.06.015. PMID: 16018988). Here, we want to describe the supragranular layer of primates as significantly higher than that of rodents, both in absolute thickness and in proportion to the thickness of the whole neocortex.
  
  The introduction needs additional clarification regarding the vRG vs oRG discussion. I was unclear what the main takeaway for readers should be. Similarly, the discussion of previous studies and the importance for comparing human and macaque could be clarified.
  
  We appreciate the suggestion and apologize for the shortcomings of the introduction part. We have rewritten the section and added additional clarification in the revised introduction. In the revised manuscript, the contents of the introduction are as follows:
  
  “The neocortex is the center for higher brain functions, such as perception and decision-making. Therefore, the dissection of its developmental processes can be informative of the mechanisms responsible for these functions. Several studies have advanced our understanding of the neocortical development principles in different species, especially in mice. Generally, the dorsal neocortex can be anatomically divided into six layers of cells occupied by distinct neuronal cell types. The deep- layer neurons project to the thalamus (layer VI neurons) and subcortical areas (layer V neurons), while neurons occupying more superficial layers (upper-layer neurons) preferentially form intracortical projections1. The generation of distinct excitatory neuron cell types follows a temporal pattern in which early-born neurons migrate to deep layers (i.e., layers V and VI), while the later- born neurons migrate and surpass early-born neurons to occupy the upper layers (layers II-IV) 2. In Drosophila, several transcription factors are sequentially explicitly expressed in neural stem cells to control the specification of daughter neuron fates, while very few such transcription factors have been identified in mammals thus far. Using single-cell RNA sequencing (scRNA-seq), Telley and colleagues found that daughter neurons exhibit the same transcriptional profiles of their respective progenitor radial glia, although these apparently heritable expression patterns fade as neurons mature3. However, the temporal expression profiles of neural stem cells and the contribution of these specific temporal expression patterns in determining neuronal fate have yet to be wholly clarified in humans and non-human primates. Over the years, non-human primates (NHP) have been widely used in neuroscience research as mesoscale models of the human brain. Therefore, exploring the similarities and differences between NHP and human cortical neurogenesis could provide valuable insight into unique features during human neocortex development.
  
  In mammals, radial glial cells are found in the ventricular zone (VZ), where they undergo proliferation and differentiation. The neocortex of primates exhibits an extra neurogenesis zone known as the outer subventricular zone (OSVZ), which is not present in rodents. As a result of evolution, the diversity of higher mammal cortical radial glia populations increases. Although ventricular radial glia (vRG) is also found in humans and non-human primates, the vast majority of radial glia in these higher species occupy the outer subventricular zone (OSVZ) and are therefore termed outer radial glia (oRG). Outer radial glial (oRG) cells retain basal processes but lack apical junctions 4 and divide in a process known as mitotic somal translocation, which differs from vRG 5. VRG and oRG are both accompanied by the expression of stem cell markers such as PAX6 and exhibit extensive self-renewal and proliferative capacities 6. However, despite functional similarities, they have distinct molecular phenotypes. Previous scRNA-seq analyses have identified several molecular markers, including HOPX for oRGs, CRYAB, and FBXO32 for vRGs7. Furthermore, oRGs are derived from vRGs, and vRGs exhibit obvious differences in numerous cell-extrinsic mechanisms, including activation of the FGF-MAPK cascade, SHH, PTEN/AKT, and PDGF pathways, and oxygen (O2) levels. These pathways and factors involve three broad cellular processes: vRG maintenance, spindle orientation, and cell adhesion/extracellular matrix production8.
  
  Some transcription factors have been shown to participate in vRG generation, such as INSM and TRNP1. Moreover, the cell-intrinsic patterns of transcriptional regulation responsible for generating oRGs have not been characterized.
  
  ScRNA-seq is a powerful tool for investigating developmental trajectories, defining cellular heterogeneity, and identifying novel cell subgroups9. Several groups have sampled prenatal mouse neocortex tissue for scRNA-seq 10,11, as well as discrete, discontinuous prenatal developmental stages in human and non-human primates 7,12 13,14. The diversity and features of primate cortical progenitors have been explored 4,6,7,15. The temporally divergent regulatory mechanisms that govern cortical neuronal diversification at the early postmitotic stage have also been focused on 16. Studies spanning the full embryonic neurogenic stage in the neocortex of humans and other primates are still lacking. Rhesus macaque and humans share multiple aspects of neurogenesis, and more importantly, the rhesus monkey and human brains share more similar gene expression patterns than the brains of mice and humans17-19. To establish a comprehensive, global picture of the neurogenic processes in the rhesus macaque neocortex, which can be informative of neocortex evolution in humans, we sampled neocortical tissue at five developmental stages (E40, E50, E70, E80, and E90) in rhesus macaque embryos, spanning the full neurogenesis period. Through strict quality control, cell type annotation, and lineage trajectory inference, we identified two broad transcriptomic programs responsible for the differentiation of deep-layer and upper-layer neurons. We also defined the temporal expression patterns of neural stem cells, including oRGs, vRGs, and IPs, and identified novel transcription factors involved in oRG generation. These findings can substantially enhance our understanding of neocortical development and evolution in primates.”
  
  Why is this study focused on the parietal lobe? This should be discussed in the introduction and interpretation of the data should be contextualized in the context of this cortical area.
  
  In this study, samples were collected from the parietal lobe area mainly for the following reasons:
  
  (1) To ensure that the cortical anatomical parts collected at each time point are consistent, we used the lateral cerebral sulcus as a marker to collect the parietal lobe tissue above the lateral sulcus for single-cell sequencing sample collection. Besides, the parietal region is also convenient for sampling the dorsal cortex.
  
  (2) Previous studies have made the timeline of the macaque parietal lobe formation process during the prenatal development stage clear （ Finlay, B L, and R B Darlington.Science.1995. doi:10.1126/science.7777856. PMID: 7777856）, which is also an essential reason for using the parietal lobe as the research object.
  
  Figure 1:
  
  Difficult to appreciate how single cell expression reflects the characterization of layers described in Figure 1A. A schematic for temporal development would be helpful. Also, how clusters correspond to discrete populations of excitatory neurons and progenitors would improve figure clarity. Perhaps enlarge and annotate the UMAPS on the bottom of Figure 1A.
  
  We thank the reviewer for the suggestion and apologize for that Figure 1A does not convey the relationship between single-cell expression and neocortex layer formation. In the revised manuscript, time points information associated with the hierarchy is labeled to the diagram in Figure S1A. The UMAPS on the bottom of Figure 1A was enlarged in the revised manuscript as new Figure 1C.
  
  Labels on top of clusters for 1A/1B would be helpful as it's difficult to see which colors the numbers correspond to on the actual UMAP.
  
  Many thanks to the reviewer for carefully reading and helpful suggestions. We have adjusted the visualization of UMAP in the revised vision. The numbers in the label bar of Figure 1B have been moved to the side of the dot so that the dot can be seen more clearly.
  
  Microglia and meninges are also non-neural cells. This needs to be changed in the discussion of the results.
  
  Thanks for the suggestion. We have fixed the manuscript as the reviewer suggested. The description in the revised manuscript has been fixed as follows: “According to the expression of the marker genes, we assigned clusters to cell type identities of neurocytes (including radial glia (RG), outer radial glia (oRG), intermediate progenitor cells (IPCs), ventral precursor cells (VP), excitatory neurons (EN), inhibitory neurons (IN), oligodendrocyte progenitor cells (OPC), oligodendrocytes, astrocytes, ventral LGE-derived interneuron precursors and Cajal-Retzius cells, or non-neuronal cell types (including microglia, endothelial, meninge/VALC(vascular cell)/pericyte, and blood cells). Based on the expression of the marker gene, cluster 23 was identified as thalamic cells, which are small numbers of non-cortical cells captured in the sample collection at earlier time points. Each cell cluster was composed of multiple embryo samples, and the samples from similar stages generally harbored similar distributions of cell types.”.
  
  It's important to define the onset of gliogenesis in the text and figure. What panels/ages show this?
  
  We identified the onset of gliogenesis by statistically analyzing the percentage of astrocytes (cluster 13) at each time point and added the result in Figure S1. The statistical results showed that the proportion of astrocytes was deficient at E40 and E50 time points and increased significantly at E80 and E90, suggesting the onset of macaque gliogenesis might be around embryonic 80 days to 90 days. The result was consistent with published research on the timing of the neuron-glial transition in primates (Rash, Brian G et al. Proceedings of the National Academy of Sciences of the United States of America 201. doi:10.1073/pnas.1822169116. PMID: 30894491).
  
  Figure 2:
  
  Why are there so few neurons at E90? Is it capture bias, dissociation challenges (as postulated for certain neuronal subtypes in the discussion), or programmed cell death at this time point?
  
  We thought it was because mature neurons at E90 with abundant axons and processes were hard to settle into micropores of the BD method for single cell capture. Due to the fixed size of the BD Rhapsody microwells, this sing-cell capture method might be less efficient in capturing mature excitatory neurons but has a good capture effect on newborn neurons at each sampling time point. In conclusion, based on the BD cell capture method feature, the immature neurons at each point are more easily captured than mature neurons in our study, so the generation of excitatory neurons at different developmental time points can be well observed, as shown in Figure 2, which aligns with our research purpose.
  
  The authors state: "We then characterized temporal changes in the composition of each EN subcluster. While the EN 5 and EN 11 (deep-layer neurons) subclusters emerged at E40 and E50 and disappeared in later stages, EN subclusters 1, 2, 3, and 4 gradually increased in population size from E50 to E80 (Figure 2D)." What about EN7? It's labeled as an upper layer neuron that is proportionally highest at E40. Could this be an interesting, novel finding? Does this indicate something unique about macaque corticogenesis? The authors don't describe/discuss this cell type at all.
  
  We apologize for the manuscript’s lack of detailed descriptions of EN results. In our study, EN7 is identified as CUX1-positive, PBX3-positive, and ZFHX3-positive excitatory neuron subcluster. The results of Fig. 2B show that EN7 was mainly captured from the early time points (E40/E50) samples. Above description was added in the revised manuscript.
  
  The Pbx/Zfhx3-positive excitatory neuron subtype reported in Moreau et al. study on mouse neocortex development progress （ Moreau, Matthieu X et al. Development. 2021. doi:10.1242/dev.197962. PMID: 34170322）. Our study verified that the Pbx3/Zfhx3-positive cortical excitatory neurons also exist in the early stage of prenatal macaque cortex development.
  
  Is there any unique gene expression in identified subtypes that are surprising? Did the comparison against human data, in later figures, inform any unique features of gene expression?
  
  Based on the excitatory neuron subclusters analysis result in our study, we found no astonishing results in excitatory neuron subclusters. In subsequent integrated cross-species analyses, macaque excitatory neurons showed similar transcriptional characteristics to human excitatory neurons. In general, excitatory neurons tend to have a greater diversity in the cortex of animals that are more advanced in evolution (Ma, Shaojie et al. Science. 2022. doi:10.1126/science.abo7257. PMID: 36007006; Wei, Jia-Ru et al. Nat Commun. 2022. doi:10.1038/s41467-022-34590-1. PMID: 36371428; Galakhova, A A et al. Trends Cogn Sci. 2022. doi:10.1016/j.tics.2022.08.012. PMID: 36117080; Berg, Jim et al. Nature. 2021. doi:10.1038/s41586-021-03813-8. PMID: 34616067). Since only single-cell transcriptome data was analyzed in this study, we did not find any unique features of the prenatal developing macaque cortex excitatory neurons in the comparison against the human dataset due to the limitation of information dimension.
  
  Figure 3:
  
  The identification of terminal oRG differentiation genes is interesting. The confirmation of known gene expression as well as novel markers that indicate different states/stages of oRG cells is a valuable resource. As the identification of described ion channel expression is a novel finding, it should be explored more and would be strengthened by validation in tissue samples and, if possible, functional assays.
  
  E is the most novel part of this figure, but it's very hard to read. I think increasing the focus of this figure onto this finding and parsing these results more would be informative.
  
  Thanks for the positive comments. We apologize for the lack of clarity and conciseness in figure visualizations. We hypothesized vRG to oRG cell trajectories into three phases: onset, commitment, and terminal. The leading information conveyed by Figure 3E was the dynamic gene expression along the developmental trajectory from vRG to oRG. Specific genes were selected and shown in the schema diagram of new Figure 3.
  
  We verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Author response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.
  
  I'm curious about the granularity of the oRG_C12 terminal cluster. Are there ways to subdivide the different cells that seem to be glial-committed vs actively dividing vs neurogenically committed to IPCs? In the text, the authors referred to different oRG populations, but they are annotated as the same cluster and cell type. The authors should clarify this.
  
  According to the reviewer's suggestion, we subdivide the oRG_C12 into eight subclusters. Based on the marker gene in Author response image 5C, subclusters 1,2 and 4 might be glial- committed with AQP4/S100B positive expression; subclusters 3,6,7 might be neurogenically committed to IPCs with NEUROD6 positive expression; subclusters 0,3,5,6,7 might be actively dividing state with MKI67/TOP2A positive expression.
  
  Author response image 5.
  
  Subdivide analysis of oRG_C12. (A)and (B) Subdividing of e oRG_C12 visualized via UMAP. Cells are colored according to subcluster timepoint (A) and subcluster identities (B). (C) Violin plot of molecular markers for the subclusters.
  
  Figure 4:
  
  Annotating/labeling the various EN clusters (even as deep/upper) would help improve the clarity of this and other figures. It's clear what each progenitor subtype is but it's hard to read the transitions. Why are all the EN groups in pink/red? It makes the data challenging to interpret.
  
  In Figure4A, we use different yellow/orange colors for deep-layer excitatory neuron subclusters (EN5 and EN10), and different red/pink colors for upper-layer excitatory neuron subclusters (EN1, EN2, EN3, EN4, EN6, EN7, EN8 and EN9). We add the above information in the legend of Figure 4 in the revised manuscript.
  
  E50 seems to be unique - what's EN11?
  
  Based on the molecular markers for EN subclusters in Author response image 2, we recognized EN11 as a deep-layer excitatory neuron subcluster expressing BCL11B and FEZF2. As explained in the above reply, the microplate of BD has a good effect on capturing newborn neurons at each time point. The EN11 was mainly a newborn excitatory neuron at the E50 timepoint, which makes the subcluster seem unique.
  
  Author response image 6.
  
  Vlnplot of different markers in EN8.
  
  Figure 4E - the specificity of gene expression for deep vs upper layer markers seems to be over stated given the visualized gene expression pattern (ex FEZF2). Could the right hand panels be increased to better appreciate the data and confirm the specificity, as described.
  
  In our study, we used slingshot method to infer cell lineages and pseudotimes, which have been used to identifying biological signal for different branching trajectories in many scRNA- seq studies. We apologize for the lack of visualization clarity in the figure 4E. Due to the size limitation of the uploaded file, the file was compressed, resulting in a decrease in the clarity of the image. Below, we provided figure 4E with a higher definition and increased several genes’ slingshot branching tree results according to the reviewer's suggestion.
  
  Figure 5:
  
  There are some grammatical typos at the bottom of page 8. In this section, it also feels like there is a missing logical step between expansion of progenitors through elongated developmental windows that impact long-term expansion of the upper cortical layers.
  
  We apologize for the grammatical typos and have corrected them in the revised manuscript. We understand the reviewer’s concern. Primates have much longer gestation than rodents, and previous study evidence had shown that extending neurogenesis by transplanting mouse embryos to a rat mother increases explicitly the number of upper-layer cortical neurons, with concomitant abundant neurogenic progenitors in the subventricular zone(Stepien, Barbara K et al. Curr Biol. 2020. doi:10.1016/j.cub.2020.08.046. PMID: 32888487). We thought this mechanism could also explain primates' much more expanded abundance of upper-layer neurons.
  
  I'm curious about the IPCs that arise from the oRGs. Lineage trajectory shows vRG decision to oRG or IPC, but oRGs also differentiate into IPCs. Could the authors conjecture why they are not in this dataset or are indistinguishable from vRG-derived IPCs.
  
  Several published experiments have proved that oRG can generate IPC in human and macaque developing neocortex. (Hansen, David V et al. Nature. 2010. doi:10.1038/nature08845. PMID: 20154730; Betizeau, Marion et al. Neuron. 2013. doi:10.1016/j.neuron.2013.09.032. PMID: 24139044). Clearly identifying the difference between IPC generated from vRG and oRG at the transcriptional level in our single-cell transcriptome dataset is difficult. We hypothesized that the IPCs produced by both pathways have highly similar transcriptional features. Due to the limit of the scRNA data analysis algorithm used in this study, we didn’t distinguish the two kinds of IPC, which could not be in terms of pseudo-time trajectory reconstruction and transcriptional data.
  
  Figure 6 :
  
  How are the types 1-5 in 6A defined? Were they defined in one species and then applied across the others?
  
  We applied the same analysis to each species. We first picked up vRG cells in each species dataset and screened the differentially expressed genes (DEGs) between adjacent development time points using the “FindMarkers” function (with min. pct = 0.25, logfc. threshold = 0.25). After separate normalization of the DEG expression matrix from different species datasets, we use the “standardise” function from the Mfuzz package to standardize the data. The DEGs of vRG in each species were grouped into five clusters using the Mfuzz package in R with fuzzy c- means algorithm.
  
  The temporal dynamics in the highlighted section in B have interesting, consistent patterns of gene expression of the genes described, but what about the genes below that appear less consistent temporally? What processes do not appear to be conserved, given those gene expression differences?
  
  Many thanks for the constructive comments. The genes in Figure 6B below are temporal dynamics non-conserved transcription factors among the three species vRG. We performed a functional enrichment analysis on the temporal dynamics of non-conserved transcription factors with the PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System(https://www.pantherdb.org/), and the analysis results are shown in Author response image 7. The gene ontology (GO) analysis results show that unconserved transcription factors were related to different biological processes, cellular components, and molecular functions. However, subsequent experiments are still needed to verify specific genes.
  
  Author response image 7.
  
  Gene Ontology (GO) analysis of unconserved temporal patterns transcription factors among mouse, macaque and human vRG cells.
  
  The identification of distinct regulation of gene networks, despite conservation of transcription factors in discrete cell types, is interesting. What does the comparison between humans and macaques indicate about regulatory differences evolutionarily?
  
  We appreciate the reviewer for the comments. We performed the TFs regulation network analysis of human vRG with pyscenic workflow. The top transcription factors of every time point in human vRG were calculated, and we used the top 10 TFs and their top 5 target genes to perform interaction analysis and generate the regulation network of human vRG in revised figure 6. In comparison of the pyscenic results of mouse, macaque and human vRG, it was obvious that the regulatory networks were not evolutionarily conservative. Compared with macaque, the regulatory network of transcription factors and target genes in humans is more complex. Some conserved regulatory relationships present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 network at an early stage when deep lager generation and SOX10, ZNF672, ZNF672 network at a late stage when upper-layer generation.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The data should be compared to a similar brain region in human and mouse, if available. (See data from PMCID: PMC8494648).
  
  We appreciate the reviewer’s suggestions. In Figure 6, the species-integration analysis, the mouse data were from the perspective of the somatosensory cortex, macaque data were mainly from the parietal lobe in this study, and human data including the frontal lobe (FL), parietal lobe (PL), occipital lobe (OL), and temporal lobe (TL). PMC8494648 offered high-quality data covering the period of gestation week 14 to gestation week 25. However, our study's development stage of rhesus monkeys is E40-E90 days, corresponding to pcw8-pcw21 in humans. The quality of data from PMC8494648 is particularly good. However, the developmental processes covered by PMC8494648 don’t perfectly match the development time of the macaque cortex that we focused on in this study. Therefore, it is challenging to integrate the dataset (PMCID: PMC8494648) into the data analysis part. However, we have cited the results of this precious research (PMCID: PMC8494648) in the discussion part of the revised manuscript.
  
  A deeper assessment of these data in the context of existing studies would help distinguish the work and enable others to appreciate the significance of the work.
  
  We appreciate the reviewer’s constructive suggestions. The human regulation analysis with pyscenic workflow was added into new figure 6 for the comparison of different species vRG regulatory network. Analysis of the regulatory activity of human, macaque and mouse prenatal neocortical neurogenesis indicated that despite commonalities in the roles of classical developmental TFs such as GATA1, SOX2, HMGN3, TCF7L1, ZFX, EMX2, SOX10, NEUROG1, NEUROD1 and POU3F1. The top 10 TFs of the human, macaque, and mouse vRG each time point and their top 5 target genes identified by pySCENIC as an input to construct the transcriptional regulation network (Figure 6 D, F and H). Some conserved regulatory TFs present in more than one species are identified, such as HMGN3, EMX2, SOX2, and HMGA2 at an early stage when deep- lager generation and SOX10, ZNF672, and ZNF672 at a late stage when upper-lay generation.
  
  Besides, we performed some comparative analysis with our macaque dataset and the newly published macaque telencephalon development dataset. The results were only used to provide additional information to reviewers and were not included in the revised manuscript.
  
  To verify the reliability of our cell annotation results, we compared the similarity of cell-type association between our study and recently published research(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652), using the scmap package to project major cell types in our macaque development scRNA-seq dataset to GSE226451. The river plot in Author response image 1 illustrates the broadly similar relationships of cell type classification between the two datasets. Otherwise, we used more marker genes for cell annotation to improve the results of cell type definition in new Figure 1D. Besides, the description of distinct excitatory neuronal types has been improved in the new Figure 2.
  
  Furthermore, we verified terminal oRG differentiation genes in the recently published macaque telencephalic development dataset(Micali N, Ma S, Li M, et al. Science. doi:10.1126/science.adf3786.PMID: 37824652) (GEO accession: GSE226451). The results of Authro response image 2 show that the gene expression showed states/stages. Most of the oRG terminal differentiation markers genes identified in our study were also expressed in the oRG cells of the GSE226451 dataset. In particular, the two datasets were consistent in the expression of ion channel genes ATP1A2, ATP1A2, and SCN4B.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.13.548828v2
www.biorxiv.org www.biorxiv.org

New submission 25/09/2023, 08:44:23

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  The authors performed a meta-analysis of GC concentrations and metabolic rates in birds and mammals. They found close associations for all studies showing a positive association between these two traits. As GCs have been viewed with close links to "stress," authors suggest that this overlooks the importance of metabolism and perhaps GC variation does not relate to "stress" per se but an increase in metabolism instead.
  
  This is an important meta-analysis, as most researchers acknowledge the link between GCs and metabolism, metabolism is often overlooked in studies. The field of conservation physiology is especially focused on GCs being a "stress" hormone, which overlooks the importance of GCs in mediating energy balance, i.e., an animal that has high GC concentrations may not be doing that poorly compared to an animal with low GC concentrations, it might just be expending more energy, e.g., caring for young. The results, with overwhelming directionality and strong effect sizes, support the link for a positive association with these two variables.
  
  My main concern lies in that most of the studies come from a few labs, therefore there may be limited data to test this relationship. I would include lab as a random effect to see how strong this effect might be.
  
  We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). This did not affect the results, leading to negligible changes in the model parameters (alternative model tables are shown in Author response table 1 and 2). In the revised version of the manuscript we mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)
  
  Author response table 1.
  
  Meta regression model testing the association between metabolic rate (MR) effect sizes and glucocorticoid effect sizes.
  
  Author response table 2.
  
  Meta regression model (quantitative approach) testing the effect of (a) Taxa, (b) Before / after effect, (c) Experiment / control effect, (d) Use of Metabolic Rate or Heart Rate as metabolic variable and (e) Treatment type, on the association between metabolic rate (MR) and glucocorticoid effect sizes across studies.
  
  Furthermore, I would like to see a test of the directionality of the two variables. Authors suggest that changes in metabolism affect GC levels but likely changes in GC levels would affect metabolism. Why not look into studies that have altered GC levels experimentally and see the effect on metabolism? Based on the close link, authors suggest that GCs may not play a role outside of "stress" beyond the stressor's effect on metabolic rate. However, if they were to investigate manipulations of GCs on metabolic rate, the link may or may not be there, which would be interesting to look at. I firmly believe that GCs are tightly linked to metabolism; however, I also think that GCs have a range of effects outside of metabolism as well, depending on the course and strength of the stressor.
  
  The directionality of the two variables is indeed a question of interest – we show that changes in metabolic rate affect GCs, but does the reverse also happen? In the schematic model we propose in Box 1, we propose that the effect is uni-directional, i.e. metabolic rate affects GC-levels, but GCs have no direct effect on metabolic rate. We note that there may however be an indirect effect, in that in the absence of a GC-response to an increase in metabolic rate the organism would after some time no longer be able to fuel the metabolic rate. Because we anticipate that more readers may raise this question, we have added the following paragraph to the discussion:
  
  “We selected studies in which experimental treatments affected MR, leading us to conclude that the most parsimonious explanation of our finding is that GC levels were causally related to MR. Suppose however that instead we reported a correlation between MR and GCs, using for example unmanipulated individuals. The question would then be justified whether changes in GCs affected MR or vice versa. Direct effects of GCs could be studied using pharmacological manipulations. However, while many studies show that GC administration induces a cascade of effects, when the function of GCs is to facilitate a level of MR, as opposed to regulate variation in MR, we do not anticipate such manipulations to induce an increase in MR (Box 1). On the other hand, when MR is experimentally increased in conjunction with pharmacological manipulations that supress the expected GC-increase (an experiment that to our best knowledge has not yet been done), we would predict that the increase in MR can be maintained less well compared to the same MR treatment in the absence of the pharmaceutical manipulation. This result, we would interpret to demonstrate that maintaining a particular level of MR may be dependent on GCs as facilitator, but it would be misleading to interpret this pattern to indicate that GCs regulate MR, as is sometimes proposed. Additionally, it would be informative to investigate whether energy turnover immediately before blood sampling is a predictor of GC levels, as we would predict on the basis of the interpretation of our findings. Increasing the use of devices and techniques that monitor energy expenditure or its proxies (e.g. accelerometers) may be a way to increase our understanding of the generality of the GC-MR association. “
  
  We based our hypotheses and searching criteria on the assumption that GCs induce physiological processes to help the organism facilitate energetic demands. Pharmacologically induced increases in GCs would lead to physiological responses and associations that we consider not comparable to the ones reported in this work, as we base our hypotheses on natural (i.e. non pharmacologically induced) GC and MR variation. This said, with exogenous GC administration, we may expect GC cascade effects, but not necessarily an increase in MR. Here - and acknowledging that the link between GCs and metabolic rate may entail complex steps - we predict that GC administration may lead to an increase in blood glucose and may affect energy allocation at a tissue-specific level. However, such increase may have no effect on whole-organism energy expenditure, unless energy expenditure is limited by glucose availability. We however acknowledge that it would be interesting to investigate the kind of associations between MR, GCs and other physiological variables (e.g. glucose) that appear when inducing an increase in GCs, as these would broaden our understanding of the mechanistic processes underlying these associations.
  
  We show that variation in GC levels was explained by variation in MR, independent of the stimulus that caused the increase in MR. We propose that the most parsimonious interpretation of our findings is that GC variation is an indicator of variation in MR, independent of the cause of variation in MR. We do not intend to prove causality when making predictions on the co-dependency of metabolic rate and GCs. In fact, our predictions do not imply that one trait necessarily affects the other per se, as these interplay is likely to be shaped by the environmental or physiological context (Box 1). Thus, the specific mechanisms underlying how changes in metabolic rate induce changes in GCs - or the other way around - need to be investigated. One step to tackle this in upcoming research would indeed be studying the effects of exogenous GCs on metabolic rate.
  
  In the manuscript, we clarify that GCs have a variety of cascade effects besides metabolism (Box 1). On the basis of our results, however, we suggest that many of the downstream effects of GCs may be interpreted as allocation adjustments to the metabolic level at which organisms operate (lines 235236), but we do acknowledge that these cascade effects are complex and affects many systems besides metabolism.
  
  This work helps in the thinking that GCs are not the same as a "stress" hormone or labelling hormones with only one function. As hormones are naturally pleiotropic, the view of any one hormone being X is overly simplistic.
  
  We fully agree, but stress that we focus on how GCs are regulated, which may be less complex than its pleiotropic functions. Indeed, we consider that the many functions of GCs have potentially clouded the question as to how GCs are regulated.
  
  Reviewer #2 (Public Review):
  
  Where this study is interesting is that the authors do a meta-analysis of studies in which metabolic rate was experimentally manipulated and both this rate and glucocorticoid levels were simultaneously measured. Unsurprisingly, there are relatively few such studies and many are from the lab of Michael Romero. While the results of the analysis are compelling, they are not surprising. That said, this work is important.
  
  It is worth noting that in this analysis, the majority of the studies, if not all, are dealing with variation in baseline levels of glucocorticoids. That means the hormone is mostly acting metabolically at these lower levels and not as a stress response hormone as it does when levels are much higher. This difference is probably due to differences in receptors being activated. This could be discussed.
  
  As mentioned in Box 1, within our hypothesis framework we make no distinction between baseline and stress-induced GC-levels, and thereby in effect assume these to be points in a continuum from a metabolic perspective. Our results support this view, as our sample includes baseline- and stressinduced –range GC values, and these are not distinguishable (Fig. 3). We do however recognize that we did not return to this issue in the Discussion, while the same issue may well occur to many readers familiar with the literature. We therefore added the following paragraph to the discussion:
  
  “ Note that in the context of our analysis we made no distinction between ‘baseline’ and ‘stressinduced GC-levels (Box 1). Firstly, because these concepts are not operationally well defined – baseline GC-levels are usually no better defined than ‘not stress-induced’. Secondly, when considering the facilitation of metabolic rate as primary driver of GC regulation, there does not appear a need to invoke different classes of GC-levels instead of the more parsimonious treatment as continuum. This is not to say that this also applies to the functional consequences of GC-level variation: it is well known that receptor types differ in sensitivity to GCs (Landys et al. 2006; Sapolsky et al. 2000; Romero 2004), thereby potentially generating step functions in the response to an increase in GC-levels.”
  
  We note further that to our best knowledge there are no standard or established thresholds that allow us to separate GC levels into “baseline” and “stress-induced”, and in any case these concentration ranges differ strongly among species and experimental set-ups (e.g. captive vs. free-living individuals). Consequently, many of the studies included in our work report what would typically be interpreted as “stress-induced” levels, and thus within the range of those reported by standardized stress protocols (e.g. levels above 20-30 ng/ml for corticosterone in bird species, Cohen et al. 2007, Jimeno et al. 2018; levels between 150-300 ng/ml in captive rats, Buwalda et al. 2012, Beerling et al. 2011; levels 2-10 times above baseline in humans, Sramek et al. 1999). We also want to note that we work with effect sizes, i.e. not GC levels, and that GC measurement units differ among studies. Mean GC values by study in the original units are shown in Table S3.
  
  Reviewer #1 (Recommendations For The Authors):
  
  L26: why is the causality in this direction? Not that I don't think that metabolic rate drives GC variation but the meta-analyses here could suggest the opposite direction as well? That GC phenotype could limit or promote metabolic activity? (In terms of the natural variation studies and not the experimental ones)
  
  See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.
  
  L27: again, I am not sure the meta-analyses can lead to this question. Although there is a tight link between GC and metabolic rate, there is still variation around that is unexplained.
  
  See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.
  
  L45: I think there is plenty of literature in the field that would say that GCs are linked to metabolism and don't define GCs as synonymous with stress. See MacDougall and others that you cite later in the paragraph: "GCs and stress are not synonymous." I think maybe shifting the strong language at the beginning might help with your argument later on.
  
  We do not disagree, but two considerations made us retain the ‘strong language’. Firstly, while many authors mention links between GCs and metabolic rate, as we read the literature, the quantitative importance of this link to understand GC variation is underestimated in our view. Secondly, the literature is rife with articles that clearly do not consider metabolic rate variation as a driver of the GC variation they observe.
  
  Box 1: on the diagram the link between GCs and learning is problematic as there are plenty of studies that show a negative effect on learning with GC exposure. It usually depends on the time course of GCs and learning outcomes.
  
  We agree with the referee´s point. Learning was deleted from the diagram to avoid confusion.
  
  The diagram also suggests that GCs in the blood decreases insulin. For Aves that are rather insulin insensitive, the evidence that GCs affect insulin concentrations are very limited, even in the poultry literature.
  
  Indeed, and we now mention in box 1 that GC effects on insulin are primarily found in mammals, and less so in birds.
  
  Box 1 at the end also makes a point about GCs having complex downstream effects at baseline and stressinduced levels, besides energy mobilization but the abstract seems to indicate that there are limited effects of GCs outside of metabolism. Hence why I also advocate being careful about the wording in the abstract.
  
  The related abstract sentence has been rewritten to avoid this inconsistency (lines 17-18)
  
  L107: "being or not significant" meaning significant or not? The wording is awkward
  
  We reworded the sentence for clarity. We included studies reporting both significant and nonsignificant increases in metabolic rate.
  
  L110: why not look at whether experimental increases in GCs also induce increases in metabolic rate, i.e., the directionality of the two variables. (point 2)
  
  See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.
  
  The studies, although there are ~30, are overlapping in terms of labs, i.e., a lot of them came from the same lab. Did you think to include lab as a random effect to see if there are effects of one or two labs doing work that strengthened the results?
  
  We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). Including Lab as random factor did not affect the results, leading to negligible changes in the model parameters. We provide tables with the model results in our previous response. In the text we now mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)
  
  L314: I think it depends on the time course and intensity of the stressor. I firmly believe that outside of metabolic demands, high levels of GCs chronically or the inability to mount a proper stress response is indicative of pathology or something outside of metabolism.
  
  Whether the association between GCs and MR holds under a context of ‘chronic stress’ (i.e. understood as chronically elevated GCs) remains to be tested. We note, however, that chronically high levels of metabolic rate may potentially have pathological effects.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I find the title a bit misleading. The conclusion from the study is that glucocorticoid levels can reflect metabolic rate, not that glucocorticoid levels do not indicate stress. Remember, stress can certainly affect metabolic rate.
  
  We see the point but note that other drivers of variation in metabolic rate also increase GCs, as we show in our analysis, and hence we propose that GC variation always indicate variation metabolic rate, and only stress when stress is the cause of the increase in metabolic rate.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.17.537243v3
www.biorxiv.org www.biorxiv.org

Iridescent structural coloration in a crested Cretaceous enantiornithine bird from Jehol Biota

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Li et al describe a novel form of melanosome based iridescence in the crest of an Early Cretaceous enantiornithine avialan bird from the Jehol Group.
  
  Strengths:
  
  Novel set of methods applied to the study of fossil melanosomes.
  
  Weaknesses:
  
  (1) Firstly, several studies have argued that these structures are in fact not a crest, but rather the result of compression. Otherwise, it would seem that a large number of Jehol birds have crests that extend not only along the head but the neck and hindlimb. It is more parsimonious to interpret this as compression as has been demonstrated using actuopaleontology (Foth 2011).
  
  Firstly, we respectfully acknowledge the reviewer’s interpretation.
  
  However, the new specimen we report here is distinct as preserved from Confuciusornis (Foth 2011), which belongs to a different clade and exhibits a differently preserved feather crest of a different shape compared to the species described in this study. Figure 3a Foth 2011, Paläontologische Zeitschrift；the cervical feather is much longer than feather from head region in the specimen the referee talked about; It is quite incompletely preserved and much shorter in proportional length (relative to the skull) than the specimen we sampled (see picture below).
  
  Author response image 1.
  
  Our new specimen with well-preserved and the feather crest were interpretated as the originally shaped；the cervical feather is largely absent or very short
  
  In the new specimen there is a large feather crest that gradually extends from the cranial region of the fossil bird, rather than the cervical region, as observed in the previously proposed Confuciusornis crest. The feather crest extends in a consistent direction (caudodistally), and the feathers in the head region of the bird are exceptionally well-preserved, retaining their original shape. The feathers are measured about 1- 2cm at their longest barb. Feathers in the neck are much shorter (see Confuciusornis picture above).
  
  (2) The primitive morphology of the feather with their long and possibly not interlocking barbs also questions the ability of such feathers to be erected without geologic compression.
  
  We acknowledge that the specimen must have undergone some degree of compression during diagenesis and fossilization. Given that the rachis itself is already sufficiently thick (that the ligaments everting a crest would attach to), we conclude that it had the structural integrity to remain erect on the skull.
  
  (3) The feather is not in situ and therefore there is no way to demonstrate unequivocally that it is indeed from the head (it could just as easily be a neck feather)
  
  We conclude that it belongs to the head based on the similar suture, overall length, and its close position to the caudal part of the head. There are no similar types of feathers nearby, such as those found on the neck or other areas, which is why we reason that it is a head crest feather. Besides, the shape of the feather we sampled is dramatically different from the much softer and shorter ones detected on the neck.
  
  In addition, we further sampled the crest feather barb from in situ preserved feather crest. We also detected a similar pattern to what we originally found regarding the packing of melanosomes. This is now added to the text.
  
  (4) Melanosome density may be taphonomic; in fact, in an important paper that is notably not cited here (Pan et al. 2019) the authors note dense melanosome packing and attribute it to taphonomy. This paper describes densely packed (taphonomic) melanosomes in non-avian avialans, specifically stating, "Notably, we propose that the very dense arrangement of melanosomes in the fossil feathers (Fig. 2 B, C, and G-I, yellow arrows) does not reflect in-life distribution, but is, rather, a taphonomic response to postmortem or postburial compression" and if this paper was taken into account it seems the conclusions would have to change drastically. If in this case the density is not taphonomic, this needs to be justified explicitly (although clearly these Jehol and Yanliao fossils are heavily compressed).
  
  We have added a line acknowledging this possibility. We have accounted for the shrinkage effects caused by heat and compression, as detailed in our Supplementary Information (SI) file. Even when these changes are considered, they do not alter the main conclusions of our study. Besides given most melanosomes we used for simulation are mostly complete and well preserved，we consider the distortion is rather limited or at least minor compared to changes seen in taxonomic experiment shown.
  
  (5) Color in modern birds is affected by the outer keratin cortex thickness which is not preserved but the authors note the barbs are much thicker (10um) than extant birds; this surely would have affected color so how can the authors be sure about the color in this feather?
  
  In extant birds, feather barbs of similar size are primarily composed of air spaces and quasi-ordered keratin structures, largely lacking dense melanosomes. The color-producing barb we have described here does not directly correspond to a feather type in modern birds for comparison. Since there is no direct extant analog to inform the keratin thickness and similar melanosome density, we utilize advanced 3-D FDTD modeling approach to the question of coloration reconstruction, rather than relying on statistical DFA approaches. In additional to packed melanosomes, the external thin keratin cortex layer is also considered for the simulation.
  
  Additionally, even in the thinner melanosome-packed layers of barbules in living birds, iridescent coloration often is observed (e.g., Rafael Maia J. R. Soc. Interface 2009). This further supports the plausibility of our modeling approach and its relevance to understanding coloration in both extinct and extant species.
  
  (6) Authors describe very strange shapes that are not present in extant birds: "...different from all other known feather melanosomes from both extant and extinct taxa in having some extra hooks and an oblique ellipse shape in cross and longitudinal sections of individual melanosome" but again, how can it be determined that this is not the result of taphonomic distortion?
  
  We consistently observed similar hook-like structures not only in this feather but also in feathers from different positions of the crest. We do not believe that distortion would produce such a regular and consistent pattern; instead, distortion likely would result in random alterations, as demonstrated by prior taphonomic experiments.
  
  (7) The authors describe the melanosomes as hexagonally packed but this does not appear to be in fact the case, rather appearing quasi-periodic at best, or random. If the authors could provide some figures to justify this hexagonal interpretation?
  
  To further validate the regional hexagonal pattern, we expanded our sampling to additional sites. We observed similar patterns not only in various regions of the same barb but also across different feathers (see added SI Figures below). This extensive sampling supports the validity of the melanosome patterns identified in our original analysis.
  
  (8) One way to address these concerns would be to sample some additional fossil feathers to see if this is unique or rather due to taphonomy
  
  We sampled additional areas from the same feather as well as feathers from other regions of the head crest. The packing patterns are generally similar with slight variations in size (figure S6).
  
  (9) On a side, why are the feet absent in the CT scan image? "
  
  To achieve better image resolution, the field of view was adjusted, resulting in part of the feet being excluded from the CT scan.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors reconstructed the three-dimensional organization of melanosomes in fossilized feathers belonging to a spectacular specimen of a stem avialan from China. The authors then proceed to infer the original coloration and related ecological implications.
  
  Strengths:
  
  I believe the study is well executed and well explained. The methods are appropriate to support the main conclusions. I particularly appreciate how the authors went beyond the simple morphological inference and interrogated the structural implications of melanosome organization in three dimensions. I also appreciate how the authors were upfront with the reliability of their methods, results, and limitations of their study. I believe this will be a landmark study for the inference of coloration in extinct species and how to interrogate its significance in the future.
  
  We thank the referee for these positive comments.
  
  Weaknesses:
  
  I have a few minor comments.
  
  Introduction: I would suggest the authors move the paragraph on coloration in modern birds (lines 75-97) before line 64, as this is part of the reasoning behind the study. I believe this change would improve the flow of the introduction for the general reader.
  
  We thank the referee for the suggestion, and we made changes accordingly to improve the flow of introduction.
  
  Melanosome organization: I was surprised to find little information in the main text regarding this topic. As this is one of the major findings of the study, I would suggest the authors include more information regarding the general geometry/morphology of the single melanosomes and their arrangement in three dimensions.
  
  We thank the referee for this suggestion. We elaborated on the details of the melanosomes in the results as follows:
  
  Hooks are commonly observed on the oval-shaped melanosomes in cross-sectional views, with two dominant types identified on the dorsal and ventral sides (Figure 3c-d, red arrows). These hooks are deflected in opposing directions, linking melanosomes from different arrays (dorsal-ventral) together. The major axis(y) of the oval-shaped melanosomes (mean = 283 nm) is oriented toward the left side in cross-section, while the shorter axis(x) measures approximately 186 nm (Table S2). In oblique or near-longitudinal sections (Figure 3e-f), the hooked structures’ connections to the distal and proximal sides of neighboring melanosomes are clearly visible (blue arrows, Figure 3f). A similar pattern occurs in two additional regions of interest within the same feather (figure S5). Although the smaller proximal hooks in these sections are less distinct, this may reflect developmental variation during melanosome formation along the feather barb. Significantly smaller hooks were also observed in cross-sections of in-situ feather barbs from the anterior side of the feather crest (figure S6). The mean long axis (z) of the melanosomes is approximately 1774 nm (Table S2). Based on these observations, we propose that the hooked structures—particularly those on the dorsal, ventral, proximal, and distal sides of the melanosomes—enhance the structural integrity of the barb (figure S7). However, these features may be teratological and unique to this individual, as no similar structures have been reported in other sampled feathers. These hooks may stabilize the stacked melanosome rods and contribute to increased barb dimensions, such as diameter and length. The sections exhibit modified (or asymmetric) hexagonally packed melanosomes with presence of extra hooked linkages (Figure 3c-d and e-f). The long rod-like melanosomes are different from all other known feather melanosomes from both extant and extinct taxa in having some extra hooks and an oblique ellipse shape in cross and longitudinal sections of individual melanosomes (Durrer 1986, Zhang, Kearns et al. 2010). The asymmetric packing of the melanosomes (the major axis leans leftward) played a major role in the reduction of fossilized keratinous matrix within the barbs, which may correspond to a novel structural coloration in this extinct bird. The close packed hexagonal melanosome pattern found in extant avian feathers yield rounded melanosome outlines in contrast to the oval-shaped melanosomes (see figure S8, x<y) in the perpendicular section here. The asymmetric compact hexagonal packing (ACHP) of the melanosomes is different from the known pattern of melanosomes formed in the structure of barbules among extant birds (Eliason and Shawkey 2012), which has been seen as a regular hexagonal organization. The packing of the melanosomes in an asymmetric pattern, on the microscopic level, might be related to the asymmetrical path of the barb extension direction observed at the macroscopic level (figure S5).
  
  Added Supplemental figure S5. STEM images of cross-sections taken from three different positions (indicated by white dashed lines in a) demonstrate similar melanosome packing styles. Dashed-lines labeled in (a) indicate where the corresponding position of these sections were taken, black arrows indicate the individual barbs that accumulated together in this long crest father. One distinct feature of these sections is the hooked-link structure that aligns the melanosomes into a modified hexagonal, packed arrangement. White arrows (in c, e, g) indicate the hooked structures observed in the selected melanosomes.
  
  Added Supplemental figure S6. STEM images showing melanosome structure from three fragments of the feather crest (indicated by dashed lines and white box in a) reveal the hooked linkages between melanosomes and their surrounding melanosomes structures in (b), (c) and (d). Due to the shorter length of these feather barbs, the hook structures are not as well-defined as those in the longer feather samples shown in the main text.
  
  Keratin: the authors use such a term pretty often in the text, but how is this inference justified in the fossil? Can the authors extend on this? Previous studies suggested the presence of degradation products deriving from keratin, rather than immaculated keratin per se.
  
  We changed to keratinous matrix and material instead. We observed matrix/material in between these melanosomes were filled by organic rich tissue that is proposed to possibly be taphonomically altered keratin.
  
  Ontogenetic assessment: the authors infer a sub-adult stage for the specimen, but no evidence or discussion is reported in the SI. Can the authors describe and discuss their interpretations?
  
  Thanks for the suggestion. We made an osteo-histological section and add our evaluation of the histology of the femoral bone tissue sampled from the specimen to justify assessment of its ontogenetic stage.
  
  See Supplemental figure S2 for Femur Osteo-Histology
  
  SI file Femur Osteo-Histology
  
  Ground sections were acquired from the right side of the femur to assess the osteo-histological features of the bone and its ontogenetic stage. As shown in figure S2, long, flat-shaped lacunae are widely present and densely packed throughout the major part of the bone section. Very few secondary osteocytes are present, and parallel-fibered bone tissue is underdeveloped. The flattened osteocyte lacunae dominate the cellular shape, with observable vascular canals connecting different lacunae. Overall, the osteo-histology indicates that the bird was still in an active growth stage at the time of death, suggesting it was in its sub-adult growth phase.
  
  CT scan data: these data should be made freely available upon publication of the study.
  
  We will release our CT scanning on an open server (https://osf.io/kw7sd/) along with the final version of the manuscript.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The paper presents an in-depth analysis of the original colour of a fossil feather from the crest of a 125-million-year-old enantiornithine bird. From its shape and location, it would be predicted that such a feather might well have shown some striking colour and pattern. The authors apply sophisticated microscopic and numerical methods to determine that the feather was iridescent and brightly coloured and possibly indicates this was a male bird that used its crest in sexual displays.
  
  Strengths:
  
  The 3D micro-thin-sectioning techniques and the numerical analyses of light transmission are novel and state-of-the-art. The example chosen is a good one, as a crest feather is likely to have carried complex and vivid colours as a warning or for use in sexual display. The authors correctly warn that without such 3D study feather colours might be given simply as black from regular 2D analysis, and the alignment evidence for iridescence could be missed.
  
  Weaknesses: Trivial.
  
  Recommendations for the authors:
  
  Reviewer #3 (Recommendations for the authors):
  
  In a few places, the paper can be strengthened:
  
  Dimensionality of study method: In the first paragraph, you set things up (lines 60-62) to say that studies hitherto have been of melanosomes and packing in two dimensions... and I then expect you to say soon after, in the next paragraph, 'Here, we investigate a fossil feather in three dimensions...' or some such, but you don't.
  
  You come back to Methods at the end of the Introduction (lines 97-101), but again do not say whether you model the feather in three dimensions or not. Yes, you did - I finally learned at line 104 - you did micro serial sectioning. This needs to shift a long forward into the Introduction.
  
  Thanks for the suggestions, we utilize serial sectioning to get a different view of the microbodies that are proposed to be melanosomes and reconstructed the three-dimensional volume of the melanosomes, as well as the intercalated keratin.
  
  We restructured the introduction and make clear that the three-dimensional data obtained in this study also was used for modeling and in a more anterior position in the text.
  
  In the Results, there are not enough references to images. It's not enough to refer generally to 'Figures 3c-f' [line 133] and then go on to rapidly step through some amazing imagery (text lines 133-146) - you need to add an image citation to each observation so readers can know exactly which image is being described each time.
  
  We elaborated our description of imaging to better describe the melanosomes in our results section. We add the description of the stack of melanosomes as IN Above (reply of Reviewer #2).
  
  The 3D data in Figures 3 and 4 is great and based on huge technical wizardry. The sketch model in Figure 4a is excellent, but could you not attempt an actual 3D block diagram showing the hexagonal arrangement of clusters of aligned melanosomes?
  
  We have also tried FIB -SEM in an additional place for validation of our ultrathin sections data. See the SI file.
  
  Added figure S7. Targeted feather barb block prepared in FIB-SEM, with volume rendering reconstruction based on the acquired sequential cross-sectional images; the volume reconstruction is visualized in the x-y plane (c-cross section view) and in x-z plane (d-sagittal section view).
  
  Modified Figure S8d shows the 3D model of aligned melanosomes. To show the arrangement more clearly, the schematic XY cross-section of the melanosomes 3D model is shown below (also shown in Supplementary Figure S8d).
  
  35: delete 'yield'
  
  Changed
  
  73: 'feather fell' ? = 'feather that has fallen'
  
  Changed
  
  305: excises ?= exercises
  
  Changed
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.21.619362v2
www.biorxiv.org www.biorxiv.org

TIPE drives a cancer stem-like phenotype by promoting glycolysis via PKM2/HIF-1α axis in melanoma

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Tian et al. describe how TIPE regulates melanoma progression, stemness, and glycolysis. The authors link high TIPE expression to increased melanoma cell proliferation and tumor growth. TIPE causes dimerization of PKM2, as well as translocation of PKM2 to the nucleus, thereby activating HIF-1alpha. TIPE promotes the phosphorylation of S37 on PKM2 in an ERK-dependent manner. TIPE is shown to increase stem-like phenotype markers. The expression of TIPE is positively correlated with the levels of PKM2 Ser37 phosphorylation in murine and clinical tissue samples. Taken together, the authors demonstrate how TIPE impacts melanoma progression, stemness, and glycolysis through dimeric PKM2 and HIF-1alpha crosstalk.
  
  Strengths:
  
  The authors manipulated TIPE expression using both shRNA and overexpression approaches throughout the manuscript. Using these models, they provide strong evidence of the involvement of TIPE in mediating PKM2 Ser37 phosphorylation and dimerization. The authors also used mutants of PKM2 at S37A to block its interaction with TIPE and HIF-1alpha. In addition, an ERK inhibitor (U0126) was used to block the phosphorylation of Ser37 on PKM2. The authors show how dimerization of PKM2 by TIPE causes nuclear import of PKM2 and activation of HIF-1alpha and target genes. Pyridoxine was used to induce PKM2 dimer formation, while TEPP-46 was used to suppress PKM2 dimer formation. TIPE maintains stem cell phenotypes by increasing the expression of stem-like markers. Furthermore, the relationship between TIPE and Ser37 PKM2 was demonstrated in murine and clinical tissue samples.
  
  Weaknesses:
  
  The evaluation of how TIPE causes metabolic reprogramming can be better assessed using isotope tracing experiments and improved bioenergetic analysis.
  
  Thank you very much for your suggestions. Unfortunately, we cannot complete the isotope tracing experiments due to the lack of instruments, nor with the help of the company after consulting several companies. We are very sorry for this imperfect experiment, and we have discussed this disadvantage in our manuscripts. Moreover, due to our negligence, there was only three metabolites were presented in the previous manuscripts. However, we have performed the routine untargeted metabolomics to demonstrate how TIPE causes metabolic reprogramming. We have added the detailed results as a new figure named as Figure S3, in which, the glycolysis pathway particularly pyruvate and lactic acid is decreased after TIPE interference.
  
  Reviewer #2 (Public Review):
  
  In this article, Tian et al present a convincing analysis of the molecular mechanisms underpinning TIPE-mediated regulation of glycolysis and tumor growth in melanoma. The authors begin by confirming TIPE expression in melanoma cell lines and identify "high" and "low" expressing models for functional analysis. They show that TIPE depletion slows tumour growth in vivo, and using both knockdown and over-expression approaches, show that this is associated with changes in glycolysis in vitro. Compelling data using multiple independent approaches is presented to support an interaction between TIPE and the glycolysis regulator PKM2, and the over-expression of TIPE-promoted nuclear translocation of PKM2 dimers. Mechanistically, the authors also demonstrate that PKM2 is required for TIPE-mediated activation of HIF1a transcriptional activity, as assessed using an HRE-promoter reporter assay, and that TIPE-mediated PKM2 dimerization is p-ERK dependent. Finally, the dependence of TIPE activity on PKM2 dimerization was demonstrated on tumor growth in vivo and in the regulation of glycolysis in vitro, and ectopic expression of HIF1a could rescue the inhibition of PKM2 dimerization in TIPE overexpressing cells and reduced induction of general cancer stem cell markers, showing a clear role for HIF1a in this pathway. The main conclusions of this paper are well supported by data, but some aspects of the experiments need clarification and some data panels are difficult to read and interpret as currently presented.
  
  The detailed mechanistic analysis of TIPE-mediated regulation of PKM2 to control aerobic glycolysis and tumor growth is a major strength of the study and provides new insights into the molecular mechanisms that underpin the Warburg effect in cancer cells. However, despite these strengths, some weaknesses were noted, which if addressed will further strengthen the study.
  
  (1) The analysis of patient samples should be expanded to more directly measure the relationship between TIPE levels and melanoma patient outcome and progression (primary vs metastasis), to build on the association between TIPE levels and proliferation (Ki67) and hypoxia gene sets that are currently shown.
  
  Thanks for your suggestions, we have added the relationship between TIPE levels and progression (non-lymph node metastasis vs lymph node metastasis). In addition, we added the association between TIPE and Ki67 or LDH levels as your advised, as shown in Figure 7.
  
  However, the relationship between TIPE levels and melanoma patient outcome is not presented in this article. One reason is that the tissue microarray lack of the survival data. Interestingly, the TCGA dataset showed that the higher TIPE expression has a favorable prognosis for melanoma. We are also very curious about this. Our following study indicated that TIPE might serve as a positive regulator of PD-L1. Therefore, the higher expression of TIPE presents more sensitive tendency to immunotherapy, resulting in a favorable prognosis in melanoma. The detailed mechanisms will be discussed in our following article, and we hope that it might as a continuous research topic for TIPE in melanoma.
  
  We just only disclose a little information that TIPE has a similar survival and immune signature to PD-L1 and PD-1 in melanoma as following:
  
  Author response image 1.
  
  (2) The duration of the in vivo experiments was not clearly defined in the figures, however, it was clear from the tumor volume measurements that they ended well before standard ethical endpoints in some of the experiments. A rationale for this should be provided because longer-duration experiments might significantly change the interpretation of the data. For example, does TIPE depletion transiently reduce or lead to sustained reductions in tumor growth?
  
  Thanks for your suggestions. Actually, we have performed a pre-experiment before the formal experiments, and all the time points were referred to this. Furthermore, we have added the detailed time points into the figure legends as you suggested.
  
  (3) The analysis of general cancer stem cell markers is solid and interesting, however inclusion of neural crest stem cell markers that are more relevant to melanoma biology would greatly strengthen this aspect of the study.
  
  Thanks for your advices. We have selected two neural crest stem cell markers including Nestin and Sox10 to test their expression after overexpression of TIPE in G361 cells or interference of TIPE in A375 cells.
  
  (4) The authors should take care that all data panels are clearly readable in the figures to facilitate appropriate interpretation by the reader.
  
  Thanks for your suggestions. We have amended the data panels according to you advises to ensure it is clear and professionally presented.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Major points
  
  (1) In Figure 1D, glucose, pyruvate, and lactate were measured at a steady state. However, metabolites at steady state do not accurately depict changes in pathway activity. An isotope tracing experiment (i.e., using labelled 13C glucose) can be used to study glucose catabolism into pyruvate, as well as tracing into lactate or into the TCA cycle following changes in TIPE expression. In addition, although the authors point towards changes in metabolic reprogramming, only three metabolites were measured. The use of isotope tracing to monitor metabolites from more than one pathway would be suggested to support the claim that metabolism is being reprogrammed due to TIPE.
  
  Thank you very much for your suggestions. Unfortunately, we cannot complete the isotope tracing experiments due to the lack of instruments, nor with the help of the company after consulting several companies. We are very sorry for this imperfect experiment, and we have discussed this disadvantage in our manuscripts. Moreover, due to our negligence, there was only three metabolites were presented in the previous manuscripts. However, we have performed the routine untargeted metabolomics to demonstrate how TIPE causes metabolic reprogramming. We have added the detailed results as a new figure named as Figure S3, in which, the glycolysis pathway particularly pyruvate and lactic acid is decreased after TIPE interference.
  
  (2) In Figure 1H, extracellular acidification was used to determine glycolytic activity. However, bicarbonate secretion can also greatly affect pH, and should be considered (PMID 25449966). Although total ATP content was measured, the contribution of ATP from glycolysis can be also determined (see PMID 28270511) to provide a more accurate representation of glycolytic ATP production.
  
  Thanks for your suggestions again. As described at the above, we will improve our measurement methods in the future, and we have discussed our weakness in the manuscripts.
  
  (3) On page 5, lines 108-111, the authors show that "This process represents an important regulator of the TIPE family switching between oxidative phosphorylation and aerobic glycolysis, paving the way for cancer-specific metabolism in response to low-oxygen challenge." However, there is no data on oxidative phosphorylation. What is the effect of TIPE on oxygen consumption?
  
  Thanks for your careful and professional advices. We have conducted a thorough review of the manuscript for language accuracy and corrected this term to eliminate confusion and ensure the text is clear and professionally presented.
  
  Minor points
  
  (1) On page 3, line 68, it is unclear what is increasing lactate levels, as lactate can be transported inside of cells.
  
  Thanks for your suggestions, we have corrected this misdescription to improve the overall quality and readability of the manuscript.
  
  (2) In Figure 1B, RNA sequencing was performed on TIPE overexpressing G361 cells. The "ribosome" pathway has the highest count and lowest p-value. However, there is no mention of this in the text.
  
  Thanks for your suggestions, we selected aerobic glycolysis as our major story comprehensively according to the transcriptomics, metabolomics and the Co-IP/MS results. Anyway, the "ribosome" pathway as you pointed might is our next research topic in the future.
  
  (3) It would be helpful to include the cell line in Figure S1B-C as well as in the figure legend.
  
  Thanks for your suggestions, we have added the cell line into Figure S1B-C as well as in the figure legend.
  
  (4) Concerning supplementary figures, it would be helpful to include the panel numbers when referring to them in the main text (see line 120 or 122 as an example).
  
  Thanks for your suggestions, we have added the panel numbers when referring to them in the main text.
  
  (5) The sentence on lines 127-131 is very confusing.
  
  Thanks for your suggestions, we have corrected the improper descriptions as you mentioned.
  
  (6) In Figure S3, qPCR is misspelled in the figure legend. Also, it would be helpful to include what is meant by "relative expression" on the y-axis of Figure S3A.
  
  Thanks for your suggestions, we have corrected the errors as you pointed. Due to the y-axis represents the expression both of TIPE and HIF-1α, the present description might be more suitable.
  
  (7) There is an extra space on line 196.
  
  Thanks for your suggestions, we have corrected as you pointed.
  
  (8) In Figure 7E LDH staining was performed. Which isoform of LDH was detected?
  
  Actually, we stained total LDH in Figure 7E.
  
  (9) On line 931, Warburg is misspelled.
  
  Thanks for your suggestion, we have corrected all mentioned typos, including " Warburg " in lines 931.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Major comments:
  
  - Supplementary Figure 2G. Unit of time measurement for tumor growth panel needs to be defined. If this refers to days, 5 days is a relatively short period to assess tumor growth differences in vivo, and indeed, 1000-1200mm3 is a standard ethical end-point for these types of models, and this experiment was concluded well before reaching these tumor sizes. Can the authors explain why they ended this experiment at this timepoint?
  
  Thanks for your suggestions. As you suggested, we have added the detailed time points into the figure legends. Actually, we have performed a pre-experiment before the formal experiments, and all the time points were referred to this.
  
  - Supplementary Figure 2j - Correlation analysis between TIPE expression and overall survival outcome in melanoma patients is more relevant to support the experimental observations described in the paper than the correlation with Ki67. This analysis should also be provided. In addition, is there any difference in TIPE expression between primary and metastatic melanoma patients which would then more directly link TIPE with melanoma progression in patients?
  
  The relationship between TIPE levels and melanoma patient outcome is not presented in this article. One reason is that the tissue microarray lack of the survival data. Interestingly, the TCGA dataset showed that the higher TIPE expression has a favorable prognosis for melanoma. We are also very curious about this. Our following study indicated that TIPE might serve as a positive regulator for PD-L1. Therefore, the higher expression of TIPE presents more sensitive tendency to immunotherapy, resulting in a favorable prognosis in melanoma. The detailed mechanisms will be discussed in our following article, and we hope that it might as a continuous research topic for TIPE in melanoma.
  
  Furthermore, we have added the relationship between TIPE levels and progression (non-lymph node metastasis vs lymph node metastasis), and Ki67 in Figure 7.
  
  - Figure 2 - The A2 domain protein represents a substantial reduction in the size of PKM2, which would likely have other structural effects that could affect interactions with TIPE. This should be discussed by the authors because, in this reviewer's opinion, the data presented do not shed light on the specific TIPE domain requirements for the interaction with PKM2.
  
  Thanks for your suggestions. We have discussed this phenomenon in our manuscripts.
  
  - Figure 4: The authors show that PKM2 recruitment to the promoters of GLUT1 and LDHA is induced by TIPE expression. Is HIF1a recruitment also induced by TIPE? This is a key gap in the detailed molecular analysis provided by the authors.
  
  Thanks for your suggestions. This phenomenon you mentioned is very interesting, however, the expression of GLUT1 and LDHA was completely decreased when we overexpression of TIPE and PKM2 (S37A) compared to overexpression of TIPE and wild PKM2. Therefore, we believe that the higher expression of GLUT1 and LDHA was primarily promoted by TIPE-induced PKM2 recruitment.
  
  - Figure 6: The authors present nice data for general pluripotency/stem cell markers however given melanocytes arise from the neural crest, and neural crest markers are expressed during melanoma initiation and response to therapies, analysis of neural crest stem cell markers would be appropriate to include in this analysis. For example, Sox10, Pax3, NGFR, and AQP2 have all been identified as neural crest stem cell markers expressed in both melanoma patients and experimental models.
  
  Thanks for your advices. We have selected two neural crest stem cell markers including Nestin and Sox10 to test their expression after overexpression of TIPE in G361 cells or interference of TIPE in A375 cells.
  
  Minor comments:
  
  - All Figure and Supplementary Figure legends should indicate how many replicate experiments the data represents, and all error bars should be defined (StDev vs SEM).
  
  We have added as you suggested.
  
  - Supplementary Figure S1C - can the authors confirm the densitometry values on the western, as the band looks to be considerably larger than 1.6 fold higher compared to the control?
  
  We redone the densitometry measurement by ImageJ. However, the result still the same.
  
  - FACs panels in Supplementary Figure 2C-D are unreadable and should be enlarged.
  
  - Supplementary Figure S2i - quantification of Ki67 images appears warranted.
  
  - Supplementary Figure S2j - The text in the figure panel is too small and needs to be increased so the data can be interpreted accurately. Also, the authors should confirm the data is specifically from melanoma patients in the figure legend.
  
  We have improved the quality of the figures and revised their descriptions for greater clarity and coherence, ensuring that they effectively highlight the key results of our study.
  
  - Figure 1A - text on the heat map cannot be read. Gene-level information can be removed, and sample labels should be made larger. In panel D, no statistical analysis is shown for the metabolomics analysis. These should be added, or the authors should modify the text when referring to these data.
  
  We have improved the quality of the figures and revised their descriptions for greater clarity and coherence, ensuring that they effectively highlight the key results of our study.
  
  - Line 127: RNAseq data does not indicate a change in metabolites; text should be changed to say "TIPE dramatically promoted expression of genes...".
  
  We have corrected as you suggested.
  
  - Supplementary Figure S3c - Labels and correlation values are not readable.
  
  - Figure 2A - The text and details in the figure are difficult to read.
  
  - Figure S4 D-H - text in figure panels too small to read.
  
  Thank you for above three questions, we have carefully reviewed the entire document to ensure all figures are clear and correctly cited, preventing any confusion and maintaining the integrity of our research findings.
  
  - Figure 3 - the legend restates the major observations and interpretations of the figure, however does not contain enough information about what the data represents or how it was generated. The interpretation of the data should be made in the main text. For example, in panel 3. F-G the number of individual cells quantified for the analysis should be stated. In addition, given the data are generated from two completely independent cell lines, it would be more appropriate to have separate graphs for the A375 cells and G361 cells. The signal levels in the respective controls at baseline are very different, and plotted together without clear labels, making the reader question the validity of the data when this just reflects different basal signals in different cell models.
  
  We have separated the graphs for the A375 cells and G361 cells.
  
  - Figure 4 B-C - IgG controls are missing in Co-IP experiments.
  
  We have added the IgG controls as you suggested.
  
  - Figure 5F - The unit of measure of time should be indicated on the axes; is this days?
  
  We measured the tumor volumes for 7 times every 5 days. We have added the detailed description in the materials and methods section.
  
  - Line 348: error in text, mammosphere which should presumably be tumorsphere if from melanoma cells.
  
  Thanks for your suggestions, we have corrected this term to "tumorsphere" and conducted a thorough language and grammar review of the manuscript to ensure its professional presentation.
  
  - Methods: more experimental details for the transcriptomic, mass spec, and metabolomics studies should be provided. There are insufficient details if readers wish to repeat these experiments.
  
  Thanks for your suggestions, we have corrected as you advised.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.14.567124v2
www.biorxiv.org www.biorxiv.org

The replication principle revisited: a shared functional organization between pulvinar-cortical and cortico-cortical connectivity and its structural and molecular imaging correlates

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The current work explored the link between the pulvinar intrinsic organisation and its functional and structural connectivity patterns of the cortex using different dimensional reduction techniques. Overall they find relationships between pulvinar-cortical organization and cortico-cortical organization, and little evidence for clustered organization. Moreover, they investigate PET maps to understand how neurotransmitter/receptor distributions vary within the pulvinar and along its structural and functional connectivity axes.
  
  Strengths:
  
  There is a replication dataset and different modalities are compared against each other to understand the structural and functional organisation of the pulvinar complex.
  
  Weaknesses:
  
  (1) What is the motivation of the study and how does this work extend previous assessments of the organization of the complete thalamus within the gradient framework?
  
  Thank you for raising this central question. As already mentioned in the main text, pulvinar is one of the largest and prototypical associative nuclei, yet its organizational principles in the human brain remain relatively unexplored. The substantial body of anatomical research conducted in primate species suggests the coexistence of multiple coexisting and overlapping corticotopic representations on the pulvinar complex.
  
  Existing connectivity-based parcellation studies of pulvinar organization often overlook these organizational principles, as the resulting parcellation may reflect a linear combination of single overlapping connectopies rather than accurately capturing their distinct and unique spatial arrangement.
  
  Investigations of thalamic connectivity have already revealed overarching organizational principles within the thalamus, which are partially reflected in its cytoarchitecture subdivision. These principles are associated with core and matrix thalamic neuronal subpopulation, and their distinct contributions to large-scale connectivity networks.
  
  Since gradient selection relies on the explained variance of the diffusion embeddings, and pulvinar-cortical connectivity likely accounts for only a limited portion of the variance in thalamocortical connectivity, we chose to focus specifically on the pulvinar nucleus. This approach was intended to ensure that the local connectivity principles of the pulvinar are not overshadowed by the broader connectotopical organization of the entire thalamus.
  
  This rationale aligns with findings in topographically organized regions of the cerebral cortex, such as M1, S1 or visual areas. In these regions, distinct principles of topographical organization are not readily apparent when analyzing whole-brain connectivity embedding but emerge when dimensionality reduction is applied to region-specific connectivity data.
  
  (2) Why is the current atlas chosen for the delineation of the pulvinar and individualized maps not considered? Given the size of the pulvinar, more validation of the correctness of the atlas may be helpful.
  
  To improve signal-to-noise ratio and in alignment with previous studies, we performed diffusion embedding on the group-level, averaged connectivity matrices rather than estimating gradients at the individual subject level.
  
  The decision to use a standard-space atlas for pulvinar delineation, rather than individualized parcellation, was driven by technical considerations: 1) functional MRI data were already transformed to MNI space; and 2) individualized parcellation of thalamic nuclei can result in varying pulvinar volumes across subjects, complicating the averaging of connectivity data. By using a standard-space atlas, we ensured that connectivity was consistently extracted from the same set of voxels across all subjects.
  
  We selected the AAL3 atlas (Rolls et al., 2020)over other existing thalamic atlases for practical reasons: the atlas incorporates an ex-vivo thalamic parcellation (Iglesias et al., 2018) with a specific delineation of pulvinar nuclei, which was necessary for subsequent analyses. In the revised version of the manuscript, to validate our findings, we replicated the pulvinar gradient using a different pulvinar delineation from a recent, thalamus-specific atlas (Su et al., 2019). Notably, the spatial distribution of pulvinar connectivity and coexpression gradients remained consistent, regardless of the choice of the thalamic atlas, underscoring the robustness of our results.
  
  (3) Overall the study feels a little incremental and a repetition of what others have done already in the thalamus. It would be good to know how focusing only on the pulvinar changes interpretation, for example by comparing thalamic and pulvinar gradients?
  
  The authors acknowledge the existing body of literature that has examined thalamic connectivity under the lens of the connectivity gradient framework. While these studies may provide valuable insights into the functional topography of the pulvinar complex -given its prominent role within the thalamus - we contend that a focused analysis of pulvinar connectivity offers a unique opportunity to uncover the specific organization principles of this nuclear complex. By isolating the pulvinar, we aimed to avoid the potential overshadowing of its local connectivity patterns by the broader connectotopical organization of the entire thalamus. However, as we believe that our findings are best interpreted within the broader context of general thalamic connectivity organization, we have included an additional paragraph in the Discussion section, which explores the similarities and differences between thalamic and pulvinar gradients, offering a more integrative perspective on our results.
  
  “In recent years, different works have explored the spatial arrangement of thalamic connectivity within a connectivity gradient framework. Diffusion embedding of thalamocortical functional connectivity has revealed a principal, medio-lateral gradient that was found correlated to thalamic structural subdivisions, and a secondary, antero-posterior gradient associated with thalamic functional subfields, and showing progression from unimodal sensorimotor cortical networks to multimodal attention and associative networks. Interestingly, the principal thalamic gradient shows a medio-lateral arrangement on the pulvinar axis while the secondary gradients correspond more to a ventral-dorsal pulvinar axis (Yang et al. 2020). In particular, further independent investigations have suggested that the progressing pattern of thalamic connectivity from unimodal to transmodal cortices is strongly associated to the local density of core and matrix cell types, thus establishing a link between molecular properties and functional connectivity dynamics (Müller et al. 2020; Huang et al. 2024). Our findings complement and expand the existing literature by revealing a similar arrangement of cortical connectivity patterns on the pulvinar complex, and elucidating its relationship to in-vivo estimates of molecular markers of neurotransmission. We found that the gradient associated to unimodal-transmodal cortical connectivity accounted for the highest percentage of variance of variance in cortico-pulvinar connectivity, in line with its well-acknowledged role of associative nucleus. It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.”
  
  (4) Could it be that the gradient patterns stem from lacking anatomical and functional resolutions (or low SNR) therefore generating no sharp boundaries?
  
  The gradient organization described in our results is aligns with anatomical evidence on non-human primates (Shipp, 2003), and with existing neuroimaging studies in humans, which report limited correspondence between connectivity-based hard clustering solutions and histological delineation of pulvinar nuclei. However, we recognize the critical importance of assessing the impact of SNR on connectivity measures derived from functional and structural MRI. In the revised manuscript, we have included an additional analysis to investigate the potential impact of local noise on gradient reconstruction. This analysis involved sampling voxel-wise SNR estimates in the pulvinar from both BOLD and diffusion-weighted MRI data, averaging these estimates to generate group-level, modality-specific SNR maps. We then assessed spatial correlations between these maps and the gradient embeddings using the same methodological framework employed throughout the study. Our findings indicate that functional connectivity gradients are weakly, but significantly correlated to SNR, with the strongest correlation observed for the third gradient (left hemisphere G<sub>FC</sub>1 r= -0.30, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.05, G<sub>FC</sub>3 r= 0.55, SA-corrected p < 0.001; right hemisphere G<sub>FC</sub>1 r= -0.41, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.008, G<sub>FC</sub>3 r= 0.52, SA-corrected p = 0.017). In contrast, structural connectivity gradients showed no significant correlation with SNR (left hemisphere G<sub>SC</sub>1 r= 0.06, SA-corrected p = 0.82, G<sub>SC</sub>2 r= -0.33, SA-corrected p = 0.01; right hemisphere G<sub>SC</sub>1 r= 0.40, SA-corrected p = 0.28, G<sub>SC</sub>2 r=-0.19, SA-corrected p = 0.31).
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Please add more literature on thalamus gradients and interpret this with care.
  
  Thank you for the suggestion. We have added the following paragraph in the Discussion section:
  
  “In recent years, different works have explored the spatial arrangement of thalamic connectivity within a connectivity gradient framework. Diffusion embedding of thalamocortical functional connectivity has revealed a principal, medio-lateral gradient that was found correlated to thalamic structural subdivisions, and a secondary, antero-posterior gradient associated with thalamic functional subfields, and showing progression from unimodal sensorimotor cortical networks to multimodal attention and associative networks. Interestingly, the principal thalamic gradient shows a medio-lateral arrangement on the pulvinar axis while the secondary gradients correspond more to a ventral-dorsal pulvinar axis (Yang et al. 2020). In particular, further independent investigations have suggested that the progressing pattern of thalamic connectivity from unimodal to transmodal cortices is strongly associated to the local density of core and matrix cell types, thus establishing a link between molecular properties and functional connectivity dynamics (Müller et al. 2020; Huang et al. 2024). Our findings complement and expand the existing literature by revealing a similar arrangement of cortical connectivity patterns on the pulvinar complex, and elucidating its relationship to in-vivo estimates of molecular markers of neurotransmission. We found that the gradient associated to unimodal-transmodal cortical connectivity accounted for the highest percentage of variance of variance in cortico-pulvinar connectivity, in line with its well-acknowledged role of associative nucleus. It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.
  
  As regards structural connectivity, existing accounts describe a medio-lateral organization of thalamocortical connections, corresponding to an antero-posterior gradient on the cortical mantle. This gradient organization appears to be anchored to genetic markers of different cell types (Oldham and Ball 2023). In line with their findings, we describe a principal axis of structural connectivity in the pulvinar complex that is arranged on the mediolateral axis, and we enforce the notion of a deep relationship between structural connections and molecular expression of neurotransmission markers. On the other hand, the patterns of connectivity with the cerebral cortex do not correspond to a clear antero-posterior axis on the cerebral cortex, probably showing the predominance of local connectivity over the global thalamic structural topography. Further investigations are warranted to ascertain whether the structural gradients of the pulvinar complex may be in continuity with this general cortico-thalamic connectivity gradient.”
  
  (2) Please state the motivation of the work more clearly and what makes it different from related literature.
  
  Thank you for pointing us to this lack of clarity. We have added the following paragraph in the Introduction section:
  
  “In particular, investigations of thalamic connectivity within the gradient framework have uncovered general organizational principles within the thalamus, which are partially reflected in thalamic cytoarchitecture subdivisions. These principles have been linked to core and matrix thalamic neuronal subpopulation, and to their differential contribution to large-scale connectivity networks (Müller et al., 2020; Yang et al., 2020). However, given the remarkable functional complexity and diversity of the pulvinar complex, these global spatial organization patterns likely capture only part of its functional topography. With this in mind, isolating pulvinar connectivity from the remaining thalamocortical connectome would ensure that local organizational principles are not obscured by the global connectotopic structure of the entire thalamus.”
  
  (3) Why did the authors opt for a whole brain labelling atlas, would a thalamus-specific atlas not be more suitable?
  
  Despite being a large-scale whole brain atlas, the labeling atlas of choice (AAL3) incorporates a thalamus-specific parcellation from previous work (Iglesias et al., 2018), derived from ex-vivo data and including subdivision of the pulvinar complex into anterior, inferior, lateral and medial nuclei. In the revised version of the manuscript, to validate our findings, we replicated the pulvinar gradient using a different pulvinar delineation from a recent, thalamus-specific atlas (Su et al., 2019). We show these results in Supplementary Figure 1. Notably, the spatial distribution of pulvinar connectivity and coexpression gradients remained consistent, regardless of the choice of the thalamic atlas, underscoring the robustness of our results.
  
  (4) How did the authors account for the potential low sensitivity of subcortical signals in the PET data?
  
  We acknowledge the inherent limitations in spatial sensitivity that are a common drawback of PET imaging. However, the PET data employed in the present study were derived from a high-quality dataset collected across multiple studies, predominantly acquired using high resolution scanners (Hansen et al., 2022; see supplementary material at https://static-content.springer.com/esm/art%3A10.1038%2Fs41593-022-01186-3/MediaObjects/41593_2022_1186_MOESM3_ESM.xlsx for technical details). Furthermore, the reliability of neurotransmission markers measurements at the subcortical level has been validated against genetic transcription markers (Hansen, Markello, et al., 2022; Hansen, Shafiei, et al., 2022), ensuring robust and biologically meaningful results.
  
  (5) What about SNR of the metrics within the pulvinar?
  
  The referee raises a crucial and complex point, prompting us to conduct additional analyses. We recognize the critical importance of assessing the impact of SNR on connectivity measures derived from functional and structural MRI. In the revised manuscript, we have included an additional analysis to investigate the potential impact of local noise on gradient reconstruction. Therefore, we have incorporated the following text into the manuscript:
  
  Results (5. Reliability and Reproducibility):
  
  “To assess the influence of local noise on functional and structural connectivity gradients, we calculated the spatial correlation between gradient values and averaged voxel-wise estimates of signal-to-noise ratio (SNR) from functional and structural MRI data, respectively. We found that functional connectivity gradients are weakly, but significantly correlated with the SNR, with the strongest correlation observed for the third gradient (left hemisphere G<sub>FC</sub>1 r= -0.30, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.05, G<sub>FC</sub>3 r= 0.55, SA-corrected p < 0.001; right hemisphere G<sub>FC</sub>1 r= -0.41, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.008, G<sub>FC</sub>3 r= 0.52, SA-corrected p = 0.017). In contrast, structural connectivity gradients were not significantly associated with SNR (left hemisphere G<sub>SC</sub>1 r= 0.06, SA-corrected p = 0.82, G<sub>SC</sub>2 r= -0.33, SA-corrected p = 0.01; right hemisphere G<sub>SC</sub>1 r= 0.40, SA-corrected p = 0.28, G<sub>SC</sub>2 r=-0.19, SA-corrected p = 0.31) (Supplementary Figure 5).”
  
  Methods (4. Reliability and reproducibility assessment):
  
  “To evaluate the possible influence of SNR on connectivity-derived diffusion embeddings, we have performed a voxel-wise,
  
  modality-specific, SNR assessment to investigate correlation between spatial distribution of noise and diffusion embeddings. For each subject, we separately calculated voxel-wise SNR maps for the left and right pulvinar, using both functional (BOLD) volumes and DWI data. For BOLD volumes, we employed the widely accepted definition of temporal signal to noise (tSNR) (Murphy et al., 2006):
  
  where T<sub>mean</sub> and T<sub>std</sub> are, respectively, the mean and the standard deviation of each voxel’s signal across the time series.
  
  For the DWI data, we applied a similar approach (Cai et al., 2021) that allows estimation of SNR from multiple b=0 diffusion weighted volumes:
  
  where S is the voxel’s signal intensity, and the mean (S<sub>mean</sub>) and standard deviation (S<sub>std</sub>) were computed across all the b0-weighted volumes (18 for HCP dataset; 7 for LEMON dataset). Individual pulvinar SNR maps were then averaged to generate group-level estimates of SNR spatial distribution. The resulting, modality-specific average SNR maps were correlated with the diffusion gradients derived from the corresponding modality, following the same approach described in the previous section (Pearson’s correlation; p-values corrected using spatial null models for spatial autocorrelation, and Benjamini-Hochberg correction for FWE).”
  
  (6) The numbers of the screeplot / numbers in figures are quite small and not so easy to read.
  
  Thank you for highlighting this point. We have fixed this issue in the revised version of the Figures.
  
  (7) How do you know the pulvinar mask is not also picking up on the cortical spinal tract?
  
  To ensure that pulvinar masks did not pick up streamlines from the corticospinal tracts, we performed a thorough visual inspection of the tractograms that were employed for structural connectivity estimation. For each subject-specific tractogram, we randomly subsampled 10000 streamlines after transformation into MNI standard space and summed up these results to generate a group-level tractogram in standard space. The resulting track-density images (Author response image 1) demonstrate only minimal involvement of descending/ascending tracts from/to the brainstem and spinal cord, confirming the specificity of the pulvinar masks.
  
  Author response image 1.
  
  Group-level structural connectivity of the pulvinar complex. Track-density images have been normalized and overlaid on the MNI152 standard template.
  
  (8) There is no mention of the within pulvinar gradients that then are correlated with PET patterns or across gradients are tested to spatial autocorrelation? I believe it is only mentioned for the cortex.
  
  Thanks for providing us with the opportunity to clarify this important aspect, which is mentioned in the Methods section (3. Gradient analysis and statistics):
  
  “To account for the spatial autocorrelation (SA) properties of gradient maps, for all the correlations described, statistical significance was assessed using the permutational approach described in Burt et al. (2020). Briefly, this method takes as input geometric distance matrices for SA estimation and involves the generation of a given number of SA-preserving permuted surrogate maps, which are then employed as nulls to estimate a permutational null distribution of the test statistic (Burt et al. 2020). Pairwise Euclidean distances between left or right pulvinar voxel coordinates were employed for pulvinar null models, while for cortical parcellated connectivity data Euclidean distances were estimated between centroids of each cortical ROI. In both cases, 1000 surrogates were generated to estimate the null distribution. Statistical tests were controlled for false discovery rate (FDR) using Benjamini and Hochberg’s correction.”
  
  However, to enhance readability, we have highlighted this concept in the Results section (3. The unimodal-to-transmodal gradient (G<sub>FC</sub>1) aligns with receptor expression on the dorso-ventral pulvinar axis):
  
  “To take into account the effects of spatial autocorrelation, we corrected the resulting p-values using a method based on SA-preserving spatial null models (Burt et al. 2020)”.
  
  (9) I don't fully understand why the mappings are so patchy of the structural connectivity gradient? Maybe some normalisation went wrong? Other papers on thalamic gradients show smoother patterns.
  
  We thank the Reviewer for the observation. After thoroughly reviewing the related codes, we found no normalization errors. However, we identified a visualization issue, which has been addressed in the revised version. Specifically, the structural gradient representations showed in the figures were based on the averaged values of left and right pulvinar gradients both of which include structural connectivity to either the ipsilateral or contralateral cerebral cortex. Since ipsilateral connectivity is more prominently represented than contralateral connectivity, this led to asymmetric gradient patterns between ipsilateral and contralateral cortical gradients, resulting in a patchy representation when gradients were averaged between left and right pulvinar. To resolve this, we adjusted the visualization by flipping the right pulvinar gradient representations along the x axis, aligning all the ipsilateral cortical connectivity on the left side and all the contralateral connectivity on the right. This adjustment produced smoother, more readable, and interpretable visualizations. Additionally, it allowed the asymmetry between ipsilateral and contralateral connections to be more clearly appreciated.
  
  (10) The final statement of the abstract is misleading as we at this point don't know how making spatial pattern maps in the pulvinar may help understand the role of the pulvinar in health and disease.
  
  We appreciate the Reviewer’s suggestion and have updated the expression accordingly:
  
  “Our findings represent a significant step forward in advancing the understanding of pulvinar anatomy and function, offering an exploratory framework to investigate the role of this structure in both health and disease.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aimed to explore and better understand the complex topographical organization of the human pulvinar, a brain region crucial for various high-order functions such as perception and attention. They sought to move beyond traditional histological subdivisions by investigating continuous 'gradients' of cortical connections along the dorsoventral and mediolateral axes. Using advanced imaging techniques and a comprehensive PET atlas of neurotransmitter receptors, the study aimed to identify and characterize these gradients in terms of structural connections, functional coactivation, and molecular binding patterns. Ultimately, the authors targeted to provide a more nuanced understanding of pulvinar anatomy and its implications for brain function in both healthy and diseased states.
  
  Strengths:
  
  A key strength of this study lies in the authors' effort to comprehensively combine multimodal data, encompassing both functional and structural connectomics, alongside the analysis of major neurotransmitter distributions. This approach enabled a more nuanced understanding of the overarching organizational principles of the pulvinar nucleus within the broader context of whole-brain connectivity. By employing cortex-wide correlation analyses of multimodal embedding patterns derived from 'gradients,' which provide spatial maps reflecting the underlying connectomic and molecular similarities across voxels, the study offers a thorough characterization of the functional neuroanatomy of the pulvinar.
  
  Weaknesses:
  
  Despite its strengths, the current manuscript falls short in presenting the authors' unique perspectives on integrating the diverse biological principles derived from the various neuroimaging modalities. The findings are predominantly reported as correlations between different gradient maps, without providing the in-depth interpretations that would allow for a more comprehensive understanding of the pulvinar's role as a central hub in the brain's network. Another limitation of the study is the lack of clarity regarding the application of pulvinar and its subnuclei segmentation maps to individual brains prior to BOLD signal extraction and gradient reconstruction. This omission raises concerns about the precision and reproducibility of the findings, leaving their robustness less transparently evaluable.
  
  We thank the Reviewer for the valuable comments. While commonalities and discrepancies between structural and functional connectivity have been extensively explored in the literature, the relationship between functional connectivity and modulatory neurotransmission remains poorly understood. Specifically, while the role of thalamic modulatory neurotransmission has been thoroughly investigated in experimental animal models from an electrophysiological perspective, it remains relatively underexplored in the human brain. In our study, we identified significant associations between the spatial distribution of serotonergic, noradrenergic, dopaminergic and mu-opioid systems and functional pulvinar-cortical connectivity to specific functional networks. Evidence from pharmacological challenge studies using resting-state fMRI suggests that these neurotransmission systems may modulate network-specific thalamocortical connectivity directly or influence neural gain in cortico-cortical connectivity, a process partially dependent on thalamocortical connections to associative thalamic nuclei. However, the limitations of spatial and receptor specificity inherent to this approach, coupled with the predominantly correlational nature of our study design, prevented us from drawing more definitive conclusions on the biological relationship between neurotransmitter expression and functional connectivity. As regards the lack of clarity concerning signal extraction, we have now clarified that all the relevant steps of time series extraction were performed in standard space, without any further registration to individual subjects.
  
  Reviewer #2 (Recommendations for the authors):
  
  In line with the weaknesses that I raised above, my recommendation to authors are two-fold:
  
  (1) Please provide readers with a more holistic viewpoint to better digest all the correlation analyses. For instance, in p18, the summary says:
  
  "G<sub>FC</sub>1, GRC1, and G<sub>SC</sub>2 substantially delineate multiscale differences between the ventral and dorsal aspects of the pulvinar. Moving along the ventral-dorsal axis of the pulvinar complex, more ventral regions showed higher functional connectivity to unimodal sensory processing networks, higher levels of 5HTT and NAT expression, and preferentially higher structural connectivity to modality-independent or low-level sensory processing cortices."
  
  We already knew somehow the existence of the dorsoventral axis in the pulvinar, as the authors already specified in the introduction. Beyond this simple report on phenomenological observation, one may provide a more integrated discussion to pinpoint what commonality or discrepancy the GFC, GRC, and GSC map show and potential common principles explaining their biological relationship (e.g., the 5HTT and NAT's high expression and functional connectivity). Such digested perspectives will grant the study unique insights into the functional system of the pulvinar.
  
  We have expanded on this topic in the Discussion section (Neurochemical correlates of pulvinar-cortical topographical organization) as follows:
  
  “Indeed, while commonalities and discrepancies between structural and functional connectivity have been extensively investigated, the relationship between functional connectivity and modulatory neurotransmission remains poorly understood. Our findings reveal stronger associations between pulvinar-cortical connectivity to specific functional networks and the spatial distribution of markers of serotonergic, noradrenergic, dopaminergic and opioid systems. Pharmacological challenge studies using resting-state functional MRI suggest that each of these neurotransmission systems may either directly modulate thalamocortical connectivity or influence neuronal gain in cortico-cortical functional connectivity, which is known to depend, in part, on cortical connections to associative thalamic nuclei, including the pulvinar.”
  
  (2) Specify the details if there was a QC procedure to check the signal extraction from the pulvinar subnuclei by applying the segmentation atlas at each individual.
  
  Preprocessed BOLD volumes were available in standard-space, and time series were extracted for each voxel within a standard-space mask of the pulvinar complex. All volumes underwent visual inspection to ensure the accuracy of the registration process. Regarding the pulvinar subnuclei, these structures were not segmented at the individual level.
  
  Reviewer #3 (Public review):
  
  Summary of the Study:
  
  The authors investigate the organization of the human pulvinar by analyzing DWI, fMRI, and PET data. The authors explore the hypothesis of the "replication principle" in the pulvinar.
  
  Strengths and Weaknesses of the Methods and Results:
  
  The study effectively integrates diverse imaging modalities to provide a view of the pulvinar's organization. The use of analysis techniques, such as diffusion embedding-driven gradients combined with detailed interpretations of the pulvinar, is a strength.
  
  Even though the study uses the best publicly available resolution possible with current MR-technology, the pulvinar is densely packed with many cell bodies, requiring even higher spatial resolution. In addition, the model order selection of gradients may vary with the acquired data quality. Therefore, the pulvinar's intricate organization needs further exploration with even higher spatial resolution to capture gradients closer to the biological organization of the pulvinar.
  
  Appraisal of the Study's Aims and Conclusions:
  
  The authors delineate the gradient organization of the pulvinar. The study provides a basis for understanding the pulvinar's role in mediating brain network communication.
  
  Impact and Utility of the Work:
  
  This work contributes to the field by offering insights into pulvinar organization.
  
  We thank the Reviewer for their positive assessment and constructive feedback. The Authors agree with the Reviewer that the spatial resolution of currently available in-vivo imaging methods is limited, and that gradient representation would indeed benefit from higher resolution data. However, we also note that the resolution of structural and functional volumes used in our study is consistent with existing literature on pulvinar connectivity. Additionally, the PET data employed in our work include multi-centric studies collected worldwide from healthy populations, and are primarily acquired using high-resolution scanners that allow spatial resolution up to 2 mm<sup>2</sup>. Notwithstanding, further investigations employing finer resolution imaging techniques, such as ultra-high field fMRI, may provide more detailed insights into pulvinar topographical organization at a finer scale.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) The HCP data contains genetically related datasets. Please mention whether the data-selection criteria for the selected 210 healthy subjects followed the genetically unrelated criteria.
  
  The HCP sample employed in this study consists of an initial cohort of 100 unrelated subjects, as provided in the HCP database, along with an additional random sample of 110 subjects. Subjects were selected without following a genetic criterion, as the family structure of the HCP dataset was part of a restricted access subset that we did not have access to at the time of processing. Subsequently, we obtained access to this information and determined that 178 out of 210 subjects (85%) are genetically unrelated. Of the remaining, genetically related subjects, 22 (~10% of the total sample) were included with another subject from the same family group (11 pairs); 6 (3%) were included with two other family members (2 triplets) and 4 (2%) were all parts of the same family group. This information has been included in the Methods section for clarity.
  
  (2) The study uses HCP data with an fMRI resolution of 2mm isotropic and diffusion MRI with 1.25mm. Additionally, the LEMON dataset includes 1.7mm isotropic DWI data and fMRI with 2.3mm isotropic resolution. Furthermore, the available PET data from the Hansen et al. 2022b study has a rather coarser spatial resolution. Therefore, it may be important to mention in the discussion that the pulvinar is densely packed with cell bodies and that their gradient organization might be better reflected with even higher spatial resolution or improved measurement techniques used in the study.
  
  We have revised the conclusive section of the Discussion into a paragraph title “Future perspectives and limitations”, and added the following text:
  
  “One notable limitation of this study lies in the relatively small size of the pulvinar complex compared to other larger cortical or subcortical structures. The high cellular density of the pulvinar poses a challenge for the relatively coarse resolution of currently available imaging techniques. Although the generally high quality of both the main and validation datasets, including rs-fMRI data (Uǧurbil et al. 2013; Babayan et al. 2019), align with current standards for imaging investigations of pulvinar connectivity, higher-resolution imaging approaches may offer more granular insights. Advanced techniques, such as ultra-high-field fMRI, hold promise for uncovering the fine-scale topographical organization of the pulvinar complex.”
  
  (3) The functional multiplicity of the Pulvinar nuclei among other thalamus nuclei is also illustrated in https://doi.org/10.1038/s42003-022-04126-w
  
  We thank the Reviewer for suggesting this important reference. We have added the following text in the Discussion section:
  
  “It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.”
  
  (4) In addition to DWI/DSI and PET, the study also uses fMRI, which allows for functional interaction in time. It may be worth reflecting in the discussion that the observed gradient organization of the pulvinar could have detailed aspects in the temporal domain, which might not be fully captured in the time-averaged embeddings.
  
  We thank the Reviewer for their insightful observation. The authors recognize that the exploration of brain temporal dynamics is a compelling area of research due to its extensive correlation with multiple hierarchical aspects of brain information processing. Examining the functional organization of the pulvinar complex lies beyond the scope of the present work and will be subject of further investigation. On the other hand, it is possible that certain aspects of the spatial organization of pulvinar connectivity may be influenced by temporal dynamics of cortico-thalamic information processing. Intrinsic timescales have been consistently showed to progressively increase from unimodal to multimodal associative cortical regions. Furthermore, cortico-thalamic connectivity in matrix-rich regions has been correlated with cortical time scales.
  
  To address this point, we have added the following lines to the Discussion section:
  
  “In this context, it could be hypothesized that the observed gradient organization of the pulvinar may also exhibit specific patterns in the temporal domain. Indeed, multiple investigations have linked the temporal dynamics of cortical regions to different aspects of information processing (Rossi-Pool et al., 2021; Soltani et al., 2021). Notably, intrinsic neural timescales of functional activity have been associated with the functional specialization and gradient organization of the cerebral cortex (Golesorkhi et al., 2021), with shorter timescales in unimodal sensory regions and longer ones in transmodal networks (Ito et al., 2020; Murray et al., 2014). Moreover, thalamocortical connectivity has been showed to correlate with these patterns of intrinsic time scale (Müller et al., 2020). In addition, modulatory neurotransmitters such as serotonin and dopamine have been demonstrated to play a significant role in modulating functional cortical dynamics across different timescales (Hansen, Shafiei, et al., 2022; Luppi et al., 2023). Exploring how the spatial organization of the pulvinar relates to temporal dynamics and timescale modulation could provide valuable insights and represents a promising avenue for future investigations.”
  
  (5) The K-means clustering (Supplementary Figure 1) used has limitations, particularly with respect to the structure of the data. Another aspect is the reproducibility of the model-order selection. Did the reliability and reproducibility assessment produce a similar number of clusters with the LEMON data as with the HCP data?
  
  We acknowledge the limitations of k-means clustering, particularly regarding the stability and reproducibility of the model order. To address the concerns, we iteratively ran the clustering algorithm 50 times on bootstrap resamples to enhance the stability of the silhouette score estimates. In addition, we have now replicated the analysis on the secondary dataset, as suggested by the Reviewer (Author response image 2). The Silhouette plots show similar number of clusters between the two different datasets for functional connectivity gradients, with minor differences observed in the results for structural connectivity gradients and multimodal gradient clustering. Notably, we did not find high a high degree of similarity between the results of gradient clustering and histologically defined nuclei, further underscoring the distinct organizational patterns identified through our analysis.
  
  This reinforces the relevance of using gradient-based approaches to reveal insights into the functional and structural organization of the pulvinar complex that may not align strictly with discrete, histologically defined subdivisions.
  
  Author response image 2.
  
  K-means clustering of pulvinar gradients on the secondary dataset (LEMON) and their correspondence with histological pulvinar nuclei. Panels on the left show the silhouette plots for left and right pulvinar clustering solutions; error bars are standard error calculated across 50 resamples. Panels on the right show matrix plots of Dice similarity coefficients for pulvinar clusters against histological nuclei (AAL3 atlas). INF: inferior; ANT: anterior; LAT: lateral; MED: medial.
  
  (6) The pulvinar correlates of the unimodal-transmodal cortical gradient (Figure 4) show an association with almost the entire brain (Figure 4C, violin plot). It would be interesting to back this association with known anatomical connectivity studies in animals that show connections to these network areas. To my limited knowledge, I am not aware of pulvinar tracer studies showing such extensive connectivity across the entire cortex.
  
  As our structural connectivity estimates are based on tractography, they are subject to the known limitation of potentially overestimating anatomical connectivity. A technical clarification is warranted: since structural connectivity is grouped by networks, it is strongly influenced by connections to specific cortical regions within each network. This explains the uneven and asymmetric distribution of structural gradient-weighted connectivity observed in our results and does not imply widespread connectivity across the entire cortex.
  
  Nonetheless, structural connectivity of the pulvinar to cortical regions in primates encompasses a remarkably broad array of cortical areas, including predominantly occipital (Adams et al., 2000; Benevento, 1976; Casanova et al., 1989), temporal (Berman & Wurtz, 2010; Gattass et al., 2018; Homman-Ludiye et al., 2020) and parietal cortices (Asanuma et al., 1985; Baleydier & Morel, 1992). Additionally, to a more limited extent, connections to the cingulate gyrus, and portions of the lateral prefrontal cortex have also been documented (Baleydier & Mauguiere, 1985; Baleydier & Mauguire, 1987). These connectivity patterns are in line with prior accounts of structural connectivity of the human pulvinar (Arcaro et al., 2015; Basile et al., 2021; Leh et al., 2008; Tamietto et al., 2012), and with the patterns identified in our work (Author response image 1). Such findings provide further validation of the structural connectivity profiles explored in the present study.
  
  References
  
  Adams, M. M., Hof, P. R., Gattass, R., Webster, M. J., & Ungerleider, L. G. (2000). Visual cortical projections and chemoarchitecture of macaque monkey pulvinar. The Journal of Comparative Neurology, 419(3), 377–393. https://doi.org/10.1002/(SICI)1096-9861(20000410)419:3<377::AID-CNE9>3.0.CO;2-E
  
  Arcaro, M. J., Pinsk, M. A., & Kastner, S. (2015). The anatomical and functional organization of the human visual pulvinar. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.1575-14.2015
  
  Asanuma, C., Andersen, R. A., & Cowan, W. M. (1985). The thalamic relations of the caudal inferior parietal lobule and the lateral prefrontal cortex in monkeys: Divergent cortical projections from cell clusters in the medial pulvinar nucleus. Journal of Comparative Neurology, 241(3), 357–381. https://doi.org/10.1002/cne.902410309
  
  Baleydier, C., & Mauguiere, F. (1985). Anatomical evidence for medial pulvinar connections with the posterior cingulate cortex, the retrosplenial area, and the posterior parahippocampal gyrus in monkeys. Journal of Comparative Neurology. https://doi.org/10.1002/cne.902320207
  
  Baleydier, C., & Mauguiere, F. (1987). Network organization of the connectivity between parietal area 7, posterior cingulate cortex and medial pulvinar nucleus: A double fluorescent tracer study in monkey. Experimental Brain Research, 66(2). https://doi.org/10.1007/BF00243312
  
  Baleydier, C., & Morel, A. (1992). Segregated thalamocortical pathways to inferior parietal and inferotemporal cortex in macaque monkey. Visual Neuroscience, 8(5), 391–405. https://doi.org/10.1017/S0952523800004922
  
  Basile, G. A., Bertino, S., Bramanti, A., Anastasi, G. P., Milardi, D., & Cacciola, A. (2021). In Vivo Super-Resolution Track-Density Imaging for Thalamic Nuclei Identification. Cerebral Cortex. https://doi.org/10.1093/cercor/bhab184
  
  Benevento. (1976). The Cortical Projections of the Inferior Pulvinar and Adjacent Lateral Pulvinar in the Rhesus Monkey ( Macaca. October, 108, 1–24.
  
  Berman, R. A., & Wurtz, R. H. (2010). Functional Identification of a Pulvinar Path from Superior Colliculus to Cortical Area MT. The Journal of Neuroscience, 30(18), 6342–6354. https://doi.org/10.1523/JNEUROSCI.6176-09.2010
  
  Cai, L. Y., Yang, Q., Hansen, C. B., Nath, V., Ramadass, K., Johnson, G. W., Conrad, B. N., Boyd, B. D., Begnoche, J. P., Beason-Held, L. L., Shafer, A. T., Resnick, S. M., Taylor, W. D., Price, G. R., Morgan, V. L., Rogers, B. P., Schilling, K. G., & Landman, B. A. (2021). PreQual: An automated pipeline for integrated preprocessing and quality assurance of diffusion weighted MRI images. Magnetic Resonance in Medicine, 86(1), 456. https://doi.org/10.1002/mrm.28678
  
  Casanova, C., Freeman, R. D., & Nordmann, J. P. (1989). Monocular and binocular response properties of cells in the striate-recipient zone of the cat’s lateral posterior-pulvinar complex. Journal of Neurophysiology. https://doi.org/10.1152/jn.1989.62.2.544
  
  Gattass, R., Soares, J. G. M., & Lima, B. (2018). Comparative Pulvinar Organization Across Different Primate Species (pp. 37–37). https://doi.org/10.1007/978-3-319-70046-5_8
  
  Golesorkhi, M., Gomez-Pilar, J., Tumati, S., Fraser, M., & Northoff, G. (2021). Temporal hierarchy of intrinsic neural timescales converges with spatial core-periphery organization. Communications Biology, 4(1), 277. https://doi.org/10.1038/s42003-021-01785-z
  
  Hansen, J. Y., Markello, R. D., Tuominen, L., Nørgaard, M., Kuzmin, E., Palomero-Gallagher, N., Dagher, A., & Misic, B. (2022). Correspondence between gene expression and neurotransmitter receptor and transporter density in the human brain. NeuroImage, 264, 119671. https://doi.org/10.1016/j.neuroimage.2022.119671
  
  Hansen, J. Y., Shafiei, G., Markello, R. D., Smart, K., Cox, S. M. L., Nørgaard, M., Beliveau, V., Wu, Y., Gallezot, J.-D., Aumont, É., Servaes, S., Scala, S. G., DuBois, J. M., Wainstein, G., Bezgin, G., Funck, T., Schmitz, T. W., Spreng, R. N., Galovic, M., … Misic, B. (2022). Mapping neurotransmitter systems to the structural and functional organization of the human neocortex. Nature Neuroscience, 25(11), 1569–1581. https://doi.org/10.1038/s41593-022-01186-3
  
  Homman-Ludiye, J., Mundinano, I. C., Kwan, W. C., & Bourne, J. A. (2020). Extensive Connectivity Between the Medial Pulvinar and the Cortex Revealed in the Marmoset Monkey. Cerebral Cortex, 30(3), 1797–1812. https://doi.org/10.1093/cercor/bhz203
  
  Iglesias, J. E., Insausti, R., Lerma-Usabiaga, G., Bocchetta, M., Van Leemput, K., Greve, D. N., van der Kouwe, A., Fischl, B., Caballero-Gaudes, C., & Paz-Alonso, P. M. (2018). A probabilistic atlas of the human thalamic nuclei combining ex vivo MRI and histology. NeuroImage, 183, 314–326. https://doi.org/10.1016/j.neuroimage.2018.08.012
  
  Ito, T., Hearne, L. J., & Cole, M. W. (2020). A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales. NeuroImage, 221, 117141. https://doi.org/10.1016/j.neuroimage.2020.117141
  
  Kumar, V. J., Beckmann, C. F., Scheffler, K., & Grodd, W. (2022). Relay and higher-order thalamic nuclei show an intertwined functional association with cortical-networks. Communications Biology, 5(1), 1–17. https://doi.org/10.1038/s42003-022-04126-w
  
  Kumar, V. J., van Oort, E., Scheffler, K., Beckmann, C. F., & Grodd, W. (2017). Functional anatomy of the human thalamus at rest. NeuroImage, 147, 678–691. https://doi.org/10.1016/j.neuroimage.2016.12.071
  
  Leh, S. E., Chakravarty, M. M., & Ptito, A. (2008). The Connectivity of the Human Pulvinar: A Diffusion Tensor Imaging Tractography Study. International Journal of Biomedical Imaging, 2008, 1–5. https://doi.org/10.1155/2008/789539
  
  Luppi, A. I., Hansen, J. Y., Adapa, R., Carhart-Harris, R. L., Roseman, L., Timmermann, C., Golkowski, D., Ranft, A., Ilg, R., Jordan, D., Bonhomme, V., Vanhaudenhuyse, A., Demertzi, A., Jaquet, O., Bahri, M. A., Alnagger, N. L. N., Cardone, P., Peattie, A. R. D., Manktelow, A. E., … Stamatakis, E. A. (2023). In vivo mapping of pharmacologically induced functional reorganization onto the human brain’s neurotransmitter landscape. Science Advances, 9(24), eadf8332. https://doi.org/10.1126/sciadv.adf8332
  
  Müller, E. J., Munn, B., Hearne, L. J., Smith, J. B., Fulcher, B., Arnatkevičiūtė, A., Lurie, D. J., Cocchi, L., & Shine, J. M. (2020). Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222, 117224. https://doi.org/10.1016/j.neuroimage.2020.117224
  
  Murphy, K., Bodurka, J., & Bandettini, P. A. (2006). How long to scan? The relationship between fMRI temporal signal to noise and necessary scan duration. NeuroImage, 34(2), 565. https://doi.org/10.1016/j.neuroimage.2006.09.032
  
  Murray, J. D., Bernacchia, A., Freedman, D. J., Romo, R., Wallis, J. D., Cai, X., Padoa-Schioppa, C., Pasternak, T., Seo, H., Lee, D., & Wang, X.-J. (2014). A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience, 17(12), 1661–1663. https://doi.org/10.1038/nn.3862
  
  Oldham, S., & Ball, G. (2023). A phylogenetically-conserved axis of thalamocortical connectivity in the human brain. Nature Communications, 14(1), 6032. https://doi.org/10.1038/s41467-023-41722-8
  
  Rolls, E. T., Huang, C.-C., Lin, C.-P., Feng, J., & Joliot, M. (2020). Automated anatomical labelling atlas 3. NeuroImage, 206, 116189. https://doi.org/10.1016/j.neuroimage.2019.116189
  
  Rossi-Pool, R., Zainos, A., Alvarez, M., Parra, S., Zizumbo, J., & Romo, R. (2021). Invariant timescale hierarchy across the cortical somatosensory network. Proceedings of the National Academy of Sciences, 118(3), e2021843118. https://doi.org/10.1073/pnas.2021843118
  
  Shipp, S. (2003). The functional logic of cortico-pulvinar connections. Philosophical Transactions of the Royal Society B: Biological Sciences, 358(1438), 1605–1624. https://doi.org/10.1098/rstb.2002.1213
  
  Soltani, A., Murray, J. D., Seo, H., & Lee, D. (2021). Timescales of cognition in the brain. Current Opinion in Behavioral Sciences, 41, 30–37. https://doi.org/10.1016/j.cobeha.2021.03.003
  
  Su, J. H., Thomas, F. T., Kasoff, W. S., Tourdias, T., Choi, E. Y., Rutt, B. K., & Saranathan, M. (2019). Thalamus Optimized Multi Atlas Segmentation (THOMAS): Fast, fully automated segmentation of thalamic nuclei from structural MRI. NeuroImage, 194, 272–282. https://doi.org/10.1016/j.neuroimage.2019.03.021
  
  Tamietto, M., Pullens, P., de Gelder, B., Weiskrantz, L., & Goebel, R. (2012). Subcortical Connections to Human Amygdala and Changes following Destruction of the Visual Cortex. Current Biology, 22(15), 1449–1455. https://doi.org/10.1016/j.cub.2012.06.006
  
  Yang, S., Meng, Y., Li, J., Li, B., Fan, Y.-S., Chen, H., & Liao, W. (2020). The thalamic functional gradient and its relationship to structural basis and cognitive relevance. NeuroImage, 218, 116960. https://doi.org/10.1016/j.neuroimage.2020.116960
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.07.11.603063v2
www.biorxiv.org www.biorxiv.org

Unraveling the Power of NAP-CNB's Machine Learning-enhanced Tumor Neoantigen Prediction

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reply to Reviewer #1 (Public Review):
  
  The post-processing increases number of putative neoantigens. As shown in Author response image 1, this is done through data augmentation or “mutations” of individual amino acids in a sequence by their most similar amino acid in the BLOSUM62 embedding. If most of the mutations result in a positive prediction (which we binarize through a >0.5 score) the sequence changes its prediction.
  
  Author response image 1.
  
  Post-processing pipeline to increase the number of putative neoantigens. Sequences can either be predicted using the forward method, for which a raw score is produced, or it can be introduced to a majority-vote prediction of the ensemble prediction of similar protein sequences.
  
  In this article, we obtain the following candidates after post-processing.
  
  Author response table 1.
  
  Sequence Symbol Gene Prediction FPKM
  
  As mentioned, the prediction column shows a binary label. The full list contained 402 sequences did not include any other sequences that met the majority vote criteria.
  
  As noted by the reviewer, the Table 3 of our original paper includes the scores of the direct prediction, which has four sequences in common with the post-processing criteria (*Pnp, *Adar, *Lrrc28 and *Nr1h2). * indicates the mutated form of the peptide, i.e neoantigen.
  
  We selected the top 4 predicted antigens (present both by direct prediction and after post-processing; (*Pnp, *Adar, *Lrrc28 and *Nr1h2) (Wert-Carvajal et al. 2021), but we encountered difficulty in synthesizing, *Nr1h2 (Mutated Nr1h2), and thus it could not be included in the study.
  
  We also decided to evaluate the immunogenicity of *Wiz, which was identified as a potential TNA only after postprocessing. *Wiz exhibited lower levels of immunogenicity compared to *Pnp, *Adar, and *Lrrc28. However, unlike these, *Wiz is highly expressed in the tumor, and vaccination with *Wiz provided the strongest protection levels. These findings led us to incorporate post-processingg into the NAPCNB platform.
  
  We chose *Herc6 as a mutated antigen predicted not to be a TNA over other candidates because its expression in the tumor was similar to that of *Wiz.
  
  Depending on the experiment we used 4 or 5 animals per group (this is now clarified in the revised version).
  
  The software used for statistical analysis was GraphPad Prism.
  
  Reply to Reviewer #2 (Public Review):
  
  This is true, binding affinity does not always predict immune responses but in most cases, high affinity peptides are immunogenic. There are of course other parameters that drive the effective priming of tumor-reactive CD8+ T cells through antigen crosspresentation, but the mechanisms of antigen presentation are yet not completely understood. High affinity peptides are desirable as good candidates in neoantigen-based vaccines.
  
  Other comments of the reviewers
  
  Reviewer #1 (Recommendations For The Authors):
  
  - Please decipher all abbreviations when they appear for the first time, e.g. NAP-CNB, PBS, CFA, FIA, and so on.
  
  Done in the revised version.
  
  - Please be consistent with the capitalization of gene names (WIZ vs Wiz, TRP2 vs Trp2, and so on), and why there is an asterisk.
  
  Done in the revised version.
  
  - Please be clear about where you use cell lines or mice as a model. It's not clear.
  
  All work is done in mice, or cells isolated from vaccinated mice.
  
  - Why there is an asterisk in front of gene names?
  
  Explained in the revised version; The * indicates the peptides that are the mutated version.
  
  - Please add a reference for the following statement in the Introduction: "However, the response rates of these therapies remain low and relapses are common."
  
  Done in the revised version.
  
  - Also please add a reference for the use of TRP2 as a positive control.
  
  Done in the revised version.
  
  Reviewer #2 (Recommendations For The Authors):
  
  - It may be helpful to validate a larger pool of antigens. This is not necessary however and could be done in a follow-up study.
  
  We are doing it for other studies with excellent results.
  
  - The negative PBS control should be included in Figure 1.
  
  Done in the modified figure 1C in the revised version.
  
  - Stats should be clearly indicated in Figure 2.
  
  Done in the revised version.
  
  - Some nuances should be discussed. Is a threshold of neoantigen expression required or is there a correlation with tumor control? On the flip side, these neoantigens that are not likely to elicit immune responses but are highly expressed are also not likely to mediate tumor control.
  
  These points have been discussed. Based on our data, strategies for designing antitumor therapies should prioritize antigens that are highly expressed in tumors, even if they are not the most immunogenic. However, it is worth noting that even low-expressed antigens can still elicit an antitumor immune response. If possible one should define strategies attacking multiple antigens in order to minimize tumor scape. Whenever possible, strategies should be developed to target multiple antigens simultaneously, aiming to minimize tumor escape.
  
  - This study focuses on CD8 T cell responses but CD4s are also important in tumor control. This could be mentioned in the discussion.
  
  This is true, but this article focuses on validating a platform that predicts the antigenicity of antigens presented in the context of MHC-I.
  
  - Ideally, we would want to see that these responses are not elicited with adjuvant alone as an additional control.
  
  The non-vaccinated control animals received PBS and adjuvant. This clarification has now been included in the text.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.22.568042v2
www.biorxiv.org www.biorxiv.org

Three-dimensional single-cell transcriptome imaging of thick tissues

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The study by Fang et al. reports a 3D MERFISH method that enables spatial transcriptomics for tissues up to 200um in thickness. MERFISH, as well as other spatial transcriptomics technologies, have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spatial transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have a major impact on future spatial transcriptomics studies to benefit diverse biomedical fields.
  
  Strengths:
  
  The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed, and the results are solid and compelling.
  
  Response: We thank the reviewer for the positive comments on our manuscript.
  
  Weaknesses:
  
  The biological application examples were limited to cell type/subtype classification in two brain regions. Additional examples of how the data could be used to address important biological questions will enhance the impact of the study.
  
  We appreciate the reviewer's suggestion that demonstrating the broader applications of our thick-tissue 3D MERFISH method to address important biological questions would enhance the impact of our study. In line with the reviewer's feedback, we have included discussions on how this method could be applied to address various biological questions in the summary (last) paragraph of our manuscript. These discussions highlight the versatility and utility of our approach in studying diverse biological processes beyond cell type classification.
  
  However, the goal of this work is to develop a method and establish its validity. While we are interested in applying it to addressing important biological questions in the future, we consider these applications beyond the scope of this work.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In their preprint, Fang et al present data on extending a spatial transcriptomics method, MERFISH, to 3D using a spinning disc confocal. MERFISH is a well-established method, first published by Zhuang's lab in 2015 with multiple follow-up papers. In the last few years, MERFISH has been used by multiple groups working on spatial transcriptomics, including approximately 12 million cell maps measured in the mouse brain atlas project. Variants of MERFISH were used to map epigenetic information complementary to gene expression and RNA abundance. However, MERFISH was always limited to thin ~10um sections to this date.
  
  The key contribution of this work by Fang et al. was to perform the optimization required to get MERFISH working in thick (100-200um) tissue sections.
  
  Major strengths and weaknesses:
  
  Overall the paper presents a technical milestone, the ability to perform highly multiplexed RNA measurements in 3D using MERFISH protocol. This is not the first spatial transcriptomics done in thick sections. Wang et al. 2018 - StarMAP used thick sections (150 um), and recently, Wang 2021 (EASI-FISH, not cited) performed serial HCR FISH on 300um sections. Data so far suggest that MERFISH has better sensitivity than in situ sequencing approaches (StarMAP) and has built-in multiplexing that EASI-FISH lacks. Therefore, while there is an innovation in the current work, i.e., it is a technically challenging task, the novelty, and overall contribution are modest compared to recently published work.
  
  The authors could improve the writing and the manuscript text that places their work in the right context of other spatial transcriptomics work. Out of the 25 citations, 12 are for previous MERFISH work by Zhuang's lab, and only one manuscript used a spatial transcriptomics approach that is not MERFISH. Furthermore, even this paper (Wang et al, 2018) is only discussed in the context of neuroanatomy findings. The fact that Wang et al. were the first to measure thick sections is not mentioned in the manuscript. The work by Wang et al. 2021 (EASI-FISH) is not cited at all, as well as the many other multiplexed FISH papers published in recent years that are very relevant. For example, a key difference between seqFISH+ and MERFISH was the fact that only seqFISH+ used a confocal microscope, and MERFISH has always been relying on epi. As this is the first MERFISH publication to use confocal, I expect citations to previous work in seqFISH and better discussions about differences.
  
  We thank the reviewer for recognizing our work as a technical milestone. Since the aim of this work is to build upon the strengths of MERFISH and address some of its limitations, we primarily cited previous MERFISH papers to clarify the specific improvements made in this work. Given the rapid growth of the spatial omics field, it has become impractical to comprehensively cite all method development papers. Instead, we cited a 2021 review article in the first sentence of the originally submitted manuscript and limited all discussions afterwards to MERFISH. In light of this reviewer’s suggestion to more broadly cite spatial transcriptomics work, we added two additional review articles on spatial omics. Spatial omics methods primarily include two categories: 1) imaging-based methods and 2) next-generation-sequencing based methods. The 2021 review article [Zhuang, Nat Methods 18,18–22 (2021)) included in the originally submitted manuscript is focused on imaging-based methods. The additional 2021 review article [Larsson et al., Nat Methods 18, 15–18 (2021)] that we now included in the revised manuscript is focused on next-generation-sequencing based methods. We also added a more recent review article published in 2023 [Bressan et al., Science 381:eabq4964 (2023)], which covers both categories of methods and include more recent technology developments. All three review articles are now cited in parallel in the first introductory paragraph of the manuscript.
  
  Although we presented our work as an advance in MERFISH specifically, we do consider the reviewer’s suggestion of citing the 2018 STARmap paper [Wang et al., Science 361, eaat5961 (2018)] in the introduction part of our manuscript reasonable. This STARmap paper was already cited in the results part of our originally submitted manuscript, and we have now described this work in the introduction part of our revised manuscript (third paragraph), as this paper was the first to demonstrate 3D in situ sequencing in thick tissues. In addition, we thank the reviewer for bringing to our attention the EASI-FISH paper [Wang et al, Cell 184, 6361-6377 (2021)], which reported a method for thick-tissue FISH imaging and demonstrated imaging of 24 genes using multiple rounds of multi-color FISH imaging. We also recently became aware of a paper reporting 3D imaging of thick samples using PHYTOMap [Nobori et al, Nature Plants 9, 10261033 (2023)]. This paper, published a few days after we submitted our manuscript to eLife, demonstrated imaging of 28 genes in thick plant samples using multiple rounds of multicolor FISH and the probe targeting and amplification methods previously developed for in situ sequencing. We also included these two papers in the introduction section of our revised manuscript (third paragraph). In addition, we also expanded the discussion paragraph (last paragraph) of the manuscript to discuss these thick tissue imaging methods in more details, and in the same paragraph, we also included discussions on two recent bioRxiv preprints in thicktissue transcriptomic imaging [Gandin et al., bioRxiv, doi:10.1101/2024.05.17.594641 (2024); Sui et al., bioRxiv, doi:10.1101/2024.08.05.606553 (2024)]
  
  However, we do not consider our use of confocal imaging in this work an advance in MERFISH because confocal microscopy, like epi-fluorescence imaging, is a commonly used approach that could be applied to MERFISH of thin tissues directly without any alteration of the protocol. Confocal imaging has been broadly used for both DNA and RNA FISH before any genomescale imaging was reported. Confocal and epi-imaging geometries have their distinct advantages, and which of these imaging geometries to use is the researcher’s choice depending on instrument availability and experimental needs. Thus, we do not find it necessary to cite specific papers just for using confocal imaging in spatial transcriptomic profiling. Our real advance related to confocal imaging is the use of machine-learning to increase the imaging speed. Without this improvement, 3D imaging of thick tissue using confocal would take a long time and likely degrade image quality due to photobleaching of out-of-focus fluorophores before they are imaged. We thus cited several papers that used deep learning to improve imaging quality and/or speed [(Laine et al., International Journal of Biochemistry & Cell Biology 140:106077 (2021); Ouyang et al., Nat Biotechnol 36:460–468 (2018); Weigert et al., Nat Methods 15:1090–1097 (2018)] in our original submission. Our unique contribution is the combination of machine learning with confocal imaging for 3D multiplexed FISH imaging of thick tissue samples, which had not been demonstrated previously.
  
  To get MERFISH working in 3D, the authors solved a few technical problems. To address reduced signal-to-noise due to thick samples, Fang et al. used non-linear filtering (i.e., deep learning) to enhance the spots before detection. To improve registrations, the authors identified an issue specific to their Z-Piezo that could be improved and replaced with a better model. Finally, the author used water immersion objectives to mitigate optical aberrations. All these optimization steps are reasonable and make sense. In some cases, I can see the general appeal (another demonstration of deep learning to reduce exposure time). Still, in other cases, the issue is not necessarily general enough (i.e., a different model of Piezo Z stage) to be of interest to a broad readership. There were a few additional optimization steps, i.e., testing four concentrations of readout and encoder probes. So while the preprint describes a technical milestone, achieving this milestone was done with overall modest innovation.
  
  We appreciate the reviewer's recognition of the technical challenges we have overcome in developing this 3D thick-tissue MERFISH method. To achieve high-quality thick- tissue MERFISH imaging, we had to overcome multiple different challenges. We agree with the reviewer that the solutions to some of the above challenges are intellectually more impressive than the remaining ones that required relatively more mundane efforts. However, all of these are needed to achieve the overall goal, a goal that is considered a milestone by the reviewer. We believe that the impact of a method should be evaluated based on its capabilities, potential applications, and its adaptability for broader adoption. In this regard, we anticipate that our reported method will be valuable and impactful contribution to the field of spatial biology.
  
  Data and code sharing - the only link in the preprint related to data sharing sends readers to a deleted Dropbox folder. Similarly, the GitHub link is a 404 error. Both are unacceptable. The author should do a better job sharing their raw and processed data. Furthermore, the software shared should not be just the MERlin package used to analyze but the specific code used in that package.
  
  We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process. We have now made all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).
  
  The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin v2.2.7 package itself, we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).
  
  Recommendations For The Authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) It will be good to expand the application section to demonstrate the utility of 3D MERFISH to address diverse types of biological questions for the two brain regions examined. At present, it only examined the localization of various cell clusters in the tissues. Can it be used to examine both short and long-range interactions, for example?
  
  We appreciate the reviewer's feedback and agree that demonstrating the broader applications of our 3D thick-tissue MERFISH imaging method in addressing diverse biological questions would enhance the impact of our study.
  
  In line with the reviewer’s comments, one of the analyses we performed in the manuscript was examining short-range interactions based on soma contact between adjacent neurons in the two brain regions studied (see third-to-last and second-to-last paragraphs of the Main text). This analysis provided insights into the spatial organization of inhibitory neurons and potential interactions between the same type of interneurons in these brain regions.
  
  Although long-range interactions, for example synaptic interactions between neurons, would be of great interest, our current 3D MERFISH measurements does not allow such interactions to be determined. Future research to enable measurements of synaptic interactions between molecularly defined neuronal subtypes would be interesting, but we consider this to be out of the scope of the current study.
  
  (2) For the nearest neighbor distance analysis in Figure 3, the method seems to be missing. Please add details about this analysis to allow better understanding. It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes. Please explain.
  
  We apologize for the missing the nearest neighbor distance analysis in the Materials and Methods section. We have added the detailed description of this analysis to the Materials and Methods section of the revised manuscript (last subsection of Materials and Methods).
  
  Regarding the comment “It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes”, this is not necessarily counter-intuitive given how we defined nearest-neighbor distances between the same subtype of neurons and nearestneighbor distances between different subtypes of neurons. Here is how we performed this analysis for interneurons. First, we determined the nearest-neighbor neurons for each interneuron and classified it as either having another interneuron of the same type as the nearest neighbor or having a different type of interneuron or an excitatory neuron as the nearest neighbor. We then determine the distributions for the distances between these two types of nearest neighbors and compared these distributions. When a neuronal subtype for a tight spatial cluster, such as the type-A cluster shown in the schematic below, the nearest-neighbor distances between nearest neighbor A-A pairs are indeed small. However, the distance between a type-A neuron and a different type of neurons (for example, type-B) is not necessarily bigger than those between two type-A neurons, if the nearest neighbor cell for this type-A neuron is a type-B neuron. These nearest-neighbor A-B pairs are likely formed between type-A neurons at the edge of the cluster with type-B neurons near the edge of the type-A cluster. If the distance of an A-B pair is not comparable to those of nearest-neighbor A-A pairs, it is unlikely a nearestneighbor pair by our definition as described above.
  
  Author response image 1.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The scholarship in this work is lacking. All of the non-MERFISH parts of the field of spatial transcriptomics are ignored. The work needs to be discussed in the context of the literature.
  
  We thank the reviewer for this suggestion and have included discussions of other spatial omics work, and other thick-tissue multiplexed imaging work in the Introduction and discussion section of the manuscript. Please see details in our response to the Public Review portion of this reviewer’s comments.
  
  (2) The data/code sharing links are broken and need to be fixed.
  
  Response: We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process We have now placed all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).
  
  The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin (MERFISH decoding package itself), we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922) to ensure that the readers can fully reproduce the results presented in our manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.21.550124v2
www.biorxiv.org www.biorxiv.org

Single-cell dissection of prognostic architecture and immunotherapy response in Helicobacter pylori infection associated gastric cancer

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  In this study, the authors conducted a single-cell RNA sequencing analysis of the cellular and transcriptional landscape of the gastric cancer tumor microenvironment, stratifying patients according to their H. pylori status into currently infected, previously infected, and non-infected patients. The authors comprehensively dissect various cellular compartments, including epithelial, stromal, and immune cells, and describe specific cell types and signatures to be associated with H. pylori infection, including i) inflammatory and EMT signatures in malignant epithelial cells, ii) inflammatory CAFs in stromal cells, iii) Angio-TAMs, TREM2+ TAMs, exhausted and suppressive T cells in immune cells. Looking at ligand-receptor interactions as well as correlations between cell type abundances, they suggest that iCAFs interact with immunosuppressive T cells via a NECTIN2-TIGIT axis, as well as Angio-TAMs through a VEGFA/B-VEGFR1 axis and thereby promote immune escape, tumor angiogenesis and resistance to immunotherapy.
  
  We sincerely appreciate the Reviewer's interest in our study and their valuable insights on how we can further enhance our work.
  
  The authors conduct a comprehensive and thorough analysis of the complex tumor microenvironment of gastric cancer, both single-cell RNA sequencing data as well as the analysis seem of high quality and according to best practices. The authors validate their findings using external datasets, and include some prognostic value of the identified signatures and cell types. However, most of their conclusions throughout the manuscript are based on the comparison between HPGC and healthy controls, which is not a valid comparison to determine which of the phenotypes are specifically driven by HP infection, e.g. Tregs are high in all GC types, independent of HP status. The same holds true for TREM+ TAMs and iCAFs, which are higher in GC in general. This makes it very difficult to assess the actual HP-driven signatures and cell types. Also, when looking at the correlation/transcriptional differences across different cell types and cellular interactions, the authors do not explicitly define if they are looking at the whole dataset (including healthy controls?) or only at certain patients (HPGC?), which again makes it difficult to interpret the results.
  
  We sincerely appreciate the reviewer's thorough assessment and valuable feedback on our study. During our analysis, although we did not specifically identify cell types unique to non-HpGC, ex-HpGC, or HpGC, we found that TREM+ TAMs and iCAFs were enriched in H. pylori-infected GC, with an even higher proportion in HpGC. This suggests that the enrichment of TREM+ TAMs and iCAFs is correlated with H. pylori infection status.
  
  However, gastric cancer is driven by multiple complex factors, including environmental influences, genetic mutations, and pathogenic infections. As single factor, the H. pylori infection does not significantly alter T cell proportions at the cellular level; rather, it affects the expression of immune checkpoint molecules (Author response image 1A-B). Importantly, we evaluated key molecules mediating the interaction among the iCAF with the angio-TAM and Tregs, the results show that the expression of NECTIN, PVR, VEGF, IL11 and IL24 are higher in ex-HpGC compared to the non-HpGC, with the highest expression observed in HpGC, which further validate the H. pylori -driven signatures (Author response image 1C).
  
  The correlation analysis among different cell types was conducted within different groups based on their H. pylori infection status (Author response image 1C). However, transcriptional differences across different cell types and cellular interactions were analyzed using the entire dataset, including healthy controls. This approach ensured an unbiased identification of molecular and cellular-level differences among cell subtypes before determining whether these subtypes originated from HpGC or ex-HpGC.
  
  Author response image 1.
  
  A. The dot plot illustrates the enrichment of the TIGIT-PVR/NECTIN axis in the interaction between malignant epithelial cells and immunosuppressive T cells. B. T Dotplot showing the expression of NECTIN2 and PVR in non-HpGC, ex-HpGC, and HpGC cells. C. The bubble plot showing the expression of NECTIN, PVR, VEGF, IL11 and IL24 in the CAF within non-HpGC, ex-HpGC, and HpGC sample. D. The correlation of cell type (percentage) between Tregs, Angio-TAM, TREM2+ TAM and iCAF.
  
  The authors aim to confirm some of their findings via immunofluorescence, which in principle is a great approach to validate their results. However, to be able to conclude that e.g. suppressive TIGIT+ T cells are located close to NECTIN2+ malignant epithelium and that this might facilitate immune escape in HPGC (Figure 4K), the authors should include stains that show that this is not the case in the other groups (nonHPGC, exHPGC and HC). The same holds true for Figure 5G.
  
  Thank you for your valuable feedback. We have add the immunostaining of the ligand TIGIT and the receptor NECTIN2 on suppressive T cells and on the malignant epithelium, as well as signature marker of Angio-TAM and TREM2+ TAM including TREM2, SPP1, VEGF and CD68, in the non-HpGC, ex-HpGC and HC sample (Figure S3, Figure S5). We could find that TIGIT and NECTIN2 exclusively express in HpGC and ex-HpGC samples compared with non-HpGC and HC, with extremely higher in HpGC. Furthermore, the Angio-TAM and TREM2+ TAM were exclusively enriched in HpGC and ex-HpGC samples, barely expressed in non-HpGC and HC. The above results also support our finding that the H.p infection statue determinate the enrichment of Angio-TAM and TREM2+ TAM, also the interaction between suppressive T cells and malignant epithelium guided by TIGIT-NECTIN.
  
  In summary, this study provides a valuable resource on the cellular and transcriptional heterogeneity of the tumor microenvironment in gastric cancers, distinguishing between positive, negative, and previously positive HP-infected gastric cancer patients. Given that HP is the main risk factor for gastric cancer development, the study provides valuable insights into HP-driven transcriptional signatures and how these might contribute to this increased risk, however, the study would highly benefit from a clearer and more stringent comparison between HPGC and nonHPGC.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This study aims to describe the single-cell transcriptomes of H pylori-associated (Hp) gastric cancers and tumor microenvironment (TME), as a starting point to understand TME diversity stratified by Hp status. RNA-seq was performed for gastric cancers with current Hp+ (from N=9 people), ex-Hp+ (N=6), non-Hp (N=6), and healthy gastric tissue (N=6).
  
  The study expands on previous single-cell transcriptomic studies of gastric cancers and was motivated by previous observations about the effect of H pylori status on therapeutic outcomes. The study includes a brief review of previous work and provides valuable context for this study.
  
  We thank the Reviewer for recognizing the interest of the topic, and for sharing their views on how we might further strengthen our work.
  
  Strengths:
  
  The observations are supported by solid RNAseq study design and analysis. The authors describe correlations between Hp status and inferred molecular characteristics including cell lineages, enrichment for cell subclusters identified as tumour-infiltrating lymphocyte cell types, tumour-infiltrating myeloid cells, and cancer-associated fibroblasts.
  
  The observed correlations between Hp status and enrichment of cell subclusters were broadly corroborated using comparisons to deconvolved bulk RNAseq from publicly available gastric cancer data, providing a convincing starting point for understanding the diversity of tumour microenvironment by Hp-status.
  
  Weaknesses:
  
  The authors acknowledge several limitations of this study.<br /> The correlations with HP-status are based on a small number of participants per Hp category (N=9 with current Hp+; N=6 for ex-HP+ and non-HP), and would benefit from further validation to establish reproducibility in other cohorts.
  
  Thank you for your valuable suggestions. We acknowledge that this may limit the generalizability and statistical power of our findings. However, despite the limited sample size, our analysis revealed statistically significant trends (e.g., p-value < 0.05) or consistent patterns in the data. The sample size in this study was constrained by the availability of participants meeting the inclusion criteria, particularly in the ex-HP+ and non-HP groups. We view these findings as hypothesis-generating and aim to validate them in future studies with larger cohorts.
  
  The ligand-receptor cross-talk analysis and the suggestion that suppressive T cells could interact with the malignant epithelium through TIGIT-NECTIN2/PVR pairs, are preliminary findings based on transcriptomic analysis and immunostaining and will require further validation.
  
  We appreciate the reviewer's comment and agree that the ligand-receptor cross-talk analysis and the proposed interaction between suppressive T cells and malignant epithelium via TIGIT-NECTIN2/PVR pairs are preliminary findings. These insights were derived from transcriptomic data and immunostaining, which provide valuable but indirect evidence of potential interactions. Our analysis revealed co-expression patterns of TIGIT in suppressive T cells and NECTIN2/PVR in malignant epithelial cells, and immunostaining demonstrated spatial proximity between these cell types. Previous studies have established the functional significance of TIGIT-NECTIN2/PVR interactions in immune regulation (PMID: 19815499, 27978489), supporting the biological plausibility of our observations. While our current data provide a foundation for this hypothesis, future studies involving functional assays or in vivo models would be valuable to confirm the biological relevance of these interactions. We view these findings as exploratory and aimed at guiding future research into the role of suppressive T cells in the tumor microenvironment.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) Software versions are missing from the scRNAseq section of the Methods.
  
  Thank you for your feedback. The bioinformation analysis are performed by Seurat 4.1 version, we have annotated the software version in the revised manuscript.
  
  (2) There is a data link to a deposit in Zenodo, subject to data access request to the authors. Do the authors intend to publish the scRNAseq data?
  
  Thank you for your inquiry regarding the data availability. We fully intend to make the scRNA-seq data publicly accessible. Currently, the dataset has been deposited in Zenodo and is available upon request to ensure compliance with institutional and ethical guidelines. We are in the process of finalizing the necessary approvals for unrestricted public release. Once completed, we will update the Raw data with an open-access link to facilitate direct download.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.31.596846v2
www.biorxiv.org www.biorxiv.org

New submission 29/11/2023, 09:34:35

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Firstly, we must take a moment to express our sincere gratitude to editorial board for allowing this work to be reviewed, and to the peer reviewers for taking the time and effort to review our manuscript. The reviews are thoughtful and reflect the careful work of scientists who undoubtedly have many things on their schedule. We cannot express our gratitude enough. This is not a minor sentiment. We appreciate the engagement.
  
  Allow us to briefly highlight some of the changes made to the revised manuscript, most on behalf of suggestions made by the reviewers:
  
  1) A supplementary figure that includes the calculation of drug applicability and variant vulnerability for a different data set–16 alleles of dihydrofolate reductase, and two antifolate compounds used to treat malaria–pyrimethamine and cycloguanil.
  
  2) New supplementary figures that add depth to the result in Figure 1 (the fitness graphs): we demonstrate how the rank order of alleles changes across drug environments and offer a statistical comparison of the equivalence of these fitness landscapes.
  
  3) A new subsection that explains our specific method used to measure epistasis.
  
  4) Improved main text with clarifications, fixed errors, and other addendums.
  
  5) Improved referencing and citations, in the spirit of better scholarship (now with over 70 references).
  
  Next, we’ll offer some general comments that we believe apply to several of the reviews, and to the eLife assessment. We have provided the bulk of the responses in some general comments, and in response to the public reviews. We have also included the suggestions and made brief comments to some of the individual recommendations.
  
  On the completeness of our analysis
  
  In our response, we’ll address the completeness issue first, as iterations of it appear in several of the reviews, and it seems to be one of the most substantive philosophical critiques of the work (there are virtually no technical corrections, outside of a formatting and grammar fixes, which we are grateful to the reviewers for identifying).
  
  To begin our response, we will relay that we have now included an analysis of a data set corresponding to mutants of a protein, dihydrofolate reductase (DHFR), from Plasmodium falciparum (a main cause of malaria), across two antifolate drugs (pyrimethamine and ycloguanil). We have also decided to include this new analysis in the supplementary material (see Figure S4).
  
  Author response image 1.
  
  Drug applicability and variant vulnerability for 16 alleles of dihydrofolate reductase.
  
  Here we compute the variant vulnerability and drug applicability metrics for two drugs, pyrimethamine (PYR) and cycloguanil (CYC), both antifolate drugs used to treat malaria. This is a completely different system than the one that is the focus of the submitted paper, for a different biomedical problem (antimalarial resistance), using different drugs, and targets. Further, the new data provide information on both drugs of different kinds, and drug concentrations (as suggested by Reviewer #1; we’ve also added a note about this in the new supplementary material). Note that these data have already been the subject of detailed analyses of epistatic effects, and so we did not include those here, but we do offer that reference:
  
  ● Ogbunugafor CB. The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions. Evolution. 2022 Feb 1;76(s1):37-48.
  
  ● Diaz-Colunga J, Sanchez A, Ogbunugafor CB. Environmental modulation of global epistasis is governed by effective genetic interactions. bioRxiv. 2022:202211.
  
  Computing our proposed metrics across different drugs is relatively simple, and we could have populated our paper with suites of similar analyses across data sets of various kinds. Such a paper would, in our view, be spread too thin–the evolution of antifolate resistance and/or antimalarial resistance are enormous problems, with large literatures that warrant focused studies. More generally, as the reviewers doubtlessly understand, simply analyzing more data sets does not make a study stronger, especially one like ours, that is using empirical data to both make a theoretical point about alleles and drugs and offer a metric that others can apply to their own data sets.
  
  Our approach focused on a data set that allowed us to discuss the biology of a system: a far stronger paper, a far stronger proof-of-concept for a new metric. We will revisit this discussion about the structure of our study. But before doing so, we will elaborate on why the “more is better” tone of the reviews is misguided.
  
  We also note that study where the data originate (Mira et al. 2015) is focused on a single data set of a single drug-target system. We should also point out that Mira et al. 2015 made a general point about drug concentrations influencing the topography of fitness landscapes, not unlike our general point about metrics used to understand features of alleles and different drugs in antimicrobial systems.
  
  This isn’t meant to serve as a feeble appeal to authority – just because something happened in one setting doesn’t make it right for another. But other than a nebulous appeal to the fact that things have changed in the 8 years since that study was published, it is difficult to argue why one study system was permissible for other work but is somehow “incomplete” in ours. Double standards can be appropriate when they are justified, but in this case, it hasn’t been made clear, and there is no technical basis for it.
  
  Our study does what countless other successful ones do: utilizes a biological system to make a general point about some phenomena in the natural world. In our case, we were focused on the need for more evolution-inspired iterations of widely used concepts like druggability. For example, a recent study of epistasis focused on a single set of alleles, across several drugs, not unlike our study:
  
  ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.
  
  Next, we assert that there is a difference between an eagerness to see a new metric applied to many different data sets (a desire we share, and plan on pursuing in the future), and the notion that an analysis is “incomplete” without it. The latter is a more serious charge and suggests that the researcher-authors neglected to properly construct an argument because of gaps in the data. This charge does not apply to our manuscript, at all. And none of the reviewers effectively argued otherwise.
  
  Our study contains 7 different combinatorially-complete datasets, each composed of 16 alleles (this not including the new analysis of antifolates that now appear in the revision). One can call these datasets “small” or “low-dimensional,” if they choose (we chose to put this front-and-center, in the title). They are, however, both complete and as large or larger than many datasets in similar studies of fitness landscapes:
  
  ● Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.
  
  ● Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, Hartl DL. Relevance of higher-order epistasis in drug resistance. Molecular biology and evolution. 2021 Jan;38(1):142-51.
  
  ● Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, Shakhnovich EI. Biophysical principles predict fitness landscapes of drug resistance. Proceedings of the National Academy of Sciences. 2016 Mar 15;113(11):E1470-8.
  
  ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.
  
  ● Lindsey HA, Gallie J, Taylor S, Kerr B. Evolutionary rescue from extinction is contingent on a lower rate of environmental change. Nature. 2013 Feb 28;494(7438):463-7.
  
  These are only five of very many such studies, some of them very well-regarded.
  
  Having now gone on about the point about the data being “incomplete,” we’ll next move to the more tangible comment-criticism about the low-dimensionality of the data set, or the fact that we examined a single drug-drug target system (β lactamases, and β-lactam drugs).
  
  The criticism, as we understand it, is that the authors could have analyzed more data,
  
  This is a common complaint, that “more is better” in biology. While we appreciate the feedback from the reviewers, we notice that no one specified what constitutes the right amount of data. Some pointed to other single data sets, but would analyzing two different sets qualify as enough? Perhaps to person A, but not to persons B - Z. This is a matter of opinion and is not a rigorous comment on the quality of the science (or completeness of the analysis).
  
  ● Should we analyze five more drugs of the same target (beta lactamases)? And what bacterial orthologs?
  
  ● Should we analyze 5 antifolates for 3 different orthologs of dihydrofolate reductase?
  
  ● And in which species or organism type? Bacteria? Parasitic infections?
  
  ● And why only infectious disease? Aren’t these concepts also relevant to cancer? (Yes, they are.)
  
  ● And what about the number of variants in the aforementioned target? Should one aim for small combinatorially complete sets? Or vaster swaths of sequence space, such as the ones generated by deep mutational scanning and other methods?
  
  I offer these options in part because, for the most part, were not given an objective suggestion for appropriate level of detail. This is because there is no answer to the question of what size of dataset would be most appropriate. Unfortunately, without a technical reason why a data set of unspecified size [X] or [Y] is best, then we are left with a standard “do more work” peer review response, one that the authors are not inclined to engage seriously, because there is no scientific rationale for it.
  
  The most charitable explanation for why more datasets would be better is tied to the abstract notion that seeing a metric measured in different data sets somehow makes it more believable. This, as the reviewers undoubtedly understand, isn’t necessarily true (in fact, many poor studies mask a lack of clarity with lots of data).
  
  To double down on this take, we’ll even argue the opposite: that our focus on a single drug system is a strength of the study.
  
  The focus on a single-drug class allows us to practice the lost art of discussing the peculiar biology of the system that we are examining. Even more, the low dimensionality allows us to discuss–in relative detail–individual mutations and suites of mutations. We do so several times in the manuscript, and even connect our findings to literature that has examined the biophysical consequences of mutations in these very enzymes.
  
  (For example: Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. 2017 May 1;34(5):1040-54.)
  
  Such detail is only legible in a full-length manuscript because we were able to interrogate a system in good detail. That is, the low-dimensionality (of a complete data set) is a strength, rather than a weakness. This was actually part of the design choice for the study: to offer a new metric with broad application but developed using a system where the particulars could be interrogated and discussed.
  
  Surely the findings that we recover are engineered for broader application. But to suggest that we need to apply them broadly in order to demonstrate their broad impact is somewhat antithetical to both model systems research and to systems biology, both of which have been successful in extracting general principles for singular (often simple) systems and models.
  
  An alternative approach, where the metric was wielded across an unspecified number of datasets would lend to a manuscript that is unfocused, reading like many modern machine learning papers, where the analysis or discussion have little to do with actual biology. We very specifically avoided this sort of study.
  
  To close our comments regarding data: Firstly, we have considered the comments and analyzed a different data set, corresponding to a different drug-target system (antifolate drugs, and DHFR). Moreover, we don’t think more data has anything to do with a better answer or support for our conclusions or any central arguments. Our arguments were developed from the data set that we used but achieve what responsible systems biology does: introduces a framework that one can apply more broadly. And we develop it using a complete, and well-vetted dataset. If the reviewers have a philosophical difference of opinion about this, we respect it, but it has nothing to do with our study being “complete” or not. And it doesn’t speak to the validity of our results.
  
  Related: On the dependence of our metrics on drug-target system
  
  Several comments were made that suggest the relevance of the metric may depend on the drug being used. We disagree with this, and in fact, have argued the opposite: the metrics are specifically useful because they are not encumbered with unnecessary variables. They are the product of rather simple arithmetic that is completely agnostic to biological particulars.
  
  We explain, in the section entitled “Metric Calculations:
  
  “To estimate the two metrics we are interested in, we must first quantify the susceptibility of an allelic variant to a drug. We define susceptibility as $1 - w$, where w is the mean growth of the allelic variant under drug conditions relative to the mean growth of the wild-type/TEM-1 control. If a variant is not significantly affected by a drug (i.e., growth under drug is not statistically lower than growth of wild-type/TEM-1 control, by t-test P-value < 0.01), its susceptibility is zero. Values in these metrics are summaries of susceptibility: the variant vulnerability of an allelic variant is its average susceptibility across drugs in a panel, and the drug applicability of an antibiotic is the average susceptibility of all variants to it.”
  
  That is, these can be animated to compute the variant vulnerability and drug applicability for data sets of various kinds. To demonstrate this (and we thank the reviewers for suggesting it), we have analyzed the antifolate-DHFR data set as outlined above.
  
  Finally, we will make the following light, but somewhat cynical point (that relates to the “more data” more point generally): the wrong metric applied to 100 data sets is little more than 100 wrong analyses. Simply applying the metric to a wide number of datasets has nothing to do with the veracity of the study. Our study, alternatively, chose the opposite approach: used a data set for a focused study where metrics were extracted. We believe this to be a much more rigorous way to introduce new metrics.
  
  On the Relevance of simulations
  
  Somewhat relatedly, the eLife summary and one of the reviewers mentioned the potential benefit of simulations. Reviewer 1 correctly highlights that the authors have a lot of experience in this realm, and so generating simulations would be trivial. For example, the authors have been involved in studies such as these:
  
  ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.
  
  ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.
  
  ● Ogbunugafor CB, Hartl D. A pivot mutation impedes reverse evolution across an adaptive landscape for drug resistance in Plasmodium vivax. Malaria Journal. 2016 Dec;15:1-0.
  
  From the above and dozens of other related studies, we’ve learned that simulations are critical for questions about the end results of dynamics across fitness landscapes of varying topography. To simulate across the datasets in the submitted study would be be a small ask. We do not provide this, however, because our study is not about the dynamics of de novo evolution of resistance. In fact, our study focuses on a different problem, no less important for understanding how resistance evolves: determining static properties of alleles and drugs, that provide a picture into their ability to withstand a breadth of drugs in a panel (variant vulnerability), or the ability of a drug in a panel to affect a breadth of drug targets.
  
  The authors speak on this in the Introduction:
  
  “While stepwise, de novo evolution (via mutations and subsequent selection) is a key force in the evolution of antimicrobial resistance, evolution in natural settings often involves other processes, including horizontal gene transfer and selection on standing genetic variation. Consequently, perspectives that consider variation in pathogens (and their drug targets) are important for understanding treatment at the bedside. Recent studies have made important strides in this arena. Some have utilized large data sets and population genetics theory to measure cross-resistance and collateral sensitivity. Fewer studies have made use of evolutionary concepts to establish metrics that apply to the general problem of antimicrobial treatment on standing genetic variation in pathogen populations, or for evaluating the utility of certain drugs’ ability to treat the underlying genetic diversity of pathogens”
  
  That is, the proposed metrics aren’t about the dynamics of stepwise evolution across fitness landscapes, and so, simulating those dynamics don’t offer much for our question. What we have done instead is much more direct and allows the reader to follow a logic: clearly demonstrate the topography differences in Figure 1 (And Supplemental Figure S2 and S3 with rank order changes).
  
  Author response image 2.
  
  These results tell the reader what they need to know: that the topography of fitness landscapes changes across drug types. Further, we should note that Mira et al. 2015 already told the basic story that one finds different adaptive solutions across different drug environments. (Notably, without computational simulations).
  
  In summary, we attempted to provide a rigorous, clean, and readable study that introduced two new metrics. Appeals to adding extra analysis would be considered if they augmented the study’s goals. We do not believe this to be the case.
  
  Nonetheless, we must reiterate our appreciation for the engagement and suggestions. All were made with great intentions. This is more than one could hope for in a peer review exchange. The authors are truly grateful.
  
  eLife assessment
  
  The work introduces two valuable concepts in antimicrobial resistance: "variant vulnerability" and "drug applicability", which can broaden our ways of thinking about microbial infections through evolution-based metrics. The authors present a compelling analysis of a published dataset to illustrate how informative these metrics can be, study is still incomplete, as only a subset of a single dataset on a single class of antibiotics was analyzed. Analyzing more datasets, with other antibiotic classes and resistance mutations, and performing additional theoretical simulations could demonstrate the general applicability of the new concepts.
  
  The authors disagree strongly with the idea that the study is ‘incomplete,” and encourage the editors and reviewers to reconsider this language. Not only are the data combinatorially complete, but they are also larger in size than many similar studies of fitness landscapes. Insofar as no technical justification was offered for this “incomplete” summary, we think it should be removed. Furthermore, we question the utility of “theoretical simulations.” They are rather easy to execute but distract from the central aims of the study: to introduce new metrics, in the vein of other metrics–like druggability, IC50, MIC–that describe properties of drugs or drug targets.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The manuscript by Geurrero and colleagues introduces two new metrics that extend the concept of "druggability"- loosely speaking, the potential suitability of a particular drug, target, or drug-target interaction for pharmacological intervention-to collections of drugs and genetic variants. The study draws on previously measured growth rates across a combinatoriality complete mutational landscape involving 4 variants of the TEM-50 (beta lactamase) enzyme, which confers resistance to commonly used beta-lactam antibiotics. To quantify how growth rate - in this case, a proxy for evolutionary fitness - is distributed across allelic variants and drugs, they introduce two concepts: "variant vulnerability" and "drug applicability".
  
  Variant vulnerability is the mean vulnerability (1-normalized growth rate) of a particular variant to a library of drugs, while drug applicability measures the mean across the collection of genetic variants for a given drug. The authors rank the drugs and variants according to these metrics. They show that the variant vulnerability of a particular mutant is uncorrelated with the vulnerability of its one-step neighbors and analyze how higher-order combinations of single variants (SNPs) contribute to changes in growth rate in different drug environments.
  
  The work addresses an interesting topic and underscores the need for evolutionbased metrics to identify candidate pharmacological interventions for treating infections. The authors are clear about the limitations of their approach - they are not looking for immediate clinical applicability - and provide simple new measures of druggability that incorporate an evolutionary perspective, an important complement to the orthodoxy of aggressive, kill-now design principles. I think the ideas here will interest a wide range of readers, but I think the work could be improved with additional analysis - perhaps from evolutionary simulations on the measured landscapes - that tie the metrics to evolutionary outcomes.
  
  The authors greatly appreciate these comments, and the proposed suggestions by reviewer 1. We have addressed most of the criticisms and suggestions in our comments above.
  
  Reviewer #2 (Public Review):
  
  The authors introduce the notions of "variant vulnerability" and "drug applicability" as metrics quantifying the sensitivity of a given target variant across a panel of drugs and the effectiveness of a drug across variants, respectively. Given a data set comprising a measure of drug effect (such as growth rate suppression) for pairs of variants and drugs, the vulnerability of a variant is obtained by averaging this measure across drugs, whereas the applicability of a drug is obtained by averaging the measure across variants.
  
  The authors apply the methodology to a data set that was published by Mira et al. in 2015. The data consist of growth rate measurements for a combinatorially complete set of 16 genetic variants of the antibiotic resistance enzyme betalactamase across 10 drugs and drug combinations at 3 different drug concentrations, comprising a total of 30 different environmental conditions. For reasons that did not become clear to me, the present authors select only 7 out of 30 environments for their analysis. In particular, for each chosen drug or drug combination, they choose the data set corresponding to the highest drug concentration. As a consequence, they cannot assess to what extent their metrics depend on drug concentration. This is a major concern since Mira et al. concluded in their study that the differences between growth rate landscapes measured at different concentrations were comparable to the differences between drugs. If the new metrics display a significant dependence on drug concentration, this would considerably limit their usefulness.
  
  The authors appreciate the point about drug concentration, and it is one that the authors have made in several studies.
  
  The quick answer is that whether the metrics are useful for drug type-concentration A or B will depend on drug type-concentration A or B. If there are notable differences in the topography of the fitness landscape across concentration, then we should expect the metrics to differ. What Reviewer #2 points out as a “major concern,” is in fact a strength of the metrics: it is agnostic with respect to type of drug, type of target, size of dataset, or topography of the fitness landscape. And so, the authors disagree: no, that drug concentration would be a major actor in the value of the metrics does not limit the utility of the metric. It is simply another variable that one can consider when computing the metrics.
  
  As discussed above, we have analyzed data from a different data set, in a different drug-target problem (DHFR and antifolate drugs; see supplemental information). These demonstrate how the metric can be used to compute metrics across different drug concentrations.
  
  As a consequence of the small number of variant-drug combinations that are used, the conclusions that the authors draw from their analysis are mostly tentative with weak statistical support. For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.
  
  We reiterate our appreciation for the engagement. Reviewer #2 generously offers some technical insight on measurements of epistasis, and their opinion on the level of statistical support for our claims. The authors are very happy to engage in a dialogue about these points. We disagree rather strongly, and in addition to the general points raised above (that speak to some of this), will raise several specific rebuttals to the comments from Reviewer #2.
  
  For one, the Reviewer #2 is free to point to what arguments have “weak statistical support.” Having read the review, we aren’t sure what this is referring to. “Weak statistical support” generally applies to findings built from underpowered studies, or designs constructed in manner that yield effect sizes or p-values that give low confidence that a finding is believable (or is replicable). This sort of problem doesn’t apply to our study for various reasons, the least of which being that our findings are strongly supported, based on a vetted data set, in a system that has long been the object of examination in studies of antimicrobial resistance.
  
  For example, we did not argue that magnetic fields alter the topography of fitness landscapes, a claim which must stand up to a certain sort of statistical scrutiny. Alternatively, we examined landscapes where the drug environment differed statistically from the non-drug environment and used them to compute new properties of alleles and drugs.
  
  We can imagine that the reviewer is referring to the low-dimensionality of the fitness landscapes in the study. Again: the features of the dataset are a detail that the authors put into the title of the manuscript. Further, we emphasize that it is not a weakness, but rather, allows the authors to focus, and discuss the specific biology of the system. And we responsibly explain the constraints around our study several times, though none of them have anything to do with “weak statistical support.”
  
  Even though we aren’t clear what “weak statistical support” means as offered by Reviewer 2, the authors have nonetheless decided to provide additional analyses, now appearing in the new supplemental material.
  
  We have included a new Figure S2, where we offer an analysis of the topography of the 7 landscapes, based on the Kendall rank order test. This texts the hypothesis that there is no correlation (concordance or discordance) between the topographies of the fitness landscapes.
  
  Author response image 3.
  
  Kendall rank test for correlation between the 7 fitness landscapes.
  
  In Figure S3, we test the hypothesis that the variant vulnerability values differ. To do this, we calculate a paired t-test. These are paired by haplotype/allelic variant, so the comparisons are change in growth between drugs for each haplotype.
  
  Author response image 4.
  
  Paired t-tests for variant vulnerability.
  
  To this point raised by Reviewer #2:
  
  “For example, the authors argue that drug combinations tend to have higher drug applicability than single drugs, because a drug combination ranks highest in their panel of 7. However, the effect profile of the single drug cefprozil is almost indistinguishable from that of the top-ranking combination, and the second drug combination in the data set ranks only 5th out of 7.”
  
  Our study does not argue that drug combinations are necessarily correlated with a higher drug applicability. Alternatively, we specifically highlight that one of the combinations does not have a high drug applicability:
  
  “Though all seven drugs/combinations are β-lactams, they have widely varying effects across the 16 alleles. Some of the results are intuitive: for example, the drug regime with the highest drug applicability of the set—amoxicillin/clavulanic acid—is a mixture of a widely used β-lactam (amoxicillin) and a β-lactamase inhibitor (clavulanic acid) (see Table 3). We might expect such a mixture to have a broader effect across a diversity of variants. This high applicability is hardly a rule, however, as another mixture in the set, piperacillin/tazobactam, has a much lower drug applicability (ranking 5th out of the seven drugs in the set) (Table 3).”
  
  In general, we believe that the submitted paper is responsible with regards to how it extrapolates generalities from the results. Further, the manuscript contains a specific section that explains limitations, clearly and transparently (not especially common in science). For that reason, we’d encourage reviewer #2 to reconsider their perspective. We do not believe that our arguments are built on “weak” support at all. And we did not argue anything particular about drug combinations writ large. We did the opposite— discussed the particulars of our results in light of the biology of the system.
  
  Thirdly, to this point:
  
  “To assess the environment-dependent epistasis among the genetic mutations comprising the variants under study, the authors decompose the data of Mira et al. into epistatic interactions of different orders. This part of the analysis is incomplete in two ways. First, in their study, Mira et al. pointed out that a fairly large fraction of the fitness differences between variants that they measured were not statistically significant, which means that the resulting fitness landscapes have large statistical uncertainties. These uncertainties should be reflected in the results of the interaction analysis in Figure 4 of the present manuscript.”
  
  The authors are uncertain with regards to the “uncertainties” being referred to, but we’ll do our best to understand: our study utilized the 7 drug environments from Mira et al. 2015 with statistically significant differences between growth rates with and without drug. And so, this point about how the original set contained statistically insignificant treatments is not relevant here. We explain this in the methods section:
  
  “The data that we examine comes from a past study of a combinatorial set of four mutations associated with TEM-50 resistance to β-lactam drugs [39 ]. This past study measured the growth rates of these four mutations in combination, across 15 different drugs (see Supplemental Information).”
  
  We go on to say the following:
  
  “We examined these data, identifying a subset of structurally similar β-lactams that also included β-lactams combined with β-lactamase inhibitors, cephalosporins and penicillins. From the original data set, we focus our analyses on drug treatments that had a significant negative effect on the growth of wild-type/TEM-1 strains (one-tailed ttest of wild-type treatment vs. control, P < 0.01). After identifying the data from the set that fit our criteria, we were left with seven drugs or combinations (concentration in μg/ml): amoxicillin 1024 μg/ ml (β-lactam), amoxicillin/clavulanic acid 1024 μg/m l (βlactam and β-lactamase inhibitor) cefotaxime 0.123 μg/ml (third-generation cephalosporin), cefotetan 0.125 μg/ml (second-generation cephalosporins), cefprozil 128 μg/ml (second-generation cephalosporin), ceftazidime 0.125 μg/ml (third-generation cephalosporin), piperacillin and tazobactam 512/8 μg/ml (penicillin and β-lactamase inhibitor). With these drugs/mixtures, we were able to embody chemical diversity in the panel.”
  
  Again: The goal of our study was to develop metrics that can be used to analyze features of drugs and targets and disentangle these metrics into effects.
  
  Second, the interpretation of the coefficients obtained from the epistatic decomposition depends strongly on the formalism that is being used (in the jargon of the field, either a Fourier or a Taylor analysis can be applied to fitness landscape data). The authors need to specify which formalism they have employed and phrase their interpretations accordingly.
  
  The authors appreciate this nuance. Certainly, how to measure epistasis is a large topic of its own. But we recognize that we could have addressed this more directly and have added text to this effect.
  
  In response to these comments from Reviewer #2, we have added a new section focused on these points (reference syntax removed here for clarity; please see main text for specifics):
  
  “The study of epistasis, and discussions regarding the means to detect and measure now occupies a large corner of the evolutionary genetics literature. The topic has grown in recent years as methods have been applied to larger genomic data sets, biophysical traits, and the "global" nature of epistatic effects. We urge those interested in more depth treatments of the topic to engage larger summaries of the topic.”
  
  “Here will briefly summarize some methods used to study epistasis on fitness landscapes. Several studies of combinatorially-complete fitness landscapes use some variation of Fourier Transform or Taylor formulation. One in particular, the Walsh-Hadamard Transform has been used to measure epistasis across a wide number of study systems. Furthermore, studies have reconciled these methods with others, or expanded upon the Walsh-Hadamard Transform in a way that can accommodate incomplete data sets. These methods are effective for certain sorts of analyses, and we strongly urge those interested to examine these studies.”
  
  “The method that we've utilized, the LASSO regression, determines effect sizes for all interactions (alleles and drug environments). It has been utilized for data sets of similar size and structure, on alleles resistant to trimethoprim. Among many benefits, the method can accommodate gaps in data and responsibly incorporates experimental noise into the calculation.”
  
  As Reviewer #2 understands, there are many ways to examine epistasis on both high and low-dimensional landscapes. Reviewer #2 correctly offers two sorts of formalisms that allow one to do so. The two offered by Reviewer #2, are not the only means of measuring epistasis in data sets like the one we have offered. But we acknowledge that we could have done a better job outlining this. We thank Reviewer #2 for highlighting this, and believe our revision clarifies this.
  
  Reviewer #3 (Public Review):
  
  The authors introduce two new concepts for antimicrobial resistance borrowed from pharmacology, "variant vulnerability" (how susceptible a particular resistance gene variant is across a class of drugs) and "drug applicability" (how useful a particular drug is against multiple allelic variants). They group both terms under an umbrella term "drugability". They demonstrate these features for an important class of antibiotics, the beta-lactams, and allelic variants of TEM-1 beta-lactamase.
  
  The strength of the result is in its conceptual advance and that the concepts seem to work for beta-lactam resistance. However, I do not necessarily see the advance of lumping both terms under "drugability", as this adds an extra layer of complication in my opinion.
  
  Firstly, the authors greatly appreciate the comments from Reviewer #3. They are insightful, and prescriptive. And allow us to especially thank reviewer 3 for supplying a commented PDF with some grammatical and phrasing suggestions/edits. This is much appreciated. We have examined all these suggestions and made changes.
  
  In general, we agree with the spirit of many of the comments. In addition to our prior comments on the scope of our data, we’ll communicate a few direct responses to specific points raised.
  
  I also think that the utility of the terms could be more comprehensively demonstrated by using examples across different antibiotic classes and/or resistance genes. For instance, another good model with published data might have been trimethoprim resistance, which arises through point mutations in the folA gene (although, clinical resistance tends to be instead conferred by a suite of horizontally acquired dihydrofolate reductase genes, which are not so closely related as the TEM variants explored here).
  
  In our new supplemental material, we now feature an analysis of antifolate drugs, pyrimethamine and cycloguanil. We have discussed this in detail above and thank the reviewer for the suggestion.
  
  Secondly, we agree that the study will have a larger impact when the metrics are applied more broadly. This is an active area of investigation, and our hope is that others apply our metrics more broadly. But as we discussed, such a desire is not a technical criticism of our own study. We stand behind the rigor and insight offered by our study.
  
  The impact of the work on the field depends on a more comprehensive demonstration of the applicability of these new concepts to other drugs.
  
  The authors don’t disagree with this point, which applies to virtually every potentially influential study. The importance of a single study can generally only be measured by its downstream application. But this hardly qualifies as a technical critique of our study and does not apply to our study alone. Nor does it speak to the validity of our results. The authors share this interest in applying the metric more broadly.
  
  Reviewer #1 (Recommendations For The Authors):
  
  The main weakness of the work, in my view, is that it does not directly tie these new metrics to a quantitative measure of "performance". The metrics have intuitive appeal, and I think it is likely that they could help guide treatment options-for example, drugs with high applicability could prove more useful under particular conditions. But as the authors note, the landscape is rugged and intuitive notions of evolutionary behavior can sometimes fail. I think the paper would be much improved if the authors could evaluate their new metrics using some type of quantitative evolutionary model. For example, perhaps the authors could simulate evolutionary dynamics on these landscapes in the presence of different drugs. Is the mean fitness achieved in the simulations correlated with, for example, the drug applicability when looking across an ensemble of simulations with the same drug but varied initial conditions that start from each individual variant? Similarly, if you consider an ensemble of simulations where each member starts from the same variant but uses a different drug, is the average fitness gain captured in some way by the variant vulnerability? All simulations will have limitations, of course, but given that the landscape is fully known I think these questions could be answered under some conditions (e.g. strong selection weak mutation limit, where the model could be formulated as a Markov Chain; see 10.1371/journal.pcbi.1004493 or doi: 10.1111/evo.14121 for examples). And given the authors' expertise in evolutionary dynamics, I think it could be achieved in a reasonable time. With that said, I want to acknowledge that with any new "metrics", it can be tempting to think that "we need to understand it all" before it is useful, and I don't want to fall into that trap here.
  
  The authors respect and appreciate these thoughtful comments.
  
  As Reviewer #1 highlighted, the authors are experienced with building simulations of evolution. For reasons we have outlined above, we don’t believe they would add to the arc of the current story and may encumber the story with unnecessary distractions. Simulations of evolution can be enormously useful for studies focused on particulars of the dynamics of evolution. This submitted study is not one of those. It is charged with identifying features of alleles and drugs that capture an allele’s vulnerability to treatment (variant vulnerability) and a drug’s effectiveness across alleles (drug applicability). Both features integrate aspects of variation (genetic and environmental), and as such, are improvements over both metrics used to describe drug targets and drugs.
  
  The new metrics rely on means, which is a natural choice. Have the authors considered how variance (or other higher moments) might also impact evolutionary dynamics? I would imagine, for example, that the ultimate outcome of a treatment might depend heavily on the shape of the distribution, not merely its mean. This is also something one might be able to get a handle on with simulations.
  
  These are relevant points, and the authors appreciate them. Certainly, moments other than the mean might have utility. This is the reason that we computed the one-step neighborhood variant vulnerability–to see if the variant vulnerability of an allele was related to properties of its mutational neighborhood. We found no such correlation. There are many other sorts of properties that one might examine (e.g., shape of the distribution, properties of mutational network, variance, fano factor, etc). As we don’t have an informed reason to pursue any of this in lieu of others, we are pleased to investigate this in the future.
  
  Also, while we’ve addressed general points about simulations above, we want to note that our analysis of environmental epistasis does consider the variance. We urge Reviewer #1 to see our new section on “Notes on Methods Used to Measure Epistasis” where we explain some of this and supply references to that effect.
  
  As I understand it, the fitness measurements here are measures of per capita growth rate, which is reasonable. However, the authors may wish to briefly comment on the limitations of this choice-i.e. the fact that these are not direct measures of relative fitness values from head-to-head competition between strains.
  
  Reviewer #1 is correct: the metrics are computed from means. As Reviewer 1 definitely understands, debates over what measurements are proper proxies for fitness go back a long time. We added a slight acknowledgement about the existence of multiple fitness proxies in our revision.
  
  The authors consider one-step variant vulnerability. Have the authors considered looking at 2-step, 3-step, etc analogs of the 1-step vulnerability? I wonder if these might suggest potential vulnerability bottlenecks associated with the use of a particular drug/drug combo or trajectories starting from particular variants.
  
  This is an interesting point. We provided one-step values as a means of interrogating the mutational neighborhood of alleles in the fitness landscape. While there could certainly be other pattern-relationships between the variant vulnerability and features of a fitness landscape (as the reviewer recognizes), we don’t have a rigorous reason to test them, other than an appeal to “I would be curious if [Blank].” As in, attempting to saturate the paper with these sorts of examinations might be fun, could turn up an interesting result, but this is true for most studies.
  
  To highlight just how serious we are about future questions along these lines, we’ll offer one specific question about the relationship between metrics and other features of alleles or landscapes. Recent studies have examined the existence of “evolvabilityenhancing mutations,” that propel a population to high-fitness sections of a fitness landscape:
  
  ● Wagner, A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 14, 3624 (2023). https://doi.org/10.1038/s41467023-39321-8
  
  One present and future area of inquiry involves whether there is any relationship between metrics like variant vulnerability and these sorts of mutations.
  
  We thank Reviewer 1 for engagement on this issue.
  
  Fitness values are measured in the presence of a drug, but it is not immediately clear how the drug concentrations are chosen and, more importantly, how the choice of concentration might impact the landscape. The authors may wish to briefly comment on these effects, particularly in cases where the environment involves combinations of drugs. There will be a "new" fitness landscape for each concentration, but to what extent do the qualitative features changes-or whatever features drive evolutionary dynamics--change?
  
  This is another interesting suggestion. We have analyzed a new data set for dihydrofolate reductase mutants that contains a range of drug concentrations of two different antifolate drugs. The general question of how drug concentrations change evolutionary dynamics has been addressed in prior work of ours:
  
  ● Ogbunugafor CB, Wylie CS, Diakite I, Weinreich DM, Hartl DL. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS computational biology. 2016 Jan 25;12(1):e1004710.
  
  ● Ogbunugafor CB, Eppstein MJ. Competition along trajectories governs adaptation rates towards antimicrobial resistance. Nature ecology & evolution. 2016 Nov 21;1(1):0007.
  
  There are a very large number of environment types that might alter the drug availability or variant vulnerability metrics. In our study, we used an established data set composed of different alleles of a Beta lactamase, with growth rates measured across a number of drug environments. These drug environments consisted of individual drugs at certain concentrations, as outlined in Mira et al. 2015. For our study, we examined those drugs that had a significant impact on growth rate.
  
  For a new analysis of antifolate drugs in 16 alleles of dihydrofolate reductase (Plasmodium falciparum), we have examined a breadth of drug concentrations (Supplementary Figure S4). This represents a different sort of environment that one can use to measure the two metrics (variant vulnerability or drug applicability). As we suggest in the manuscript, part of the strength of the metric is precisely that it can incorporate drug dimensions of various kinds.
  
  The metrics introduced depend on the ensemble of drugs chosen. To what extent are the chosen drugs representative? Are there cases where nonrepresentative ensembles might be advantageous?
  
  The authors thank the reviewer for this. The general point has been addressed in our comments above. Further, the general question of how a study of one set of drugs applies to other drugs applies to every study of every drug, as no single study interrogates every sort of drug ensemble. That said, we’ve explained the anatomy of our metrics, and have outlined how it can be directly applied to others. There is nothing about the metric itself that has anything to do with a particular drug type – the arithmetic is rather vanilla.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Regarding my comment about the different formalisms for epistatic decomposition analysis, a key reference is
  
  Poelwijk FJ, Krishna V, Ranganathan R (2016). The Context-Dependence of Mutations: A Linkage of Formalisms. PLoS Comput Biol 12(6): e1004771.
  
  The authors appreciate this, are fans of this work, and have cited it in the revision.
  
  An example where both Fourier and Taylor analyses were carried out and the different interpretations of these formalisms were discussed is
  
  Unraveling the causes of adaptive benefits of synonymous mutations in TEM-1 βlactamase. Mark P. Zwart, Martijn F. Schenk, Sungmin Hwang, Bertha Koopmanschap, Niek de Lange, Lion van de Pol, Tran Thi Thuy Nga, Ivan G. Szendro, Joachim Krug & J. Arjan G. M. de Visser Heredity 121:406-421 (2018)
  
  The authors are grateful for these references. While we don’t think they are necessary for our new section entitled “Notes on methods used to detect epistasis,” we did engage them, and will keep them in mind for other work that more centrally focuses on methods used to detect epistasis. As the author acknowledges, a full treatment of this topic is too large for a single manuscript, let alone a subsection of one study. We have provided a discussion of it, and pointed the readers to longer review articles that explore some of these topics in good detail:
  
  ● C. Bank, Epistasis and adaptation on fitness landscapes, Annual Review of Ecology, Evolution, and Systematics 53 (1) (2022) 457–479.
  
  ● T. B. Sackton, D. L. Hartl, Genotypic context and epistasis in individuals and populations, Cell 166 (2) (2016) 279–287.
  
  ● J. Diaz-Colunga, A. Skwara, J. C. C. Vila, D. Bajic, Á. Sánchez, Global epistasis and the emergence of ecological function, BioRxviv
  
  Although the authors label Figure 4 with the term "environmental epistasis", as far as I can see it is only a standard epistasis analysis that is carried out separately for each environment. The analysis of environmental epistasis should instead focus on which aspects of these interactions are different or similar in different environments, for example, by looking at the reranking of fitness values under environmental changes [see Ref.[26] as well as more recent related work, e.g. Gorter et al., Genetics 208:307-322 (2018); Das et al., eLife9:e55155 (2020)]. To some extent, such an analysis was already performed by Mira et al., but not on the level of epistatic interaction coefficients.
  
  The authors have provided a new analysis of how fitness value rankings have changed across drug environments, often a signature of epistatic effects across environments (Supplementary Figure S1).
  
  We disagree with the idea that our analysis is not a sort of environmental epistasis; we resolve coefficients between loci across different environments. As with every interrogation of G x E effects (G x G x E in our case), what constitutes an “environment” is a messy conversation. We have chosen the route of explaining very clearly what we mean:
  
  “We further explored the interactions across this fitness landscape and panels of drugs in two additional ways. First, we calculated the variant vulnerability for 1-step neighbors, which is the mean variant vulnerability of all alleles one mutational step away from a focal variant. This metric gives information on how the variant vulnerability values are distributed across a fitness landscape. Second, we estimated statistical interaction effects on bacterial growth through LASSO regression. For each drug, we fit a model of relative growth as a function of M69L x E104K x G238S x N276D (i.e., including all interaction terms between the four amino acid substitutions). The effect sizes of the interaction terms from this regularized regression analysis allow us to infer higher-order dynamics for susceptibility. We label this calculation as an analysis of “environmental epistasis.”
  
  As the grammar for these sorts of analyses continues to evolve, the best one can do is be clear about what they mean. We believe that we communicated this directly and transparently.
  
  As a general comment, to strengthen the conclusions of the study, it would be good if the authors could include additional data sets in their analysis.
  
  The authors appreciate this comment and have given this point ample treatment. Further, other main conclusions and discussion points are focused on the biology of the system that we examined. Analyzing other data sets may demonstrate the broader reach of the metrics, but it would not alter the strength of our own conclusions (or if they would, Reviewer #2 has not told us how).
  
  There are some typos in the units of drug concentrations in Section 2.4 that should be corrected.
  
  The authors truly appreciate this. It is a great catch. We have fixed this in the revised manuscript.
  
  Reviewer #3 (Recommendations For The Authors):
  
  I would suggest demonstrating the concepts for a second drug class, and suggest folA variants and trimethoprim resistance, for which there is existing published data similar to what the authors have used here (e.g. Palmer et al. 2015, https://doi.org/10.1038/ncomms8385)
  
  The authors appreciate this insight. As previously described, we have analyzed a data set of folA mutants for the Plasmodium falciparum ortholog of dihydrofolate reductase, and included these results in new supplemental material. Please see the supplementary material.
  
  There are some errors in formatting and presentation that I have annotated in a separate PDF file (https://elife-rp.msubmit.net/eliferp_files/2023/04/11/00117789/00/117789_0_attach_8_30399_convrt.pdf), as the absence of line numbers makes indicating specific things exceedingly difficult.
  
  The authors apologize for the lack of line numbers (an honest oversight), but moreover, are tremendously grateful for this feedback. We have looked at the suggested changes carefully and have addressed many of them. Thank you.
  
  One thing to note: we have included a version of Figure 4 that has effects on the same axes. It appears in the supplementary material (Figure S4).
  
  In closing, the authors would like to thank the editors and three anonymous reviewers for engagement and for helpful comments. We are confident that the revised manuscript qualifies as a substantive revision, and we are grateful to have had the opportunity to participate.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.08.536116v2
www.biorxiv.org www.biorxiv.org

New submission 11/08/2023, 09:01:09

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  We are grateful for the comments from the reviewers, which helped us to strengthen our analyses and communicate more effectively the details of our findings and their significance. To address their criticisms, we have performed new analyses and revised the text and figures. We believe the manuscript was significantly improved. We provide the line number of important parts of the text that were changed, here in this letter. Below, we address the specific comments from the reviewers in detail.
  
  Reviewer #1 (Public Review):
  
  Gehr and colleagues used an elegant method, using neuropixels probes, to study retinal input integration by mouse superior collicular cells in vivo. Compared to a previous report of the same group, they opto-tagged inhibitory neurons and defined the differential integration onto each group. Through these experiments, the author concluded that overall, there is no clear difference between the retina connectivity to excitatory and inhibitory superior colliculus neurons. The exception to that rule is that excitatory neurons might be driven slightly stronger than inhibitory ones. Technically, this work is performed at a high level, and the plots are beautifully conceived, but I have doubts if the interpretation given by the authors is solid. I will elaborate below.
  
  Some thoughts about the interpretation of the results.
  
  My main concern is the "survivor bias" of this work, which can lead to skewed conclusions. From the data set acquired, 305 connections were measured, 1/3 inhibitory and 2/3 excitatory. These connections arise from 83 RGC onto 124 RGC (I'm interpreting the axis of Fig.2 C). Here it is worth mentioning that different RGC types have different axonal diameters (Perge et al., 2009). Here the diameter is also related to the way cells relay information (max frequencies, for example). It is possible that thicker axons are easier to measure, given the larger potential changes would likely occur, and thus, selectively being picked up by the neuropixels probe. If this is the case, we would have a clear case of "survival bias", which should be tested and discussed. One way to determine if the response properties of axonal termini are from an unbiased sample is to make a rough functional characterization as generally performed (see Baden et al. 2006). This is fundamental since all other conclusions are based on unbiased sampling.
  
  First of all, we want to thank the reviewer for the detailed and constructive comments based on which we refined the analysis and updated the figures. We hope that our changes adequately address the concerns of reviewer #1.
  
  We would like to clarify that Fig. 2C represents an example from a single experiment. In total, we recorded 326 RGCs and 680 SC neurons in total, with 161 individual RGCs making connections onto 183 individual SC neurons. Moreover, we thank the reviewer for bringing up that important point about the potential “survivor bias”. To address this concern, we would like to provide some clarifications (see below). In addition, we now added the point that different RGCs can have different axonal diameter as requested by the reviewers (line 605).
  
  It is important to note that our approach does not capture the total pool of retinal inputs. Moreover, we did not want to convey the impression that our approach equally captures all retinal inputs to a given SC neuron, as this is not the case. Likewise, it is important to note that our current method does not allow for the measurement of axonal diameters. To obtain an estimate of axonal thickness, complementary techniques such as imaging/staining or electron microscopy would be needed. Our study aimed at characterizing connected RGC-SC pairs and how excitatory and inhibitory neurons in the SC integrate retinal inputs, providing valuable insight on their wiring principles.
  
  We greatly appreciate the reviewer for highlighting this limitation and we now address these points in the discussion of the revised version of our manuscript (line 603).
  
  Regarding the suggested “rough functional characterization” of the RGCs. We have thought about this analysis and unfortunately, we did not present the necessary stimuli, e.g. chirp, in all experiments to be able to perform this analysis. Moreover, the dataset represented in this work contains only 326 RGC neurons, with 161 identified RGCs making connections to SC neurons. Thus, it is unlikely that our dataset uniformly covers all ~30 RGC types in the mouse. However, given that our dataset is the first measurement of RGC inputs to SC INs and SC EXNs in vivo, we believe it provides a first step and a foundation for future studies focusing on specific RGC types to refine our understanding of the RGC-SC circuitry. We discuss this point now in the revised manuscript (line 586).
  
  One aspect that is not clear to me is to measure of connectivity strength in Figure 2. Here it seems that connectivity strength is directly correlated with the baseline firing rate of the SC neuron (see example plots). If this is a general case, the synaptic strength can be assumed but would only differ in strength due to the excitability of the postsynaptic cell. This should be tested by plotting the correlation coefficient analysis against the baseline firing rate.
  
  We appreciate the reviewer for bringing up this important point. From the analysis perspective, we would like to clarify that the efficacy measure is independent of the baseline firing rate. It quantifies the probability of adding spikes on top of the baseline rate by subtracting the baseline firing rate before measuring the area of the peak (Usrey et al., 1999).
  
  Furthermore, we acknowledge the reviewer’s interesting and valuable observation about the relationship of the firing rate and the excitability of the SC neuron in the example plots. To test whether the efficacy is directly related to the mean firing rate, we conducted additional analyses to show the efficacy measure as a function of the mean firing rate (Author response image 1 and Figure 2G). To that end, we utilized two different measures of firing rate: the mean firing rate during spontaneous activity (gray screen) over a duration of 10 sec (across 30 trials), which was interleaved with the natural movie presentations, and the overall firing rate throughout the entire recording session. Our findings indeed reveal a positive correlation, as predicted by the reviewer (Author response image 1, gray screen: EXC r = 0.22721; p < 0.00081; INH: r = 0.34677, p= 0.00076; entire recording: EXC r = 0.42685; p < 0.0005; INH: r = 0.43543, p = 0.00002).
  
  Author response image 1.
  
  Efficacy measure of connected RGC-SC pairs as a function of the mean firing rate during different stimulus conditions: during spontaneous activity (gray screen, left) and throughout the entire recording session (right).
  
  However, it is important to note that although we observe a correlation on the population level, the relationship between postsynaptic firing and efficacy is diverse. We identify pairs with strong connections despite the firing rate of the postsynaptic SC cell being low. Likewise, we also find pairs with weak connections despite the firing rate of the SC neuron being high (Author response image 2). These observations suggest that factors beyond the postsynaptic firing contribute to the efficacy of the connection. This is exemplified by the fact that SC neurons can receive both strong and weak connections from their convergent presynaptic RGC pool.
  
  Author response image 2.
  
  RGC-SC connectivity. Cross-correlograms showing 4 connected RGC-SC pairs (top) with two RGCs connecting onto the same SC neuron. Raster plots of SC neuron spiking activity in response to firing of the presynaptically connected RGC. The same SC neuron can receive both strong and weak RGC inputs.
  
  In summary, we thank the reviewer for bringing up this important question, and we believe that our additional analyses shed light on the relationship between firing rate and efficacy. This result is very interesting, and we include these findings in the updated Figure 2 in the revised manuscript (panel 2G) in exchange with the panel of the peak latency. Moreover, we also address this point now in the results and discussion section of the revised manuscript (line 280 and line 525).
  
  My third concern is the assessment of functional similarity in Fig. 3. It is not clear to me why the similarity value was taken by the arithmetic mean. For example, even if the responses are identical for one connected pair that exclusively responds either to the ON or OFF sparse noise, the maximal value can only be 0.67. Perhaps I misunderstood something.
  
  We thank the reviewer for raising this point about the clarification regarding the calculation of the similarity index. We apologize for any confusion caused by our description on the similarity index calculation. To clarify, the similarity index was calculated specifically between the responses of the RGC and the responses of the postsynaptic SC neuron, rather than between the neurons and the visual stimulus. As a result, the similarity index reflects the degree of similarity in the responses of the connected pairs. Therefore, if the responses of the RGC and the connected postsynaptic neuron are identical, regardless of whether they respond exclusively to ON, only to OFF, or a mixture of ON-OFF, the similarity index will be one. We have updated the relevant part in the methods section to make this point clearer to the reader (line 917).
  
  Secondly, correlations in natural(istic) movies can differ dramatically depending on the frame rate that the movie was acquired and the way it is displayed to the animal. What looks natural to us will elicit several artifacts at a retinal level, e.g., due to big jumps between frames (no direction-selective response) or overall little modulation (large spatial correlations). I would rather opt for uniform stimuli, as suggested previously. Of course, these are also approximations but can be easily reproduced by different labs and are not subjected to the intricacies of the detailed naturalistic stimulus used.
  
  We agree with the reviewer that spatiotemporal correlations of naturalistic stimuli are complex. To address this point, we added two stimuli with little spatiotemporal correlations to the similarity analysis. The first stimulus we added is a phase scrambled version of the natural movie (PSM, also taken from Froudarakis et al. (2014)). The second is a binary white noise checkerboard stimulus. These stimuli were presented randomly interleaved with the natural movie, for 30 trials each. The similarity index analysis revealed that even with uniform stimuli included, the average similarity index is correlated to the efficacy. We show this data now in Figure 3.
  
  Fourth. It is important to control the proportion of inhibitory cells activated optogenetically across the recording probe. Currently, it is not possible to assess if there are false negatives. One way of controlling for this would be to show that the number of inhibitory interneurons doesn't vary across the probe.
  
  We thank the reviewer for highlighting this important aspect of the experiment and analysis. We are aware of this point and therefore took extra care to minimize the biases that could be introduced by our recording and stimulation method. Our approach to include recorded excitatory and inhibitory neurons was conservative, briefly:
  
  We included only excitatory and inhibitory neurons that were within the SC, defined by visually driven activity and continuous retinotopy (see method).
  
  We further restricted the included neurons to neurons that were located within the boundaries of the LED evoked responses, i.e. the recording channels with optogenetic evoked MUA responses within the SC (Figure 1 – figure supplement 1).
  
  Both excitatory and inhibitory SC neurons were selected in this way.
  
  These inclusion criteria were specifically designed to avoid sampling excitatory neurons from regions on the Neuropixels probe that lacked optogenetically evoked responses and thus to minimize the number of falsely labeled excitatory neurons.
  
  To illustrate these inclusion criteria and the resulting spatial distribution of the selected excitatory and inhibitory SC neurons along the 384 channels of the Neuropixels probe, we now added a supplementary figure (Figure 1 – figure supplement 1). This figure shows the multi- unit activity in response to optogenetic stimulation and the distribution of inhibitory and excitatory single units within the range of channels that are activated via LED stimulation for 3/11 selected experiments. This highlights that we employed stringent criteria for determining the boundaries and selecting which neurons to include in our study. The distribution of excitatory and inhibitory SC neurons is not significantly different for 9/11 experiments (Wilcoxon rank-sum test, p values = 0.307, 0.0115, 0.755, 0.834, 5.0110-6, 0.79, 0.80, 0.26, 0.33, 0.08, 0.13). Moreover, in the two significantly different experiments only 2 RGC-SC EXC pairs were located in the region without identified SC INs, and thus will not affect the results. We now address this point in the methods section (line 859).
  
  Fifth. In Fig. 4, the ISI had a minimal bound of 5 ms. Why? This would cap the firing rate at 200Hz, but we know that RGC in explants can fire at higher frequencies for evoked responses. I would set a lower bound since it should come naturally from the after-depolarization block.
  
  The chosen 5 ms minimal bound was in the range used in previous literature, e.g. 4-30 ms in Usrey et al. 1998 (Usrey et al., 1998). To address the question of the reviewer, we re-analyzed the data with a lower bound of 2 ms (2 – 30 ms) to include RGCs that fire at higher frequencies than 200Hz. However, we did not observe a clear difference between the 2-30 and 5-30 ms groups for inhibitory connections (SC IN: p = 0.604). Only the excitatory connections show a statistically significant difference (p = 0.011), however, the effect size is small (Cohen’s d = EXC = 0.063, INH = 0.030). Nonetheless we updated a panel in figure 4 to represent the 2-30 ms group (Figure 4F).
  
  Another aspect that remains unclear is to what extent the paired-spike ratio depends on the baseline firing rate. This would change the interpretation from the particular synaptic connection to the intrinsic properties of the cell and is plausible since the bassline firing rate varies tremendously.
  
  To address how the paired-spike ratio depends on the baseline firing rate we plotted the change of PSR depending on ISI as suggested by the reviewer.
  
  One related analysis would be to plot the change of PSR depending on the ISI. It would be intuitive to make a scatter plot for all paired spikes of all recorded neurons (separated into inhibitory and excitatory) of ISI vs. PSR.
  
  We appreciate the valuable suggestion from the reviewer. We have now separated the ISIs into distinct groups spanning 5 ms intervals represented in Author response image 3, right. These intervals range from 5-10 ms up to 25-30 ms. Notably, we observe a difference between the excitatory and inhibitory populations. The excitatory population exhibits a monotonic decrease in mean PSR across the intervals, while the inhibitory population shows a peak around 10/15 ms.
  
  Author response image 3.
  
  Change of mean paired-spike ratio (PSR) depending on ISI. Left) Comparison of PSR between two groups of different ISIs. The 2-30 ms group ensures to include high-firing RGCs (excitatory pairs 2-30 vs 5-30 ms p = 0.011; inhibitory pairs 2-30 vs 5-30 ms p = 0.604, Wilcoxon signed-rank). Right) PSR for groups of different ISI intervals. Mean PSR ± SEM for excitatory groups: 2.0±0.09, 1.75±0.09, 1.51±0.05, 1.31±0.05, 1.2±0.05; inhibitory groups: 1.35±0.06, 1.51±0.09, 1.5±0.1,1.22±0.06, 1.21±0.07. p E vs I (within group): 1.5510-5, 9.55±10-2, 4.21±10-1, 3.74±10-1, 6.22 ±10-1, Wilcoxon rank-sum test.
  
  Panel 4E is confusing to me. Here what is plotted is efficacy 1st against PSR (which is efficacy 2nd/efficacy 1st). Given that you have a linear relation between efficacy 1st and efficacy 2nd (panel 4C), you are essentially re-plotting the same information, which should necessarily have a hyperbolic relationship: [ f(x) = y/x ]. Thus, fitting this with a linear function makes no sense and it has to be decaying if efficacy 2nd > efficacy1st as shown in 4C.
  
  We thank the reviewer for raising this question which helped us to improve the representation and disruption of the results shown in figure 4. Panel 4E is intended to investigate whether there is a correlation between the efficacy strength (eff 1st) and the amount of facilitation (PSR). From panel 4C it is already evident that the data points for high efficacies lie closer to the unity line, as compared to the data points for low efficacies. This suggests that the PSR is stronger for connections with smaller efficacies 1st. To quantify this relationship, we have plotted the efficacy 1st vs the PSR in panel 4E, which thus adds new information to the figure. Importantly, this panel is shown in log-log scales, and therefore the decaying relationship is not evident. If we had shown the data on linear-linear scale, the decaying function would have been evident (Author response image 4). And indeed, as the reviewer pointed out, we cannot fit a hyperbolic relationship with a linear function. This is exactly the reason why we show the data in log-log scale and also estimate the Pearson correlation also from the logs of the efficacies and PSRs.
  
  In Author response image 4 we show the relationship plotted on linear scale using an approach to fit the hyperbolic relationship employing a hyperbolic cosecant function 𝑎/𝑠𝑖𝑛ℎ(𝑏 ∗ 𝑥) + 𝑐.
  
  Author response image 4.
  
  Relationship between efficacy to 1st RGC and PSR visualized on linear scale using a hyperbolic fitting approach 𝑎/𝑠𝑖𝑛ℎ(𝑏 ∗ 𝑥) + 𝑐.
  
  Finally, in Figure 5, the perspective is inverted, and the spike correlations are seen from the perspective of SC neurons. Here it would also be good to plot the cumulative histograms and not look at the averages.
  
  We added the cumulative histogram in Figure 5 (panel B), in addition to represent the raw data points and the mean.
  
  Regarding the similarity index and use of natural stats, please see my previous comments. Also, would it be possible to plot the contribution v/s the firing rate with the baseline firing rate with no stimulation or full-field stimulation? This is important since naturalistic movies have too many correlations and dependencies that make this plot difficult to interpret.
  
  We now show the contribution vs firing rates for different stimulus conditions in a new figure supplement (Figure 5- figure supplement 1). We added the correlations to the different stimuli for baseline firing rate with no stimulation (gray background), full-field stimulation (checkerboard) and phase scrambled natural movie.
  
  Overall, the paper only speaks from excitatory and inhibitory differences in the introduction and results. However, it is known that there are three clear morphologically distinct classes of excitatory neurons (wide-field, narrow-field, and stellate). This topic is touched in the discussion but not directly in the context of these results. Smaller cells might likely be driven much stronger. Wide-field cells would likely not be driven by one RGC input only and will probably integrate from many more cells than 6.
  
  We thank the reviewer for this comment. We agree with the reviewer that addressing how the different excitatory and inhibitory cell-types integrate RGC input is important to understand the visual processing mechanisms in the SC. The presented study aimed at comparing the excitatory and inhibitory population in general using the VGAT-ChR2 mouse line. Understanding how specific genetically defined cell-types integrate RGC inputs is clearly very interesting and should be done. Unfortunately, the mouse lines that would allow targeting genetically identified inhibitory cell-types are still limited and therefore we can only use functional measurements to assess different types of neurons in the SC. We now address this point about distinct SC cell-types in the discussion (line 643).
  
  One possible functional measurement is the size of the receptive field, which, to some degree, could be used as a proxy for different morphologies, i.e. small receptive fields could hint towards compact morphology while large receptive fields could indicate a wider morphology. It is known for example that narrow-field and stellate cells have small RF sizes, while wide-field cells have large RFs. We studied the relationship between the RF size and spike waveform duration but did not find a significant correlation (Figure R6). Moreover, the spike waveform duration, as discussed in the manuscript, is not a valid criterion to separate EXNs and INs in the SC, as it is common practice in the cortex. We now also looked into whether the connectivity strength is related to the RF size. Interestingly, while in the current dataset we do not find a significant correlation between the efficacy and the receptive field size for both EXN and IN (Author response image 5, left), we do find a significant negative correlation between contribution and receptive field size for the excitatory neurons (Author response image5, right). This result indicates that SC excitatory neurons with small receptive fields are more strongly coupled to the RGC input as compared to neurons with larger receptive fields.
  
  Author response image 5.
  
  Relationship between RF size and connectivity measures (efficacy and contribution) for RGC-SC EXN and RGC-SC IN pairs (two-sided Wilcoxon rank-sum test).
  
  Reviewer #2 (Public Review):
  
  This study follows up on a previous study by the group (Sibille et al Nature Communications 2022) in which high density Neuropixel probes were inserted tangentially through the superficial layers of the superior colliculus (SC) to record the activity of retinocollicular axons and postsynaptic collicular neurons in anesthetized mice. By correlating spike patterns, connected pairs could be identified which allowed the authors to demonstrate that functionally similar retinal axon-SC neuron pairs were strongly connected.
  
  In the current study, the authors use similar techniques in vGAT-ChR2 mice and add a fiber optic to identify light-activated GABAergic and non-light-activated nonGABAergic neurons. Using their previously verified techniques to identify connected pairs, within regions of optogenetic activation they identified 214 connected pairs of retinal axons and nonGABAergic neurons and 91 pairs of connected retinal axons and GABAergic neurons. The main conclusion is that retinal activity contributed more to the activity of postsynaptic nonGABAergic SC neurons than to the activity of postsynaptic GABAergic SC neurons.
  
  The study is very well done. The figures are well laid out and clearly establish the conclusions. My main comments are related to the comparison to other circuits and further questions that might be addressed in the SC.
  
  It is stated several times that the superior colliculus and the visual cortex are the two major brain areas for visual processing and these areas are compared throughout the manuscript. However, since both the dorsal lateral geniculate nucleus (dLGN) and SC include similar synaptic motifs, including triadic arrangements of retinal boutons with GABAergic and nonGABAergic neurons, it might be more relevant to compare and contrast retinal convergence and other features in these structures.
  
  Thank you for pointing out that crucial point. Indeed, the comparison to the thalamus is a valid argument, as both the SC and LGN are primary targets of RGC axon terminals. During the preparation of the manuscript, we extensively discussed whether to compare our new SC dataset with existing literature on the LGN or the primary visual cortex (V1) is the more appropriate. Ultimately, we decided on using the visual cortex as the main comparison because of the following reasons:
  
  The SC is widely recognized as an evolutionary conserved circuit for visual computation and visually guided behaviors, while the dLGN is generally regarded as a relay station for RGC information to the visual cortex (Steriade, McCormick, 1997). Thus, we believe it is more relevant to compare the evolutionary older visual circuit (SC) to the evolutionary newer visual circuit (visual cortex).
  
  In the mouse, the dLGN contains only a limited number of inhibitory interneurons and represent only approximately 6% of the total dLGN neuronal population (Butler, 2008; Evangelio et al., 2018). It has been suggested that the rodent somatosensory thalamus even lacks interneurons (Arcelli et al., 1997). Consequently, directly comparing inhibitory interneurons in the SC to those in the dLGN would pose challenges.
  
  Along the same line, the density and also the diversity of inhibitory neurons in the SC is high and likely more comparable to the density and diversity of inhibitory neurons in the visual cortex, than to the dLGN circuit. In the dLGN, TC projection neurons far outnumber inhibitory neurons (Arcelli et al., 1997; Evangelio et al., 2018) and the dLGN is inhabited by just 1-2 classes of GABAergic retinorecipient interneurons (Arcelli et al., 1997; Jaubert-Miazza et al., 2005; Krahe et al., 2011; Ling et al., 2012). Classification approaches (e.g. 3D reconstruction) so far have not revealed any subclasses except for distinctions in intrinsic membrane properties (Leist et al., 2016), suggesting low interneuron diversity in the dLGN. This is in contrast to the vLGN, where a recent study found a diversity of GABAergic neurons (Sabbagh et al., 2021).
  
  In the thalamo-cortical circuit, there exists a notable difference in how cortical excitatory and cortical inhibitory neurons are driven by their thalamic input (Alonso and Swadlow, 2005; Cruikshank et al., 2007). This discrepancy forms the basis for several models of visual processing in the visual cortex (Kremkow et al., 2016; Taylor et al., 2021). Which is why we wanted to assess whether the SC follows similar or different rules.
  
  That said, the reviewer is correct that the dLGN and the SC share certain wiring motifs, such as the triadic arrangements of retinal boutons. Unfortunately, the VGAT-ChR2 mouse line used in our study does not specifically label SC inhibitory neurons that are involved in the formation of triadic arrangements. Therefore, we are unable to draw specific conclusion regarding this point. To further investigate this aspect, the usage of GAD67 mice, which have been shown to selectively label intrinsic interneurons which receive RGC input and contact non-GABAergic dendrites (Whyland et al., 2020), would be necessary. Nonetheless, we acknowledge the question raised by the reviewer and in response, we have now provided a more in-depth comparison to the dLGN in the discussion section of the revised manuscript (line 565).
  
  The GABAergic and nonGABAergic neurons showed a wide range of firing rates. It might be interesting to sort the cells by firing rates to see if they exhibit different properties. For example, since the SC contains both GABAergic interneurons and projection neurons it would be interesting to examine whether GABAergic neurons with higher firing rates exhibit narrower spikes, similar to cortical fast spiking interneurons. Similarly, it might be of interest to sort the neurons by their receptive field sizes since this is associated with different SC neuron types.
  
  We thank the reviewer for the interesting suggestions of SC neurons classification into different categories. The relationship between connectivity measures and RF size has been addressed in Author response image 5. We have now studied the relationship of spike waveforms and several measures such as firing rate and RF size in more detail (Author response image 6).
  
  As the baseline firing is generally low in SC and our experiments are performed under anesthetized conditions, we used the evoked firing rates to sort the cells by firing rates or RF sizes. We have added an analysis showing the mean firing rate (calculated over the full recording duration) as a function of the spike width (peak-to-trough duration). We observe no significant relationship between the different groups of cell types. The same accounts if we sort the SC neurons by their RF size. RF sizes were calculated from PSTHs and summed RF for SL and SD. We do not see a relationship between neuron type and firing or RF size.
  
  Author response image 6.
  
  Mean firing rate (left) and RF size (right) as a function of peak-to-trough (PT) duration for excitatory and inhibitory SC neurons. Both measures are not correlated to the PT duration (Pearson correlation coefficient, two-sided Wilcoxon rank-sum test).
  
  The recording techniques allowed for the identification of the distance between connected retinocollicular fibers and postsynaptic neurons. It might also be interesting to compare the properties of connected pairs recorded at dorsal versus ventral locations since neurons with different genetic identities and response properties are located in different dorsal/ventral locations (e.g. Liu et al. Neuron 2023). Also, regarding the strength of connections, previous electron microscopy studies have shown that the retinocollicular terminals differ in density and size in the dorsal/ventral dimension (e.g Carter et al JCN 1991).
  
  We thank the reviewer for raising this interesting and relevant point to compare the properties of the connected pairs across the dorsal and ventral location. Unfortunately, our tangential recording approach is not ideally suited for comparing the properties of neurons across the different SC depths. For comparing dorsal versus ventral located neurons in the SC, as done in Liu et al., Neuron 2023, vertical recordings would be more appropriate. We now provide a discussion on this aspect (line 589).
  
  Was optogenetic activation of GABAergic neurons ever paired with visual activation? It would be interesting to examine the receptive fields of the nonGABAergic neurons before and after activation of the GABAergic neurons (as in Gale and Murphy J Neurosci 2016).
  
  This is an important point and indeed we have paired activation of GABAergic neurons with visual stimulation (checkerboard stimulus) to assess the impact of the GABAergic neurons on the firing of the excitatory neurons. We observed a diversity of effects, with some EXNs being strongly suppressed and others being only weakly suppressed. Thus, we predict that the receptive field of those EXN that are suppressed by optogenetically evoked IN firing, should be affected in some way. However, the checkerboard stimulus was only presented for a short duration (1 s) and for only a few trials (n = 30). Therefore, estimating the receptive fields of EXN before and after optogenetic activation of GABAergic neurons is unfortunately not possible with the existing dataset. We now mention this point in the discussion (line 668).
  
  Reviewer #3 (Public Review):
  
  This study performs in vivo recordings of neurons in the mouse superior colliculus and their afferents from the retina, retinal ganglion cells (RGCs). Building on a preparation they previously published, this study adds the use of optogenetic identification of inhibitory neurons (aka optotagging) to compare RGC connectivity to excitatory and inhibitory neurons in SC. Using this approach, the authors characterize connection probability, strength, and response correlation between RGCs and their target neurons in SC, finding several differences from what is observed in the retina-thalamus-visual cortex pathway. As such, this may be a useful dataset for efforts to understand retinocollicular connectivity and computations.
  
  Recommendations:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Some minor points.
  
  Fig.1G shows a difference in mean firing rates between inhibitory and excitatory cells. Please plot the cumulative distribution of firing rates to be able to scrutinize the data better.
  
  We have addressed this issue and updated panel G in Figure 1.
  
  Fig. 2C. The black background color of this plot is black; it is not possible to decipher much, please change it to white
  
  We have now changed panel C in Figure 2 to a white background.
  
  Fig. 4D would be better represented as a histogram since most points overlap.
  
  We now represent panel D in Figure 4 as a histogram.
  
  Citations. I would cite some of the foundational work, in some instances, e.g., in the first sentence (SC receives input from the retina)
  
  We have now addressed this issue and cited more foundational studies (e.g. line 68)
  
  The discussion is a bit long; the last paragraph can be removed, mainly because the previous section conflates superficial SC with the entire SC, which is confusing (e.g., Ayupe et al.). In this way, there is more space to discuss the direct implication of the study within the context of known cell types.
  
  We now shortened the discussion and provide more background about different SC cell types in the discussion (line 643).
  
  Reviewer #2 (Recommendations For The Authors):
  
  Minor correction: Whyland et al 2020 did not identify V1 input to horizontal cells. A more appropriate reference is Zingg et al Neuron 2017.
  
  We thank the reviewer for this important point and have now corrected the citation in line 613 in the discussion to Zingg et al 2017.
  
  Reviewer #3 (Recommendations For The Authors):
  
  Regarding the degree of convergence from RGC to SC, the Crair lab (Furman 2013) performed a quantal analysis in slice that is worth citing.
  
  We included this citation in the revised version of the manuscript (line 501).
  
  I have lost track at this point, but many labs (Heimel, Meister, Farrow, Cang, Isa, maybe others?) have observed that neighboring SC neurons have similar tuning for direction/orientation, but the circuit mechanisms are not well understood. Given the relatively weak correlation between response tuning of RGC axons and their SC target neurons, a useful comparison might be that of SC neurons and their neighbors, and whether SC neurons that show weaker correlation to their RGC axons show stronger correlations with their SC neighbors, which could implicate local connectivity within SC.
  
  We thank the reviewer for providing this interesting comment. With our recording approach we could study locally connected SC neurons. However, the focus of our study was to first characterize the retinocolliculuar connectivity and therefore investigating the intracollicular connectivity is beyond the scope of the current study. We thank the reviewer for the valuable suggestion and will consider to tackle this aspect in a separate study in the future.
  
  Is it possible any of these measurements are biased by laminar targeting of their probe within superficial SC? Their schematic seems to suggest they targeted the deeper part of superficial SC. Do they know whether they recorded throughout superficial SC or targeted the deeper layers closer to stratum opticum?
  
  Our recordings are in between the deeper and upper visual SC layer depending on the recording site on the Neuropixels probe as we use an angled insertion approach. Besides DiI staining (Author response image 7), we can estimate the location of the probe using functional measurements, i.e. visually driven channels and retinotopic locations of the recording sites. If the Neuropixels probe is inserted too superficial, the number of recording site with visually driven activity is low. If the Neuropixels probe is inserted too deep in the visual layers we see two separated regions on the probe with visually driven activity in which the retinotopy is non-continues (please refer to Figure 2 in (Sibille et al., 2022)). In the recordings included in this study, the number of visually driven channels was generally high and the retinotopy continues, suggesting that we covered a region within the deeper and upper visual layers.
  
  Author response image 7.
  
  Functional estimation of probe location. DiI staining of Neuropixels probe (middle) and multi-unit activity across channels in response to visual stimulation (bottom). The white dashed lines in the middle and bottom panels mark the rough boundaries of the visual SC layers.
  
  In Fig. 4, the authors argue that firing in inhibitory neurons is less correlated with RGC input. Does their metric for contribution of retinal input control for the fact that inhibitory neurons have higher firing rates overall and, e.g., may be more depolarized at rest and likelier to fire spontaneous spikes but no less likely to be driven by retina? Or is the argument that their visual responses are more likely to be driven by V1 or local connections?
  
  We thank the reviewer for bringing up that point. The contribution measure estimates the fraction of SC spikes that were preceded by an RGC spike and it is thus, in theory, independent of the firing rate of the SC neuron. In practice, however, we agree that high firing SC neurons may be more likely to have a lower contribution value simply because a larger fraction of their spikes is not preceded by the activity of the presynaptic RGC. But this is exactly what we aimed at characterizing with this analysis. Where these non-RGC driven SC spikes originate from, whether from a more depolarized state of the neuron or by other sources such as V1 or local connections, we can only speculate about. That said, please note that despite SC INs having higher firing rates, not all of them show low contribution. Likewise, we also see SC neurons with low firing rates and low contribution values (new Supp Fig. 3).
  
  Minor point: The optotagging in the example cell doesn't cause the cell to fire for ~50 ms? That is odd. Typically, cells classified as optotagged fire within 5-10 ms of light onset. Is that a strange example cell or is there something different about the optotagging approach?
  
  Unfortunately, transient LED light onsets and offsets can induce light artifacts on Neuropixels probes (Jun et al., 2017; Steinmetz et al., 2021) and therefore it is challenging to use brief LED pulses for optotagging with Neuropixels probes. To avoid this overlap of artefacts and LED evoked spikes, we opted for a longer stimulus duration of 100 ms to activate VGAT neurons (Bennett et al., 2019; Siegle et al., 2019). Moreover, instead of a square pulse, we used a slow ramping for light onsets and offsets to minimize the magnitude of induced artifacts. In Author response image 8 we present examples of individual activated VGAT neurons responding to a 100 ms blue light pulse.
  
  Author response image 8.
  
  Optotagging approach. Example traces of a single stimulation pulse and protocol used for optogenetic stimulation. Evoked activity in response to LED stimulation (100ms, 100 trials) for six example SC IN neurons.
  
  References
  
  Alonso J-M, Swadlow HA. 2005. Thalamocortical specificity and the synthesis of sensory cortical receptive fields. J Neurophysiol 94:26–32. doi:10.1152/jn.01281.2004
  
  Arcelli P, Frassoni C, Regondi MC, De Biasi S, Spreafico R. 1997. GABAergic neurons in mammalian thalamus: a marker of thalamic complexity? Brain Res Bull 42:27–37. doi:10.1016/s0361- 9230(96)00107-4
  
  Bennett C, Gale SD, Garrett ME, Newton ML, Callaway EM, Murphy GJ, Olsen SR. 2019. Higher-Order Thalamic Circuits Channel Parallel Streams of Visual Information in Mice. Neuron 102:477- 492.e5. doi:10.1016/j.neuron.2019.02.010
  
  Butler AB. 2008. Evolution of the thalamus: a morphological and functional review. Thalamus & Related Systems 4:35–58. doi:10.1017/S1472928808000356
  
  Cruikshank SJ, Lewis TJ, Connors BW. 2007. Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat Neurosci 10:462–468. doi:10.1038/nn1861
  
  Evangelio M, García-Amado M, Clascá F. 2018. Thalamocortical Projection Neuron and Interneuron Numbers in the Visual Thalamic Nuclei of the Adult C57BL/6 Mouse. Frontiers in Neuroanatomy 12.
  
  Froudarakis E, Berens P, Ecker AS, Cotton RJ, Sinz FH, Yatsenko D, Saggau P, Bethge M, Tolias AS. 2014. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nat Neurosci 17:851–857. doi:10.1038/nn.3707
  
  Jaubert-Miazza L, Green E, Lo F-S, Bui K, Mills J, Guido W. 2005. Structural and functional composition of the developing retinogeniculate pathway in the mouse. Vis Neurosci 22:661–676. doi:10.1017/S0952523805225154
  
  Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M, Barbarits B, Lee AK, Anastassiou CA, Andrei A, Aydın Ç, Barbic M, Blanche TJ, Bonin V, Couto J, Dutta B, Gratiy SL, Gutnisky DA, Häusser M, Karsh B, Ledochowitsch P, Lopez CM, Mitelut C, Musa S, Okun M, Pachitariu M, Putzeys J, Rich PD, Rossant C, Sun W, Svoboda K, Carandini M, Harris KD, Koch C, O’Keefe J, Harris TD. 2017. Fully integrated silicon probes for high-density recording of neural activity. Nature 551:232–236. doi:10.1038/nature24636
  
  Krahe TE, El-Danaf RN, Dilger EK, Henderson SC, Guido W. 2011. Morphologically Distinct Classes of Relay Cells Exhibit Regional Preferences in the Dorsal Lateral Geniculate Nucleus of the Mouse. J Neurosci 31:17437–17448. doi:10.1523/JNEUROSCI.4370-11.2011
  
  Kremkow J, Perrinet LU, Monier C, Alonso J-M, Aertsen A, Frégnac Y, Masson GS. 2016. Push-Pull Receptive Field Organization and Synaptic Depression: Mechanisms for Reliably Encoding Naturalistic Stimuli in V1. Frontiers in Neural Circuits 10.
  
  Leist M, Datunashvilli M, Kanyshkova T, Zobeiri M, Aissaoui A, Cerina M, Romanelli MN, Pape H-C, Budde T. 2016. Two types of interneurons in the mouse lateral geniculate nucleus are characterized by different h-current density. Sci Rep 6:24904. doi:10.1038/srep24904
  
  Ling C, Hendrickson ML, Kalil RE. 2012. Morphology, Classification, and Distribution of the Projection Neurons in the Dorsal Lateral Geniculate Nucleus of the Rat. PLOS ONE 7:e49161. doi:10.1371/journal.pone.0049161
  
  Sabbagh U, Govindaiah G, Somaiya RD, Ha RV, Wei JC, Guido W, Fox MA. 2021. Diverse GABAergic neurons organize into subtype-specific sublaminae in the ventral lateral geniculate nucleus. J Neurochem 159:479–497. doi:10.1111/jnc.15101
  
  Sibille J, Gehr C, Teh KL, Kremkow J. 2022. Tangential high-density electrode insertions allow to simultaneously measure neuronal activity across an extended region of the visual field in mouse superior colliculus. J Neurosci Methods 376:109622. doi:10.1016/j.jneumeth.2022.109622
  
  Siegle JH, Jia X, Durand S, Gale S, Bennett C, Graddis N, Heller G, Ramirez TK, Choi H, Luviano JA, Groblewski PA, Ahmed R, Arkhipov A, Bernard A, Billeh YN, Brown D, Buice MA, Cain N, Caldejon S, Casal L, Cho A, Chvilicek M, Cox TC, Dai K, Denman DJ, de Vries SEJ, Dietzman R, Esposito L, Farrell C, Feng D, Galbraith J, Garrett M, Gelfand EC, Hancock N, Harris JA, Howard R, Hu B, Hytnen R, Iyer R, Jessett E, Johnson K, Kato I, Kiggins J, Lambert S, Lecoq J, Ledochowitsch P, Lee JH, Leon A, Li Y, Liang E, Long F, Mace K, Melchior J, Millman D, Mollenkopf T, Nayan C, Ng L, Ngo K, Nguyen T, Nicovich PR, North K, Ocker GK, Ollerenshaw D, Oliver M, Pachitariu M, Perkins J, Reding M, Reid D, Robertson M, Ronellenfitch K, Seid S, Slaughterbeck C, Stoecklin M, Sullivan D, Sutton B, Swapp J, Thompson C, Turner K, Wakeman W, Whitesell JD, Williams D, Williford A, Young R, Zeng H, Naylor S, Phillips JW, Reid RC, Mihalas S, Olsen SR, Koch C. 2019. A survey of spiking activity reveals a functional hierarchy of mouse corticothalamic visual areas (preprint). Neuroscience. doi:10.1101/805010
  
  Steinmetz NA, Aydin C, Lebedeva A, Okun M, Pachitariu M, Bauza M, Beau M, Bhagat J, Böhm C, Broux M, Chen S, Colonell J, Gardner RJ, Karsh B, Kloosterman F, Kostadinov D, Mora-Lopez C, O’Callaghan J, Park J, Putzeys J, Sauerbrei B, van Daal RJJ, Vollan AZ, Wang S, Welkenhuysen M, Ye Z, Dudman JT, Dutta B, Hantman AW, Harris KD, Lee AK, Moser EI, O’Keefe J, Renart A, Svoboda K, Häusser M, Haesler S, Carandini M, Harris TD. 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372:eabf4588. doi:10.1126/science.abf4588
  
  Taylor MM, Contreras D, Destexhe A, Frégnac Y, Antolik J. 2021. An Anatomically Constrained Model of V1 Simple Cells Predicts the Coexistence of Push–Pull and Broad Inhibition. J Neurosci 41:7797–7812. doi:10.1523/JNEUROSCI.0928-20.2021
  
  Usrey WM, Reppas JB, Reid RC. 1999. Specificity and Strength of Retinogeniculate Connections. Journal of Neurophysiology 82:3527–3540. doi:10.1152/jn.1999.82.6.3527
  
  Usrey WM, Reppas JB, Reid RC. 1998. Paired-spike interactions and synaptic efficacy of retinal inputs to the thalamus. Nature 395:384–387. doi:10.1038/26487
  
  Whyland KL, Slusarczyk AS, Bickford ME. 2020. GABAergic cell types in the superficial layers of the mouse superior colliculus. J Comp Neurol 528:308–320. doi:10.1002/cne.24754
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.07.536092v2
www.biorxiv.org www.biorxiv.org

Increased listening effort and cochlear neural degeneration underlie behavioral deficits in speech perception in noise in normal hearing middle-aged adults

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.
  
  The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.
  
  The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle-aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.
  
  The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.
  
  I only have two questions/concerns about the specific methodologies used:
  
  (1) Synapse counts were made only at the 3 kHz place on the cochlea. However, the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?
  
  Thank you for raising this important point. The frequency regions that contribute to the generation of EFRs, especially at the suprathreshold sound levels presented here are expected to be broad, with a greater leaning towards higher frequencies and reaching up to one octave above the center frequency. We have investigated this phenomenon in earlier published articles using both low/high pass masking noise and computational models using data from rodent models and humans (Encina-Llamas et al. 2017; Parthasarathy, Lai, and Bartlett 2016). So, the expectation here is that the EFRs reflect a wider frequency region centered at 3 kHz. The difference in cochlear activation regions between humans and gerbils for EFRs have not been systematically studied to our knowledge but given the general agreement between humans and other rodent models stated above, we expect this to be similar to gerbils as well. Additionally, all current evidence points to cochlear synapse loss with age being flat across frequencies, in contrast to cochlear synapse loss with noise which is dependent on the bandwidth of the noise exposure.
  
  Histological evidence for this flat loss across frequencies is found in mice and human temporal bones (Parthasarathy and Kujawa 2018; Sergeyenko et al. 2013; Wu et al. 2018). We find this to be true in our gerbils as well. Author response image 1 shows the patterns of synapse loss as a function of cochlear place. We focused on synapse loss at 3 kHz to keep the analysis focused on the center frequency of the stimulus and minimize compounding errors due to averaging synapse counts across multiple frequency regions. We have now added some explanatory language in the discussion.
  
  Author response image 1.
  
  Cochlear synapse counts per inner hair cell (IHC) in young and middle-aged gerbils as a function of cochlear frequency.
  
  (2) Unless I misunderstood, the predictive power of the final model was not tested on heldout data. The standard way to fit and test such a model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only split was for training and hyperparameter optimization.
  
  The goal of the analysis in this current manuscript was inference, rather than prediction, i.e., to find the important/significant variables that contribute to speech intelligibility in noise, rather than predicting the behavioral deficit of speech performance in a yet-unforeseen sample of adults.
  
  Additionally, we used a repeated 10-fold cross-validation approach for our model building exercise as detailed in the Elastic Net Regression section of the methods. This repeated-cross validation calculated the mean square error on a held-out fold and average it repeatedly to reduce the inherent variability of randomly choosing a validation set. The repeated 10-fold CV approach is both more stable and efficient compared to a validation set approach, or splitting the data into two segments: training and test, and provides a better estimate of the test error by utilizing more observations for training (vide Chapter 5,(James et al. 2021). These predictive MSEs along with the R-squared for the final model give us a good idea of the predictive performance, as, for the linear model the R-squared is the correlation between the observed and the predicted response. Future studies with a larger sample size can facilitate having a designated test set and still have enough statistical power to perform predictive analyses.
  
  While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixing these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us? I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.
  
  Studies with CND have so far been largely inferential in humans, since currently we cannot confirm CND in vivo. Hence any measures of putative CND in humans can only be interpreted based on evidence from other animal studies. Our translational approach is partly meant to address this deficit, as mentioned in the Introduction section. By using identical stimuli, recording, acquisition and analysis parameters we hope to reduce some of the variability that may be associated with this inference between human and other animal models. Until direct measurements of CND in humans are possible, the intended goal is to provide diagnostic biomarkers that have face validity – i.e., that explain variance related to speech intelligibility deficits in this population.
  
  We’ve added more to the discussion to state that our work demonstrates the need for next generation diagnostic measures of auditory processing that incorporate cognitive factors associated with listening effort to better capture speech in noise perceptual abilities.
  
  That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.
  
  We agree that pre-determining sample sizes is the optimal approach towards designing a study. The sample sizes here were chosen a priori based on previously published data in young adults with normal hearing thresholds (McHaney et al. 2024; Parthasarathy et al. 2020). With the lack of published literature especially for the EFRs at 1024Hz AM in middle aged adults, there are practical challenges in pre-determining the sample size (given a prefixed power and an effect size) with limited precursors to supply good estimates of the parameters (e.g., mean, s.d. for each age group for a two-sample test). We hope that this data set now shared will enable us and other researchers to conduct power analyses for successive studies that use similar metrics on this population.
  
  Several authors, including Heinsburg and Weeks (2022) argue that post-hoc power could be “misleading and simply not informative” and encourage using other indicators of poorly powered studies such as the width of the confidence interval. Since the elastic net estimate is a non-linear and non-differentiable function of the response values—even for fixed tuning parameters—it is difficult to obtain an accurate estimate of its standard error (Tibshirani and Taylor 2012). While acknowledging the limitations of post-hoc power analyses, we performed a retrospective power calculation for our linear model with the predictors that we selected (EFR @ 1024Hz, Pupil slope for QuickSIN at selected SNRs and analyses windows, and PTA). The calculated Cohen’s effect size was 0.56, which is considered large (Cohen 2013). With this effect size, a power analysis with our sample size revealed a very high retrospective power of 0.99 with a significance level of 0.05. The minimum number of subjects needed to get 80% power with this effect size was N = 21. Hence for the final model, we are confident that our results hold true with adequate statistical power.
  
  So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.
  
  Thank you for your comments. We hope that this study establishes a framework for the eventual development of the next generation of objective diagnostics tests in the hearing clinic that provide insights into the underlying neurophysiology of the auditory pathway and take into effect top-down contributors such as listening effort.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper addresses the bottom-up and top-down causes of hearing difficulties in middleaged adults with clinically-normal audiograms using a cross-species approach (humans vs. gerbils, each with two age groups) mixing behavioral tests and electrophysiology. The study is not only a follow-up of Parthasarathy et al (eLife 2020), since there are several important differences.
  
  Parthasarathy et al. (2020) only considered a group of young normal-hearing individuals with normal audiograms yet with high complaints of hearing in noisy situations. Here, this issue is considered specifically regarding aging, using a between-subject design comparing young NH and older NH individuals recruited from the general population, without additional criterion (i.e. no specifically high problems of hearing in noise). In addition, this is a cross-species approach, with the same physiological EFR measurements with the same stimuli deployed on gerbils.
  
  This article is of very high quality. It is extremely clear, and the results show clearly a decrease of neural phase-locking to high modulation frequencies in both middle-aged humans and gerbils, compared to younger groups/cohorts. In addition, pupillometry measurements conducted during the QuickSIN task suggest increased listening efforts in middle-aged participants, and a statistical model including both EFRs and pupillometry features suggests that both factors contribute to reduced speech-in-noise intelligibility evidenced in middle-aged individuals, beyond their slight differences in audiometric thresholds (although they were clinically normal in both groups).
  
  These provide strong support to the view that normal aging in humans leads to auditory nerve synaptic loss (cochlear neural degeneration - CNR- or, put differently, cochlear synaptopathy) as well as increased listening effort, before any clearly visible audiometric deficits as defined in current clinical standards. This result is very important for the community since we are still missing direct evidence that cochlear synaptopathy might likely underlie a significant part of hearing difficulties in complex environments for listeners with normal thresholds, such as middle-aged and senior listeners. This paper shows that these difficulties can be reasonably well accounted for by this sensory disorder (CND), but also that listening effort, i.e. a top-down factor, further contributes to this problem. The methods are sound and well described and I would like to emphasize that they are presented concisely yet in a very precise manner so that they can be understood very easily - even for a reader who is not familiar with the employed techniques. I believe this study will be of interest to a broad readership.
  
  I have some comments and questions which I think would make the paper even stronger once addressed.
  
  Main comments:
  
  (1) Presentation of EFR analyses / Interpretation of EFR differences found in both gerbils and humans:
  
  a) Could the authors comment further on why they think they found a significant difference only at the highest mod. frequency of 1024 Hz in their study? Indeed, previous studies employing SAM or RAM tones very similar to the ones employed here were able to show age effects already at lower modulation freqs. of ~100H; e.g. there are clear age effects reported in human studies of Vasilikov et al. (2021) or Mepani et al. (2021), and also in animals (see Garrett et al. bioXiv: https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.p df).
  
  Previously published studies in animal models by us and others suggests that EFRs elicited to AM rates > 700Hz are most sensitive to confirmed CND (Parthasarathy and Kujawa 2018; Shaheen, Valero, and Liberman 2015). This is likely because these AM rates fall well outside of phase-locking limits in the auditory midbrain and cortex (Joris, Schreiner, and Rees 2004), and hence represent a ‘cleaner’ signal from the auditory periphery that may not be modulated by complex excitatory/inhibitory feedback circuits present more centrally (Caspary et al. 2008). We have also demonstrated that we are able to acquire high quality EFRs at 1024Hz AM rates both in a previously published study in young normal hearing adults (McHaney et al. 2024), and in middle aged adults in the present study as seen in Fig. 1 H-J. We posit that the lack of age-related differences at the lower AM rates may be indicative of compensatory plasticity with age (central ‘gain’) that occurs with age in more central regions of the auditory pathway (Auerbach, Radziwon, and Salvi 2019; Parthasarathy and Kujawa 2018). We now expand on this in the discussion. A secondary reason for the lack of change in slower modulation rates may be the difference in stimulus between sinusoidally amplitude modulated tones used here, and the rectangular amplitude modulated tones in other studies, as discussed in response to the comment below.
  
  Furthermore, some previous EEG experiments in humans that SAM tones with modulation freqs. of ~100Hz showed that EFRs do not exhibit a single peak, i.e. there are peaks not only at fm but also for the first harmonics (e.g. 2fm or 3fm) see e.g.Garrett et al. bioXiv https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pd f. Did the authors try to extract EFR strength by looking at the summed amplitude of multiple peaks (Vasilikov Hear Res. 2021), in particular for the lower modulation frequencies? (indeed, there will be no harmonics for the higher mod. freqs).
  
  We examined peak amplitudes for the AM rate and harmonics for the 110 Hz AM condition as shown in Author response image 2. The quantified amplitudes of the first four harmonics did not differ with age (ps > .08).
  
  Additionally, the harmonic structures obtained were also not as robust as would be expected with rectangular amplitude modulated stimuli. The choice of sinusoidal modulation may explain why. We have previously published studies systematically modulating the rise time of the envelope per cycle in amplitude modulated tones, where the individual period of the envelope is described by Env (t) = t<sup>x</sup> (1-t), where t goes from 0 to 1 in one period, and where x = 0.05 represents a highly damped envelope akin to the rising envelope f a rectangular modulation, and x = 1 representing a symmetric, near-sinusoidal envelope (Parthasarathy and Bartlett 2011). The harmonic structure was much more developed in the damped envelopes compared to the symmetric envelopes and response amplitudes were also higher for the damped envelopes overall, a result also observed in Mepani et. al., 2021. Hence, we believe the rapid rise time may contribute to the harmonic structures evidenced in studies using RAM stimuli, and the absence of this rapid onset may result in reduced harmonic structures in our EFRs. Some language regarding this issue is now added to the discussion.
  
  Author response image 2.
  
  Harmonics analysis for the first four harmonics of envelope following responses elicited to the 110Hz AM stimulus.
  
  b) How do the present EFR results relate to FFR results, where effects of age are already at low carrier freqs? (e.g. Märcher-Rørsted et al., Hear. Res., 2022 for pure tones with freq < 500 Hz). Do the authors think it could be explained by the fact that this is not the same cochlear region, and that synapses die earlier in higher compared to lower CFs? This should be discussed. Beyond the main group effect of age, there were no negative correlations of EFRs with age in the data?
  
  We believe the current results are in close agreement with these studies showing deficits in pure tone phase locking with age. These tones are typically at ~300-500Hz or above, and phase locking to these tones likely involves the same or similar peripheral neural generators in the auditory nerve and brainstem. Emerging evidence also seems to suggest that TFS coding measured using pure tone phase locking is closely related to sound with amplitude modulation in the same range (Ponsot et al. 2024). Unpublished observations from our lab support this view as well. In this data set, we begin to see EFR responses at 512 Hz diverge with age, but this difference does not reach statistical significance. This may be due to specific AM frequencies selected or a lack of statistical power. Using more continuous AM frequency sweeps such as with our recently published dynamic amplitude modulated tones (Parida et al. 2024) may help resolve these AM frequency specific challenges and help us investigate changes over a broader range of AM frequencies. Ongoing studies are currently exploring this hypothesis. Some explanatory language is now presented in the discussion.
  
  (2) Size of the effects / comparing age effects between two species:
  
  Although the size of the age effect on EFRs cannot be directly compared between humans and gerbils - the comparison remains qualitative - could the authors at least provide references regarding the rate of synaptic loss with aging in both humans and gerbils, so that we understand that the yNH/MA difference can be compared between the two age groups used for gerbils; it would have been critical in case of a non-significant age effect in one species.
  
  Current evidence seems to suggest that humans have more synaptic loss than gerbils, though exact comparison of lifespan between the two species is challenging due to differences in slopes of growth trajectories between species. Post-mortem temporal bone studies demonstrate a ~40-50% loss of synapses in humans by the fifth decade of life. On the other hand, our gerbils in the current study showed approximately 15-20% loss. Based on our findings and previous studies, it is reasonable to assume that our gerbil data underestimate the temporal processing deficits that would be seen in humans due to CND.
  
  We have added this information and citations to the discussion section.
  
  Equalization/control of stimuli differences across the two species: For measuring EFRs, SAM stimuli were presented at 85 dB SPL for humans vs. 30 dB above the detection threshold (inferred from ABRs) for gerbils - I do not think the results strongly depend on this choice, but it would be good to comment on why you did not choose also to present stimuli 30 dB above thresholds in humans.
  
  We chose to record EFRs to stimuli presented at 85 dB SPL in humans, as opposed to 30 dB SL, because 30 dB SL in humans would have corresponded to an intensity that makes EEG recordings unfeasible. The average PTA across younger and middle-aged adults was 7.51 dB HL (~19.51 dB SPL), which would have resulted in an average stimulus intensity of ~50 dB SPL at 30 dB SL. This intensity level would have been far too low to reliably record EFRs without presenting many thousands of trials. In a pilot study, we recorded EFRs at 75 dB SL, which equated to an average of 83.9 dB SPL. Thus, we chose the suprathreshold level of 85 dB SPL for the current study to obtain reliable responses with just 1000 trials.
  
  Simulations of EFRs using functional models could have been used to understand (at least in humans) how the differences in EFRs obtained between the two groups are quantitatively compatible with the differences in % of remaining synaptic connections known from histopathological studies for their age range (see the approach in Märcher-Rørsted et al., Hear. Res., 2022)
  
  We agree with the reviewer that phenomenological models would be a useful approach to examining differences between age groups and species. We have previously used the Zilany/Carney model to examine differences in EFRs with age in rats (Parthasarathy, Lai, and Bartlett 2016). It is unclear if such models will directly translate to responses form gerbils. However, this is a subject of ongoing study in our lab.
  
  (3) Synergetic effects of CND and listening effort:
  
  Could you test whether there is an interaction between CND and listening effort? (e.g. one could hypothesize that MA subjects with the largest CND have also higher listening effort).
  
  We have previously reported that EFRs and listening effort are not linearly related (McHaney et al. 2024). We found the same to be largely true in the current study as well. We ran correlations between EFR amplitudes at 1024 Hz and listening effort at each SNR level in the listening and integrations windows. We did not observe any significant relationships between EFRs at 1024 Hz and listening effort in the listening window (all ps > .05). In the integration window, we did see a significant correlation between listening effort at SNR 5 and EFRs at 1024 Hz, which was significant after correcting for multiple comparisons (r = -.42, p-adj = .021). However, we chose to not report these multiple oneto-one correlations in the current study and instead opted for the elastic net regression analysis to better understand the multifactorial contributions to speech-in-noise abilities. These results also do not preclude non-linear relationships between listening effort and EFRs which may be present based on emerging results (Bramhall, Buran, and McMillan 2025), and will be explored in future studies.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  A few more minor comments/questions:
  
  (1) How old were the YA gerbils on average? 18 weeks, or 19 weeks, or 22 weeks?
  
  Young gerbils were on average 22 weeks. We have updated the manuscript accordingly.
  
  (2) "Gerbils share the same hearing frequency range as humans" is misleading; the gerbil hearing range extends to much higher frequencies.
  
  We have revised the statement to say: “The hearing range of gerbils largely overlaps with that of humans, making them an ideal animal model for direct comparison in crossspecies studies.”
  
  (3) The writing contains more than a few typos and grammatical errors.
  
  We have completed a thorough revision to correct for grammatical and typographical errors.
  
  (4) Suggesting that correlation and linear modelling are "independent" methods is misleading since they are both measuring linear associations. A better word would be "different".
  
  Thank you for this suggestion. We have rephrased the sentence as “two separate approaches”
  
  (5) The phrase "Our results reveal perceptual deficits ... driven by CND" in the abstract is too strong. Correlation is not causation.
  
  We have revised this phrase to say they “are associated with CND.”
  
  Reviewer #2 (Recommendations for the authors):
  
  More general comments:
  
  (1) Recruitment criterion related to hearing-in-noise difficulties:
  
  If I understood correctly, the middle-aged participants recruited for this study do not have specific hearing in noise difficulties, some could, as with 10% in the general population, but they were not recruited using this criterion. If this is correct, this should be stated explicitly, as it constitutes an important methodological choice and a difference with your eLife 2020 study. If you were to use this specific recruitment criterion for both groups here, what differences would you expect?
  
  Our participants were not required to have specific complaints of speech perception in noise challenges to be eligible for this study. We included middle-aged adults here, as opposed to only younger adults as in Parthasarathy et al. (2020), with the assumption that middle-aged adults were likely to have some cochlear synapse loss and individual variability in the degree of synapse loss based on post-mortem data from human temporal bones. We have recently published studies identifying the specific clinical populations of patients with self-perceived hearing loss, including those adults who have received assessments for auditory processing disorders (Cancel et al. 2023). Ongoing studies in the lab are aimed at recruiting from this population.
  
  It is striking here that the QuickSIN test does not exhibit the same variability at low SNRS here as with the digits-in-noise used in your eLife 2020 study. Why would QuickSIN more appropriate than the Digits-in-noise test? Would you expect the same results with the Digits-in-noise test?
  
  Our 2020 eLife study investigated the effects of TFS coding in multi-talker speech intelligibility. TFS coding is specifically hypothesized to be related to multi-talker speech, compared to broadband maskers. The digits test was appropriate in that context as the ‘masker’ there was two competing speakers also speaking digits. In this study, we wanted to test the effects of CND on speech in noise perception using clinically relevant speech in noise tests. The Digits test is devoid of linguistic context and is essentially closed set (participants know that only a digit will be presented). However, QuickSIN consists of open set sentences of moderate context, making it closer to real world listening situations. Additionally, we recently published pupillometry recorded in response to QuickSIN in young adults ((McHaney et al. 2024) and identified QuickSIN as a promising screening tool for self-perceived hearing difficulties (Cancel et al. 2023). These factors informed our choice of using QuickSIN in the current study.
  
  (2) Why is the increase in listening effort interpreted as an increase in gain? please clarify (p10, 1st paragraph; [these data suggest a decrease in peripheral neural coding, with a concomitant increase in central auditory activity or 'gain'])
  
  In the above referenced paragraph, we were discussing the increase in 40 Hz AM rate EFRs in middle-aged adults as an increase in central gain. We have revised parts of this paragraph to better communicate that we were discussing the EFRs and not listening effort: “We observed decreases in EFRs at modulation rates that were selective to the auditory periphery (i.e., 1024 Hz) in middle-aged adults, while EFRs primarily generated from the central auditory structures were not different from those in younger adults (Fig. 1K). These data suggest that middle-aged adults exhibited an increase in central auditory activity, or ‘gain’, in the presence of decreased peripheral neural coding. The perceptual consequences of this gain are unclear, but our findings align with emerging evidence suggesting that gain is associated with selective deficits in speech-in-noise abilities”
  
  (3) Further discussion on the relationship/differences between markers EFR marker of CND (this study) and MEMR marker of CND(Bharadwaj et al., 2022) is needed.
  
  We now make mention of other candidate markers of CND (ABR wave I and MEMRs) in the discussion and expand on why we chose the EFR.
  
  (4) Further analyses and discussion would be needed to be related to extended high-freq thresholds:
  
  Did you test for a potential correlation of your EFR marker of CND with extended high-freq. thresholds ? (could be paralleling the amount of CND in these individuals) Why won't you also consider measuring extended HF in Gerbils?
  
  We acknowledge that there is increasing evidence to suggest extended high frequency thresholds may be an early marker for hidden hearing loss/CND. We have examined an additional correlation for extended high frequency pure tone averages (8k-16k Hz) with EFR amplitudes at 1024 Hz AM rate, which revealed a significant relationship (r = -.43, p < .001). However, we opted to exclude this analysis from our current study as we wanted to reduce reporting on several one-to-one correlations. Therefore, we chose the elastic net regression model to examine individual contributions to speech in noise abilities. EHF thresholds were included in the elastic net regression models, but were not found to be significant upon accounting for individual differences in PTA.
  
  Additionally, our electrophysiological experimental paradigm was not designed with the consideration of extended high frequencies—we used ER3C transducers which are not optimal for frequencies above ~6kHz. Future studies could use transducers such as the ER2 or free field speakers to examine the influence of extended high frequencies on the EFRs and measure high frequency thresholds in gerbils.
  
  Minor Comments:
  
  (1) Abstract: repetition of 'later in life' in the first two sentences - please reformulate.
  
  We have revised the first two sentences to state: “Middle-age is a critical period of rapid changes in brain function that presents an opportunity for early diagnostics and intervention for neurodegenerative conditions later in life. Hearing loss is one such early indicator linked to many comorbidities in older age.”
  
  (2) Sentence on page 3 [However, these behavioral readouts may minimize subliminal changes in perception that are reflected in listening effort but not in accuracies (26-28)] is not clear.
  
  We’ve added a sentence just after that states: “Specifically, two individuals may show similar accuracies on a listening task, but one individual may need to exert substantially more listening effort to achieve the same accuracy as the other.”
  
  (3) The second paragraph of page 11 should go to a methods (model) section, not to the discussion.
  
  We have now moved a portion of this paragraph to the Elastic Net Regression subsection of the Statistical Analysis in the Methods.
  
  (4) Please checks references: references 13 and 25 are identical.
  
  Fixed
  
  References
  
  Auerbach, Benjamin D., Kelly Radziwon, and Richard Salvi. 2019. “Testing the Central Gain Model: Loudness Growth Correlates with Central Auditory Gain Enhancement in a Rodent Model of Hyperacusis.” Neuroscience 407:93–107. https://doi.org/10.1016/j.neuroscience.2018.09.036.
  
  Bramhall, Naomi F., Brad N. Buran, and Garnett P. McMillan. 2025. “Associations Between Physiological Indicators of Cochlear Deafferentation and Listening Effort in Military Veterans with Normal Audiograms.” Hearing Research, April, 109263. https://doi.org/10.1016/j.heares.2025.109263.
  
  Cancel, Victoria E., Jacie R. McHaney, Virginia Milne, Catherine Palmer, and Aravindakshan Parthasarathy. 2023. “A Data-Driven Approach to Identify a Rapid Screener for Auditory Processing Disorder Testing Referrals in Adults.” Scientific Reports 13 (1): 13636. https://doi.org/10.1038/s41598-023-40645-0.
  
  Caspary, D. M., L. Ling, J. G. Turner, and L. F. Hughes. 2008. “Inhibitory Neurotransmission, Plasticity and Aging in the Mammalian Central Auditory System.” Journal of Experimental Biology 211 (11): 1781–91. https://doi.org/10.1242/jeb.013581.
  
  Cohen, Jacob. 2013. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. New York: Routledge. https://doi.org/10.4324/9780203771587.
  
  Encina-Llamas, Gerard, Aravindakshan Parthasarathy, James Michael Harte, Torsten Dau, Sharon G. Kujawa, Barbara Shinn-Cunningham, and Bastian Epp. 2017. “Hidden Hearing Loss with Envelope Following Responses (EFRs): The off-Frequency Problem: 40th MidWinter Meeting of the Association for Research in Otolaryngology.” In .
  
  James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.
  
  Joris, P. X., C. E. Schreiner, and A. Rees. 2004. “Neural Processing of Amplitude-Modulated Sounds.” Physiological Reviews 84 (2): 541–77. https://doi.org/10.1152/physrev.00029.2003.
  
  McHaney, Jacie R., Kenneth E. Hancock, Daniel B. Polley, and Aravindakshan Parthasarathy. 2024. “Sensory Representations and Pupil-Indexed Listening Effort Provide Complementary Contributions to Multi-Talker Speech Intelligibility.” Scientific Reports 14 (1): 30882. https://doi.org/10.1038/s41598-024-81673-8.
  
  Parida, Satyabrata, Kimberly Yurasits, Victoria E. Cancel, Maggie E. Zink, Claire Mitchell, Meredith C. Ziliak, Audrey V. Harrison, Edward L. Bartlett, and Aravindakshan Parthasarathy. 2024. “Rapid and Objective Assessment of Auditory Temporal Processing Using Dynamic Amplitude-Modulated Stimuli.” Communications Biology 7 (1): 1–10. https://doi.org/10.1038/s42003-024-07187-1.
  
  Parthasarathy, A., and E. L. Bartlett. 2011. “Age-Related Auditory Deficits in Temporal Processing in F-344 Rats.” Neuroscience 192:619–30. https://doi.org/10.1016/j.neuroscience.2011.06.042.
  
  Parthasarathy, A., J. Lai, and E. L. Bartlett. 2016. “Age-Related Changes in Processing Simultaneous Amplitude Modulated Sounds Assessed Using Envelope Following Responses.” Jaro-Journal of the Association for Research in Otolaryngology 17 (2): 119–32. https://doi.org/10.1007/s10162-016-0554-z.
  
  Parthasarathy, A., Kenneth E Hancock, Kara Bennett, Victor DeGruttola, and Daniel B Polley. 2020. “Bottom-up and Top-down Neural Signatures of Disordered Multi-Talker Speech Perception in Adults with Normal Hearing.” Edited by Barbara G Shinn-Cunningham, Huan Luo, Fan-Gang Zeng, and Christian Lorenzi. eLife 9 (January):e51419. https://doi.org/10.7554/eLife.51419.
  
  Parthasarathy, Aravindakshan, and Sharon G. Kujawa. 2018. “Synaptopathy in the Aging Cochlea: Characterizing Early-Neural Deficits in Auditory Temporal Envelope Processing.” The Journal of Neuroscience. https://doi.org/10.1523/jneurosci.324017.2018.
  
  Ponsot, Emmanuel, Pauline Devolder, Ingeborg Dhooge, and Sarah Verhulst. 2024. “AgeRelated Decline in Neural Phase-Locking to Envelope and Temporal Fine Structure Revealed by Frequency Following Responses: A Potential Signature of Cochlear Synaptopathy Impairing Speech Intelligibility.” bioRxiv. https://doi.org/10.1101/2024.12.11.628010.
  
  Sergeyenko, Yevgeniya, Kumud Lall, M. Charles Liberman, and Sharon G. Kujawa. 2013. “Age-Related Cochlear Synaptopathy: An Early-Onset Contributor to Auditory Functional Decline.” Journal of Neuroscience 33 (34): 13686–94. https://doi.org/10.1523/jneurosci.1783-13.2013.
  
  Shaheen, L. A., M. D. Valero, and M. C. Liberman. 2015. “Towards a Diagnosis of Cochlear Neuropathy with Envelope Following Responses.” J Assoc Res Otolaryngol. https://doi.org/10.1007/s10162-015-0539-3.
  
  Tibshirani, Ryan J., and Jonathan Taylor. 2012. “Degrees of Freedom in Lasso Problems.” The Annals of Statistics 40 (2): 1198–1232. https://doi.org/10.1214/12-AOS1003.
  
  Wu, P. Z., L. D. Liberman, K. Bennett, V. de Gruttola, J. T. O’Malley, and M. C. Liberman. 2018. “Primary Neural Degeneration in the Human Cochlea: Evidence for Hidden Hearing Loss in the Aging Ear.” Neuroscience. https://doi.org/10.1016/j.neuroscience.2018.07.053.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.01.606213v3
www.biorxiv.org www.biorxiv.org

Notch signaling and Bsh homeodomain activity are integrated to diversify Drosophila lamina neuron types

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signalproviding cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.
  
  Thanks!
  
  I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.
  
  I again finish by commending the authors for this terrific piece of work.
  
  Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project.
  
  We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all postsensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the cosubmitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.
  
  Thanks for the positive feedback!
  
  Strengths:
  
  The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.
  
  Thanks!
  
  Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.
  
  Thanks!
  
  Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.
  
  Thanks for the positive feedback on both manuscripts.
  
  Weaknesses:
  
  Differential Notch activity in L4 and L5:
  
  ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.
  
  We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single-cell RNAseq on LPCs to look for molecular heterogeneities. Nevertheless, whether L4 is generated by E(spl)mɣ-GFP+ (NotchON) LPCs does not affect our conclusion that Notch signaling and the primary HDTF Bsh are integrated to specify L4 fate over L5.
  
  ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.
  
  Dl is transiently expressed in newborn L1 neurons. To knock down Dl in newborn L1, we need to express Dl-RNAi before the onset of Dl expression in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4, which is the one that we used.
  
  ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.
  
  We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.
  
  ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.
  
  Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in newborn neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.
  
  Notch role in establishing L4 vs L5 fates:
  
  ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.
  
  We disagree that the use of 27G05-Gal4 is problematic when performing Notch-KD because our conclusion from Notch-KD is that Bsh without Notch signaling activates Pdm3 and specifies L5 fate. However, 27G05-Gal4 does not have any effect on Pdm3 expression. To make this clearer, we will quantify the percentage of Pdm3+ L5 neurons in Bsh+ lamina neurons for Notch-KD experiment. We are sorry this wasn't clearer.
  
  ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.
  
  Thank you for catching this. We will correct it in the text.
  
  ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?
  
  Our data show that Bsh with transient Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons.
  
  However, as suggested by the reviewer, we did the experiment (see figure below). We used Gal80 (Gal80 inhibits Gal4 activity at 18C) to temporarily control Bsh-Gal4 activity for expressing N-ICD (the active form of Notch) in L5 neurons. We found that tub-Gal80ts, Bsh-Gal4>UAS-N-ICD is unable to induce ectopic L4 neurons when we shift the temperature from 18C to 30C to inactivate Gal80 at 15 hours after pupal formation, which is close to the end of lamina neurogenesis. However, it is unknown how many hours it takes to inactivate Gal80 and activate Bsh-Gal4 and thus we decided not to include this data in our manuscript.
  
  Author response image 1.
  
  L4-to-L3 conversion in the absence of Bsh
  
  ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dlexpressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.
  
  Our data show the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently, we only have Hey as an available Notch target reporter in newborn neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.
  
  ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.
  
  That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).
  
  Different chromatin landscape in L4 and L5 neurons
  
  ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.
  
  We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. The reviewer posits: “An alternative hypothesis: different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.” Yes, it is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF (e.g., Bsh). We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all premotor neurons are NotchON neurons while all post-sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.
  
  ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.
  
  We agree and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5-specific gene transcription during the synaptogenesis window, in addition to Bsh. We will include this in the text.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.15.545141v3
www.biorxiv.org www.biorxiv.org

New submission 11/02/2024, 13:48:18

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  We are grateful for the insightful suggestions and comments provided by the reviewers. Your constructive feedback has been valuable, and we are thankful for the opportunity to address each point.
  
  We appreciate both reviewers’ recognition of our devotion to rigorous methodology and experimental control in this study, as evidenced by the comments: “remarkable efforts were made to isolate peripheral confounds”, “a clear strength of the study is the multitude of control conditions … that makes results very convincing”, and “thorough design of the study”. Indeed, we hope to have provided more than solid, but compelling evidence for sound-driven motor inhibitory effects of online TUS. We hope that this will be reflected in the assessment. Our conclusions are supported by multiple experiments across multiple institutions using exemplary experimental control including (in)active controls and multiple sound-sham conditions. This contrasts with the sole use of flip-over sham or no-stimulation conditions used in the majority of work to date. Indeed, the current study communicates that substantiated inferences on the efficacy of ultrasonic neuromodulation cannot be made under insufficient experimental control.
  
  In response to the reviewers' comments, we have substantially changed our manuscript. Specifically, we have open-sourced the auditory masking stimuli and specified them in better detail in the text, we have improved the figures to reflect the data more closely, we have clarified the intracranial doseresponse relationship, we have elaborated in the introduction, and we have further discussed the possibility of direct neuromodulation. We hope that you agree these changes have helped to substantially improve the manuscript.
  
  Public reviews
  
  1.1) Despite the main conclusion of the authors stating that there is no dose-response effects of TUS on corticospinal inhibition, both the comparison of Isppa and MEP decrease for Exp 1 and 2, and the linear regression between MEP decrease (relative to baseline) and the estimated Isppa are significant, arguing the opposite, that there is a dose-response function which cannot be fully attributed to difference in sound (since the relationship in inversed, lower intracranial Isppa leads to higher MEP decrease). These results suggest that doseresponse function needs to be further studied in future studies.
  
  We thank the reviewer for bringing up this point. While we are convinced our study provides no evidence for a direct neuromodulatory dose-response relationship, we have realized that the manuscript could benefit from improved clarity on this point.
  
  A dose-response relationship between TUS intensity and motor cortical excitability was assessed by manipulating free-water Isppa (Figure 4C). Here, no significant effect of free-water stimulation intensity was observed for Experiment I or II, thus providing no evidence for a dose-response relationship (Section 3.2). To aid in clarity, ‘N.S.’ has been added to Figure 4C in the revised manuscript.
  
  However, it is likely that the efficacy of TUS would depend on realized intracranial intensity, which we estimated with 3D simulations for on-target stimulation. These simulations resulted in an estimated intracranial intensity for each applied free-water intensity (i.e., 6.35 and 19.06 W/cm2), for each participant. We then tested whether inter-individual differences in intracranial intensity during on-target TUS affected MEP amplitude. We have realized that the original visualization used to display these data and its explanation was unintuitive. Therefore, we have completely revised Supplementary Figure 6. Because of the substantial length of this section, we have not copied it here. Please see the Supplementary material for the implemented improvements.
  
  In brief, we now show MEP amplitudes on the y-axis, rather than expressing values a %change. This plot depicts how individuals with higher intracranial intensities during ontarget TUS exhibit higher MEP amplitudes. However, this same relationship is observed for active control and sound-sham conditions. If there were a direct neuromodulatory doseresponse relationship of TUS, this would be reflected as the difference between on-target and control conditions changing as the estimated intracranial intensity increases. This was not the case. Further, the fact that the difference between on-target stimulation and baseline changes across intracranial intensities is notable, but this occurs to an equal degree in the control conditions. Therefore, these data cannot be interpreted as evidence for a doseresponse relationship.
  
  We hope the changes in Supplementary Figure 6 will make it clear that there is no evidence for direct intracranial dose-response effects.
  
  1.2) Other methods to test or mask the auditory confound are possible (e.g., smoothed ramped US wave) which could substantially solve part of the sound issue in future studies or experiments in deaf animals etc...
  
  We agree with the reviewer’s statement. We aimed to replicate the findings of online motor cortical inhibition reported in prior work using a 1000 Hz square wave modulation frequency. While ramping can effectively reduce the auditory confound, as noted in the discussion, this is not feasible for the short pulse durations (0.1-0.3 ms) employed in the current study (Johnstone et al., 2021). We have further clarified this point in the methods section of the revised manuscript as follows:
  
  “While ramping the pulses can in principle mitigate the auditory confound (Johnstone et al., 2021; Mohammadjavadi et al., 2019), doing so for such short pulse durations (<= 0.3 ms) is not effective. Therefore, we used a rectangular pulse shape to match prior work.”
  
  Mitigation of the auditory confound by testing deaf subjects is a valid approach, and has now been added to the revised manuscript in the discussion as follows:
  
  “Alternative approaches could circumvent auditory confounds by testing deaf subjects, or perhaps more practically by ramping the ultrasonic pulse to minimize or even eliminate the auditory confound.”
  
  1.3) Dose-response function is an extremely important feature for a brain stimulation technique. It was assessed in Exp II by computing the relationship between the estimated intracranial intensities and the modulation of corticospinal excitability (Fig. 3b, 3c). It is not clear why data from Experiment I could not be integrated in a global intracranial dose-response function to explore wider ranges of intracranial intensities and MEP variability.
  
  We chose not to combine data from Experiment 1 in a global intracranial dose-response function because TUS was applied at different fundamental frequencies and focal depths (Experiment I: 500 kHz, 35 mm; Experiment II: 250 kHz, 28 mm). We have now explicitly communicated this under Supplementary Figure 6:
  
  “It was not appropriate to combine data from Experiments I and II given the different fundamental frequencies and stimulation depths applied… we ran simple linear models for Experiment II, which had a sufficient sample size (n = 27) to assess inter-individual variability.”
  
  1.4) Furthermore, the dose response function as computed with the MEP change relative to baseline shows a significant effect (6.35W/cm2) or a trend (19.06 W/cm2) for a positive linear relationship. This comparison cannot disentangle the auditory confound from the pure neuromodulatory effect but given the direction of the relationship (lower Isppa associated with larger neuromodulatory effect), it is unlikely that it is driven by sound. This relationship is absent for the Active control condition or the Sound Sham condition, more or less matched for peripheral confound. This needs to be further discussed.
  
  Please refer to point 1.1
  
  1.5) The clear auditory confound arises from TUS pulsing at audible frequencies, which can be highly subject to inter-individual differences. Did the authors individually titrate the auditory mask to account for this intra- and inter-individual variability in auditory perception?
  
  In Experiments I-III, the auditory mask was identical between participants. In Experiment IV, the auditory mask volume and signal-to-noise ratio were adjusted per participant. In the discussion we recommend individualized mask titration. However, we do note that masking successfully blinded participants in Experiment II, despite using uniform masking stimuli (Supplementary Figure 5).
  
  1.6) How different is the masking quality when using bone-conducting headphones (e.g., Exp. 1) compared to in-ear headphones (e.g., Exp. 2)?
  
  In our experience, bone conducting headphones produce a less clear, fuzzier, sound than in-ear headphones. However, in-ear headphones block the ear canal and likely result in the auditory confound being perceived as louder. We have included this information in the discussion of the revised manuscript:
  
  “Titrating auditory mask quality per participant to account for intra- and inter-individual differences in subjective perception of the auditory confound would be beneficial. Here, the method chosen for mask delivery must be considered. While bone-conducting headphones align with the bone conduction mechanism of the auditory confound, they might not deliver sound as clearly as in-ear headphones or speakers. Nevertheless, the latter two rely on airconducted sound. Notably, in-ear headphones could even amplify the perceived volume of the confound by obstructing the ear canal.”
  
  1.7) I was not able to find any report on the blinding efficacy of Exp. 1. Do the authors have some data on this?
  
  We do not have blinding data available for Experiment I. Following Experiment I, we decided it would be useful to include such an assessment in Experiment II.
  
  1.8) Was the possibility to use smoothed ramped US wave form ever tested as a control condition in this set of studies, to eventually reduce audibility? For such fast PRF, for fast PRF, the slope would still need to be steep to stimulate the same power (AUC), it might not be as efficient.
  
  We indeed tested smoothing (ramping) the waveform. There was no perceptible impact on the auditory confound volume. Indeed, prior research has also indicated that ramping over
  
  such short pulse durations is not effective (Johnstone et al., 2021). Taken together, we chose to continue with a square wave modulation as in prior TUS-TMS studies. We have updated the methods section of the manuscript with the following:
  
  “While ramping the pulses can in principle mitigate the auditory confound (Johnstone et al., 2021; Mohammadjavadi et al., 2019), doing so for such short pulse durations (<= 0.3 ms) is not effective. Therefore, we used a rectangular pulse shape to match prior work.”
  
  Importantly, our research shows that auditory co-stimulation can confound effects on motor excitability, and this likely occurred in multiple seminal TUS studies. While some preliminary work has been done on the efficacy of ramping in humans, future work is needed to determine what ramp shapes and lengths are optimal for reducing the auditory confound.
  
  1.9) There are other models or experiments that need to be discussed in order to clearly disassociate the TUS effect from the auditory confound effect, for instance, testing deaf animal models or participants, or experiments with multi-region recordings (to rule out the effects of the dense structural connectivity between the auditory cortex and the motor cortex).
  
  The suggestion to consider multi-region recording in future experiments is important. Indeed, the effects of the auditory confound are expected to vary between brain regions. In the primary motor cortex, we observe a learned inhibition, which is perhaps supported by dense structural connectivity with the auditory system. In contrast, in perceptual areas such as the occipital cortex, one might expect tuned attentional effects in response to the auditory cue. We suggest that it is likely that the impact of the auditory confound also operates on a more global network level. It is reasonable to propose that, in a cognitive task for example, the confound will affect task performance and related brain activity, ostensibly regardless of the extent of direct structural connectivity between the auditory cortex and the (stimulated) region of interest.
  
  Regarding the testing of deaf subjects, this has been included in the revised discussion as follows:
  
  “Alternative approaches could circumvent auditory confounds by testing deaf subjects, or perhaps more practically by ramping the ultrasonic pulse to minimize or even eliminate the auditory confound.”
  
  1.10) The concept of stochastic resonance is interesting but traditionally refers to a mechanism whereby a particular level of noise actually enhances the response of non-linear systems to weak sensory signals. Whether it applies to the motor system when probed with suprathreshold TMS intensities is unclear. Furthermore, whether higher intensities induce higher levels of noise is not straightforward neither considering the massive amount of work coming from other NIBS studies in particular. Noise effects are indeed a function of noise intensity, but exhibit an inverted U-shape dose-response relationship (Potok et al., 2021, eNeuro). In general SR is rather induced with low stimulation intensities in particular in perceptual domain (see Yamasaki et al., 2022, Neuropsychologia). In the same order of ideas, did the authors compare inter-trials variability across the different conditions?
  
  We thank the reviewer for these insightful remarks. Indeed, stochastic resonance is a concept first formalized in the sensory domain. Recently, the same principles have been shown to apply in other domains as well. For example, transcranial electric noise (tRNS) exhibits similar stochastic resonance principles as sensory noise (Van Der Groen & Wenderoth, 2016). Indeed, tRNS has been applied to many cortical targets, including the motor system. In the current manuscript, we raise the question of whether TUS might engage with neuronal activity following principles similar to tRNS. One prediction of this framework would be that TUS might not modulate excitation/inhibition balance overall, but instead exhibit an inverted U-shape dose-dependent relationship with stochastic noise. Please note, we do not use the ‘suprathreshold TMS intensity’ to quantify whether noise could bring a sub-threshold input across the detection threshold, nor whether it could bring a sub-threshold output across the motor threshold. Instead, we use the MEP read-out to estimate the temporally varying excitability itself. We argue that MEP autocorrelation captures the mixture of temporal noise and temporal structure in corticospinal excitability. Building on the non-linear response of neuronal populations, low stochastic noise might strengthen weakly present excitability patterns, while high stochastic noise might override pre-existing excitability. It is therefore not the overall MEP amplitude, but the MEP timeseries that is of interest to us. Here, we observe a non-linear dose-dependent relationship, matching the predicted inverted U-shape. Importantly, we did not intend to assume stochastic resonance principles in the motor domain as a given. We have now clarified in the revised manuscript that we propose a putative framework and regard this as an open question:
  
  “Indeed, human TUS studies have often failed to show a global change in behavioral performance, instead finding TUS effects primarily around the perception threshold where noise might drive stochastic resonance (Butler et al., 2022; Legon et al., 2018). Whether the precise principles of stochastic resonance generalize from the perceptual domain to the current study is an open question, but it is known that neural noise can be introduced by brain stimulation (Van Der Groen & Wenderoth, 2016). It is likely that this noise is statedependent and might not exceed the dynamic range of the intra-subject variability (Silvanto et al., 2007). Therefore, in an exploratory analysis, we exploited the natural structure in corticospinal excitability that exhibits as a strong temporal autocorrelation in MEP amplitude.”
  
  Following the above reasoning, we felt it critical to estimate noise in the timeseries, operationalized as a t-1 autocorrelation, rather than capture inter-trial variability that ignores the timeseries history and requires data aggregation thereby reducing statistical power. Importantly, we would expect the latter index to capture global variability, putatively masking the temporal relationships which we were aiming to test. The reviewer raises an interesting option, inviting us to wonder if inter-trial variability might be sensitive enough, nonetheless. To this end, we compared inter-trial variability as suggested. This was achieved by first calculating the inter-trial variability for each condition, and then running a three-way repeated measures ANOVA on these values with the independent variables matching our autocorrelation analyses, namely, procedure (on-target/active control)intensity (6.35/19.06)masking (no mask/masked). This analysis did not reveal any significant interactions or main effects.
  
  Author response table 1.
  
  1.11) State-dependency/Autocorrelations: These values were extracted from Exp2 which has baseline trials. Can the authors provide autocorrelation values at baseline, with and without auditory mask? Can the authors comment on the difference between the autocorrelation profiles of the active TUS condition at 6.35W/cm2 or at 19.06W/cm2. They should somehow be similar to my understanding. Besides, the finding that TUS induces noise only when sound is present and at lower intensities is not well discussed.
  
  In the revised manuscript, we have now included baseline in the figure (Figure 4D). Regarding baseline with and without a mask, we must clarify that baseline involves only TMS (no mask), and sham involves TMS + masking stimulus (masked).
  
  The dose-dependent relationship of TUS intensity with autocorrelation is critical. One possible observation would have been that TUS at both intensities decreased autocorrelation, with higher intensities evoking a greater reduction. Here, we would have concluded that TUS introduced noise in a linear fashion.
  
  However, we observed that lower-intensity TUS in fact strengthened pre-existing temporal patterns in excitability (higher autocorrelation), while during higher-intensity TUS these patterns were overridden (lower autocorrelation). This non-linear relationship is not unexpected, given the non-linear responses of neurons.
  
  If this non-linear dependency is driven by TUS, one could expect it to be present during conditions both with and without auditory masking. However, the preparatory inhibition effect of TUS likely depends on the salience of the cue, that is, the auditory confound. In trials without auditory masking, the salience of the confound in highly dependent on (transmitted) intensity, with higher intensities being perceived as louder. In contrast, when trials are masked, the difference in cue salience between lower and higher intensity stimulation in minimized. Therefore, we would expect for any nuanced dose-dependent direct TUS effect to be best detectable when the difference in dose-dependent auditory confound perception is minimized via masking. Indeed, the dose-dependent effect of TUS on autocorrelation is most prominent when the auditory confound is masked.
  
  “In sum, these preliminary exploratory analyses could point towards TUS introducing temporally specific neural noise to ongoing neural dynamics in a dose-dependent manner, rather than simply shifting the overall excitation-inhibition balance. One possible explanation for the discrepancy between trials with and without auditory masking is the difference in auditory confound perception, where without masking the confound’s volume differs between intensities, while with masking this difference is minimized. Future studies might consider designing experiments such that temporal dynamics of ultrasonic neuromodulation can be captured more robustly, allowing for quantification of possible state-dependent or nondirectional perturbation effects of stimulation.”
  
  1.12) Statistical considerations. Data from Figure 2 are considered in two-by-two comparisons. Why not reporting the ANOVA results testing the main effect of TUS/Auditory conditions as done for Figure 3. Statistical tables of the LMM should be reported.
  
  Full-factorial analyses and main effects for TUS/Auditory conditions are discussed from Section 3.2 onwards. These are the same data supporting Figure 2 (now Figure 3). We would like to note that the main purpose of Figure 2 is to demonstrate to the reader that motor inhibition was observed, thus providing evidence that we replicated motor inhibitory effects of prior studies. A secondary purpose is to visually represent the absence of direct and spatially specific neuromodulation. However, the appropriate analyses to demonstrate this are reported in following sections, from Section 3.2 onwards, and we are concerned that mentioning these analyses earlier will negatively impact comprehensibility.
  
  Statistical tables of the LMMs are provided within the open-sourced data and code reported at the end of the paper, embedded within the output which is accessible as a pdf (i.e., analysis/analysis.pdf).
  
  1.13) Startle effects: The authors dissociate two mechanisms through which sound cuing can drive motor inhibition, namely some compensatory expectation-based processes or the evocation of a startle response. I find the dissociation somehow artificial. Indeed, it is known that the amplitude of the acoustic startle response habituates to repetitive stimulation. Therefore, sensitization can well explain the stabilization of the MEP amplitude observed after a few trials.
  
  Thank you for bringing this to our attention. Indeed, an acoustic startle response would habituate over repetitive stimulation. A startle response would result in MEP amplitude being significantly altered in early trials. As the participant would habituate to the stimulus, the startle response would decrease. MEP amplitude would then return to baseline levels. However, this is not the pattern we observe. An alternative possibility is that participants learn the temporal contingency between the stimulus and TMS. Here, compensatory expectation-based change in MEP amplitude would be observed. In this scenario, there would be no change in MEP amplitude during early trials because the stimulus has not yet become informative of the TMS pulse timing. However, as participants learn how to predict TMS timing by the stimulus, MEP amplitude would decrease. This is also the pattern we observe in our data. We have clarified these alternatives in the revised manuscript as follows:
  
  “Two putative mechanisms through which sound cuing may drive motor inhibition have been proposed, positing either that explicit cueing of TMS timing results in compensatory processes that drive MEP reduction (Capozio et al., 2021; Tran et al., 2021), or suggesting the evocation of a startle response that leads to global inhibition (Fisher et al., 2004; Furubayashi et al., 2000; Ilic et al., 2011; Kohn et al., 2004; Wessel & Aron, 2013). Critically, we can dissociate between these theories by exploring the temporal dynamics of MEP attenuation. One would expect a startle response to habituate over time, where MEP amplitude would be reduced during startling initial trials, followed by a normalization back to baseline throughout the course of the experiment as participants habituate to the starling stimulus. Alternatively, if temporally contingent sound-cueing of TMS drives inhibition, MEP amplitudes should decrease over time as the relative timing of TUS and TMS is being learned, followed by a stabilization at a decreased MEP amplitude once this relationship has been learned.”
  
  1.14) Can the authors further motivate the drastic change in intensities between Exp1 and 2? Is it due to the 250-500 carrier difference? It this coming from the loss power at 500kHz?
  
  The change in intensities between Experiments I and II was not an intentional experimental manipulation. Following completion of data acquisition, our TUS system received a firmware update that differentially corrected the 250 kHz and 500 kHz stimulation intensities. In this manuscript, we report the actual free-water intensities applied during our experiments.
  
  1.15) Exp 3: Did 4 separate blocks of TUS-TMS and normalized for different TMS intensities used with respect to baseline. But how different was it. Why adjusting and then re adjusting intensities?
  
  The TMS intensities required to evoke a 1 mV MEP under the four sound-sham conditions significantly differed from the intensities required for baseline. In the revised appendix, we have now included a figure depicting the TMS intensities for these conditions, as well as statistical tests demonstrating each condition required a significantly higher TMS intensity than baseline.
  
  TMS intensities were re-adjusted to avoid floor effects when assessing the efficacy of ontarget TUS. Sound-sham conditions themselves attenuate MEP amplitude. This is also evident from the higher TMS intensities required to evoke a 1 mV MEP under these conditions. If direct neuromodulation by TUS would have further decreased MEP amplitude, the concern was that effects might not be detectible within such a small range of MEP amplitudes.
  
  1.16) In Exp 4, TUS targeted the ventromedial WM tract. Since direct electrical stimulation on white matter pathways within the frontal lobe can modulate motor output probably through dense communication along specific white matter pathways (e.g., Vigano et al., 2022, Brain), how did the authors ensure that this condition is really ineffective? Furthermore, the stimulation might have covered a lot more than just white matter. Acoustic and thermal simulations would be helpful here as well.
  
  Thank you for pointing out this possibility. Ultrasonic and electrical stimulation have quite distinct mechanisms of action. Therefore, it is challenging to directly compare these two approaches. There is a small amount of evidence that ultrasonic neuromodulation of white matter tracts is possible. However, the efficacy of white matter modulation is likely much lower, given the substantially lesser degree of mechanosensitive ion channel expression in white matter as opposed to gray matter (Sorum et al., 2020, PNAS). Further, recent work has indicated that ultrasonic neuromodulation of myelinated axonal bundles occurs within the thermal domain (Guo et al., 2022, SciRep), which is not possible with the intensities administered in the current study. Nevertheless, based on Experiment IV in isolation, it cannot be definitively excluded that there TUS induced direct neuromodulatory effects in addition to confounding auditory effects. However, Experiment IV does not possess sufficient inferential power on its own and must be interpreted in tandem with Experiments I-III. Taken together with those findings, it is unlikely that a veridical neuromodulation effect is seen here, given the equivalent or lower stimulation intensities, the substantially deeper stimulation site, and the absence of an additional control condition in Experiment IV. This likelihood is further decreased by the fact that inhibitory effects under masking descriptively scale with the audibility of TUS.
  
  Off-target effects such as unintended co-stimulation of gray matter when targeting white matter is always an important factor to consider. Unfortunately, individualized simulations for Experiment IV are not available. However, the same type of transducer and fundamental frequency was used as in Experiment II, for which we do have simulations. Given the size of the focus and the very low in-situ intensities extending beyond the main focal point, it is incredibly unlikely that effective stimulation was administered outside white matter in a meaningful number of participants. Nevertheless, the reviewer is correct that this can only be directly confirmed with simulations, which remain infeasible due to both technical and practical constraints. We have included the following in the revised manuscript:
  
  “The remaining motor inhibition observed during masked trials likely owes to, albeit decreased, persistent audibility of TUS during masking. Indeed, MEP attenuation in the masked conditions descriptively scale with participant reports of audibility. This points towards a role of auditory confound volume in motor inhibition (Supplementary Fig. 8). Nevertheless, one could instead argue that evidence for direct neuromodulation is seen here. This unlikely for a number of reasons. First, white matter contains a lesser degree of mechanosensitive ion channel expression and there is evidence that neuromodulation of these tracts may occur primarily in the thermal domain (Guo et al., 2022; Sorum et al., 2021). Second, Experiment IV lacks sufficient inferential power in the absence of an additional control and must therefore be interpreted in tandem with Experiments I-III. These experiments revealed no evidence for direct neuromodulation using equivalent or higher stimulation intensities and directly targeting grey matter while also using multiple control conditions. Therefore, we propose that persistent motor inhibition during masked trials owes to continued, though reduced, audibility of the confound (Supplementary Fig. 8). However, future work including an additional control (site) is required to definitively disentangle these alternatives.”
  
  1.17) Still for Exp 4. the rational for the 100% MSO or 120% or rMT is not clear, especially with respect to Exp 1 and 2. Equipment is similar as well as raw MEPs amplitudes, therefore the different EMG gain might have artificially increased TMS intensities. Could it have impacted the measured neuromodulatory effects?
  
  Experiment IV was conducted independently at a different institute than Experiments I-II. In contrast to Experiments I-II, a gel pad was used to couple TUS to the participant’s head. The increased TMS-to-cortex distance introduced by the gel pad necessitates higher TMS intensities to compensate for the increased offset. In fact, in 9/12 participants, the intended intensity at 120% rMT exceeded the maximum stimulator output. In those cases, we defaulted to the maximum stimulator output (i.e., 100% MSO). We have clarified in the revised supplementary material as follows:
  
  “We aimed to use 120% rMT (n =3). However, if this intensity surpassed 100% MSO, we opted for 100% MSO instead (n = 9). The mean %MSO was 94.5 ± 10.5%. The TMS intensities required in this experiment were higher than those required in Experiment I-II using the same TMS coil, though still within approximately one standard deviation. This is likely due to the use of a gel pad, which introduces more distance between the TMS coil and the scalp, thus requiring a higher TMS intensity to evoke the same motor activity.”
  
  Regarding the EMG gain, this did not affect TMS intensities and did not impact the measured neuromodulatory effects. The EMG gain at acquisition is always considered during signal digitization and further analyses.
  
  1.18) Exp. 4. It would be interesting to provide the changes in MEP amplitudes for those subjects who rated "inaudible" in the self-rating compared to the others. That's an important part of the interpretation: inaudible conditions lead to inhibition, so there is an effect. The auditory confound is not additive to the TUS effect.
  
  Previously, we only provided participant’s ratings of audibility, and showed that conditions that were rated as inaudible more often showed less inhibition, descriptively indicating that inaudible stimulation does not lead to inhibition. This interpretation is in line with our conclusion that the TUS auditory confound acts as a cue signaling the upcoming TMS pulse, thus leading to preparatory inhibition.
  
  We have now included an additional plot and discussion in Supplementary Figure 8 (Subjective Report of TUS Audibility). Here, we show the change in MEP amplitude from baseline for the three continuously masked TUS intensities as in the main manuscript, but now split by participant rating of audibility. Descriptively, less audible sounds result in no marked change or a smaller change in MEP amplitude. This supports our conclusion that direct neuromodulation is not being observed here. When participants were unsure whether they could hear TUS, or when they did hear TUS, more inhibition was observed. However, this is still to a lesser degree than unmasked stimulation which was nearly always audible, and likely also more salient. This also supports our conclusion that these results indicate a role of cue salience rather than direct neuromodulation. Regarding masked conditions where participants were uncertain whether they heard TUS, the sound was likely sufficient to act as a cue, albeit potentially subliminally. After all, preparatory inhibition is not a conscious action undertaken by the participant either. We would also like to note that participants reported perceived audibility after each block, not after each trial, so selfreported audibility was not a fine-grained measurement. The data from Experiment IV suggest that the volume of the cue has an impact on motor inhibition. Taken together with the points mentioned in 1.16, it is not possible to conclude there is evidence for direct neuromodulation in Experiment IV.
  
  1.19) I suggest to re-order sub panels of the main figures to fit with the chronologic order of appearance in the text. (e.g Figure 1 with A) Ultrasonic parameters, B) 3D-printed clamp, C) Sound-TMS coupling, D) Experimental condition).
  
  We have restructured the figures in the manuscript to provide more clarity and to have greater alignment with the eLife format.
  
  2.1) Although auditory confounds during TUS have been demonstrated before, the thorough design of the study will lead to a strong impact in the field.
  
  We thank the reviewer for recognition of the impact of our work. They highlight that auditory confounds during TUS have been demonstrated previously. Indeed, our work builds upon a larger research line on auditory confounds. The current study extends on the confound’s presence by quantifying its impact on motor cortical excitability, but perhaps more importantly by invalidating the most robust and previously replicable findings in humans. Further, this study provides a way forward for the field, highlighting the necessity of (in)active control conditions and tightly matched sham conditions for appropriate inferences in future work. We have amended the abstract to better reflect these points:
  
  “Primarily, this study highlights the substantial shortcomings in accounting for the auditory confound in prior TUS-TMS work where only a flip-over sham control was used. The field must critically reevaluate previous findings given the demonstrated impact of peripheral confounds. Further, rigorous experimental design via (in)active control conditions is required to make substantiated claims in future TUS studies.”
  
  2.2) A few minor [weaknesses] are that (1) the overview of previous related work, and how frequent audible TUS protocols are in the field, could be a bit clearer/more detailed
  
  We have expanded on previous related work in the revised manuscript:
  
  “Indeed, there is longstanding knowledge of the auditory confound accompanying pulsed TUS (Gavrilov & Tsirulnikov, 2012). However, this confound has only recently garnered attention, prompted by a pair of rodent studies demonstrating indirect auditory activation induced by TUS (Guo et al., 2022; Sato et al., 2018). Similar effects have been observed in humans, where exclusively auditory effects were captured with EEG measures (Braun et al., 2020). These findings are particularly impactful given that nearly all TUS studies employ pulsed protocols, from which the pervasive auditory confound emerges (Johnstone et al., 2021).”
  
  2.3) The acoustic control stimulus can be described in more detail
  
  We have elaborated upon the masking stimulus for each experiment in the revised manuscript as follows:
  
  Experiment I: “In addition, we also included a sound-only sham condition that resembled the auditory confound. Specifically, we generated a 1000 Hz square wave tone with 0.3 ms long pulses using MATLAB. We then added white noise at a signal-to-noise ratio of 14:1. This stimulus was administered to the participant via bone-conducting headphones.”
  
  Experiment II: “In this experiment, the same 1000 Hz square wave auditory stimulus was used for sound-only sham and auditory masking conditions. This stimulus was administered to the participant over in-ear headphones.”
  
  Experiment III: “Auditory stimuli were either 500 or 700 ms in duration, the latter beginning 100 ms prior to TUS (Supplementary Fig. 3.3). Both durations were presented at two pitches. Using a signal generator (Agilent 33220A, Keysight Technologies), a 12 kHz sine wave tone was administered over speakers positioned to the left of the participant as in Fomenko and colleagues (2020). Additionally, a 1 kHz square wave tone with 0.5 ms long pulses was administered as in Experiments I, II, IV, and prior research (Braun et al., 2020) over noisecancelling earbuds.”
  
  Experiment IV: “We additionally applied stimulation both with and without a continuous auditory masking stimulus that sounded similar to the auditory confound. The stimulus consisted of a 1 kHz square wave with 0.3 ms long pulses. This stimulus was presented through wired bone-conducting headphones (LBYSK Wired Bone Conduction Headphones). The volume and signal-to-noise ratio of the masking stimulus were increased until the participant could no longer hear TUS, or until the volume became uncomfortable.”
  
  In the revised manuscript we have also open-sourced the audio files used in Experiments I, II, and IV, as well as a recording of the output of the signal generator for Experiment III:
  
  “Auditory stimuli used for sound-sham and/or masking for each experiment are accessible here: https://doi.org/10.5281/zenodo.8374148.”
  
  2.4) The finding that remaining motor inhibition is observed during acoustically masked trials deserves further discussion.
  
  We agree. Please refer to points 1.16 and 1.18.
  
  2.5) In several places, the authors state to have "improved" control conditions, yet remain somewhat vague on the kind of controls previous work has used (apart from one paragraph where a similar control site is described). It would be useful to include more details on this specific difference to previous work.
  
  In the revised manuscript, we have clarified the control condition used in prior studies as follows:
  
  Abstract:
  
  “Primarily, this study highlights the substantial shortcomings in accounting for the auditory confound in prior TUS-TMS work where only a flip-over sham control was used.”
  
  Introduction:
  
  “To this end, we substantially improved upon prior TUS-TMS studies implementing solely flip-over sham by including both (in)active control and multiple sound-sham conditions.”
  
  Methods:
  
  “We introduced controls that improve upon the sole use of flip-over sham conditions used in prior work. First, we applied active control TUS to the right-hemispheric face motor area, allowing for the assessment of spatially specific effects while also better mimicking ontarget peripheral confounds. In addition, we also included a sound-only sham condition that closely resembled the auditory confound.”
  
  2.6) I also wondered how common TUS protocols are that rely on audible frequencies. If they are common, why do the authors think this confound is still relatively unexplored (this is a question out of curiosity). More details on these points might make the paper a bit more accessible to TUS-inexperienced readers.
  
  Regarding the prevalence of the auditory confound, please refer to point 2.2.
  
  Peripheral confounds associated with brain stimulation can have a strong impact on outcome measures, often even overshadowing the intended primary effects. This is well known from electromagnetic stimulation. For example, the click of a TMS pulse can strongly modulate reaction times (Duecker et al., 2013, PlosOne) with effect sizes far beyond that of direct neuromodulation. Unfortunately, this consideration has not yet fully been embraced by the ultrasonic neuromodulation community. This is despite long known auditory effects of TUS (Gavrilov & Tsirulnikov, 2012, Acoustical Physics). It was not until the auditory confound was shown to impact brain activity by Guo et al., and Sato et al., (2018, Neuron) that the field began to attend to this phenomenon. Mohammadjavadi et al., (2019, BrainStim) then showed that neuromodulation persisted even in deaf mice, and importantly, also demonstrated that ramping ultrasound pulses could reduce the auditory brainstem response (ABR). Braun and colleagues (2020, BrainStim) were the first bring attention to the auditory confound in humans, while also discussing masking stimuli. This was followed by a study from Johnstone and colleagues (2021, BrainStim) who did preliminary work assessing both masking and ramping in humans. Recently, Liang et al., (2023) proposed a new form of masking colourfully titled the ‘auditory Mondrian’. Further research into the peripheral confounds associated with TUS is on the way.
  
  However, we agree that the confound remains relatively unexplored, particularly given the substantial impact it can have, as demonstrated in this paper. What is currently lacking is an assessment of the reproducibility of previous work that did not sufficiently consider the auditory confound. The current study constitutes a strong first step to addressing this issue, and indeed shows that results are not reproducible when using control conditions that are superior to flip-over sham, like (in)active control conditions and tightly matched soundsham conditions. This is particularly important given the fundamental nature of this research line, where TUS-TMS studies have played a central role in informing choices for stimulation protocols in subsequent research.
  
  We would speculate that, with TUS opening new frontiers for neuroscientific research, there comes a rush of enthusiasm wherein laying the groundwork for a solid foundation in the field can sometimes be overlooked. Therefore, we hope that this work sends a strong message to the field regarding how strong of an impact peripheral confounds can have, also in prior work. Indeed, at the current stage of the field, we see no justification not to include proper experimental control moving forward. Only when we can dissociate peripheral effects from direct neuromodulatory effects can our enthusiasm for the potential of TUS be warranted.
  
  2.7) Results, Fig. 2: Why did the authors not directly contrast target TUS and control conditions?
  
  Please refer to point 1.1.
  
  2.8) The authors observe no dose-response effects of TUS. Does increasing TUS intensity also increase an increase in TUS-produced sounds? If so, should this not also lead to doseresponse effects?
  
  We thank the reviewer for this insightful question. Yes, increasing TUS intensity results in an increased volume of the auditory confound. Under certain circumstances this could lead to ‘dose-response’ effects. In the manuscript, we propose that the auditory confounds acts as a cue for the upcoming TMS pulse, thus resulting in MEP attenuation once the cue is informative (i.e., when TMS timing can be predicted by the auditory confound). In this scenario, volume can be taken as the salience of the cue. When the auditory confound is sufficiently salient, it should cue the upcoming TMS pulse and thus result in a reduction of MEP amplitude.
  
  If we take Experiment II as an example (Figure 3B), the 19.06 W/cm2 stimulation would be louder than the 6.35 W/cm2 intensity. However, as both intensities are audible, they both cue the upcoming TMS pulse. One could speculate that the very slight (nonsignificant) further decrease for 19.06 W/cm2 stimulation could owe to a more salient cueing.
  
  One might notice that MEP attenuation is less strong in Experiment I, even though higher intensities were applied. Directly contrasting intensities from Experiments I and II was not feasible due to differences in transducers and experimental design. From the perspective of sound cueing of the upcoming TMS pulse, the auditory confound cue was less informative in Experiment I than Experiment II, because TUS stimulus durations of both 100 and 500 ms were administered, rather than solely 500 ms durations. This could explain why descriptively less MEP attenuation was observed in Experiment I, where cueing was less consistent.
  
  Perhaps more convincing evidence of a sound-based ‘dose-response’ effect comes from Experiment IV (Figure 4B). Here, we propose that continuous masking reduced the salience of the auditory confound (cue), and thus, less MEP attenuation was be observed. Indeed, we see less MEP change for masked stimulation. For the lowest administered volume during masked stimulation, there was no change in MEP amplitude from baseline. For higher volumes, however, there was a significant inhibition of MEP amplitude, though it was still less attenuation than unmasked stimulation. These results indicate a ‘doseresponse’ effect of volume. When the volume (intensity) of the auditory confound was low enough, it was inaudible over the continuous mask (also as reported by participants), and thus it did not act as a cue for the upcoming TMS pulse, therefore not resulting in motor inhibition. When the volume (intensity) was higher, less participants reported not being able to hear the stimulation, so the cue was to a given extent more salient, and in line with the cueing hypothesis more inhibition was observed.
  
  In summary, because the volume of the auditory confound scales with the intensity of TUS, there may be dose-response effects of the auditory confound volume. Along the border of (in)audibility of the confound, as in masked trials of Experiment IV, we may observe dose-response effects. However, at clearly audible intensities (e.g., Experiment I & II), the size of such an effect would likely be small, as both volumes are sufficiently audible to act as a cue for the upcoming TMS pulse leading to preparatory inhibition.
  
  2.9) I wonder if the authors could say a bit more on the acoustic control stimulus. Some sound examples would be useful. The authors control for audibility, but does the control sound resemble the one produced by TUS?
  
  Please refer to point 2.3.
  
  2.10) The authors' claim that the remaining motor inhibition observed during masked trials is due to persistent audibility of TUS relies "only" on participants' descriptions. I think this deserves a bit more discussion. Could this be evidence that there is a TUS effect in addition to the sound effect?
  
  Please refer to points 1.16 and 1.18.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.02.22.527901v3
www.biorxiv.org www.biorxiv.org

New submission 22/09/2023, 09:43:49

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Some sentences need to be clarified and some additional data and references could be added.
  
  1) Line 18
  
  SRY is the sex-determining gene
  
  SRY is the testis-determining gene is more accurate as described in line 44
  
  Modification done
  
  2) Line 50
  
  Despite losing its function in early testis determination in mice, DMRT1 retained part of this function in adulthood when it is necessary to maintain Sertoli cell identity.
  
  Losing its function is misleading. The authors describe firstly that Dmrt1 has no obvious function in embryonic testis development but is critical for the maintenance of Sertoli cells in adult mice. The wording "losing its function in early testis" is confusing. Do the authors mean that despite the expression of Dmrt1 in early testis development, the function of Dmrt1 seems to be restricted to adults in mice? A comparison between the testis and ovary should be more cautious since GarciaAlonso et al (2022) have shown that the transcriptomics of supporting cells between humans and mice is partly different.
  
  That’s what we thought, and the sentence has been changed as follow: “Although DMRT1 is not required for testis determination in mice, it retained part of its function in adulthood when it is necessary to maintain Sertoli cell identity.” (line 51 to 53)
  
  3) Line 78
  
  XY DMRT1-/- rabbits showed early male-to-female sex reversal.
  
  Sex reversal indicates that there is no transient Sertoli cell differentiation that transdifferentiate into granulosa cells. This brings us to an interesting point. In the case of reprogramming, the transient Sertoli cells can produce AMH leading to the regression of the Mullerian ducts. In humans, some 9pdeleted XY patients have Mullerian duct remnants and feminized external genitalia. This finding indicates early defects in testis development.
  
  Is there also feminized external genitalia in XY Dmrt1−/− rabbits. Can the authors comment on the phenotype of the ducts?
  
  We proposed to add “and complete female genitalia” at the end of the following sentence: “Secondly, thanks to our CRISPR/Cas9 genetically modified rabbit model, we demonstrated that DMRT1 was required for testis differentiation since XY DMRT1-/- rabbits showed early male-tofemale sex reversal with differentiating ovaries and complete female genitalia.” (line 77 to 80)
  
  Indeed, since the first stage (16 dpc) where we can predict the sex of the individual by observing its gonads during dissection, we always predict a female sex for XY DMRT1 KO fetuses. It is only genotyping that reveals an XY genotype. At birth, our rabbits are sexed by technicians from the facility and again, but now based on the external genitalia, they always phenotype these rabbits as female ones. In these XY KO rabbits, the supporting cells never differentiate into Sertoli, and ovarian differentiation occurs as early as in XX animals. Thus, these animals are fully feminized with female internal and external genitalia. Most of 9p-deleted patients are not homozygous for the loss-offunction of DMRT1, and the remaining wild-type allele could explain the discrepancy between KO rabbits and humans.
  
  4) Line 53
  
  In the ovary, an equivalent to DMRT1 was observed since FOXL2 (Forkhead family box L2) is expressed in female supporting cells very early in development.
  
  Can the authors clarify what is the equivalent of DMRT1, is it FOXL2? DMRT1 heterozygous mutations result in XY gonad dysgenesis suggesting haploinsufficiency of DMRT1. However, to my knowledge, there is no evidence of haploinsufficiency in XX babies. Thus can we compare testis and ovarian genetics?
  
  We agree, the term “equivalent” is ambiguous, and we changed the sentence as follows: “In ovarian differentiation, FOXL2 (Forkhead family box L2) showed a similar function discrepancy between mice and goats as DMRT1 in the testis pathway. In the mouse, Foxl2 is expressed in female supporting cells early in development but does not appear necessary for fetal ovary differentiation. On the contrary, it is required in adult granulosa cells to maintain female-supporting cell identity.” (line 53 to 56)
  
  Regarding reviewer 2's question on haploinsufficiency in humans: the patient described in Murphy et al., 2015 is an XY individual with complete gonadal dysgenesis. But, it has been shown that the mutation carried by this patient leads to a dominant-negative protein, equivalent to a homozygous state (Murphy et al., 2022).
  
  For FOXL2 mutation in XX females, haploinsufficiency does not affect early ovarian differentiation (no sex reversal) but induces premature ovarian failure.
  
  We agree with the reviewer, we cannot compare testis and ovarian genetics considering two different genes.
  
  5) Line 55
  
  In mice, Foxl2 does not appear necessary for fetal ovary differentiation (Uda et al., 2004), while it is required in adult granulosa cells to maintain female-supporting cell identity (Ottolenghi et al., 2005). The reference Uhlenhaut et al (2009) reporting the phenotype of the deletion of Foxl2 in adults should be added.
  
  The reference has been added.
  
  6) Line 64<br /> These observations in the goat suggested that DMRT1 could retain function in SOX9 activation and, thus, in testis determination in several mammals.
  
  Lindeman et al (2021) have shown that DMRT1 can act as a pioneer factor to open chromatin upstream and Dmrt1 is expressed before Sry in mice (Raymond et al, 1999, Lei, Hornbaker et al, 2007). Whereas additional factors may compensate for the absence of Dmrt1, these results suggest that DMRT1 is also involved in Sox9 activation.
  
  Dmrt1 is indeed expressed before Sry/Sox9 in the mouse gonad. However, no binding site for DMRT1 could be observed at Sox9 enhancer 13 in mice. This does not support a role for DMRT1 in the activation of Sox9 expression in this species. Furthermore, in Lindeman et al 2021, the authors clearly state that DMRT1 acts as a pioneering factor for SOX9 only after birth. It does not appear to have this role before. One of the explanations put forward is that the state of chromatin is different during fetal development in mice: chromatin is more permissive and does not require a factor to facilitate its opening. This hypothesis is based in particular on the description of a similar chromatin profile in the precursors of XX and XY fetal supporting cells, where many common regions display an open structure (Garcia-Moreno et al., 2019). Once sex determination and differentiation are established, a sex-specific epigenome is set up in gonadal cells. Chromatin remodeling agents are then needed to regulate gene expression. We hypothesize that in non-murine mammals such as rabbits, the state of gonadal cell chromatin would be different in the fetal period, more repressed, requiring the intervention of specific factors for its opening, such as DMRT1.
  
  7) Figure 1
  
  Most of the readers might not be familiar with the developmental stages of the gonad in rabbits. A diagram of the key stages in gonad development would facilitate the understanding of the results.
  
  Thank you, it has been added in Figure 1.
  
  8) Figure 2
  
  Arrowheads are difficult to spot, could the authors use another color?
  
  Done
  
  9) Line 117: can the authors comment on the formation of the tunica albuginea? Do the epithelial cells acquire some specific characteristics?
  
  The formation of the tunica albuginea begins with the formation of loose connective tissue beneath the surface epithelium of the male gonad. The appearance of this tissue is concomitant with the loss of expression of DMRT1 in the cell of the coelomic epithelium. Our interpretation is that the contribution of the cells from the coelomic epithelium and their proliferation stops when the tunica begins to form because the structure of the tissue beneath the epithelium change, and the cellular interactions between the epithelium and the tissue below remain disrupted. By contrast, these interactions persist in the ovary until around birth for ovigerous nest formation.
  
  10) The first part of the results described DMRT1 expression in rabbits. With the new single-cell transcriptomic atlas of human gonads, it would be important to describe the pattern of expression in this species. This could be described in the introduction in order to know the DMRT1 expression pattern in the human gonad before that of the rabbit.
  
  A comment on the expression pattern of DMRT1 in human fetal gonads has been added in the discussion section: “In the human fetal testis, DMRT1 expression is co-detected with SRY in early supporting gonadal cells (ESCGs), which become Sertoli cells following the activation of SOX9 expression (Garcia-Alonso et al., 2022) » (line 222 to 224)
  
  11) Figure 3 supplement 3
  
  Dotted line: delimitation of the ovarian surface epithelium. Could the authors check that there is a dotted line?
  
  Done
  
  12) Figure 5 and Line 186
  
  Quantification is missing such as the % of germ cells, % of meiotic germ cells.
  
  Quantification is not easy to realize in rabbits because of the size and the elongated shape of the gonad. Indeed, it’s difficult to be sure that both sections (one from WT, the other from KO) are strictly in a similar region of the gonad and that the section is perfectly longitudinal or not. See also our answer to reviewer 3 (point 7) on this aspect. Actually, we are trying to make a better characterization of this XX phenotype and to find a marker of the pre-leptotene/leptotene stage susceptible to work in rabbits (SYCP3 will be the best, but we encountered huge difficulties with different antibodies and even RNAscope probe!). So actually, the most convincing indirect evidence of this pre-meiotic blockage (in addition to HE staining at 18 dpp in the new Figure 6) is the persistence of POU5F1 (pluripotency), specifically in the germinal lineage of KO XX and XY gonads. In addition to the new figure supplement 5, we can show you in Author response image 1: (i) the gonadal section at a lower magnification, where it is evident that there is a big difference between WT and KO germ cell POU5F1-stainings; and (ii) POU5F1 expression from a bulk RNA-seq realized the day after birth at 1 dpp where the difference is also transcriptionally very clear.
  
  Author response image 1.
  
  13) Line 186,
  
  E is missing at preleptoten
  
  Added
  
  14) Figure supplement 7.
  
  A magnification of the histology of the gonads is missing.
  
  This figure is only for showing the gonadal size, and there are the same gonads as in the new Figure 6. So, the magnification is represented in Figure 6.
  
  15)Discussion
  
  Line 201
  
  SOX9, well known in vertebrates,
  
  The references of the human DSD associated with SOX9 mutations are missing. Thank you, references have been added.
  
  16) Line 286
  
  One of the targets of WNT signaling is Bmp2 in the somatic cells and in turn, Zglp1, which is required for meiosis entry in the ovary as shown by Miyauchi et al (2017) and Nagaoka et al (2020). Does the level of BMP pathway vary in DMRT1 mutants?
  
  At 20 dpc, the expression level of BMP2 in XY and XX DMRT1 mutants gonads is similar to the one of XX control which is lower than in XY control (see the TMP values from our RNA-seq in Author response image 2).
  
  Author response image 2.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Here are my minor comments:
  
  1) Line 106- You mention that coelomic epithelial cells only express DMRT1. Please add an arrow to highlight where you refer to.
  
  Done
  
  2) Line 112: In mice, the SLCs also express Sox9 but not Sry apart from Pax8. You mention here that the SLCs are expressing SRY and DMRT1 in addition to PAX8. Could you perhaps explain the difference? Please refer to that in the results or discussion.
  
  We add a new sentence at the end of this paragraph on SLCs: “As in mice, these cells will express SOX9 at the latter stages (few of them are already SOX9 positive at 15 dpc), but unlike mice, they express SRY.” (line 114 to 115)
  
  We already have collaborations with different labs on these SLC cells, and we will certainly come back later on this aspect, remaining slightly off-topic here.
  
  3) Could you please explain why did you chose to target Exon 3 of DMRT1 and not exons 1-2 which contain the DM domain? Was it to prevent damaging other DMRT proteins? Is there an important domain or function in Exon 2?
  
  Our choice was mainly based on technical issues (rabbit genome annotation & sgRNA design), but also we want to avoid targeting the DM domain due to its strong conservation with other DMRT genes. Due to the poor quality of the rabbit genome, exons 1 and 2 are not well annotated in this species. We have amplified and sequenced the region encompassing exons 1 & 2 from our rabbit line, but the software used for sgRNA design does not predict good guides on this region. The two best sgRNAs were predicted on exon 3, and we used both to obtain more mutated alleles.
  
  4) Your scheme in Supp Figure 4 is not so clear. It is not clear that the black box between the two guides is part of Exon 3 (labelled in blue).
  
  The scheme has been improved.
  
  5) Did you only have 1 good founder rabbit in your experiment? Why did you choose to work with a line that had duplication rather than deletion?
  
  Very good point! In the first version of this paper, we’d try to explain the long (around 2 years) story of breeding to obtain the founder animal. Here it is:
  
  During the genome editing process, we generate 6 mosaic founder animals (5 males and 1 female), then we cross them with wild-type animals to isolate each mutated allele in F1 offspring used afterward to establish and amplify knockout lines. Unexpectedly, we observe a very slow ratio of mutated allele transmission (5 on 129 F1 animals), and only one mutated allele has been conserved from the unique surviving adult F1 animal. It consists of an insertion of the deleted 47 bp DNA fragment, flanked by the cutting sites of the two RNA guides used with Cas9.<br /> The main hypothesis to explain this mutation event is that in the same embryonic cell, the deletion occurs on one allele then the deleted fragment remains inserted into the other allele. Under this scheme, the embryonic cell carries a homozygous DMRT1 knockout genotype, albeit heterogeneous, with a deleted allele (del47) and the present allele (insertion of a 47 bp fragment leading to an in sense duplication). This may explain the very low frequency of transmission since all germ cells carrying a homozygous DMRT1-/- genotype will probably not be able to enter the meiotic process as suggested by our results on XX and XY DMRT1-/- ovaries. Finally, and under this hypothesis, the way we obtained this unique founder animal remains a mystery!
  
  6) Figure 4- real-time data- where does it say what is a,b,c,d of the significance? It should appear on the figure itself and not elsewhere.
  
  Modification done.
  
  7) If I understand correctly, you were able to get the rabbits born and kept to adulthood (you show in supp figure 7 their gonads). What was the external phenotype of these rabbits? Did the XY mutant gonads have the internal and external genitals of a female (oviduct, uterus, vagina etc.)?
  
  See our answer to Reviewer 1 on this question (point 3).
  
  8) Line 20: It is more correct to write 46, XY DSD rather than XY DSD
  
  Modification done.
  
  9) Line 21: you can remove the "the" after abolished
  
  Modification done.
  
  10) Line 31: consider replacing the first "and" by "as well as" since the sentence sounds strange with two "and".
  
  Modification done.
  
  11) Line 212- Please check with the eLife guidelines if they allow "data not shown" in the paper.
  
  This is unspecified.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The following points should be addressed.
  
  1) The in situ's in Fig 1 and 2 are very clear. Fig 1 and Fig 2, In situ hybridisation in tissue sections, it looked like DMRT1 could be expressed in some cells where SRY mRNA is absent @ E13.5dpc and 14.5 dpc. Do you think this is real, or maybe Sry is turned off now in those cells?
  
  Based on the results of in situ hybridizations, DMRT1 appears to be expressed by both coelomic epithelium and genital crest medullar cells in a pattern that is actually broader than that of SRY. Moreover, in rabbits, SRY expression seems to start in the medulla of the genital ridge rather than in the surface epithelium, as described in mice (see Figure 1 at 12 and 13 dpc). Nevertheless, more detailed analyses are needed to ensure the lineage of cells expressing SRY and/or DMRT1, such as single-cell RNAseq at these key stages of sexual determination in rabbits (from 12 to 16 dpc).
  
  2) It is curious that SRY expression is elevated in the DMRT1 KO (Knockout) rabbit gonads. Does this suggest feedback inhibition by DMRt1, or maybe indirect via effect on Sox9 (as I believe Sox9 feeds back to down-regulate Sry in mouse, for example).
  
  The maintenance of SRY expression in the DMRT1 -/- rabbit testis seems to be linked to the absence of SOX9 expression. We believe that, as in mice, SOX9 would down-regulate SRY (even if, in rabbits, SRY expression is never completely turned off).
  
  3) I suggest the targeting strategy and proof of DMRT1 knockout by sequencing etc. be brought out of the suppl. Data and shown as a figure in the text.
  
  See also our answer to reviewer 2 (point 5). It has needed huge efforts to obtain these DMRT1 mutated rabbit line, and of course, it constitutes the basis of the study. But regarding the title and the main message of the article, we are not convinced that the targeting strategy should be moved into the main text.
  
  4) Unless there are limitations imposed by the journal, I also feel that Suppl Fig 5 (the immunostaining) deserves to be in the paper text too. The Fig showing loss of DMRt1 by immunostaining is important.
  
  We include the figure supplement 5 in the main text. So, Figure 4E and figure supplement 5 have been combined into a new Figure 5.
  
  5) The RT-qPCR data should have the statistics clarified on the graphs. (e.g., it is stated that, although Sox9 mRNA is clearly down, there is a slight increase compared to control on KO XX gonads. Is this statistically significant? Figure legend states that the Kruskal-Wallis test is used, and significance is shown by letters. This is unclear. It would be better to use the more usual asterisks and lines to show comparisons.
  
  Modification done.
  
  6) Reference is made to DMRT1+/- rabbits having aberrant germ cell development, pointing to a dosage effect. This is interesting. Does the somatic part of the gonad look completely normal in the het knockouts?
  
  DMRT1 heterozygous male rabbits have a phenotype of secondary infertility with aging, and we are trying now to better characterize this phenotype. The problem is complex because, as we cannot carry out conditional KO, it remains difficult to decipher the consequence of DMRT1 haploinsufficiency in the Sertoli cells versus the germinal ones. Anyway, the somatic part is sufficiently normal to support spermatogenesis since heterozygous males are fertile at puberty and for some months thereafter.
  
  7) Can the authors indicate why meiotic markers were not used to explore the germ cell phenotype? It would be advantageous to use a meiotic germ cell marker to definitely show that the germ cells do not enter meiosis after DMRT1 loss. (Not just H/E staining or maintenance of POU). Example SYCP3, or STRA8 (as pre-meiotic marker) by in situ or immunostaining. Even though no germ cells were detected in adult KO gonads.
  
  The expression of pre-meiotic or meiotic markers is currently under study in DMRT1 -/- females. Transcriptomic data (RNA-seq) are also being analyzed. We are preparing a specific article on the role of DMRT1 in ovarian differentiation in rabbits. We felt it was important to reveal the phenotype observed in females in this first article, but we still need time to refine our description and understanding of the role of DMRT1 in the female.
  
  8) What future studies could be conducted? In the Discussion section, it is suggested that DMRT1 could act as a pioneering factor to allow SRY action upon Sox9. How could this be further explored?
  
  To explore the function of DMRT1 as a pioneering factor, it now seems necessary to characterize the epigenetic landscapes of rabbit fetal gonads expressing or not DMRT1 (comparison of control and DMRT1-/- gonads). Two complementary approaches could be privileged: the study of chromatin opening (ATAC-seq) and the analysis of the activation state of regulatory regions (CUT&Tag). The study of several histone marks, such as H3K4me3 (active promoters), H3K4me1 (primed enhancers), H3K27ac (enhancers and active promoters), and H3K27me3 (enhancers and repressed promoters), would be of great interest. However, these techniques are only relevant for gonads that can be separated from the adjacent mesonephros, which is only possible from the 16 dpc stage in rabbits. To perform a relevant analysis at earlier stages, a "single-nucleus" approach such as ATAC-seq singlenucleus or multi-omic single-nucleus combining ATAC-seq and RNA-seq could be used.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.13.523925v3
www.biorxiv.org www.biorxiv.org

GnRH pulse generator activity in mouse models of polycystic ovary syndrome

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews.
  
  Reviewer #1 (Public Review):
  
  The manuscript involves 11 research vignettes that interrogate key aspects of GnRH pulse generator in two established mouse models of PCOS (peripubertal and prenatal androgenisation; PPA and PNA) (9 of the vignettes focus on the latter model).
  
  A key message of this paper is that the oft-quoted idea of rapid GnRH/LH pulses associated with PCOS is in fact not readily demonstrable in PNA and PPA mice. This is an important message to make known, but when established dogmas are being challenged, the experiments behind them need to be robust. In this case, underpowered experiments and one or two other issues greatly limit the overall robustness of the study.
  
  General critiques
  
  (1) My main concern is that many/most of the experiments were limited to 4-5 mice per group (PPA experiments 1 and 2, PNA experiments 3, 5, 6, 8, and 9). This seems very underpowered for trying to disprove established dogmas (sometimes falling back on "non-significant trends" - lines 105 and 239).
  
  For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient.
  
  It is pertinent to explore the “established dogma”. While there is every expectation that the PNA model should have increased LH pulsatility, in fact there is only a single study (Moore, Prescott et al. 2015) that has shown this. The two other reports that have examined this issue find no change in LH pulse frequency (McCarthy, Dischino et al. 2021 and ours). Hence, we would suggest that expectations rather than evidence presently maintains the PNA “dogma”. For the PPA model, there is in fact not a single paper reporting increased LH pulse frequency.
  
  (2) Page 133-142: it is concerning that the PNA mice didn't have elevated testosterone levels, and this clearly isn't the fault of the assay as this was re-tested in the laboratory of Prof Handelsman, an expert in the field, using LCMS. The point (clearly made in lines 315-336 of the Discussion) that elevated testosterone in PNA mice has been shown in some but not other publications is an important concern to describe for the field. However, the fact remains that it IS elevated in numerous studies, and in the current study it is not so, yet the authors go on to present GnRH pulse generator data as characteristic of the PNA model. Perhaps a demonstration of elevated testosterone levels (by LCMS?) should become a standard model validation prerequisite for publishing any PNA model data.
  
  We provide a Table below showing the huge inconsistencies in testosterone levels reported in the PNA mouse model. If anything, these inconsistencies might be explained by age, although again this is very variable between studies. Much the same as the “dogma” related to LH pulsatility in the PNA model, we would question whether there is any robust increase in testosterone levels in this model. There is no question that women with PCOS have elevated testosterone but whether the PNA mouse is a good model for this is debatable. We have noted this caution and the need for further LC-MS studies in the Discussion.
  
  Author response table 1.
  
  *Same ELISA used in the current study.
  
  (3) Line 191-196: the lack of a significant increase in LH pulse frequency in PNA mice is based on measurements using reasonable group sizes (7-8), although the sampling frequency is low for this type of analysis (10-minute intervals; 6-minute intervals would seem safer for not missing some pulses). The significance of the LH pulse frequency results is not stated (looks like about p=0.01). The authors note that LH concentration IS elevated (approximately doubled), and this clearly is not caused by an increase in amplitude (Figure 4 G, H, I). These things are worth commenting on in the discussion.
  
  We have included the p-value of the LH pulse frequency results and included the relevant discussion.
  
  (4) An interesting observation is that PNA mice appear to continue to have cyclical patterns of GnRH pulse generator activity despite reproductive acyclicity as determined by vaginal cytology (lines 209-241). This finding was used to analyse the frequency of GnRH pulse generator SEs in the machine-learning-identified diestrous-like stage of PNA mice and compare it to diestrous control mice (as identified by vaginal cytology?) (lines 245-254). The idea of a cycle stage-specific comparison is good, but surely the only valid comparison would be to use machine-learning to identify the diestrous-like stage in both groups of mice. Why use machine learning for one and vaginal cytology for the other?
  
  As “machine learning-defined” diestrus is based on the control vaginal cytology information, the diestrous mice are in fact defined by the same machine learning parameters. We have now noted this.
  
  Specific points
  
  (5) With regard to point 2 above, it would be helpful to note the age at which the testosterone samples were taken.
  
  We have included the age in the method.
  
  (6) Lines 198-205 and 258-266: I think these are repeated measures of ANOVA data? If so, report the main relevant effect before the post hoc test result.
  
  We have included the relevant main effect in the manuscript.
  
  (7) Line 415: I don't think the word "although" works in this sentence.
  
  We have changed the wording accordingly.
  
  (8) Lines 514-518: what are the limits of hormone detection in the LCMS assay?
  
  These were originally stated in the figure legend but have now been included in the Methods.
  
  Reviewer #2 (Public Review):
  
  Summary
  
  The authors aimed to investigate the functionality of the GnRH (gonadotropin-releasing hormone) pulse generator in different mouse models to understand its role in reproductive physiology and its implications for conditions like polycystic ovary syndrome (PCOS). They compared the GnRH pulse generator activity in control mice, peripubertal androgen (PPA) treated mice, and prenatal androgen (PNA) exposed mice. The study sought to elucidate how androgen exposure affects the GnRH pulse generator and subsequent LH (luteinizing hormone) secretion, contributing to the pathophysiology of PCOS.
  
  Strengths
  
  (1) Comprehensive Model Selection: The use of both PPA and PNA mouse models allows for a comparative analysis that can distinguish the effects of different timings of androgen exposure.
  
  (2) Detailed Methodology: The methods employed, such as photometry recordings and serial blood sampling, are robust and allow for precise measurement of GnRH pulse generator activity and LH secretion.
  
  (3) Clear Results Presentation: The experimental results are well-documented with appropriate statistical analyses, ensuring the findings are reliable and reproducible.
  
  (4) Relevance to PCOS: The study addresses a significant gap in understanding the neuroendocrine mechanisms underlying PCOS, making the findings relevant to both basic science and potentially clinical research.
  
  Weaknesses
  
  (1) Model Limitations: While the PNA mouse model is suggested as the most appropriate for studying PCOS, the authors acknowledge that it does not completely replicate the human condition, particularly the elevated LH response seen in women with PCOS.
  
  We agree.
  
  (2) Complex Data Interpretation: The reduced progesterone feedback and its effects on the GnRH pulse generator in PNA mice add complexity to data interpretation, making it challenging to draw straightforward conclusions.
  
  We agree.
  
  (3) Machine Learning (ML) Selection and Validation: While k-means clustering is a useful tool for pattern recognition, the manuscript lacks detailed justification for choosing this specific algorithm over other potential methods. The robustness of clustering results has not been validated.
  
  Please see below.
  
  (4) Biological Interpretability: Although the machine learning approach identified cyclical patterns, the biological interpretation of these clusters in the context of PCOS is not thoroughly discussed. A deeper exploration of how these clusters correlate with physiological and pathological states could enhance the study's impact.
  
  It is presently difficult to ascribe specific functions of the various pulse generator states to physiological impact. While it is reasonable to suggest that Cluster_0 activity (representing very infrequent SEs) is responsible for the estrous/luteal-phase pause in pulsatility, we remain unclear on the physiological impact of multi-peak SEs on LH secretion, even in normal mice (see Vas et al., Endo 2024). Thus, for the moment, it is most appropriate to simply state that pulse generator activity remains cyclical in PNA mice without any unfounded speculation.
  
  (5) Sample Size: The study uses a relatively small number of animals (n=4-7 per group), which may limit the generalisability of the findings. Larger sample sizes could provide more robust and statistically significant results.
  
  For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient. Some of the subsequent experiments do have smaller N numbers and we are particularly aware of the progesterone treatment study that only has N=3 for the PNA group. However, as this was sufficient to show a statistical difference we did not generate more mice.
  
  (6) Scope of Application: The findings, while interesting, are primarily applicable to mouse models. The translation to human physiology requires cautious interpretation and further validation.
  
  We agree.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The validation of clustering results through additional metrics or comparison with other algorithms would strengthen the methodology. Specifically, the authors selected k=5 for k-means clustering without providing an explicit rationale or evidence of exploratory data analysis (EDA) to support this choice. They refer to their previous publication (Vas, Wall et al. 2024), which does not provide any EDA regarding the choice of a number of clusters nor their robustness. The arbitrary selection of "k" without justification can undermine confidence in the clustering results since clustering results heavily depend on "k". The authors also choose to use Euclidean distance as the "numerical measure" setting in the RapidMiner Studio's software without justification given the chosen features used for clustering and their properties. The lack of exploratory analysis to determine the optimal number of clusters, "k", to be considered means that the authors might have missed identifying the true structure of the data. Common cluster robustness methods, like the elbow method or silhouette analysis, are crucial for justifying the number of clusters. An inappropriate choice could lead to incorrect conclusions about the synchronisation patterns of ARN kisspeptin neurons and their implications for the study's hypotheses. Including EDA and other validation techniques (e.g., silhouette scores, elbow method) would have strengthened the manuscript by providing empirical support for the chosen algorithm and settings.
  
  It is important to clarify that we did not start this exercise with an unknown or uncharacterised data set and that the objective of the clustering was not to provide any initial pattern to the data. Rather, our aim was to develop an unsupervised approach that would automatically detect the onset and existence of the key features of pulse generator cyclicity that were apparent by eye e.g. the estrous stage slowing and the presence of multi-peak SEs in metestrous. As such, our optimization was driven by the data as well as observation while retaining the unsupervised nature of k-means clustering. We started by assessed 10 variables describing all possible features of the recordings and through a process of elimination found that just 5 were sufficient to describe the key stages of the cycle. While we appreciate that the use of multiple different algorithms would progressively increase the robustness of the machine learning approach, it is evident that the current k-means approach with k=5 is already very effective at reporting the estrous cyclicity of the pulse generator in normal mice (Vas et al., Endo 2024). Having validated this approach, we have now used it here to compare the cyclical patterns of activity of PNA- and vehicle-treated mice.
  
  (2) The data and methods presented in this study could be valuable for the research community studying reproductive endocrinology and neuroendocrine disorders provided the authors address my comments above regarding the application of ML methods. The insights gained from this work could potentially inform clinical research aiming to develop better diagnostic and therapeutic strategies for PCOS.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Zhou and colleagues elegantly used pre-clinical mouse models to understand the nature of abnormally high GnRH/LH pulse secretion in polycystic ovary syndrome (PCOS), a major endocrine disorder affecting female fertility worldwide. This work brings a fundamental question of how altered gonadotropin secretion takes place upstream within the GnRH pulse generator core, which is defined by arcuate nucleus kisspeptin neurons.
  
  Strengths:
  
  The authors use state-of-the-art in vivo calcium imaging with fiber photometry and important physiological manipulations and measurements to dissect the possible neuronal mechanisms underlying such neuroendocrine derangements in PCOS. The additional use of unsupervised k-means clustering analysis for the evaluation of calcium synchronous events greatly enhances the quality of their evidence. The authors nicely propose that neuroendocrine dysfunction in PCOS might involve different setpoints through the hypothalamic-pituitary-gonadal (HPG) axis, and beyond kisspeptin neurons, which importantly pushes our field forward toward future investigations.
  
  Weaknesses:
  
  Although the authors provide important evidence, additional efforts are required to improve the quality of the manuscript and back up their claims. For instance, animal experiments failed to detect high testosterone levels in PNA female mice, a well-established PCOS mouse model. Considering that androgen excess is a hallmark of PCOS, this highly influences the subsequent evaluation of calcium synchronous events in arcuate kisspeptin neurons and the implications for neuroendocrine derangements.
  
  Please see our response to Reviewer 1. It will be important to establish a robust PCOS mouse model in the future that has elevated pulse generator activity in the presence of elevated testosterone concentrations.
  
  Authors also may need to provide LH data from another mouse model used in their work, the peripubertal androgen (PPA) model. Their claims seem to fall short without the pairing evidence of calcium synchronous events in arcuate kisspeptin neurons and LH pulse secretion.
  
  We have demonstrated that ARN-KISS neuron SEs are perfectly correlated with pulsatile LH secretion in intact and gonadectomized male and female mice on many occasions. Given that the pulse generator frequency slows by 50% in PPA mice, it is very hard to imagine how this could result in an elevated LH pulse frequency. While we were undertaking these studies the first paper (to our knowledge) looking at pulsatile LH secretion in the PPA model was published; no change was found.
  
  Another aspect that requires reviewing, is further exploration of their calcium synchronous events data and the increase of animal numbers in some of their experiments.
  
  Please see below.
  
  Reviewer #3 (Recommendations For The Authors):
  
  The reviewer believes that this work will greatly contribute to the field and, to provide better manuscript quality, there might be only a few minor and major revisions to be included in the future version.
  
  Minor:
  
  (1) Line 17: I would change the sentence to "One in ten women in their reproductive age suffer from PCOS" to adapt to more accurate prevalence studies.
  
  We have revised the sentence as recommended.
  
  (2) Line 18 and 19: Although the evidence indeed points to a high LH pulse secretion in PCOS, I would change it to "with increased LH secretion" as most studies show mean values and not LH pulse release data.
  
  While we agree that most human studies show a mean increase in LH, when assessed with sufficient temporal resolution, this results from elevated LH pulse frequency. As such, and to keep the manuscript focussed on the pulse generator, we would like the retain the present wording.
  
  (3) Line 47: Please correct "polycystic ovaries" to polycystic-like ovarian morphology to adapt to the current AEPCOS guidelines.
  
  We have revised the sentence as recommended.
  
  (4) Line 231: Authors stated that "These PNA mice exhibited a cyclical pattern of activity similar to that of control mice" (Figure 5C and D). Please, include the statistical tests here for this claim. Although they say there aren't differences, the colored fields do not reflect this and seem quite different. Could the authors re-evaluate these claims or provide better examples in the figure?
  
  We used Sidak’s multiple comparisons tests for this analysis (as stated in Results). The key data for assessing overall cyclical activity in PNA and control mice is Fig 5B which suggest very little difference. We accept that the individual traces of activity (Fig.5D) do not look identical to controls and, indeed, they are representative of the data set. The key point is they remain cyclical in an acyclic mouse. We have made sure that this is clear in the text.
  
  (5) Subheadings 6 and & of the result section: It sounds confusing to read the foremost claims of the absence of SE differences and next have a clear SE frequency difference in Figures 6 C and D. The reviewer suggests that authors could reorganize the text and figures to make their rationale flow better for future readers.
  
  We have considered this point carefully but find that re-organization creates its own problems with having to use the machine learning algorithm before describing it. It will always be problematic to incorporate this type of data-reanalysis in an original paper but think this present sequence is the best that can be achieved.
  
  (6) Discussion: If PNA female mice did not have elevated testosterone levels, how can the authors compare their results to the current literature? Could this be the case for lacking a more robust ARNKISS neuronal activity output in their experiments? The reviewer recommends a better discussion concerning these aspects.
  
  Please refer to our response to Reviewer #1 comment (2).
  
  (7) Discussion: the authors claim that diestrous PNA mice exhibited highly variable patterns of ARNKISS neuron activity. Would these differences be due to different circulating sex steroid levels or intrinsic properties? Would the inclusion of future in vitro calcium imaging (brain slices) studies contribute to their research question and conclusions? The reviewer recommends a better discussion concerning these aspects.
  
  We have tried to clarify that the highly variable patterns of activity in “diestrous” PNA mice come from the fact that we are actually randomly recording from ARN-KISS neurons at metestrus, diestrus, proestrus and estrus. The pulse generator is cycling but we only have the acyclic “diestrous” smear to go by. This also makes brain slice studies difficult as we would never know the actual cycle stage.
  
  Major:
  
  (1) Results section: The reviewer strongly recommends that the LH pulse secretion data for the PPA group be included in the manuscript. If the SEs represent the central mechanism of pulse generation, would the LH pulse frequency match those events? If not, could a mismatch be explained by androgen-mediated negative feedback at the pituitary level? What is the pituitary LH response to exogenous GnRH (i.p. injection) in the PPA group?
  
  Our initial observation showed the frequency of ARNKISS neuron SEs was halved in PPA mice compared to controls. Additionally, one study reported pulsatile LH secretion to be unchanged in this animal model (Coyle, Prescott et al. 2022). Both pieces of evidence clearly indicate that the PPA mouse does not provide an appropriate PCOS model of elevated pulse generator activity. Therefore, we do not see the value of pursuing further experiments in this animal model.
  
  (2) Although the evaluation of relative frequency and normalized amplitude indicate the dynamic over time, the authors should include the average amplitudes and frequencies of events within the recording session. For instance, looking at Figures 1 A and B and Figures 3 A and B, a reader can observe differences in the amplitude due to different scaling axes. Perhaps, using a Python toolbox such as GuPPy or any preferred analysis pipeline might help authors include these parameters.
  
  The amplitude of recorded SEs for each mouse depends primarily on the fiber position. As such, it has only ever been possible to assess SE amplitude changes within the same mouse. It is not possible to assess differences in SE amplitude between mice.
  
  (3) Line 144-156: (Immunoreactivity results): Authors should proceed with caution when describing these results and clearly state that results show a software-based measurement of immunoreactive signal intensity. In addition, the small sample size of the PNA group (N = 4) compared to controls (N = 6-7) seems to mask possible differences. Could the authors increase the N of the PNA group and re-evaluate these results?
  
  We have clarified that the immunoreactive signal intensity is based on software-based measurement. The N number for PNA mice in these studies varies from 4 to 6 depending on brain section availability for the different immunohistochemistry runs. The scatter of data is such that any new data points would need to be at the extreme of the distributions to likely have any impact on statistical significance. As a minor part of the paper, we did not feel that the use of further mice was warranted.
  
  (4) Considering the great variability of PNA's number of SE/hr, the review suggests increasing the N in this group, thus, authors can re-evaluate their findings and draw better analysis/ conclusion.
  
  We have n=6 for the PNA group in the study. As noted above, the variability in SE/hr in Figure 3 comes from assessing the pulse generator at random times within the estrous cycle. Once we separate “diestrous-like” stage for the PNA animals, the variability is decreased as shown in Figure 6.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.04.597387v2
www.biorxiv.org www.biorxiv.org

A Pvr–AP-1–Mmp1 signaling pathway is activated in astrocytes upon traumatic brain injury

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  To resolve and further test the claim that TBI did not induce cell proliferation:
  
  How many brains did they analyse? Sample sizes must be provided in Figure S1.
  
  As per reviewer’s suggestion, we removed one of the unsupported claims shown in Figure S1. The original Figure S1 is shown below with the sample number added.
  
  Author response image 1.
  
  The authors could either improve the TBI method or the detection of cells in S-phase, mitosis or cycling. They could use PCNA-GFP or BrdU, EdU or FUCCI instead and at least provide evidence that they can detect cells in S-phase in intact brains. Timing is critical (ie cell cycle is longer than in larvae) so multiple time points should be tested. Or they could use pH3 but test more time points and rather large sample sizes. If they are not able to provide any evidence, then their lack of evidence is no evidence. The authors should consider removing pH3 and PCNA-GFP related claims instead.
  
  We have removed pH3 and PCNA-GFP related results and claims.
  
  Other unsupported claims:
  
  Figure 2A-C is not very clear what they are showing, but it is not evidence of astrocyte hypertrophy. It does not have cellular resolution and does not show the cell size, membranes, nor number
  
  (1) We have avoided the term “hypertrophy” and changed the description throughout the text to “astrocyte swelling”.
  
  (2) Images in the resolution of Figure 2E and 2F were able to show the enlarged soma of astrocytes, suggesting swelling.
  
  What is the point of using RedStinger in Figure 2?
  
  We used RedStinger to label the astrocyte nuclei.
  
  Figure S5 is not convincing, as anti-Pvr does not look localised to specific cells. Instead, it looks like uniform background. If they really think the antibody is localised, they should do double stainings with cell type specific markers. If the antibody does not work, then remove the data and the claim. They could test with RNAi knock-down in specific cell types and qRT-PCR which cells express pvr instead.
  
  We have removed the claim that “Pvr is predominantly expressed in astrocytes” and changed the description to “Immunostainings using the anti-Pvr antibodies revealed that endogenous Pvr expression is low in the control brains, yet significantly enhanced upon TBI. Reducing Pvr expression, but not Pvr overexpression, in astrocytes blocked the TBI-induced increase of Pvr expression (Figure S5)”.
  
  Figure S6: it is unclear what they are trying to show, but these data do not demonstrate that astrocytes do not engulf debris after TBI, as there isn't sufficient cellular resolution to make such claim. Firstly, they analyse one single cell per treatment. Secondly, the cell projections are not visible in these images, and therefore engulfment cannot be seen. The authors could remove the claim or visualise whether astrocytes phagocytose debris or not either using clones or with TEM.
  
  We agree with the reviewer that our images do not have the resolution to make this claim. We have removed Figure S6 and corresponding text description.
  
  On statistics:
  
  The statistical analysis needs revising as it is wrong in multiple places, eg Fig.1F,G,H; Figure 2D. They only use Student t-tests. These can only be used when data are continuous, distributed uniformly and only two samples are compared; if more than 2 samples, distributed uniformly, then use One-Way ANOVA and multiple comparisons tests. If data are categorical, use Chi-Square.
  
  We have double checked and compared the experimental group to the control separately using the Student t-tests throughout the study.
  
  Other points for improvement:
  
  Figure 2E,F: what are GFP puncta and how are they counted?
  
  I. Each GFP puncta looks like a little circle, likely representing a functional or dysfunctional structure. The biology of the GFP puncta is currently unkonwn.
  
  II. We used the ImageJ to quantify the GFP puncta:
  
  (1) Image- type-8 bits
  
  (2) Process-subtract background (Rolling ball radio:10)
  
  (3) Image-Adjust-Threshold-Apply
  
  (4) Analyze-Measure-set measurements-choose “area” “limit to threshold”-OK
  
  (5) Count the puncta number in the choosing area.
  
  (6) Get the number of puncta per square micron.
  
  All genotypes must be provided (including for MARCM clones), currently they are not.
  
  We have shown the full genotype in the corresponding legend.
  
  Figure 7O,P indicate on figure that these are RNAi
  
  We have revised the labels to RNAi in Figure 7O,P.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Several typos are present in the text.
  
  We have read the manuscript carefully and corrected typos throughout.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.27.534488v2
www.biorxiv.org www.biorxiv.org

CARD8 inflammasome activation during HIV-1 cell-to-cell transmission

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  We again thank you for the positive and constructive feedback on our manuscript, and for highlighting its contributions to understanding the role of CARD8 in viral protease-triggered sensing of viral spread, and the potential impact of our findings on chronic inflammation and immune activation. We agree that it will be important for future work to address whether or not HIV-1 protease-triggered CARD8 inflammasome activation contributes to chronic inflammation in PLWH who are receiving ART.
  
  In response to the question about the baseline level of IL-1β in Fig. 4D, the figure below shows the mock condition for the CD4+ T cell:MDM coculture. We had done this control in parallel with the data presented in the submitted figure. Levels of IL-1β during HIV-1 infection are increased over background (i.e., mock infection). We note that for donor G the IL-1β concentration is below the limit of detection for this assay. Thus, it remains possible that other inflammasomes contribute modestly during cell-to-cell transmission of HIV-1; however, incomplete knockout of CARD8 in a minority of cells may also contribute to the observed levels of IL-1β in response to HIV-1 infection. Nonetheless, collectively, our data strongly supports the role for CARD8 in HIV-1 protease-triggered inflammasome activation.
  
  The following is the authors’ response to the original reviews.
  
  Joint Public Review:
  
  Following up on their previous work, the authors investigated whether cell-to-cell transmission of HIV-1 activates the CARD8 inflammasome in macrophages, an important question given that inflammasome activation in myeloid cells triggers proinflammatory cytokine release. The data support the idea that CARD8 is activated by the viral protease and promotes inflammation. However, time-course analyses in primary T cells and macrophages and further information on the specific inflammasome involved would further increase the significance of the study.
  
  Strengths:
  
  The manuscript is well-written and the data is of good quality. The evidence that CARD8 senses the HIV-1 protease in the context of cell-to-cell transmission is important since cell-to-cell transmission is thought to play a key role in viral spread in vivo, and inflammation is a major driver of disease progression. Clean knockout experiments in primary macrophages are a notable strength and the results clearly support the role of CARD8 in protease-dependent sensing of viral spread and the induction of IL1β release and cell death. The finding that HIV-1 strains are resistant to protease inhibitors differ in CARD8 activation and IL1β production is interesting and underscores the potential clinical relevance of these results.
  
  Weaknesses:
  
  One weakness is that the authors used T cell lines which might not faithfully reflect the efficiency of HIV-1 production and cell-cell transfer by primary T cells. To assess whether CARD8 is also activated by protease from incoming viral particles earlier time points should be analyzed. Finally, while the authors exclude the role of NLRP3 in IL-1b and the death of macrophages it would be interesting to know whether the effect is still Gasdermin D dependent.
  
  Recommendations for the authors
  
  (1) Co-culture assay should also be done between primary CD4 cells and primary MDMs, because T-cell lines produce much more viruses, and the efficiency of cell-tocell transmission might be dramatically different in primary cells compared to cell lines.
  
  We have now added data from experiments using infected primary CD4 cells as the donor cells in cell-to-cell HIV-1 transmission to MDMs in new Figure 4. The results largely phenocopy the SUPT1:MDM coculture in that we observe inflammasome activation after co-culture of HIV-infected primary T cells with primary MDMs. We find that this inflammasome activity induced by the CD4:MDM cell-to-cell transmission is abrogated by knockout of CARD8 in the MDMs or treatment of HIV protease inhibitor lopinavir (LPV) or caspase 1 inhibitor VX765, suggesting that this activation is dependent on CARD8, HIV protease, and caspase 1. Additionally, the signal persists in the presence of reverse transcriptase inhibitor nevirapine (NVP), suggesting that the incoming protease is driving activation.
  
  (2) For all co-culture experiments, supernatants were collected at 48 or 72 hours. Since CARD8 activation is expected to be driven by incoming viral particles without RT, they should measure cytokine production at much earlier time points. 2-3 days co-culture raises concerns. Ideally, the authors can provide a time-course.
  
  We have now added a time course of the SUPT1:MDM coculture from 3 unique donors taken at 4, 24, 48, and 72 hours post coculture in the presence or absence of reverse transcriptase inhibitor (see new Figure 3B) as well as for the primary CD4 cells to MDM co-culture (see new Figure 4B). We detect IL-1β at the 24hour time point (and later), but not at the 4-hour time point which is slower than what was detected by direct cell-free infection (Kulsuptrakul et al., 2023). However, we still hypothesize that this is driven by active incoming viral protease because the signal is not abrogated by a reverse transcriptase inhibitor, which indicates that de novo protease production is not necessary. We also observed that IL-1β levels do not increase after plateauing 24h after establishing the co-culture, suggesting that secondary infection does not further amplify inflammasome activation. We now speculate on this in the Discussion.
  
  (3) A potential confounder in the data in Figure 4 is that despite rightly including the cognate adaptations in the Gag cleavage sites with the PI-R protease mutants, some of these viruses still display Gag processing defects. Can the authors disentangle the potency of PR mutant cleavage with either reduced cell entry or reduced protease availability due to processing defects in the incoming virions?
  
  The reviewer is correct that although the western blot with the p24<sup>gag</sup> antibody suggests that Gag is processed, we cannot rule out that other variables do not contribute to the observed difference in CARD8 inflammasome activation. For example, PI-R clones relative to the LAI strain may have distinct protease substrate specificity, variable efficiency/kinetics in viral assembly, gag dimerization, and other factors may ultimately influence CARD8 inflammasome activation. We have updated the text to reflect these possibilities. Nonetheless, this argument does not change the conclusion that CARD8 inflammasome activation is affected by protease mutations acquired during drug resistance.
  
  (4) There is considerable donor variation in the macrophages (unsurprising) but can the authors correlate this with CARD8 expression and are there any off-target effects on macrophage permissivity to HIV-1 infection?
  
  We have now considerably increased the number of primary cell donors from the first submission (see Author response table 1 below). We find that the non-responsive donor presented in the first submission is aberrant since all others do respond to a greater or lesser degree (Figure 3, Figure 4). However, the reviewer may be correct that the particular aberrant donor MDMs were poorly infected. We also note that despite donor variability in the degree of activation (IL-1β secretion) from cocultures with HIV<sub>BaL</sub>-infected SUPT1 cells, HIV-induced activation is comparable to the activation induced by VbP (see new Figure 3–figure supplement 1B). We do not see a notable difference in CARD8 expression between donors. Nonetheless, with the added number of primary cell donors, the data are consistent with a role of primary MDMs from nearly all donors in supporting a CARD8-dependent, HIV-protease dependent inflammasome response after co-culture with infected T cells. We have left in data from all of the donors so that readers can appreciate the variability among primary cells.
  
  Author response table 1.
  
  In addition, to address the reviewer concerns about off-target effects of the sgRNAs on macrophage permissivity, we assessed our CD4:MDM cocultures for percent infectivity via intracellular p24<sup>gag</sup> in AAVS1 vs CARD8 KO MDMs and we observed no significant difference in infectivity in AAVS1 vs CARD8 KO MDMs (see Author response image 1 of MDMs after co-culture with T cells that is not affected any potential off-target effects of the sgRNAs.
  
  Author response image 1.
  
  Equivalent infection in AAVS1 vs CARD8 KOMDMs. AAVS1 or CARD8 KO from donor 12 were cocultured with mock or HIV infected CD4 T cells as described in Figure 4D for 72 hours then assessed for HIV infection of the MDMs by washing away CD4 T cells, harvesting MDMs, and staining attached MDMs for intracellular p24<sup>gag</sup> for flow cytometry analysis. Datasets represent mean ± SD (n=2 technical replicates from one donor). One-way ANOVA with Dunnett’s test using GraphPad Prism 10. ns = not significant, *p<0.05,**p<0.01, ***p<0.001, ****p<0.0001.
  
  (5) The authors suggest that NLRP3 is unlikely to be the mediator of IL-1b and cell death in the macrophages. Is this death still GSDMDdependent, what other NLRs are expressed in this system and does it make a difference what PAMP you use to prime the response?
  
  We have now added additional data in support of the conclusion that NLRP3 is not a mediator of the IL-1β secretion in the infected SUPT1 cells to primary MDMs coculture. In addition to using an NLRP3 inhibitor, we have now also made NLRP3 KOs MDMs and used these in the coculture experiments which show that the IL-1β secretion after coculture of infected SUPT1 cells and primary MDMs is mediated by CARD8 and not NLRP3 because the signal is abrogated by CARD8 knockout, but not by NLRP3 knockout. This new data is shown in Figure 3C and D.
  
  To assess the role of GSDMD, we treated SUPT1:MDM cocultures with disulfiram, a GSDMD inhibitor (Hu et al., 2020). Disulfiram treatment abrogated IL-1β secretion, suggesting that this activation is indeed GSDMD-mediated (see Author response image 2 below). We choose not to include the disulfiram result in the final manuscript since we have not ruled out cytotoxic effects of the drug.
  
  There are likely other NLRs expressed in primary MDMs; however, since inflammasome activation is completely absent in the CARD8 KO MDMs, we infer that CARD8 is the main inflammasome-forming sensor in this system. However, we cannot rule out the possibility of other innate sensors being activated downstream of CARD8 or under different differentiation conditions.
  
  To address the concern that alternative priming affects CARD8 activation, we compared pre-treatment of cells with Pam3CSK4 or lipopolysaccharide (LPS) in the presence or absence of HIV protease inhibitor and reverse transcriptase inhibitor. Regardless of the priming agent used, we observed HIV protease-dependent activation that persisted in the presence of reverse transcriptase inhibitor, suggesting that CARD8 is the main sensor under LPS and Pam3CSK4 priming (new Figure 3–figure supplement 1A).
  
  Author response image 2.
  
  Inflammasome activation following cell-to-cell HIV infection is mediated by GSDMD. SUPT1-CCR5 cells were either mock-infected or infected with HIV-1<sub>NL4.3BaL</sub> for 20 hours before coculturing with MDMs in either the presence or absence of GSDMD inhibitor disulfarim (25μM). Cocultures were harvested 24 hours later to assess (left) IL-1β secretion via IL-1 reporter assay and (right) cell viability via CellTiter-Glo® assay. Viability was calculated by normalizing to relative luminescence units in the mock untreated control. Dotted line indicates limit of detection (LoD). Dashed line indicates 100% viability as determined by untreated mock control. Datasets represent mean ± SD (n=2 technical replicates for one donor). Two-way ANOVA with Sidak’s test (using GraphPad Prism 10. ns = not significant, *p<0.05,**p<0.01, ***p<0.001, ****p<0.0001.
  
  Minor points
  
  (1) In Figure 1, the authors should clarify whether LAI or LAI-VSV-G was used.
  
  Wild-type virus (LAI strain) was used in Figure 1. This has now been clarified in the figure legend.
  
  (2) In Figure 1, the fraction of infected cells without DEAE was ~20% in both WT and CARD8 KO THP-1, suggesting somewhat efficient viral entry even in the absence of DEAE. How do the authors reconcile this with the lack of IL-1β production? The increase in infection observed in WT THP-1 +DEAE was overall modest (from ~20% to 25-30%) compared to the dramatic difference in IL-1β production. Can they provide more evidence or discuss how DEAE might be impacting cytokine production? If differences in viral entry are the explanation for differences in inflammasome activation, then they should be able to overcome this by using virus at a higher MOI in the absence of DEAE. Experiments proposed in Figure 1 +/- DEAE should be repeated using a range of MOI for LAI and showing the corresponding percent infection in THP-1 cells (which is not shown in Figure S2 for LAI-VSVG).
  
  We hypothesize that the lack of IL-1β production without DEAE is likely due to an insufficient amount of incoming viral protease to induce CARD8 activation. Though the increase in infection with DEAE is modest by intracellular p24<sup>gag</sup> at 24 hours post infection, we infer that intracellular p24<sup>gag</sup> may be largely underestimating the actual increase in viral efficiency achieved with DEAE (now in Supplemental Note). We have also updated Figure S2 (now Figure 2–figure supplement 1) legend to include the percent infection for HIV-1<sub>LAI</sub> and HIV-1<sub>LAI-VSVG</sub> infections. We agree that activation in the absence of DEAE could be overcome by infecting with a more concentrated viral stock to increase the MOI. Indeed, our decision to use the cell-to-cell transmission model achieves this in a more physiologic context.
  
  (3) In Figure S1, the authors point out that RT-activity in the supernatants was similar in the cell-free vs. cell-to-cell model. While in the transwell system THP-1 cells are the only cells capable of producing new virions, how are they able to differentiate viral production from sup-T1 vs. THP-1 in the cell-to-cell system? At a minimum, they should provide some data on the observed RT activity in matching wells containing the same number of infected sup-T1 cells utilized in coculture experiments.
  
  We think this may have been a misinterpretation. In Figure S1 (now Figure 1B, right), we compare the amount of virus available in the lower chamber of the transwell versus the cell-to-cell condition. We are not comparing cell-free to cell-to-cell infection. We have changed the text and figure title to clarify this point.
  
  (4) Can the authors provide additional comments on the lack of IL-1β release in donor C in Figure 3? The donor did not produce IL-1β in response to VbP or HIV, although the WB for CARD8 appears similar to the other two donors.
  
  We have now tested MDMs from additional donors and continue to find a range of IL-1β secretion after the coculture. However, donor C is aberrant since each of the other donors had detectable IL-1β secretion in response to VbP and HIV-1 to greater or lesser extents. Nonetheless, we have included additional donors summarized in the table above corresponding to major comment #4.
  
  (5) For Figure 3, can the authors provide information on the fraction of MDMs that were infected after coculture with sup-T1 cells? Why didn't the authors measure cell death in MDMs?
  
  It is difficult to measure the fraction of MDMs infected or dying in the cocultures since it is hard to separate signal from the T cells. Although it would be possible to do so, in this manuscript, we instead prefer to focus on the potential contribution of CARD8 inflammasome activation in exacerbating chronic inflammation in response to HIV rather than the depletion of macrophages.
  
  (6) In Figure 4, did the authors introduce the mutations associated with PI resistance into the same LAI backbone? If not, this is not a fair comparison, as viral protein expression levels were not at the same level, indicated in Figure 4A. Additionally, such comparison will be further strengthened by using cells other than 293T cells for the coculture assay.
  
  No, we did not introduce these mutations into LAI, since they were already in an NL4.3 backbone and NL4.3 and LAI differ by only 1 amino acid in protease. We have updated Table S1 to report this amino acid difference. We also note that in our previous manuscript we tested much more diverse proteases such as a clade A HIV-1, HIV-2, and SIVs and find comparable CARD8 cleavage to LAI.
  
  Additions not requested by Reviewers:
  
  THP-1 characterization
  
  In our previous work, we noticed that different “wildtype” THP-1 lines behaved uniquely in response to DEAE-dextran. In particular, we observed inflammasome activation in response to DEAE-dextran alone at the concentration used for spinoculations (20μg/mL), whereas the other THP-1 line did not. Thus, we performed STR profiling on each THP-1 cell line and determined that the THP-1 cells used in our studies (JK THP1s) are distinct from THP-1 cells from ATCC at 3 different loci. This data is now included in the Supplemental Note (Figure A1). Please note that all data in this and the accompanying manuscript were performed in JK THP-1 cells.
  
  Whole plasmid sequencing of the PI-resistant HIV clones
  
  Since preprint submission, we have done whole plasmid Oxford Nanopore sequencing on the PI-resistant HIV clones obtained from the NIAID HIV/AIDS Specimen Repository Program. Of note, there were a handful of previously unreported mutations included in these plasmid stocks within protease. We have updated Table S1 to include an additional column titled “Additional amino acid changes in HIV<sup>PR</sup> relative to NL4.3.”
  
  References
  
  Hu JJ, Liu X, Xia S, Zhang Z, Zhang Y, Zhao J, Ruan J, Luo X, Lou X, Bai Y, Wang J, Hollingsworth LR, Magupalli VG, Zhao L, Luo HR, Kim J, Lieberman J, Wu H. 2020. FDA-approved disulfiram inhibits pyroptosis by blocking gasdermin D pore formation. Nat Immunol 21:736–745. doi:10.1038/s41590-020-0669-6
  
  Kulsuptrakul J, Turcotte EA, Emerman M, Mitchell PS. 2023. A human-specific motif facilitates CARD8 inflammasome activation after HIV-1 infection. eLife 12:e84108. doi:10.7554/eLife.84108
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.21.608981v2
www.biorxiv.org www.biorxiv.org

UGGT1-mediated reglucosylation of N-glycan competes with ER-associated degradation of unstable and misfolded glycoproteins

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.
  
  Strengths:
  
  The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes glycoprotein degradation.
  
  Weaknesses:
  
  Less clear, though, is the involvement of UGGT2 in the process. Also, to this reviewer, some data do not necessarily support the conclusion.
  
  Major criticisms:
  
  (1) One of the biggest problems I had on reading through this manuscript is that, while the authors appeared to generate UGGTs-KO cells from HCT116 and HeLa cells, it was not clearly indicated which cell line was used for each experiment. I assume that it was HCT116 cells in most cases, but I did not see that it was clearly mentioned. As the expression level of UGGT2 relative to UGGT1 is quite different between the two cell lines, it would be critical to know which cells were used for each experiment.
  
  Thank you for this comment. We have clarified this point, especially in the figure legends.
  
  (2) While most of the authors' conclusion is sound, some claims, to this reviewer, were not fully supported by the data. Especially I cannot help being puzzled by the authors' claim about the involvement of UGGT2 in the ERAD process. In most of the cases, KO of UGGT2 does not seem to affect the stability of ERAD substrates (ex. Fig. 1C, 2A, 3D). When the author suggests that UGGT2 is also involved in the ERAD, it is far from convincing (ex. Fig. 2D/E). Especially because now it has been suggested that the main role of UGGT2 may be distinct from UGGT1, playing a role in lipid quality control (Hung, et al., PNAS 2022), it is imperative to provide convincing evidence if the authors want to claim the involvement of UGGT2 in a protein quality control system. In fact, it was not clear at all whether even UGGT1 is also involved in the process in Fig. 2D/E, as the difference, if any, is so subtle. How the authors can be sure that this is significant enough? While the authors claim that the difference is statistically significant (n=3), this may end up with experimental artifacts. To say the least, I would urge the authors to try rescue experiments with UGGT1 or 2, to clarify that the defect in UGGT-DKO cells can be reversed. It may also be interesting to see that the subtle difference the authors observed is indeed N-glycan-dependent by testing a non-glycosylated version of the protein (just like NHK-QQQ mutants in Fig. 2C).
  
  We appreciate this comment. According to this comment, we reevaluated the importance of UGGT2 for ER-protein quality control. As this reviewer mentioned, KO of UGGT2 does not affect the stability of ATF6a, NHK, rRI332-Flag or EMC1-△PQQ-Flag (Fig. 1E, 2A, and 3DE). Furthermore, we tested whether overexpression of UGGT2 reverses the phenotype of UGGT-DKO regarding the degradation rate of NHK, and we found that it did not affect the degradation rate of NHK, whereas overexpression of UGGT1 restored the degradation rate to that in WT cells.
  
  Author response image 1.
  
  Collectively, these facts suggest that the role of UGGT2 in ER protein quality control is rather limited in HCT116 cells. Therefore, we have decided not to mention UGGT2 in the title, and weakened the overall claim that UGGT2 contributes to ER protein quality control. Tissues with high expression of UGGT2 or cultured cells other than HCT116 would be appropriate for revealing the detailed function of UGGT2.
  
  To this reviewer, it is still possible that the involvement of UGGT1 (or 2, if any) could be totally substrate-dependent, and the substrates used in Fig 2D or E happen not to be dependent to the action of UGGTs. To the reviewer, without the data of Fig. 2D and E the authors provide enough evidence to demonstrate the involvement of UGGT1 in preventing premature degradation of glycoprotein ERAD substrates. I am just afraid that the authors may have overinterpreted the data, as if the UGGTs are involved in stabilization of all glycoproteins destined for ERAD.
  
  Based on the point this reviewer mentioned, we decided to delete previous Fig. 2D and 2E. There may be more or less efficacy of UGGT1 for preventing early degradation of substrates.
  
  (3) I am a bit puzzled by the DNJ treatment experiments. First, I do not see the detailed conditions of the DNJ treatment (concentration? Time?). Then, I was a bit surprised to see that there were so little G3M9 glycans formed, and there was about the same amount of G2M9 also formed (Figure 1 Figure supplement 4B-D), despite the fact that glucose trimming of newly syntheized glycoproteins are expected to be completely impaired (unless the authors used DNJ concentration which does not completely impair the trimming of the first Glc). Even considering the involvement of Golgi endo-alpha-mannosidase, a similar amount of G3M9 and G2M9 may suggest that the experimental conditions used for this experiment (i.e. concentration of DNJ, duration of treatment, etc) is not properly optimized.
  
  We think that our experimental condition of DNJ treatment is appropriate to evaluate the effect of DNJ. Referring to the other papers (Ali and Field, 2000; Karlsson et al., 1993; Lomako et al., 2010; Pearse et al., 2010; Tannous et al., 2015), 0.5 mM DNJ is appropriate. In our previously reported experiment, 16 h treatment with kifunensine mannosidase inhibitor was sufficient for N-glycan composition analysis prior to cell collection (Ninagawa et al., 2014), and we treated cells for a similar time in Figure 1-Figure Supplement 4 and 5 (and Figure 1-Figure Supplement 6). We could see the clear effect of DNJ to inhibit degradation of ATF6a with 2 hours of pretreatment (Fig. 1G). Furthermore, our results are very reasonable and consistent with previous findings that DNJ increased GM9 the most (Cheatham et al., 2023; Gross et al., 1983; Gross et al., 1986; Romero et al., 1985). In addition to DNJ, we used CST for further experiments in new figures (Fig. 1H and Figure 1-Figure supplement 6). DNJ and CST are inhibitors of glucosidase; DNJ is a stronger inhibitor of glucosidase II, while CST is a stronger inhibitor of glucosidase I (Asano, 2000; Saunier et al., 1982; Szumilo et al., 1987; Zeng et al., 1997). An increase in G3M9 and G2M9 was detected using CST (Figure1-Figure Supplement 6). Like DNJ, CST also inhibited ATF6a degradation in UGGT-DKO cells (Fig. 1H). These findings show that our experimental condition using glucosidase inhibitor is appropriate and strongly support our model (Fig. 5). Differences between the effects of DNJ and CST are now described in our manuscript pages 8 to 10.
  
  Reviewer #2 (Public Review):
  
  In this study, Ninagawa et al., shed light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO cells, they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.
  
  While this study convincingly demonstrates early degradation of misfolded glycoproteins in the absence of UGGTs, my major concern is the need for additional experiments to support the "tug of war" model involving UGGTs and EDEMs in influencing the substrate's fate - whether misfolded glycoproteins are pulled into the folding or degradation route. Specifically, it would be valuable to investigate how overexpression of UGGTs and EDEMs in WT cells affects the choice between folding and degradation for misfolded glycoproteins. Considering previous studies indicating that monoglucosylation influences glycoprotein solubility and stability, an essential question is: what is the nature of glycoproteins in UGGTKO/EDEMKO and potentially UGGT/EDEM overexpression cells? Understanding whether these substrates become more soluble/stable when GM9 versus mannose-only translation modification accumulates would provide valuable insights.
  
  In the new figure 2DE, we conducted overexpression experiments of structure formation factors UGGT1 and/or CNX, and degradation factors EDEMs. While overexpression of structure formation factors (Fig. 2DE) and KO of degradation factors (Ninagawa et al., 2015; Ninagawa et al., 2014) increased stability of substrates, KO of UGGT1 (Fig. 1E, 2A and 3DF) and overexpression of degradation factors (Fig. 2DE) (Hirao et al., 2006; Hosokawa et al., 2001; Mast et al., 2005; Olivari et al., 2005) accelerated degradation of substrates. A comparison of the properties of N-glycan with the normal type and the type without glucoses was already reported (Tannous et al., 2015). The rate of degradation of substrate was unchanged, but efficiency of secretion of substrates was affected.
  
  The study delves into the physiological role of UGGT, but is limited in scope, focusing solely on the effect of ATF6alpha in UGGT KO cells' stress response. It is crucial for the authors to investigate the broader impact of UGGT KO, including the assessment of basal ER proteotoxicity levels, examination of the general efflux of glycoproteins from ER, and the exploration of the physiological consequences due to UGGT KO. This broader perspective would be valuable for the wider audience. Additionally, the marked increase in ATF4 activity in UGGTKO requires discussion, which the authors currently omit.
  
  We evaluated the sensitivity of WT and UGGT1-KO cells to ER stress (Figure 4G). KO of UGGT1 increased the sensitivity to ER stress inducer Tg, indicating the importance of UGGT1 for resisting ER stress.
  
  We add the following description in the manuscript about ATF4 activity in UGGT1-KO: “In addition to this, UGGT1 is necessary for proper functioning of ER resident proteins such as ATF6a (Fig. 4B-F). It is highly possible that ATF6a undergoes structural maintenance by UGGT1, which could be necessary to avoid degradation and maintain proper function, because ATF6a with more rigid in structure tended to remain in UGGT1-KO cells (Fig. 4C). Responses of ERSE and UPRE to ER stress, which require ATF6a, were decreased in UGGT1-KO cells (Fig. 4DE). In contrast, ATF4 reporter activity was increased in UGGT1-KO cells (Fig. 4F), while the basal level of ATF4 in UGGT1-KO cells was comparable with that in WT (Figure 1-Figure supplement 2B). The ATF4 pathway might partially compensate the function of the ERSE and UPRE pathways in UGGT1-KO cells in acute ER stress. This is now described on Page 17 in our manuscript.
  
  The discussion section is brief and could benefit from being a separate section. It is advisable for the authors to explore and suggest other model systems or disease contexts to test UGGT's role in the future. This expansion would help the broader scientific community appreciate the potential applications and implications of this work beyond its current scope.
  
  Thank you for making this point. The DISCUSSION part has now been separated in our manuscript. We added some points in the manuscript about other model organisms and diseases in the DISCUSSION as follows: “ Our work focusing on the function of mammalian UGGT1 greatly advances the understanding how ER homeostasis is maintained in higher animals. Considering that Saccharomyces cerevisiae does not have a functional orthologue of UGGT1 (Ninagawa et al., 2020a) and that KO of UGGT1 causes embryonic lethality in mice (Molinari et al., 2005), it would be interesting to know at what point the function of UGGT1 became evolutionarily necessary for life. Related to its importance in animals, it would also be of interest to know what kind of diseases UGGT1 is associated with. Recently, it has been reported that UGGT1 is involved in ER retention of Trop-2 mutant proteins, which are encoded by a causative gene of gelatinous drop-like corneal dystrophy (Tax et al., 2024). Not only this, but since the ER is known to be involved in over 60 diseases (Guerriero and Brodsky, 2012), we must investigate how UGGT1 and other ER molecules are involved in diseases.”
  
  Reviewer #3 (Public Review):
  
  This manuscript focuses on defining the importance of UGGT1/2 in the process of protein degradation within the ER. The authors prepared cells lacking UGGT1, UGGT2, or both UGGT1/UGGT2 (DKO) HCT116 cells and then monitored the degradation of specific ERAD substrates. Initially, they focused on the ER stress sensor ATF6 and showed that loss of UGGT1 increased the degradation of this protein. This degradation was stabilized by deletion of ERAD-specific factors (e.g., SEL1L, EDEM) or treatment with mannose inhibitors such as kifunesine, indicating that this is mediated through a process involving increased mannose trimming of the ATF6 N-glycan. This increased degradation of ATF6 impaired the function of this ER stress sensor, as expected, reducing the activation of downstream reporters of ER stress-induced ATF6 activation. The authors extended this analysis to monitor the degradation of other well-established ERAD substrates including A1AT-NHK and CD3d, demonstrating similar increases in the degradation of destabilized, misfolding protein substrates in cells deficient in UGGT. Importantly, they did experiments to suggest that re-overexpression of wild-type, but not catalytically deficient, UGGT rescues the increased degradation observed in UGGT1 knockout cells. Further, they demonstrated the dependence of this sensitivity to UGGT depletion on N-glycans using ERAD substrates that lack any glycans. Ultimately, these results suggest a model whereby depletion of UGGT (especially UGGT1 which is the most expressed in these cells) increases degradation of ERAD substrates through a mechanism involving impaired re-glucosylation and subsequent re-entry into the calnexin/calreticulin folding pathway.
  
  I must say that I was under the impression that the main conclusions of this paper (i.e., UGGT1 functions to slow the degradation of ERAD substrates by allowing re-entry into the lectin folding pathway) were well-established in the literature. However, I was not able to find papers explicitly demonstrating this point. Because of this, I do think that this manuscript is valuable, as it supports a previously assumed assertion of the role of UGGT in ER quality control. However, there are a number of issues in the manuscript that should be addressed.
  
  Notably, the focus on well-established, trafficking-deficient ERAD substrates, while a traditional approach to studying these types of processes, limits our understanding of global ER quality control of proteins that are trafficked to downstream secretory environments where proteins can be degraded through multiple mechanisms. For example, in Figure 1-Figure Supplement 2, UGGT1/2 knockout does not seem to increase the degradation of secretion-competent proteins such as A1AT or EPO, instead appearing to stabilize these proteins against degradation. They do show reductions in secretion, but it isn't clear exactly how UGGT loss is impacting ER Quality Control of these more relevant types of ER-targeted secretory proteins.
  
  We appreciate your comment. It is certainly difficult to assess in detail how UGGT1 functions against secretion-competent proteins, but we think that the folding state of these proteins is improved, which avoids their degradation and increases their secretion. In Figure 1-Figure supplement 2E, there is a clear decrease in secretion of EPO in UGGT1-KO cells, suggesting that UGGT1 also inhibits degradation of such substrates. Note that, as shown in Fig. 3A-C, once a protein forms a solid structure, it is rarely degraded in the ER.
  
  Lastly, I don't understand the link between UGGT, ATF6 degradation, and ATF6 activation. I understand that the idea is that increased ATF6 degradation afforded by UGGT depletion will impair activation of this ER stress sensor, but if that is the case, how does UGGT2 depletion, which only minimally impacts ATF6 degradation (Fig. 1), impact activation to levels similar to the UGGT1 knockout (Fig 4)? This suggests UGGT1/2 may serve different functions beyond just regulating the degradation of this ER stress sensor. Also, the authors should quantify the impaired ATF6 processing shown in Fig 4B-D across multiple replicates.
  
  According to this valuable comment, we reevaluated our manuscript. As this reviewer mentioned, involvement of UGGT2 in the activation of ATF6a cannot be explained only by the folding state of ATF6a. Thus, the part about whether UGGT2 is effective in activating ATF6 is outside the scope of this paper. The main focus of this paper is the contribution of UGGT1 to the ER protein quality control mechanism.
  
  Ultimately, I do think the data support a role for UGGT (especially UGGT1) in regulating the degradation of ERAD substrates, which provides experimental support for a role long-predicted in the field. However, there are a number of ways this manuscript could be strengthened to further support this role, some of which can be done with data they have in hand (e.g., the stats) or additional new experiments.
  
  In this revision period, to further elucidate the function of UGGT, we did several additional experiments (new figures Fig. 1H, 2DE, 4G and, Figure 1-Figure Supplement 6). We hope that these will bring our papers up to the level you have requested.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Minor points:
  
  (1) Abbreviations: GlcNAc, N-acetylglucosamines -> why plural?
  
  Corrected.
  
  (2) Abstract: to this reviewer, it may not be so common to cite references in the abstract.
  
  We submit this manuscript to eLife as “Research Advances”. In the instructions of eLife for “Research Advances”, there is the description: “A reference to the original eLife article should be included in the abstract, e.g. in the format “Previously we showed that XXXX (author, year). Here we show that YYYY.” We follow this.
  
  (3) Introduction: "as the site of biosynthesis of approximately one-third of all proteins." Probably this statement needs a citation?
  
  We added the reference there. You can also confirm this in “The Human Protein Atlas” website. https://www.proteinatlas.org/humanproteome/tissue/secretome
  
  (4) Figure 1F - the authors claimed that maturation of HA was delayed also in UGGT2 cells, but it was not at all clear to me. Rescue experiments with UGGT2 would be desired.
  
  We agree with this reviewer, but there was a statistically significant difference in the 80 min UGGT2-KO strain. Previously, it was reported that HA maturation rate was not affected by UGGT2 (Hung et al., 2022). We think that the difference is not large. A rescue experiment of UGGT2 on the degradation of NHK was conducted, and is shown in this response to referees.
  
  (5) Figure 4A, here also the authors claim that UGGT2 is "slightly" involved in folding of ATF6alpha(P) but it is far from convincing to this reviewer.
  
  Now we also think that involvement of UGGT2 in ER protein quality control should be examined in the future.
  
  (6) Page 11, line 7 from the bottom: "peak of activation was shifted from 1 hour to 4 hours after the treatment of Tg in UGGT-KO cells". I found this statement a bit awkward; how can the authors be sure that "the peak" is 4 hours when the longest timing tested is 4 hours (i.e. peak may be even later)?
  
  Corrected. We deleted the description.
  
  (7) Page 11, line 4 "a more rigid structure that averts degradation" Can the authors speculate what this "rigid" structure actually means? The reviewer has to wonder what kind of change can occur to this protein with or without UGGT1. Binding proteins? The difference in susceptibility against trypsin appears very subtle anyway (Figure 4 Figure Supplement 1).
  
  Let us add our thoughts here: Poorly structured ATF6a is immediately routed for degradation in UGGT1-KO cells. As a result, ATF6a with a stable or rigid structure have remained in the UGGT1-KO strain. ATF6a with a metastable state is tended to be degraded without assistance of UGGT1.
  
  (8) Figure 1 Figure supplement 2; based on the information provided, I calculate the relative ratio of UGGT2/UGGT1 in HCT116 which is 4.5%, and in HeLa 26%. Am I missing something? Also significant figure, at best, should be 2, not 3 (i.e. 30%, not 29.8%).
  
  Corrected. Thank you for this comment.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The effect in Fig. 2B with UGGT1-D1358A add-back is minimal. Testing the inactive and active add-back on other substrates, such as ATF6alpha, which undergoes a more rapid degradation, would provide a more comprehensive assessment.
  
  To examine the effect of full length and inactive mutant of UGGT1 in UGGT1-KO and UGGT2-KO on the rate of degradation of endogenous ATF6a, we tried to select more than 300 colonies stably expressing full-length Myc-UGGT1/2, UGGT1/2-Flag, and UGGT1/2 (no tag), and their point mutant of them. However, no cell lines expressing nearly as much or more UGGT1/2 than endogenous ones were obtained. The expression level of UGGT1 seemed to be tightly regulated. A low-expressing stable cell line could not recover the phenotype of ATF6a degradation.
  
  We also tried to measure the degradation rate of exogenously expressed ATF6a. But overexpressed ATF6a is partially transported to the Golgi and cleaved by proteases, which makes it difficult to evaluate only the effect of degradation.
  
  (2) In reference to this statement on pg. 11:
  
  "This can be explained by the rigid structure of ATF6(P) lacking structural flexibility to respond to ER stress because the remaining ATF6(P) in UGGT1-KO cells tends to have a more rigid structure that averts degradation, which is supported by its slightly weaker sensitivity to trypsin (Figure 4-figure supplement 1A). "
  
  The rationale for testing ATF6(P) rigidity via trypsin digestion needs clarification. The authors should provide more background, especially if it relates to previous studies demonstrating UGGT's influence on substrate solubility. If trypsin digestion is indeed addressing this, it should be applied consistently to all tested misfolded glycoproteins, ensuring a comprehensive approach.
  
  We now provide more background with three references about trypsin digestion. Trypsin digestion allows us to evaluate the structure of proteins originated from the same gene, but it can sometimes be difficult to comparatively evaluate the structure of proteins originated from different genes. For example, antitrypsin is resistant to trypsin by its nature, which does not necessarily mean that antitrypsin forms a more stable structure than other proteins. NHK, a truncated version of antitrypsin, is still resistant to trypsin compared with other substrates.
  
  (3) Many of the figures described in the manuscript weren't referred to a specific panel. For example, pg. 12 "Fig. 1E and Fig.5," the exact panel for Fig. 5 wasn't referenced.
  
  Thank you for this comment. Corrected.
  
  (4) For experiments measuring the composition of glycoproteins in different KO lines, it is necessary to do the experiment more than once for conducting statistical analysis and comparisons. Moreover, the authors did not include raw composition data for these experiments. Statistical analysis should also be done for Fig. 4E-F.
  
  Our N-glycan composition data (Figure 1-Figure supplement 5 and 6C) is consistent with previous our papers (George et al., 2021; George et al., 2020; Ninagawa et al., 2015; Ninagawa et al., 2014). We did it twice in the previous study and please refer to it regarding statistical analysis (George et al., 2020). We add the raw composition data of N-glycan (Figure 1-Figure supplement 4 and 6B). In Fig. 4D-F, now statistical analysis is included.
  
  Ali, B.R., and M.C. Field. 2000. Glycopeptide export from mammalian microsomes is independent of calcium and is distinct from oligosaccharide export. Glycobiology. 10:383-391.
  
  Asano, N. 2000. Glycosidase-Inhibiting Glycomimetic Alkaloids. Biological Activities and Therapeutic Perspectives. Journal of Synthetic Organic Chemistry, Japan. 58:666-675.
  
  Cheatham, A.M., N.R. Sharma, and P. Satpute-Krishnan. 2023. Competition for calnexin binding regulates secretion and turnover of misfolded GPI-anchored proteins. J Cell Biol. 222.
  
  George, G., S. Ninagawa, H. Yagi, J.I. Furukawa, N. Hashii, A. Ishii-Watabe, Y. Deng, K. Matsushita, T. Ishikawa, Y.P. Mamahit, Y. Maki, Y. Kajihara, K. Kato, T. Okada, and K. Mori. 2021. Purified EDEM3 or EDEM1 alone produces determinant oligosaccharide structures from M8B in mammalian glycoprotein ERAD. Elife. 10.
  
  George, G., S. Ninagawa, H. Yagi, T. Saito, T. Ishikawa, T. Sakuma, T. Yamamoto, K. Imami, Y. Ishihama, K. Kato, T. Okada, and K. Mori. 2020. EDEM2 stably disulfide-bonded to TXNDC11 catalyzes the first mannose trimming step in mammalian glycoprotein ERAD. Elife. 9:e53455.
  
  Gross, V., T. Andus, T.A. Tran-Thi, R.T. Schwarz, K. Decker, and P.C. Heinrich. 1983. 1-deoxynojirimycin impairs oligosaccharide processing of alpha 1-proteinase inhibitor and inhibits its secretion in primary cultures of rat hepatocytes. Journal of Biological Chemistry. 258:12203-12209.
  
  Gross, V., T.A. Tran-Thi, R.T. Schwarz, A.D. Elbein, K. Decker, and P.C. Heinrich. 1986. Different effects of the glucosidase inhibitors 1-deoxynojirimycin, N-methyl-1-deoxynojirimycin and castanospermine on the glycosylation of rat alpha 1-proteinase inhibitor and alpha 1-acid glycoprotein. Biochem J. 236:853-860.
  
  Hirao, K., Y. Natsuka, T. Tamura, I. Wada, D. Morito, S. Natsuka, P. Romero, B. Sleno, L.O. Tremblay, A. Herscovics, K. Nagata, and N. Hosokawa. 2006. EDEM3, a soluble EDEM homolog, enhances glycoprotein endoplasmic reticulum-associated degradation and mannose trimming. J Biol Chem. 281:9650-9658.
  
  Hosokawa, N., I. Wada, K. Hasegawa, T. Yorihuzi, L.O. Tremblay, A. Herscovics, and K. Nagata. 2001. A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO reports. 2:415-422.
  
  Hung, H.H., Y. Nagatsuka, T. Solda, V.K. Kodali, K. Iwabuchi, H. Kamiguchi, K. Kano, I. Matsuo, K. Ikeda, R.J. Kaufman, M. Molinari, P. Greimel, and Y. Hirabayashi. 2022. Selective involvement of UGGT variant: UGGT2 in protecting mouse embryonic fibroblasts from saturated lipid-induced ER stress. Proc Natl Acad Sci U S A. 119:e2214957119.
  
  Karlsson, G.B., T.D. Butters, R.A. Dwek, and F.M. Platt. 1993. Effects of the imino sugar N-butyldeoxynojirimycin on the N-glycosylation of recombinant gp120. Journal of Biological Chemistry. 268:570-576.
  
  Lomako, J., W.M. Lomako, C.A. Carothers Carraway, and K.L. Carraway. 2010. Regulation of the membrane mucin Muc4 in corneal epithelial cells by proteosomal degradation and TGF-beta. Journal of cellular physiology. 223:209-214.
  
  Mast, S.W., K. Diekman, K. Karaveg, A. Davis, R.N. Sifers, and K.W. Moremen. 2005. Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins. Glycobiology. 15:421-436.
  
  Ninagawa, S., T. Okada, Y. Sumitomo, S. Horimoto, T. Sugimoto, T. Ishikawa, S. Takeda, T. Yamamoto, T. Suzuki, Y. Kamiya, K. Kato, and K. Mori. 2015. Forcible destruction of severely misfolded mammalian glycoproteins by the non-glycoprotein ERAD pathway. J Cell Biol. 211:775-784.
  
  Ninagawa, S., T. Okada, Y. Sumitomo, Y. Kamiya, K. Kato, S. Horimoto, T. Ishikawa, S. Takeda, T. Sakuma, T. Yamamoto, and K. Mori. 2014. EDEM2 initiates mammalian glycoprotein ERAD by catalyzing the first mannose trimming step. J Cell Biol. 206:347-356.
  
  Olivari, S., C. Galli, H. Alanen, L. Ruddock, and M. Molinari. 2005. A novel stress-induced EDEM variant regulating endoplasmic reticulum-associated glycoprotein degradation. J Biol Chem. 280:2424-2428.
  
  Pearse, B.R., T. Tamura, J.C. Sunryd, G.A. Grabowski, R.J. Kaufman, and D.N. Hebert. 2010. The role of UDP-Glc:glycoprotein glucosyltransferase 1 in the maturation of an obligate substrate prosaposin. J Cell Biol. 189:829-841.
  
  Romero, P.A., B. Saunier, and A. Herscovics. 1985. Comparison between 1-deoxynojirimycin and N-methyl-1-deoxynojirimycin as inhibitors of oligosaccharide processing in intestinal epithelial cells. Biochem J. 226:733-740.
  
  Saunier, B., R.D. Kilker, J.S. Tkacz, A. Quaroni, and A. Herscovics. 1982. Inhibition of N-linked complex oligosaccharide formation by 1-deoxynojirimycin, an inhibitor of processing glucosidases. Journal of Biological Chemistry. 257:14155-14161.
  
  Szumilo, T., G.P. Kaushal, and A.D. Elbein. 1987. Purification and properties of the glycoprotein processing N-acetylglucosaminyltransferase II from plants. Biochemistry. 26:5498-5505.
  
  Tannous, A., N. Patel, T. Tamura, and D.N. Hebert. 2015. Reglucosylation by UDP-glucose:glycoprotein glucosyltransferase 1 delays glycoprotein secretion but not degradation. Molecular biology of the cell. 26:390-405.
  
  Zeng, Y., Y.T. Pan, N. Asano, R.J. Nash, and A.D. Elbein. 1997. Homonojirimycin and N-methyl-homonojirimycin inhibit N-linked oligosaccharide processing. Glycobiology. 7:297-304.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.10.18.562958v2
www.biorxiv.org www.biorxiv.org

Auditory cortex anatomy reflects multilingual phonological experience

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The goal of this project is to test the hypothesis that individual differences in experience with multiple languages relate to differences in brain structure, specifically in the transverse temporal gyrus. The approach used here is to focus specifically on the phonological inventories of these languages, looking at the overall size of the phonological inventory as well as the acoustic and articulatory diversity of the cumulative phonological inventory in people who speak one or more languages. The authors find that the thickness of the transverse temporal gyrus (either the primary TTG, in those with one TTG, or in the second TTG, in people with multiple gyri) was related to language experience, and that accounting for the phonological diversity of those languages improved the model fit. Taken together, the evidence suggests that learning more phonemes (which is more likely if one speaks more than one language) leads to experience-related plasticity in brain regions implicated in early auditory processing.
  
  Strengths:
  
  This project is rigorous in its approach--not only using a large sample, but replicating the primary finding in a smaller, independent sample. Language diversity is difficult to quantify, and likely to be qualitatively and quantitatively distinct across different populations, and the authors use a custom measure of multilingualism (accounting for both number of languages as well as age of acquisition) and three measures of phonological diversity. The team has been careful in discussion of these findings, and while it is possible that pre-existing differences in brain structure could lead to an aptitude difference which could drive one to learn more than one language, the fine-grained relationships with phonological diversity seem less likely to emerge from aptitude rather than experience.
  
  Weaknesses:
  
  It is a bit unclear how the measures of phonological diversity relate to one another--they are partially separable, but rest on the same underlying data (the phonemes in each language). It would be helpful for the reader to understand how these measures are distributed (perhaps in a new figure), and the degree to which they are correlated with one another.
  
  Thank you for the comment. Indeed our description missed this important detail that we now included in the manuscript. Unsurprisingly, the distances all correlated with one another, which we present in Table 2 in Section 2.3 of the revised manuscript. We have also added a figure with distributions of the three distance measures (see Figure S3).
  
  Further, as the authors acknowledge, it is always possible that an unseen factor instead drives these findings--if typological lexical distance measures are available, it would be helpful to enter these into the model to confirm that phonological factors are the specific driver of TTG differences and not language diversity in a more general sense. That said, the relationship between phonological diversity and TTG structure is intuitive.
  
  Thank you for the suggestion. To further establish that our results reflected the relationship between TTG structure and phonological diversity specifically (as opposed to language diversity in a more general sense), we derived a fourth measure of language experience, where the AoA index of different languages was weighted by lexical distances between the languages. Here, we followed the methodology described in Kepinska, Caballero, et al. (2023): We used Levenshtein Distance Normalized Divided (LDND) (Wichmann et al., 2010) which was computed using the ASJP.R program by Wichmann (https://github.com/Sokiwi/InteractiveASJP01). Information on lexical distances was combined with language experience information per participant using Rao's quadratic entropy equation in the same way as for the phonological measures.
  
  We then entered this language experience measure accounting for lexical distances between the languages into linear models predicting the thickness of the second left and right TTG (controlling for participants’ age, sex and mean hemispheric thickness) in the main sample, and compared these models with the corresponding models including the original three phonological distance measures (models 24 in Author response table 1), and the measure with no typological information (1).
  
  Below, we list adjusted R2 values of all models, from which it is clear that the index of multilingual language experience accounting for lexical distances between languages (5) explained less variance than the index incorporating phoneme-level distances between languages (2), both in the left and the right hemisphere. This further strengthens our conclusion that our results reflected the relationship between TTG structure and phonological diversity specifically, as opposed to language diversity in a more general sense.
  
  Author response table 1.
  
  We have added a description of this analysis to the manuscript, Section 3.3, lines 357-370.
  
  One curious aspect of this paper relates to the much higher prevalence of split or duplicate TTG in the sample. The authors do a good job speculating on how features of the TASH package might lead to this, but it is unclear where the ground truth lies--some discussion of validation of TASH against a gold standard would be useful.
  
  The validation of the TASH toolbox in comparison to gold standard manual measurement involved assessing how well the measurements of left and right Heschl's gyrus (HG) volumes obtained using the TASH method correlated with those obtained through manual labeling (see Dalboni da Rocha et al., 2020 for details). This validation process was conducted across three independent datasets. Additionally, for comparison, the manually labeled HG volumes were also compared with those obtained using FreeSurfer's Destrieux parcellation of the transverse temporal gyrus in the same datasets. The validation process, therefore, involved rigorous comparisons of HG volumes obtained through manual labeling, FreeSurfer, and TASH across different datasets, along with an assessment of inter-rater reliability for the manual labeling procedure. This comprehensive approach ensures that the results are robust and reliable. TASH_complete, the version used in the present work, is an extension of the extensively validated TASH, which apart from the first gyrus, also identifies additional transverse temporal gyri (i.e. Heschl’s gyrus duplications and multiplications) situated in the PT, when present. Since work on the correspondence between manually identified TTG multiplications is still ongoing, as outlined in the Methods section, we complemented the automatic segmentation by extensive visual assessment of the identified posterior gyri. This process involved removing from the analysis those gyri that lay along the portion of the superior temporal plane that curved vertically (i.e., within the parietal extension, Honeycutt et al., 2000), when present. Given that TASH_complete and TASH operate on the same principles and are both based on FreeSurfer’s surface reconstruction and cortical parcellation (which have been extensively validated against manual tracing and other imaging modalities, showing good accuracy), and since we have visually inspected all segmentations, we are confident as to the accuracy of the reported TTG variability. It has to be further noted that the prevalence of TTG multiplications beyond 2nd full posterior duplications was not systematically assessed in previous descriptive reports (Marie, 2015). However, we acknowledge that more work is needed to further ascertain anatomical accuracy of the segmentations, and we elaborate on this point in the Discussion of the revised manuscript (lines 621-623).
  
  Reviewer #2 (Public Review):
  
  This work investigates the possible association between language experience and morphology of the superior temporal cortex, a part of the brain responsible for the processing of auditory stimuli. Previous studies have found associations between language and music proficiency as well as language learning aptitude and cortical morphometric measures in regions in the primary and associated auditory cortex. These studies have most often, however, focused on finding neuroanatomical effects of difference between features in a few (often two) languages or from learning single phonetic/phonological features and have often been limited in terms of N. On this background, the authors use more sophisticated measures of language experience that take into account the age of onset and the differences in phonology between languages the subjects have been exposed to as well as a larger number of subjects (N = 146 + 69) to relate language experience to the shape and structure of the superior temporal cortex, measured from T1weighted MRI data. It shows solid evidence for there being a negative relationship between language experience and the right 2nd transverse temporal gyrus as well as some evidence for the relationship representing phoneme-level cross-linguistic information.
  
  Strengths
  
  The use of entropy measures to quantify language experience and include typological distance measures allows for a more general interpretation of the results and is an important step toward respecting and making use of linguistic diversity in neurolinguistic experiments.
  
  A relatively large group of subjects with a range of linguistic backgrounds.
  
  The full analysis of the structure of the superior temporal cortex including cortical volume, area, as well as the shape of the transverse gyrus/gyri. There is a growing literature on the meaning of the shape and number of the transverse gyri in relation to language proficiency and the authors explore all measures given the available data.
  
  The authors chose to use a replication data set to verify their data, which is applaudable. However, see the relevant point under "Weaknesses".
  
  Weaknesses
  
  The authors fail to explain how a thinner cortex could reflect the specialization of the auditory cortex in the processing of diverse speech input. The Dynamic Restructuring Model (Pliatsikas, 2020) which is referred to does not offer clear guidance to interpretation. A more detailed discussion of how a phonologically diverse environment could lead to a thinner cortex would be very helpful.
  
  Thank you for bringing our attention to this point. We have now extended the explanation we had previously included in the Discussion by including the following passage on p. 20 (lines 557-566) of the revised manuscript:
  
  “Experience-induced pruning is essential for maintaining an efficient and adaptive neural network. It reinforces relevant neural circuits for faster more efficient information processing, while diminishing those that are less active, or less beneficial. The cortical specialization may need to arise because phonologically more diverse language experience requires that the mapping of acoustic signal to sound categories is denser, more detailed and more intricate. As a result, the brain may need to engage in more intensive processing to discriminate between and accurately perceive the sound categories of each language. This increased cognitive demand may, in turn, require the auditory and language processing regions of the brain to adapt and become more efficient. Over time, this heightened effort for successful speech perception and sound discrimination may lead to neural plasticity, resulting in cortical specialization. This means that cortical areas become more finely tuned and specialized for processing the unique phonological features of language(s) spoken by individuals.”
  
  We have also added a passage to the Introduction regarding the possible microscopic or physiological underpinnings of the brain structural differences that we observe macroscopically using structural MRI (lines 68-73):
  
  “Such environmental effect on cortical thickness might in turn be tied to microstructural changes to the underlying brain tissue, such as modifications in dendritic length and branching, synaptogenesis or synaptic pruning, growth of capillaries and glia, all previously tied to some kind of environmental enrichment and/or skill learning (see Lövdén et al., 2013; Zatorre et al., 2012 for overviews). Increased cortical thickness may reflect synaptogenesis and dendritic growth, while cortical thinning observed with MRI may be a result of increased myelination (Natu et al., 2019) or synaptic pruning.”
  
  It is difficult to understand what measure of language experience is used when. Clearer and more explicit nomenclature would assist in the interpretation of the results.
  
  We have added more explicit list of indices used in the Introduction (lines 104-107 of the revised manuscript) and in Section 2.4 and used them consistently throughout the text:
  
  (1) language experience index not accounting for typological features: ‘Language experience - no typology’
  
  (2) measures combining language experience with typological distances at different levels:
  
  a. ‘Language experience – features’,
  
  b. ‘Language experience – phonemes’,
  
  c. ‘Language experience – phonological classes’.
  
  There is a lack of description of the language backgrounds of the included subjects. How many came from each of the possible linguistic backgrounds? How did they differ in language exposure? This would be informative to evaluate the generalizability of the conclusions.
  
  Thank you for raising this point. Given the complexity of participants’ language experience, ranging between monolingual to speaking 7 different languages, we opted for a fully parametric approach in quantifying it. We used the Shannon’s entropy and Rao’s quadratic entropy equations to create continuous measures of language experience, without the constraints of a minimum sample size per language and the need to exclude participants with underrepresented languages. To add further details in our description of the language background, we summarize the language background of both samples in the newly added Table 1 presenting a breakdown of participants by number of languages they spoke, and Supplementary Table S1 listing all languages spoken by each participant.
  
  Only the result from the multiple transverse temporal gyri (2nd TTG) is analyzed in the replicated dataset. Only the association in the right hemisphere 2nd TTG is replicated but this is not reflected in the discussion or the conclusions. The positive correlation in the right TTG is thus not attempted to be replicated.
  
  Thank you for bringing this point to our attention. Since only few participants presented single gyri in the left (n = 7) and the right hemisphere (n = 14), the replication analysis focused on the second TTG results only. We have now commented on this fact in Section 3.5 (lines 413-415), as well as in the Discussion (lines 594-596).
  
  The replication dataset differed in more ways than the more frequent combination of English and German experience, as mentioned in the discussion. Specifically, the fraction of monolinguals was higher in the replication dataset and the samples came from different scanners. It would be better if the primary and replication datasets were more equally matched.
  
  Indeed, the replication sample did not fully mimic the characteristics of the main sample and a better match between the two samples would have been preferable. As elaborated in the Introduction, however, the data was split into two groups according to the date of data acquisition, which also coincided with the field strength of the scanners used for data acquisition: the first, main sample’s data were acquired on a 1.5T, the replication sample’s on 3T. We opted for keeping this split and not introducing additional noise in the analysis by using data from different field strengths at the cost of not fully matching the two datasets. Observing the established effects (even partially) in this somewhat different replication sample, however, seems in our view to further strengthen our results.
  
  Even if the language experience and typological distance measures are a step in the right direction for correctly associating language exposure with cortical plasticity, it still is a measure that is insensitive to the intensity of the exposure. The consequences of this are not discussed.
  
  Indeed, we agree with the reviewer that there is still a lot of grounds to cover to fully understand the relationship between language experience and cortical plasticity. We have added a paragraph to the Discussion (lines 587-592 of the revised manuscript) to bring attention to this issue:
  
  “Future research should also further increase the degree of detail in describing the multilingual language experience, as both AoA and proficiency (used here) are not sensitive to other aspects of multilingualism, such as intensity of the exposure to the different languages, or quantity and quality of language input. Since these aspects have been convincingly shown to be associated with neural changes (e.g., Romeo, 2019), incorporating further, more detailed measures describing individuals’ language experience could further enhance our understanding of cortical plasticity in general, and how the brain accommodates variable language experience in particular.”
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  The study uses structural MRI to identify how the number, degree of experience, and phonemic diversity of language(s) that a speaker knows can influence the thickness of different sub-segments of the auditory cortex. In both a primary and replication sample of adult speakers, the authors find key differences in cortical thickness within specific subregions of the cortex due to either the age at which languages are acquired (degree of experience), or the diversity of the phoneme inventories carried by that/those language(s) (breadth of experience).
  
  Strengths:
  
  The results are first and foremost quite fascinating and I do think they make a compelling case for the different ways in which linguistic experience shapes the auditory cortex.
  
  The study uses a number of different measures to quantify linguistic experience, related to how many languages a person knows (taking into account the age at which each was learned) as well as the diversity of the phoneme inventories contained within those languages. The primary sample is moderately large for a study that focuses on brainbehaviour relationships; a somewhat smaller replication sample is also deployed in order to test the generality of the effects.
  
  Analytic approaches benefit from the careful use of brain segmentation techniques that nicely capture key landmarks and account for vagaries in the structure of STG that can vary across individuals (e.g., the number of transverse temporal gyri varies from 1-4 across individuals).
  
  Weaknesses:
  
  The specificity of these effects is interesting; some effects really do appear to be localized to the left hemisphere and specific subregions of the auditory cortex e.g., TTG. However because analyses only focus on auditory regions along the STG and MTG, one could be led to the conclusion that these are the only brain regions for which such effects will occur. The hypothesis is that these are specifically auditory effects, but that does make a clear prediction that nonauditory regions should not show the same sort of variability. I recognize that expanding the search space will inflate type-1 errors to a point where maybe it's impossible to know what effects are genuine. And the fine-grained nature of the effects suggests a coarse analysis of other cortical regions is likely to fail. So I don't know the right answer here. Only that I tend to wonder if some control region(s) might have been useful for understanding whether such effects truly are limited to the auditory cortex. Otherwise one might argue these are epiphenomenal or some hidden factor unrelated to auditory experience predicting that we'd also see them in the non-auditory cortex as well, either within or outside the brain's speech network(s).
  
  Thank you for raising this important issue. Our primary analyses indeed focused on the auditory regions, given their involvement in speech and language processing at different levels of processing hierarchy (from low – HG, to high – STG and STS). Here, we included a fairly broad range of ROIs (8 per hemisphere, 16 in total) and it has to be noted that it was only the bilateral planum temporale which showed an association with multilingualism. In the original submission we had indeed attempted at confirming the specificity of this result by performing a whole-brain vertex-wise analysis in freesurfer (see Table 3, Section 3.2, Figure S5), which again showed that the only cluster of vertices related to participants’ language experience at p < .0001 (uncorrected) was located in the superior aspect of the left STG, corresponding to the location of planum temporale and the second TTG. Lowering the threshold of statistical significance to p < .001 (uncorrected) results in further clusters of vertices whose thickness was positively associated with the degree of multilingual language experience localized in:
  
  • Left hemisphere: central sulcus (S_cenral), long insular gyrus and central sulcus of the insula (G_Ins_lg_and_S_cent_ins), lingual gyrus (G_oc-temp_med-Lingual), planum temporale of the superior temporal gyrus (G_temp_sup-Plan_tempo), short insular gyri (G_insular_short), middle temporal gyrus (G_temporal_middle), and planum polare of the superior temporal gyrus (G_temp_sup-Plan_polar)
  
  • Right hemisphere: angular gyrus (G_pariet_inf-Angular), superior temporal sulcus (S_temporal_sup), middle-posterior part of the cingulate gyrus and sulcus (G_and_S_cingul-Mid-Post), marginal branch of the cingulate sulcus (S_cingul-Marginalis), parieto-occipital sulcus (S_parieto_occipital), parahippocampal gyrus (G_oc-temp_med-Parahip), Inferior temporal gyrus (G_temporal_inf)
  
  We present the result of this analysis in Author response image 1, where clusters are labelled according to the Destrieux anatomical atlas implemented in FreeSurfer:
  
  Author response image 1.
  
  As the reviewer points out, establishing relationships between our dependent and independent variables at a lower threshold of statistical significance might not reflect a true effect, and it is statistically more probable that multilingualism-related cortical thickness effects seem to be specific to the auditory regions. We do not exclude that an analysis of other pre-defined ROIs, performed at a similar level of detail as our present investigation, would uncover further significant associations between multilingual language experience and brain anatomy, but such an investigation is beyond the scope of the present work.
  
  The reason(s) why we might find a link between cortical thickness and experience is not fully discussed. The introduction doesn't really mention why we'd expect cortical thickness to be correlated (positively or negatively) with speech experience. There is some discussion of it in the Discussion section as it relates to the Pliatsikas' Dynamic Restructuring Model, though I think that model only directly predicts thinning as a function of experience (here, negative correlations). It might have less to say about observed positive correlations e.g., HG in the right hemisphere. In any case, I do think that it's interesting to find some relationship between brain morphology and experience but clearer explanations for why these occur could help, and especially some mention of it in the intro so readers are clearer on why cortical thickness is a useful measure.
  
  We have expanded the section of the Introduction introducing cortical thickness pointing to different microstructural changes previously associated with environmental enrichment and skill learning (lines 68-73), and hope the link between cortical thickness and multilingual language experience is clearer now:
  
  “Such environmental effect on cortical thickness might in turn be tied to microstructural changes to the underlying brain tissue, such as modifications in dendritic length and branching, synaptogenesis or synaptic pruning, growth of capillaries and glia, all previously tied to some kind of environmental enrichment and/or skill learning (see Lövdén et al., 2013; Zatorre et al., 2012 for overviews). Increased cortical thickness may reflect synaptogenesis and dendritic growth, while cortical thinning observed with MRI may be a result of increased myelination (Natu et al., 2019) or synaptic pruning.”
  
  In addition, we have also expanded the Discussion section providing more reasoning for the links between cortical thickness and multilingual language experience (lines 557-566):
  
  “Experience-induced pruning is essential for maintaining an efficient and adaptive neural network. It reinforces relevant neural circuits for faster more efficient information processing, while diminishing those that are less active, or less beneficial. The cortical specialization may need to arise because phonologically more diverse language experience requires that the mapping of acoustic signal to sound categories is denser, more detailed and more intricate. As a result, the brain may need to engage in more intensive processing to discriminate between and accurately perceive the sound categories of each language. This increased cognitive demand may, in turn, require the auditory and language processing regions of the brain to adapt and become more efficient. Over time, this heightened effort for successful speech perception and sound discrimination may lead to neural plasticity, resulting in cortical specialization. This means that cortical areas become more finely tuned and specialized for processing the unique phonological features of language(s) spoken by individuals.”
  
  One pitfall of quantifying phoneme overlap across languages is that what we might call a single 'phoneme', shared across languages, will, in reality, be realized differently across them. For instance, English and French may be argued to both use the vowel /u/ although it's realized differently in English vs. French (it's often fronted and diphthongized in many English speaker groups). Maybe the phonetic dictionaries used in this study capture this using a close phonetic transcription, but it's hard to tell; I suspect they don't, and in that case, the diversity measures would be an underestimate of the actual number of unique phonemes that a listener needs to maintain.
  
  The PHOIBLE database uses transcription that reflects phonological descriptive data as closely as possible, according to the available descriptive sources. Different realizations of sounds are (as much as possible) marked in the database. For example, the open front unrounded vowel /a/ is listed as e.g., [a] or [a̟ ], with the “+” sign denoting a fronted realization. This is done in PHOIBLE by the use of diacritics (see https://phoible.org/conventions) which further specify variations on the language-specific realizations of the phonemes listed in the database. Further details are available in Moran (2012) (https://digital.lib.washington.edu/researchworks/items/0d26e54d-950a-4d0b-b72c-3afb4b1aa9eb). In our calculation of phoneme-based distances a sign with and without a diacritic were treated as different phonemes, and therefore the different realizations were accounted for.
  
  That said, we fully agree with the reviewer that in fact any diversity measure will be an underestimation of the actual variation, as between-speaker micro-variation can never be fully reflected in largescale typological databases as the one used in the present study. To the best of our knowledge, however, PHOIBLE offers the most comprehensive way of allowing for quantifying cross-linguistic variation to date, and we are looking forward for the field to offer further tools capturing the linguistic variability at an ever-finer level of detail.
  
  Discussion of potential genetic differences underlying the findings is interesting. One additional data point here is a study finding a relationship between the number of repeats of the READ1 (a factor of the DCDC2 gene) in populations of speakers, and the phoneme inventory of language(s) predominant in that population (DeMille, M. M., Tang, K., Mehta, C. M., Geissler, C., Malins, J. G., Powers, N. R., ... & Gruen, J. R. (2018). Worldwide distribution of the DCDC2 READ1 regulatory element and its relationship with phoneme variation across languages. Proceedings of the National Academy of Sciences, 115(19), 4951-4956.) Admittedly, that paper makes no claim about the cortical expression of that regulatory factor under study, and so more work needs to be done on whether this has any bearing at all on the auditory cortex. But it does represent one alternative account that does not have to do with plasticity/experience.
  
  We thank the reviewer for bringing this important line of research to our attention, which we now included in the Discussion (lines 494-498 of the revised manuscript).
  
  The replication sample is useful and a great idea. It does however feature roughly half the number of participants meaning statistical power is weaker. Using information from the first sample, the authors might wish to do a post-hoc power analysis that shows the minimum sample size needed to replicate their effect; given small effects in some cases, we might not be surprised that the replication was only partial. I don't think this is a deal breaker as much as it's a way to better understand whether the failure to replicate is an issue of power versus fragile effects.
  
  Thank you for the suggestion. Indeed, the effect sizes established in the analyses using the main sample were small (e.g., f2 = 0.07). According to a power analysis performed with G*Power 3.1 (Faul et al., 2009), detecting an effect of this magnitude of the predictor of interest at alpha = .05 (two-tailed), in a linear multiple regression model with 4 predictors (i.e., 3 covariates of no-interest: sex, age, hemispheric thickness, and 1 predictor of interest), a sample of N = 114 is required to achieve 80% of power. Our partial lack of replicating the effect might therefore indeed be related to a lower power of the replication sample, rather than the effect itself being fragile.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the Authors):
  
  A few remaining details that I think you can handle:
  
  (1) Was there any correction for multiple comparisons, especially when multiple anatomical measures were investigated in separate models? (e.g. ln 130).
  
  Since three different anatomical measures were investigated in Analysis 1 and Analysis 2 (see Table 1), the alpha level of the two linear mixed models was lowered to α = .0166. Note that the p-values of the predictors of interest were p = .012 (mixed model with all auditory regions) and p = .005 (mixed model with all identified TTGs).
  
  (2) In Table 2, since your sample skews heavily female, it would be more useful to present the counts of Male/Female totals for 1, 2, 3, 4, etc TTGs as proportions of the total for that sex rather than counts, so that the distribution across sex is more obvious.
  
  Thank you for bringing this issue to our attention. We have now included an additional row in Table 4, with proportions of males and females presenting different total number of identified gyri in the left and the right hemisphere.
  
  (3) (ln 161) It wasn't clear to me how you dealt statistically with the fact that some participants had only one TTG - did you simply enter "0" as a value for cortical thickness for 2, 3, etc. for those participants? If so, it's possible that this result could reflect the number of split/duplicated gyri rather than the thickness of those gyri.
  
  Indeed, if non-existing gyri were coded with a value of “0” (it being the lowest possible thickness value), the results would reflect the configuration of TTGs (single vs multiple gyri) rather than a relationship between thickness and language experience.
  
  The model was, however, fit to all available thickness values, and the gyri labels (1st, 2nd, 3rd) were modeled as a fixed factor with 3 levels. This procedure allowed us to localize the effect of language experience to a specific gyrus. The following formula was used with the lmer package in R:
  
  thickness ~ age + sex + whole_brain_thickness + language_experience* gyrus*hemisphere + (1 | participant_id)
  
  We observed a significant interaction between language experience and the 2nd gyrus (NB. no significant 3-way interaction between language experience, the 2nd gyrus and hemisphere pointed to the effect being bilateral). This result was then followed up with two linear models: one for the thickness values of the 2nd left and one for the 2nd right gyrus, each fit to the available data only (n = 130 for the left hemisphere; n = 96 for the right), see Table 5. This procedure ensured that only the available cortical thickness data were considered when establishing their relationship with our independent variable (language experience).
  
  (4) I think more could be done in the results section to distinguish your three phonological measures--these details are evident in the Methods section, but if readers consume this paper front to back they may find it difficult to figure out what each measure really means.
  
  Thank you. We have added more explicit list of indices used in the Introduction (lines 104-107) and in Section 2.4. As per Reviewer #2 comments, the Methods section was also moved before the Results section, hopefully further enhancing the readability of the paper.
  
  Typos:
  
  ln 270: "weighed"--could you have meant "weighted"?
  
  Corrected, thank you!
  
  ln 377: "Apart from phoneme-based typological distance measure explaining" --> "Apart from *the* phonemebased..."
  
  Corrected, thank you!
  
  Reviewer #2 (Recommendations for the Authors):
  
  The interpretation of the results would be much helped by the methods section being moved to precede it. Now, much of the results section is methods summaries that would not have been needed if the reader had been presented with the methods beforehand. This is especially true for the measures of language experience and typological distances used.
  
  Thank you. We have moved the Materials and Methods section before the Results section.
  
  The equation in section "4.2 Language experience" should be H = - sum(p_i log2 (p_i)) and not H = - sum(p_i log2(i)).
  
  Corrected, thank you!
  
  It is unclear what "S" represents in the equation in the section "4.4 Combining typology and language experience (indexed by AoA)".
  
  The explanation has been added, thank you!
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.16.545298v3
www.biorxiv.org www.biorxiv.org

New submission 03/05/2023, 15:15:12

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Assessment note: “Whereas the results and interpretations are generally solid, the mechanistic aspect of the work and conclusions put forth rely heavily on in vitro studies performed in cultured L6 myocytes, which are highly glycolytic and generally not viewed as a good model for studying muscle metabolism and insulin action.”
  
  While we acknowledge that in vitro models may not fully recapitulate the complexity of in vivo systems, we believe L6 myotubes are appropriate for studying the mechanisms underlying muscle metabolism and insulin action. L6 myotubes possess many important characteristics relevant to our research, including high insulin sensitivity and a similar mitochondrial respiration sensitivity compared to primary muscle fibres. Furthermore, several studies have demonstrated the utility of L6 myotubes as a model for studying insulin sensitivity and metabolism, including our own previous work (PMID: 19805130, 31693893, 19915010) and work of others (PMID:12086937, 29486284, 15193147).
  
  Importantly, our observations from the L6 myotube model are supported by in vivo data from both mice and humans. Chow (Figure 3J, K) and high-fat fed mice (new data - Supplementary Figure 4 H-I) demonstrated a reduction in mitochondrial Ceramide and an increase in CoQ9. Muscle biopsies from humans showed a strong negative correlation between mitochondrial C18:0 ceramide levels and insulin sensitivity (PMID: 29415895). Further, complex I and IV abundance was strongly correlated with both muscle insulin sensitivity and mitochondrial ceramide (CerC18:0) (Figure 6E, F). This is consistent with our observations in L6 myotubes (Figure 6H, I). These findings support the relevance of our in vitro results to in vivo muscle metabolism.
  
  Points from reviewer 1
  
  Although the authors' results suggest that higher mitochondrial ceramide levels suppress cellular insulin sensitivity, they rely solely on a partial inhibition (i.e., 30%) of insulin-stimulated GLUT4-HA translocation in L6 myocytes. It would be critical to examine how much the increased mitochondrial ceramide would inhibit insulin-induced glucose uptake in myocytes using radiolabeled deoxy-glucose. Another important question to be addressed is whether glycogen synthesis is affected in myocytes under these experimental conditions. Results demonstrating reductions in insulin-stimulated glucose transport and glycogen synthesis in myocytes with dysfunctional mitochondria due to ceramide accumulation would further support the authors' claim.
  
  Response: We have now conducted additional experiments focusing on glycogen synthesis as a readout of insulin sensitivity, as it offers an orthogonal method for assessing GLUT4 translocation and glucose uptake. L6-myotubes overexpressing the mitochondrial-targeted ASAH1 construct (as described in Fig. 3) were challenged with palmitate and insulin stimulated glycogen synthesis was measured using 14C radiolabeled glucose. As shown below, palmitate suppressed insulin-induced glycogen synthesis, which was effectively prevented by overexpression of ASAH1 (N = 5, * p<0.05) supporting our previous observation using GLUT4 translocation as a readout of insulin sensitivity (Fig. 3). These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism.
  
  These data have now been added to Supplementary Figure 4K and the results modified as follows:
  
  “...For this reason, several in vitro models have been employed involving incubation of insulin sensitive cell types with lipids such as palmitate to mimic lipotoxicity in vivo. In this study we have used cell surface GLUT4-HA abundance as the main readout of insulin response...”
  
  “Notably, mtASAH1 overexpression protected cells from palmitate-induced insulin resistance without affecting basal insulin sensitivity (Fig. 3E). Similar results were observed using insulin-induced glycogen synthesis as an orthologous technique for Glut4 translocation. These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism (Sup. Fig. 5K). Importantly, mtASAH1 overexpression did not rescue insulin sensitivity in cells depleted…”
  
  Author response image 1.
  
  Additionally, the following text was added to the method section:
  
  “L6 myotubes overexpressing ASAH were grown and differentiated in 12-well plates, as described in the Cell lines section, and stimulated for 16 h with palmitate-BSA or EtOH-BSA, as detailed in the Induction of insulin resistance section.
  
  On day seven of differentiation, myotubes were serum starved in DMEM for 3.5 h. After incubation for 1 h at 37 °C with 2 µCi/ml D-[U-14C]-glucose in the presence or absence of 100 nM insulin, glycogen synthesis assay was performed, as previously described (Zarini S. et al., J Lipid Res, 63(10): 100270, 2022).”
  
  In addition, it would be critical to assess whether the increased mitochondrial ceramide and consequent lowering of energy levels affect all exocytic pathways in L6 myoblasts or just the GLUT4 trafficking. Is the secretory pathway also disrupted under these conditions?
  
  Response: This is an interesting point raised by the reviewer that is aimed at the next phase of this work, to identify how ceramide induced mitochondrial dysfunction drives insulin resistance. Looking at energy deficiency in more detail as well as general trafficking is part of ongoing work, but given the complexity of this question, it is beyond the scope of the current study.
  
  Points from reviewer 2
  
  The mechanistic aspect of the work and conclusions put forth rely heavily on studies performed in cultured myocytes, which are highly glycolytic and generally viewed as a poor model for studying muscle metabolism and insulin action. Nonetheless, the findings provide a strong rationale for moving this line of investigation into mouse gain/loss of function models.
  
  Response: We acknowledge that in vitro models may not fully mimic in vivo complexity as described above in the response to the “Assessment note”. We have now added to the Discussion:
  
  “In this study, we mainly utilised L6-myotubes, which share many important characteristics with primary muscle fibres. Both types of cells exhibit high sensitivity to insulin and respond similarly to maximal doses of insulin, with GLUT4 translocation stimulated between 2 to 4 times over basal levels in response to 100 nM insulin (as shown in Fig. 1-4 and (46,47)). Additionally, mitochondrial respiration in L6-myotubes has a similar sensitivity to mitochondrial poisons, as observed in primary muscle fibres (as shown in Fig. 5 (48)). Finally, inhibiting ceramide production increases CoQ levels in both L6-myotubes and adult muscle tissue (as shown in Fig. 2-3). Therefore, L6-myotubes possess the necessary metabolic features to investigate the role of mitochondria in insulin resistance, and this relationship is likely applicable to primary muscle fibres”.
  
  One caveat of the approach taken is that exposure of cells to palmitate alone is not reflective of in vivo physiology. It would be interesting to know if similar effects on CoQ are observed when cells are exposed to a more physiological mixture of fatty acids that includes a high ratio of palmitate, but better mimics in vivo nutrition.
  
  Response: We appreciate the reviewer's comment. Previously, we reported that mitochondrial CoQ depletion occurs in skeletal muscle after 14 and 42 days of HFHSD feeding, coinciding with the onset of insulin resistance (PMID: 29402381, see figure below).
  
  Author response image 2.
  
  These data demonstrated that our in vitro model recapitulates the loss of CoQ in insulin resistance observed in muscle tissue in response to a more physiological mixture of fatty acids. Further, it has been reported that different fatty acids can induce insulin resistance via different mechanisms (PMID:20609972), which would complicate interpretation of the data. Saturated fatty acids such as palmitate increase ceramides in cell-lines and humans, but unsaturated FAs generally do not (PMID: 10446195,14592453,34704121). As such we conclude that palmitate is a cleaner model for studying the effects of ceramide on skeletal muscle function.
  
  We have added to discussion:
  
  “…These findings align with our earlier observations demonstrating that mice exposed to HFHSD exhibit mitochondrial CoQ depletion in skeletal muscle (Fazakerley et al. 2018).”
  
  While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. This could be more representative of toxicity rather than pathophysiology. It would be helpful to know if these same effects are observed with other manipulations that lower CoQ to a similar degree. If not, the discrepancies should be discussed.
  
  Response: As the reviewer suggests many of these lipids can cause cell death (toxicity) if the dose is too high. We have previously found that low levels (0.15 mM) of palmitate were sufficient to trigger insulin resistance without any signs of toxicity (Hoehn, K, PNAS, 19805130). Using a similar approach, we show that mitochondrial membrane potential is maintained in SMPD5 overexpressing cells (Sup. Fig. 2J - and Author response image 2). Given that toxicity is associated with a loss of mitochondrial membrane potential (eg., 50uM Saclac; RH panel), these data suggest SMPD5 overexpression is not causing overt toxicity.
  
  Author response image 3.
  
  Furthermore, we conducted an overrepresentation analysis of molecular processes within our proteomic data from SMPD5-overexpressing cells. As depicted below, no signs of cell toxicity were observed in our model at the protein level. This data is now available in supplementary table 1.
  
  Author response table 1.
  
  Our results are therefore consistent with a pathological condition induced by elevated levels of ceramides independently of cellular toxicity. The following text has been added to the discussion:“...downregulation of the respirasome induced by ceramides may lead to CoQ depletion.
  
  Despite the significant impact of ceramide on mitochondrial respiration, we did not observe any indications of cell damage in any of the treatments, suggesting that our models are not explained by toxicity and increased cell death (Sup. Fig. 2H & J).”
  
  The conclusions could be strengthened by more extensive studies in mice to assess the interplay between mitochondrial ceramides, CoQ depletion and ETC/mitochondrial dysfunction in the context of a standard diet versus HF diet-induced insulin resistance. Does P053 affect mitochondrial ceramide, ETC protein abundance, mitochondrial function, and muscle insulin sensitivity in the predicted directions?
  
  Response: We agree with the referee about the importance of performing in vivo studies to corroborate our in vitro data. We have now conducted extensive new studies in mice skeletal muscle using targeted metabolomic and lipidomic analyses to investigate the impact of ceramide depletion in CoQ levels in HF-fed mice. Mice were exposed to a HF-fed diet with or without the administration of P053 (selective inhibitor of CerS1) for 5 weeks. As illustrated in the figures below, the administration of P053 led to a reduction in ceramide levels (left panel), increase in CoQ levels (middle panel) and a negative correlation between these molecules (right panel), which is consistent with our in vitro findings.
  
  Author response image 4.
  
  Additional suggestions:
  
  Figure 1: How does increased mitochondrial ceramide affect fatty acid oxidation (FAO) in L6-myocytes? As the accumulation of mitochondrial ceramide inhibits respirasome and mitochondrial activity in vitro, can reduce FAO in vivo, due to high mitochondrial ceramide, accounts for ectopic lipid deposition in skeletal muscle of obese subjects?
  
  Response: We appreciate the reviewer for bringing up this intriguing point. We would like to emphasise that Complex II activity is vital for fatty acid oxidation. As shown in Fig. 5H, our results indicate that specifically Complex II mediated respiration was diminished in cells with SMPD5 overexpression, suggesting that ceramides hinder the mitochondria's capability to oxidise lipids. We agree that this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity.
  
  We have added the following text to discussion:
  
  “...the mitochondria to switch between different energy substrates depending on fuel availability, named “metabolic Inflexibility”...this mechanism may potentially play a role in the ectopic lipid accumulation seen in individuals with obesity, a condition linked with cardio-metabolic disease.”
  
  Figure 2: Although the authors show that mtSMPD5 overexpression does not affect ceramide abundance in whole cell lysate, it would be critical to examine the abundance of this lipid in other cellular membranes and organelles, particularly plasma membrane. What is the effect of mtSMPD5 overexpression on plasma membrane lipids composition? Does that affect GLUT4-containing vesicles fusion into the plasma membrane, possibly due to depletion of v-SNARE or tSNARE?
  
  Response: While we acknowledge the importance of this point we strongly feel that measuring lipids in purified membranes has its limitations because it is impossible to purify specific membranes without contamination from other kinds of membranes. For example, we have done proteomics on purified plasma membranes from different cell types and we always observe considerable mitochondrial contamination with these membranes (e.g. PMID 21928809). This was the main factor that led us to use the mitochondrial targeting approach.
  
  Nevertheless we do acknowledge that there is a possibility that ceramides that are produced in the mitochondria in SMPD5 cells could leak out of mitochondria into other membranes and this could influence other aspects of GLUT4 trafficking and insulin action. However, we believe that the studies using mito targeted ASAH mitigate against this problem. Thus, we have now included a statement in the revised manuscript as follows: “It is also possible that ceramides generated within mitochondria in SMPD5 cells leak out from the mitochondria into other membranes (e.g. PM and Glut4 vesicles) affecting other aspects of Glut4 trafficking and insulin action. However, the observation that ASAH1 overexpression reversed IR without affecting whole cell ceramides argues against this possibility.”.
  
  Figure 4: One critical piece of information missing is the effect (if any) of mitochondrial ceramide accumulation on the mRNAs encoding the ETC components affected by this lipid. Although the ETC protein's lower stability may account for the effect of increased ceramide, transcriptional inhibition can't be ruled out without checking the mRNA expression levels for these ETC components.
  
  Response: To address this point, we have quantified the mRNA abundance of nine complex I subunits that exhibit downregulation in our proteomic dataset subsequent to mtSMPD5 overexpression (as depicted in Figure 4G).
  
  Induction of mtSMPD5 expression with doxycycline (below - Left hand panel) had no effect on the mRNA levels of the Complex I subunits (below - right hand panel).. This is consistent with our initial hypothesis that the reduction in electron transport chain (ETC) components, caused by heightened ceramide levels, primarily arises from alterations in protein stability rather than gene expression. While we acknowledge the possibility that certain subunits might be regulated at the transcriptional level, the absence of mRNA downregulation across our data strongly suggests that, at the very least, a portion of the observed protein depletion is attributed to diminished protein stability. We have incorporated this dataset into Supplementary Figure 6J and added the following text to the results:
  
  Author response image 5.
  
  “Importantly, CI downregulation was not associated with reduction in gene expression as shown in Sup. Fig. 6J.”
  
  Additionally, we have added the following text to discussion:
  
  “In addition, the absence of mRNA downregulation in mtSMPD5 overexpressing cells strongly suggests that at least a portion of the observed protein depletion within CI is attributed to diminished protein stability.”
  
  Figure 3: The authors state that neither palmitate nor mtASAH1 overexpression affected insulin-dependent Akt phosphorylation. However, the results in Figure 3F-G do not support this conclusion, as the overexpression of mtASAH1 does enhance the insulin-stimulated AKT (thr-308) phosphorylation. They need to clarify this issue.
  
  Response: We have now analysed these data in a manner that preserves the control variance, consistent with the other figures in the manuscript and there is no significant change in Akt phosphorylation in ASAH over-expressing cells.
  
  Author response image 6.
  
  Figure S2: A functional assessment of mitochondrial function in HeLa cells would be helpful to validate the small effect of Saclac treatment on CI NDUFB8.
  
  Response: Mitochondrial respiration was conducted in cells treated with Saclac (2 µM and 10 µM) for 24 hours. As shown below, in Hela cells, we did not detect any mitochondrial respiratory impairments at low doses, but only at high doses of Saclac. This suggests that the minor effect of Saclac on CI NDUFB8 is insufficient to alter mitochondrial function.
  
  Author response image 7.
  
  Reviewer #2 (Recommendations For The Authors):
  
  Additional questions and comments for consideration:
  
  The working model links ceramide-induced CoQ depletion to a reduction in ETC proteins and accompanying deficits in OxPhos capacity. The idea that mitochondrial dysfunction necessarily precedes and causes insulin resistance has been heavily debated for years because many animal and human studies have found no overt changes in ETC proteins and/or mitochondrial respiratory capacity during the early phases of insulin resistance. How do the investigators reconcile their work in the context of this controversy?
  
  Response: We acknowledge this controversy in our revised manuscript more clearly now as follows on page 21: “We present evidence that mitochondrial dysfunction precedes insulin resistance. However, previous studies have failed to observe changes in mitochondrial morphology, respiration or ETC components during early stages of insulin resistance (72). However, in many cases such studies fail to document changes in insulin-dependent glucose metabolism in the same tissue as was used for assessment of mitochondrial function. This is crucial because we and others do not observe impaired insulin action in all muscles from high fat fed mice for example. In addition, surrogate measures such as insulin-stimulated Akt phosphorylation may not accurately reflect tissue specific insulin action as demonstrated in figure 1C. Thus, further work is required to clarify some of these inconsistencies''.
  
  While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. Is this representative of pathophysiology or toxicity?
  
  Response: We believe we have addressed this in point 3 above (Principal comments, reviewer 1, point 3)
  
  How did this affect other mitochondrial lipids (e.g. cardiolipin)?
  
  Response: As shown in the supplementary figure 3, SMPD5 overexpression did not affect other lipids species such as cardiolipin (D-J). We have added to results:
  
  “Importantly, mtSMPD5 overexpression did not affect ceramide abundance in the whole cell lysate nor other lipid species inside mitochondria such as cardiolipin, cholesterol and DAGs (Sup. Fig. 3 A, D-J)”
  
  Are these severe effects rescued by CoQ supplementation?
  
  Response: We have performed additional experiments to address this point. As shown below, mitochondrial ceramide accumulation induced by palmitate was not reversed by CoQ supplementation, as demonstrated in Figure 1F. We have added to results:
  
  “Addition of CoQ9 had no effect on control cells but overcame insulin resistance in palmitate treated cells (Fig. 1A). Notably, the protective effect of CoQ9 appears to be downstream of ceramide accumulation, as it had no impact on palmitate-induced ceramide accumulation (Fig. 1E-F). Strikingly, both myriocin and CoQ9…”
  
  Additionally, we assessed mitochondrial respiration by using SeaHorse in cells with SMPD5 overexpression treated with or without CoQ supplementation. Our results, depicted below, indicate that CoQ supplementation reversed the ceramide-induced decrease in basal and ATP linked mitochondrial respiration. We have modified Fig.5.
  
  Author response image 8.
  
  We have added to results:
  
  “Respiration was assessed in intact mtSMPD5-L6 myotubes treated with CoQ9 by Seahorse extracellular flux analysis. mtSMPD5 overexpression decreased basal and ATP-linked mitochondrial respiration (Fig. 5 A, B &C), as well as maximal, proton-leak and non-mitochondrial respiration (Fig. 5 A, D, E & F) suggesting that mitochondrial ceramides induce a generalised attenuation in mitochondrial function. Interestingly, CoQ9 supplementation partially recovered basal and ATP-linked mitochondrial respiration, suggesting that part of the mitochondrial defects are induced by CoQ9 depletion. The attenuation in mitochondrial respiration is consistent with a depletion of the ETC subunits observed in our proteomic dataset (Fig. 4)...”
  
  Are these same effects observed with other manipulations that lower CoQ to a similar degree?
  
  Response: As mentioned in point 5 (additional suggestions from Reviewer 1), we conducted mitochondrial respiration measurements on HeLa cells treated with Saclac (2 µM and 10 µM) for 24 hours. Our findings showed no signs of mitochondrial respiratory impairments at low doses of Saclac in HeLa cells, despite observing CoQ depletion at this dose (Fig. Sup. 2C). We believe that this variation could be due to the varying sensitivity of mitochondrial respiration/ETC abundance to ceramide-induced CoQ depletion in different cell lines. Alternatively, it is possible that reduced mitochondrial respiration is a secondary event to other mitochondrial/cellular defects such as mitochondrial fragmentation or deficient nutrient transport inside mitochondria.
  
  *Author response image 9.
  
  The mitochondrial concentrations of CoQ required to maintain insulin sensitivity in L6 myocytes seem to vary from experiment to experiment. Is it the absolute concentration that matters and/or the change relative to a baseline condition?
  
  Response: This is an excellent observation. The findings indicate that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to discussion: “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, however the levels in palmitate treated mtASAH1 cells remained similar to control untreated cells (Fig. 3I). This suggests that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”
  
  Considering that CoQ has been shown to have antioxidant properties, does the rescue observed after a 16 h treatment require the prolonged exposure, or alternatively, are similar effects observed during short-term exposures (~1-2 h), which might imply a different or additional mechanism.
  
  Response: This is an excellent point that we have long considered. The problem is how to address the question in a way that will be definitive and we are concerned that the experiment suggested by the referee will not generate definitive data. A major issue is that CoQ has low solubility and needs to reach the right compartment. As such if short term treatment (as suggested) does not rescue, it would be difficult to make any definite conclusions as this might just be because insufficient CoQ is delivered to mitochondria. Conversely, if short term treatment does rescue this could be either because CoQ does get into mitochondria and regulate ETC or because of its general antioxidant function. So, even if we observe a rescue after 1 hour of incubation with CoQ, it will not clarify whether this is due to the antioxidant effect or simply because 1 hour is adequate to boost mitoCoQ levels. Thus, in our view this experiment might not get us any closer to the answer. Nevertheless, we do feel this is an important point and we have added the following statement to our revised manuscript to acknowledge this: “Because CoQ can accumulate in various intracellular compartments, it's important to consider that its impact on insulin resistance might be due to its overall antioxidant properties rather than being limited to a mitochondrial effect”
  
  In Figure 1, CoQ depletion due to 4NB treatment resulted in increased ceramide levels. Could this be due to impaired palmitate oxidation leading to rerouting of intracellular palmitate to the ceramide pathway? This could be tested using stable isotope tracers.
  
  Response: We have added the statement below to the manuscript to address this point. We feel that while an interesting experiment to perform it is somewhat outside of the major focus of this study.
  
  “One possibility is that CoQ directly controls ceramide turnover (35). An alternate possibility is that CoQ inside mitochondria is necessary for fatty acid oxidation (12) and CoQ depletion triggers lipid overload in the cytoplasm promoting ceramide production (36). Future studies are required to determine how CoQ depletion promotes Cer accumulation. Regardless, these data indicate that ceramide and CoQ have a central role in regulating cellular insulin sensitivity.”
  
  To a similar point, it would be helpful to know if the C2 ceramide analog is sufficient to cause elevated mito-ceramide and/or CoQ depletion. If not, the results might imply mitochondrial uptake of palmitate is required.
  
  Response: We feel this point is analogous to Point 7 above in that this experiment is not definitive enough to make any clear conclusions as it may or may not work for many different reasons. For example, C2 ceramide may not work simply because it has the wrong chain length.
  
  Moreover, it is clear that C2 ceramide has effects that clearly differ from those observed with palmitate most notably the inhibitory effect on Akt signalling. For these reasons we do not agree with the logic of this experiment.
  
  We have mentioned in the results section:
  
  “Based on these data we surmise that C2-ceramide does not faithfully recapitulate physiological insulin resistance, in contrast to that seen with incubation with palmitate”.
  
  Likewise, does inhibition of CPT1 ameliorate or exacerbate palmitate-induced insulin resistance?
  
  Response: This experiment has been performed by a number of different labs. For instance, muscle specific CPT1 overexpression is protective against high fat diet induced insulin resistance in mice (Bruce C, PMID19073774), CPT1 overexpression protects L6E9 muscle cells from fatty acid-induced insulin resistance (Sebastian D, PMID17062841) and increased beta-oxidation in muscle cells enhances insulin stimulated glucose metabolism and is protective against lipid induced insulin resistance (Perdomo G, PMID15105415). We have now cited all of these studies in our revised manuscript in the discussion: “In fact, increased fatty acid oxidation is protective against insulin resistance in several model organisms (37–39)”
  
  Does the addition of palmitate to the cells treated with mtSMPD5 further reduce CoQ9 (Figure 2I and 2J)?
  
  Response: This intriguing observation, as highlighted by the referee, has prompted us to conduct additional experiments to investigate the effects of palmitate and SMPD5 overexpression on Coenzyme Q (CoQ) levels in L6 myotubes. As demonstrated in the figures presented below, both palmitate and SMPD5 overexpression independently resulted in the depletion of CoQ9, with no observed additive effects suggesting that they shared a common pathway driving CoQ9 deficiency. One plausible hypothesis is that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane, likely the pool associated with Complex I (CI) in the Electron Transport Chain (ETC). This hypothesis is supported by previous studies indicating that approximately ~25 - 35 % of CoQ binds to CI (PMID: 33722627) and our data demonstrating that ceramide induces a selective depletion of CI in L6 myotubes (Fig. 4).
  
  We have added this result to Fig. 2I in the main section.
  
  Author response image 10.
  
  We have added to the result section:
  
  “Mitochondrial CoQ levels were depleted in both palmitate-treated and mtSMPD5-overexpressing cells without any additive effects. This suggests that these strategies to increase ceramides share a common mechanism for inducing CoQ depletion in L6 myotubes (Fig. 2I).”
  
  We have added to the discussion section:
  
  “...These are known to form supercomplexes or respirasomes where ~25 - 35 % of CoQ is localised in mammals (58,16).…The observation that both palmitate and SMPD5 overexpression trigger CoQ depletion without additive effects support the notion that ceramides may trigger the depletion of a specific CoQ9 pool localised within the inner mitochondrial membrane.”
  
  Some of the cell-based experiments appear to be underpowered and therefore confidence in the interpretations might benefit from additional repeats. For example, in Figure 3i, it appears that palmitate still causes a substantial reduction of CoQ in the cells treated with mtASAH1, even though mito-ceramide levels are restored to baseline. Please specify if these and other results are representative of multiple cell culture experiments or a single experiment.
  
  Response: All data were derived from a minimum of 3-4 independent experiments from at least two separate cultures of L6 cells. Separate batches of drug treatments were prepared for each experiment. We have previously compared metabolic parameters between batches of cells differentiated at different times (i.e. at least weeks apart) in a previous study (Krycer PMID 31744882) and found variations of <20% for insulin-stimulated glucose oxidation. With an expected variance of 20% and a type I error rate of 0.05, this is sufficient to detect a 40% difference with a power of 0.8. As the reviewer has indicated this is likely underpowered in situations where variance is unexpectedly high or if a small difference needs to be detected.
  
  In terms of Fig3, the reviewer raises an interesting point. As discussed in point 6, the fact that palmitate still appears to cause a depletion of CoQ in mtASAH1 cells likely indicates that the absolute concentration of CoQ is the determining factor for insulin sensitivity, rather than the relative depletion of CoQ compared to basal conditions. We have added to the discussion:
  
  “Finally, mtASAH1 overexpression increased CoQ levels. In both control and mtASAH1 cells, palmitate induced a depletion of CoQ, but this effect was less pronounced in the mtASAH1 cell line (Fig. 3I). Our results suggest that the absolute concentration of CoQ is crucial for insulin sensitivity, rather than the relative depletion compared to basal conditions, thus supporting the causal role of mitochondrial ceramide accumulation in reducing CoQ levels in insulin resistance”
  
  The color scheme of 2E is inconsistent with other panels in the figure.
  
  Response: Corrected
  
  It would be helpful if the axis labels for CoQ graphs were labeled as "Mito-CoQ" for clarity.
  
  Response: Corrected
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.03.10.532020v1
www.biorxiv.org www.biorxiv.org

Kinases in motion: impact of protein and small molecule interactions on kinase conformations

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We would like to thank you and the two Reviewers for the thoughtful evaluation of the manuscript and the support for publication. We have addressed all points raised by the two Reviewers.
  
  - We have extensively streamlined the manuscript. Repetitive passages regarding the respective kinase cascades have been removed.
  
  - We improved the presentation of the main Figures (mainly labeling and font size):
  
  - Figure 1: C, D, E, F o Figure 2: C, E, F, G, I, o Figure 3: D o Figure 4: F
  
  - Figure 5: A, B, C, D, E
  
  - We integrated new SI-data related to kinase functions, expression and the ‘cell-type comparisons’ of the KinCon reporter system (Figure Supplement 4, 5).
  
  Below you will find a detailed point-by-point response.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Regarding the issue of the use of the word "dynamics," as described in the public review, here are a few examples of ambiguous use in different sentences: o Line 27: dynamics of full-length protein kinases. Is this referring to the dynamics of conformational interconversion between inactive and active states?
  
  - Line 138: dynamic functioning of kinases. It is not clear what this means. o Line 276: ... alters KinCon dynamics. Not clear if they are measuring time-dependent process or a single point.
  
  - Figure legend 4F: dynamics of CDK4/6 reporters. Again, not clear how the assay is measuring dynamics.
  
  In my opinion, the authors use proper terminology that describes their assay in which the term dynamics is not used: Title: "... impact of protein and small molecule interactions on kinase conformations" and Line 89 "... reporter can be used to track conformational changes of kinases...".
  
  We have replaced the “dynamics” sections.
  
  - Line 27: The understanding of the structural dynamics of…
  
  - Line 91: This reporter can be used to track dynamic changes of kinases conformations…
  
  - Line 139: Conventional methods often fall short in capturing the dynamics of kinases within their native cellular environments…
  
  - Line 146: Such insights into the molecular structure dynamics of kinases in intact cells…
  
  - Line 199: In order to enhance our understanding of kinase structure dynamics…
  
  - Line 276: These findings underline that indeed the trimeric complex formation alters….
  
  - Figure Legend 4F: Quantification of alterations of CDK4/6 KinCon reporter bioluminescence signals…
  
  The authors state that KinCon has predictive capabilities (abstract and line 142). What do the authors mean by this?
  
  Previously we have benchmarked the suitability of the KinCon reporter for target engagement assays of wt and mutated kinase activities. With this we determined specificities of melanoma drugs for mutated BRAF variants (Mayrhofer 2020, PNAS).
  
  The authors indicate that KinCon is a highly sensitive assay. Can the authors elaborate on what high sensitivity means?
  
  With sensitivity we mean that we can detect conformation dynamics of the reporter at low expression levels of the hybrid protein expressed in the cell line of choice.
  
  - Line 209: Immunoblotting of cell lysates following luminescence measurements showed expression levels of the reporters in the range and below the endogenous expressed kinases (Figure 1E). …
  
  - Line 219: Using this readout, we showed that at expression levels of the BRAF KinCon reporter below the immunoblotting detection limit, one hour of drug exposure exclusively converted BRAF-V600E to the more closed conformation (Figure 1F, G, Figure Supplement 1B).
  
  - Line 221: These data underline that at expression levels far below the endogenous kinase, protein activity conformations can be tracked in intact cells. …
  
  For example, can they discuss how other fluorescence-based approaches that are less sensitive would not be able to accomplish the same type of results or derive similar conclusions? Can they provide a resolution metric both in space and time? Given that the authors state that this is a technical report, this information is of relevance.
  
  We highlight the key pros & cons of the KinCon reporter technology in following sections:
  
  -Line 529: The KinCon technology, introduced here, seeks to address the previously mentioned challenges. It has the potential to become a valuable asset for tracking kinase functions in living cells which are hard to measure solely via phosphotransferase activities. Overall, it offers an innovative solution for understanding kinase activity conformations, which could pave the way for more novel intervention strategies for kinase entities with limited pharmaceutical targeting potential. So far, this relates to the tracking of kinase-scaffold and pseudo-kinase functions.
  
  - Line 535: Key advantages of the KinCon reporter technology is the robustness of the system to track kinase conformations at varying expression levels. However, in contrast to fluorescence-based reporter read-outs subcellular analysis and cell sorting are still challenging due to comparable low levels of light emission
  
  The authors nicely describe how KinCon works in Figure 1B and part of 1C. I do think that the bottom of panel 1C needs to be revised, as well as the text describing the potential scenarios of potency, efficacy, and synergism.
  
  One issue with this part of Figure 1C is that it is not clear what the x-axis in the 3 plots refers to. Is this time? Is this concentration of a small molecule, inhibitor, or binding partner? This was confusing also in the context of the term dynamics used throughout the text. The terms potency, efficacy, and synergism should be subtitles, or the panels and the x-axis should be better defined, especially for a non-specialized reader.
  
  Related to this part of Figure 1C is the text. The authors mention potency, effectiveness, and synergy (Line 195). Can the authors use more fundamental terminology related to these three scenarios, for example, changes in activation constant, and percent of protein activates? Also, why synergy is only related to effectiveness? Can synergy also be associated with potency?
  
  Thank you for bringing this up, we have revised Figure 1C to better reflect the mentioned effects of potency. To avoid confusion, we removed the illustration for drug synergism. Accordingly, we have integrated the axis descriptions for the presented dose-response curves.
  
  Thus, we have further streamlined the text in the introduction – examples are shown below:
  
  - Line 195: Light recordings and subsequent calculations of time-dependent dosage variations of bioluminescence signatures of parallel implemented KinCon configurations aid in establishing dose-response curves. These curves are used for discerning pharmacological characteristics such as drug potency, effectiveness of drug candidates, and potential drug synergies (Figure 1C)
  
  - Figure 1C: Shown is the workflow for the KinCon reporter construct engineering and analyses using KinCon technology. The kinase gene of interest is inserted into the multiple cloning site of a mammalian expression vector which is flanked by respective PCA fragments (-F[1], -F[2]) and separated with interjacent flexible linkers. Expression of the genetically encoded reporter in indicated multi-well formats allows to vary expression levels and define a coherent drug treatment plan. Moreover, it is possible to alter the kinase sequence (mutations) or to co-express or knock-down the respective endogenous kinase, interlinked kinases or proteinogenic regulators of the respective pathway. After systematic administration of pathway modulating drugs or drug candidates, analyses of KinCon structure dynamics may reveal alterations in potency, efficacy, and potential synergistic effects of the tested bioactive small molecules (schematic dose response curves are depicted)
  
  Lastly, the use of these three cartoons gives the impression that the experimental results to come will follow a similar representation. Instead, the results are presented in bar plots for many different conditions. I think this will lead to confusion for a broad audience.
  
  The bottom panel of Figure 1C is not the depiction of real experiments but rather an illustration of fitted dose-response curves. We would like to present previous demonstrations of doseresponse curves using BRAF KinCon data and ERK phosphorylation (Röck 2019, Sci. Advances)
  
  We further agree with the reviewer and have therefore added a new part in the methods section addressing the evaluation of data extensively.
  
  - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown. In these cases, absolute bioluminescence values without any normalization are shown. Otherwise, data was indicated as RLU (relative light unit) fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated.
  
  For a non-expert reader, can the authors clarify the use of tracking basal conformations vs. transient over-expression of the various KinCon constructs? Moreover, the authors use the term transient over-expression for 10, 16, 24, and 48 h (Line 203). This, to a non-expert reader, does not seem transient.
  
  We have revised the manuscript to clarify it:
  
  - Line 207: We showed that transient over-expression of these KinCon reporters for a time frame of 10h, 16h, 24h or 48h in HEK293T cells delivers consistently increasing signals for all KinCon reporters (Figure 1E, Figure Supplement 1A).
  
  - Figure 1E) Representative KinCon experiments of time-dependent expressions of indicated KinCon reporter constructs in HEK293T cells are shown (mean ±SEM). Indicated KinCon reporters were transiently over-expressed in 24-well format in HEK293T cells for 10h, 16h, 24h and 48h each.
  
  Regarding Figure 1E and similar graphical representations: Why is the signal (RLU) nonlinear with time? If the fluorescence of the KinCon construct is linearly related to its expression or concentration inside the cell, one would expect a linear increase. Have the authors plotted RLU/Expression band intensity to account for changes in protein concentration? For instance, some of the results within Figure 3 are normalized to concentration on reporter expression level.
  
  Out intention was to show that varying expression levels can be used for the illustrated target engagement assays.Indeed, the represented elevations of RLU might be due to factors such as:
  
  - Doubling times of cells
  
  - Cell density
  
  - Media composition (which changes over time)
  
  - Reporter protein stabilities
  
  - Abundance of interactors of kinases
  
  For the results with LKB1, the authors claim that intermediate fold change in fluorescence (Figure 2E) is due to a partially closed intermediate state (Line 262). Can the authors discard the possibility by which there is a change in populations of active and inactive that on average give intermediate values?
  
  Based on our experience with KinCon reporter conformation states of kinases we tested so far, we assume that the presented data reflects an intermediate state. We agree that it needs further validation. We have changed the text accordingly:
  
  - Line 264: Upon interaction with LKB1 this conformation shifts to a partially closed intermediate state.
  
  The authors claim in Line 274 that mutations located at the interface of the LKB1/STRADalpha complex affect interactions and hypothesize that allosteric communication between LKB1 and STRADalpha is essential for function. Given that these mutations are at the interaction interface, why would the authors postulate an allosteric mechanism that evokes an effect distant from the interaction/active site? Could it be that function requires surface contacts alone that are disrupted by the mutations?
  
  We agree with the reviewer and changed our argumentation for this point:
  
  - Line 276: These findings underline that indeed the trimeric complex formation alters the opening and closing of the tested full-length kinase structures using the applied KinCon reporter read out
  
  I was unable to find text to explain the following: Figure 2I shows the mutation R74A as n.s., but in the text, only W308C is mentioned to not change fluorescence. Could the authors clarify why R74A is not discussed in the text? Maybe this reviewer missed the text in which it was discussed.
  
  We adapted the manuscript and include the R74A mutation as followed:
  
  - Line 296: Among these mutations, only the W308C and R74A mutation prevented significant closing of the LKB1 conformation when co-expressed with STRAD𝛼 and MO25 (Figure 2I).
  
  In Figure 2I where the individual measurements of the LKB1-R74A KinCon are highlighted in red to better emphasize the deviations. In the case of the R74A mutation the effect seen might be due to the high deviation between the experiments (Highlighted in red). These deviations are much higher when compared to either the wt or the W308 mutant, and can also be seen in the LKB1-R74A-KinCon only condition (white). Even though no significant closing of the LKB1 conformation could be observed in the case of R74A, we believe, since the trend of the conformation closing upon complex formation is still visible that the effect is still there. Further replicates would be necessary to validate this theory.
  
  Similarly, the authors state in line 326 that the study included an analysis of RIPK2. However, I was unable to find results, graphs, or additional text discussing RIPK2.
  
  The RIPK2 conformation was analyzed in Figure 3C (page 12).
  
  Some figures of RLU use absolute values, percentages, and fold change. Is there are reason why the authors use different Y-axis values? These should be explained and justified in Methods. Similarly, bars for wt in Figures 3D, G, or 4D, E, F show no errors. How are the authors normalizing the data and repeats so that there is no error, and are they treating the rest of the data (i.e., mutants and/or treated with small molecules) in the same way?
  
  We have changed the Y-axis values. Now, throughout the manuscript we show that there is a RLU fold-change. Except are selected experiments when solely absolute RLU values are shown (such as Figure 1E, F). We have also decided to integrate a paragraph into the methods section (Line 655). Figure 3D was changed as well.
  
  - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown. In these cases absolute bioluminescence values without any normalisation are shown. Otherwise, data was indicated as RLU fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated).
  
  The data is generally normalized on wt or untreated conditions, when the cells were treated with small molecules for target engagement assays.
  
  Lastly, the section starting in Line 472 reads more like a discussion of results from different types of inhibitors used in this study that results on its own. The authors should consider a new subtitle such as results or make this section a discussion.
  
  We agree with the reviewer and this part of the results was split into a new section of the result:
  
  - Line 455: “Effect of different kinase inhibitor types on the KinCon reporter system”.
  
  Reviewer #2 (Recommendations For The Authors):
  
  I have a few suggestions, since the paper is a distillation of a vast amount of work and tells a useful story.
  
  (1) The work is very solid, uses examples from the literature, and also extends into new experimental space. An obvious weakness is mentioned by the authors for the CKD data, in that measurements with Cyclin D (the activating subunit) are not characterized, although Cyclin D might be assumed to be present.
  
  We performed experiments with the CDK4/6 KinCon reporters and co-expressed CyclinD with a ratio of 1:3 (HEK293T cells, expression for 48h). However, in the context of inhibitor treatments we could not track conformation changes in these initial experiments. The cells were treated with the indicated CDK4/6i [1µM] for 3h. This seems to not impact the conformation of CDK4/6 wt or mutated KinCon reporters. There is a tendency that CyclinD co-expression promotes CDK4/6 conformation opening (data not shown).
  
  Author response image 1.
  
  Bioluminescence signal of CDK4/6 KinCon reporters with co-expressed CyclinD3 (HEK293T, expression for 48h) upon exposure to indicated CDK4/6i [1µM] or DMSO for 3h (mean ±SEM, n=3 ind. experiments). No significant changes using the current setting.
  
  (2) The work with the trimeric LKB1 complex involves pseudokinase, STRADalpha, whose conformation is also examined as a function of LKB1 status; since STRAD is an activator of LKB1. A future goal should be the evaluation of the complex in the presence of STRAD inhibitory/activating small molecules.
  
  Thank you for this great idea, we are currently compiling a FWF grant application to get support for such a R&D project.
  
  Minor points
  
  • Have any of the data been repeated in a different cell background? This came to mind because HeLa cells lack LKB1, which might be a useful place to test the LKB1 data in a different context.
  
  This experiment was performed and we show it in Figure Supplement 5. Further, we followed the advice of the reviewer and performed suggested experiments. We integrated the colon cancer cell line SW480 into the experimental setup. Overall, three cell settings showed the same pattern of KinCon reporter analyses for LKB1-STRADα-MO25 complex formation utilizing the LKB1- and STRADα-KinCon reporters.
  
  • The study picks up the PKA Cushings Syndrome field, which makes sense, and data are presented for L206R. PMID 35830806 explains how different patient mutations drive different signaling outcomes through distinct complex formations, and it would be interesting to discuss how mutations in KinCon complexes, especially those with mutations, could affect sub-cellular localization. Could the authors explain if this was done for any of the proteins, whose low experimental expression is a clear advantage, but is presumably hard to maintain across experiments?
  
  The feedback of the reviewer motivated us to perform subcellular fractionation experiments. They were performed with PKAc wt and L206R KinCon reporters as well as BRAF wt and V600E reporters. We were not able to see major differences between the wt and mutated reporter constructs in respect to their nucleus: cytoplasm localizations (Figure Supplement 4). For your information, in a R+D project with the mitochondrial kinase PINK1 we see localization of the reporter as expected almost exclusively at the mitochondria fraction.
  
  - Line 495: In this context of activating kinase mutations we showed that using PKAc (wt and L206R) and BRAF (wt and V600E) reporters as example we could not track alterations of cytoplasmic and nuclear localization (Figure Supplement 4). Furthermore, subcellular localization of PKAc KinCon reporters did not change when L206R mutant was introduced (Figure Supplement 4). As a control BRAF wt and V600E KinCon reporters were used and also no changes in localization was observed.
  
  • I suggest changing PMs (Figure 2 and others) simply to mutation, I read this as plasma membrane constantly.
  
  We agree and we have changed it to “patient mutation” in Figure 2C, Figure 3E, Figure 4B.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.01.11.575270v3
www.biorxiv.org www.biorxiv.org

New submission 26/10/2023, 09:35:56

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author Response
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1:
  
  This manuscript describes a set of four passage-reading experiments which are paired with computational modeling to evaluate how task-optimization might modulate attention during reading. Broadly, participants show faster reading and modulated eye-movement patterns of short passages when given a preview of a question they will be asked. The attention weights of a Transformerbased neural network (BERT and variants) show a statistically reliable fit to these reading patterns above-and-beyond text- and semantic-similarity baseline metrics, as well as a recurrent-networkbased baseline. Reading strategies are modulated when questions are not previewed, and when participants are L1 versus L2 readers, and these patterns are also statistically tracked by the same transformer-based network.
  
  I should note that I served as a reviewer on an earlier version of this manuscript at a different venue. I had an overall positive view of the paper at that point, and the same opinion holds here as well.
  
  Strengths:
  
  Task-optimization is a key notion in current models of reading and the current effort provides a computationally rigorous account of how such task effects might be modeled
  
  Multiple experiments provide reasonable effort towards generalization across readers and different reading scenarios
  
  Use of RNN-based baseline, text-based features, and semantic features provides a useful baseline for comparing Transformer-based models like BERT
  
  Thank you for the accurate summary and positive evaluation.
  
  Weaknesses:
  
  1) Generalization across neural network models seems, to me, somewhat limited: The transformerbased models differ from baseline models in numerous ways (model size, training data, scoring algorithm); it is thus not clear what properties of these models necessarily supports their fit to human reading patterns.
  
  Thank you for the insightful comment. To dissociate the effect of model architecture and the effect of training data, we have now compared the attention weights across three transformer-based models that have the same architecture but different training data/task: randomized (with all model parameters being randomized), pretrained, and fine-tuned models. Remarkably, even without training on any data, the attention weights in randomly initialized models exhibited significant similarity to human attention patterns (Figure. 3A). The predictive power of randomly initialized transformer-based models outperformed that of the SAR model. Through subsequent pre-training and fine-tuning, the predictive capacity of the models was further elevated. Therefore, both model architecture and the training data/task contribute to human-like attention distribution in the transformer models. We have now reported this result:
  
  “The attention weights of randomly initialized transformer-based models could predict the human word reading time and the predictive power, which was around 0.3, was significantly higher than the chance level and the SAR (Fig. 3A, Table S1). The attention weights of pre-trained transformerbased models could also predict the human word reading time, and the predictive power was around 0.5, significantly higher than the predictive power of heuristic models, the SAR, and randomly initialized transformer-based models (Fig. 3A, Table S1). The predictive power was further boosted for local but not global questions when the models were fine-tuned to perform the goal-directed reading task (Fig. 3A, Table S1).”
  
  In addition, we reported how training influenced the sensitivity of attention weights to text features and question relevance. As shown in Figure 4AB, attention in the randomized models were sensitive to text features across all layers. After pretraining, the models exhibited increased sensitivity to text features in the shallow layers, and decreased sensitivity to text features in deep layers. Subsequent finetuning on the reading comprehension task further attenuates the encoding of text features in deep layers but strengthens the sensitivity to task-relevant information.
  
  2) Inferential statistics are based on a series of linear regressions, but these differ markedly in model size (BERT models involve 144 attention-based regressor, while the RNN-based model uses just 1 attention-based regressor). How are improvements in model fit balanced against changes in model size?
  
  Thank you for pointing out this issue. The performance of linear regressions was evaluated based on 5-fold cross-validation, and the performance we reported was the performance on the test set. To match the number of parameters, we have now predicted human attention using the average of all heads. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript:
  
  “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”
  
  Also, it was not clear to me how participant-level variance was accounted for in the modeling effort (mixed-effects regression?) These questions may well be easily remedied by more complete reporting.
  
  In the previous manuscript, the word reading time was averaged across participants, and we did not consider the variance between participants. We have now analyzed eye movements of each participant and used the linear mixed effects model to test how different factors affected human word reading time to account for participantslevel and item-level variances.
  
  “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”
  
  “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”
  
  Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.
  
  3) Experiment 1 was paired with a relatively comprehensive discussion of how attention weights mapped to reading times, but the same sort of analysis was not reported for Exps 2-4; this seems like a missed opportunity given the broader interest in testing how reading strategies might change across the different parameters of the four experiments.
  
  Thank you for the valuable suggestion. We have now also characterized how different reading measures, e.g., gaze duration and counts or rereading, were affected by text and task-related features in Experiments 2-4.
  
  For Experiment 2: “For local questions, consistent with Experiment 1, the effects of question relevance significantly increased from early to late processing stages that are separately indexed by gaze duration and counts of rereading (Fig. S9A, Table S3).”
  
  For Experiment 3: “For local questions, the layout effect was more salient for gaze duration than for counts of rereading. In contrast, the effect of word-related features and task relevance was more salient for counts of rereading than gaze duration (Fig. S9B, Table S3).”
  
  For Experiment 4: “Both the early and late processing stages of human reading were significantly affected by layout and word features, and the effects were larger for the late processing stage indexed by counts of rereading (Fig. S9C, Table S3).”
  
  4) Comparison of predictive power of BERT weights to human annotations of text relevance is limited: The annotation task asked participants to chose the 5 "most relevant" words for a given question; if >5 words carried utility in answering a question, this would not be captured by the annotation. It seems to me that the improvement of BERT over human annotations discussed around page 10-11 could well be due to this arbitrary limitation of the annotations.
  
  Thank you for the insightful comment. We only allowed a participant to label 5 words since we wanted the participant to only label the most important information. As the reviewer pointed out, five words may not be enough. However, this problem is alleviated by having >26 annotators per question. Although each participant can label up to 5 words, pooling the results across >26 annotators results in nonzero relevance rating for an average 21.1 words for local questions and 26.1 words for global question. More important, as was outlined in Experimental Materials, we asked additional participants to answer questions based on only 5 annotated keywords. The accuracy for question answering were 75.9% for global questions and 67.6% for local questions, which was close to the accuracy achieved when the complete passage was present (Fig. 1B), suggesting that even 5 keywords could support question answering.
  
  5) Abstract ln 35: This concluding sentence didn't really capture the key contribution of the paper which, at least from my perspective, was something closer to "we offer a computational account of how task optimization modulates attention during reading"
  
  p 4 ln 66: I think this sentence does a good job capturing the main contributions of this paper
  
  Thanks for your suggestion. We have modified our conclusion in Abstract accordingly.
  
  6) p 4 ln 81: "therefore is conceptually similar" maybe "may serve a conceptually similar role"
  
  We have rewritten the sentence.
  
  “Attention in DNN also functions as a mechanism to selectively extract useful information, and therefore attention may potentially serve a conceptually similar role in DNN.”
  
  7) p. 7 ln 140: "disproportional to the reading time" I didn't understand this sentence
  
  Sorry for the confusion and we have rewritten the sentence.
  
  “In Experiment 1, participants were allowed to read each passage for 2 minutes. Nevertheless, to encourage the participants to develop an effective reading strategy, the monetary reward the participant received decreased as they spent more time reading the passage (see Materials and Methods for details).”
  
  8) p 8 ln 151: This was another sentence that helped solidify the main research contributions for me; I wonder if this framing could be promoted earlier?
  
  Thank you for the suggestion and we have moved the sentence to Introduction.
  
  9) p. 33: I may be missing something here, but I didn't follow the reasoning behind quantifying model fit against eye-tracking measures using accuracy in a permutation test. Models are assessed in terms of the proportion of random shuffles that show a greater statistical correlation. Does that mean that an accuracy value like 0.3 (p. 10 ln 208) means that 0.7 random permutations of word order led to higher correlations between attention weights and RT? Given that RT is continuous, I wonder if a measure of model fit such as RMSE or even R^2 could be more interpretable.
  
  We have now realized that the term “prediction accuracy” was not clearly defined and have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:
  
  “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”
  
  The permutation test was used to test if the predictive power is above chance. Specifically, if the predictive power is higher than the 95 percentile of the chancelevel predictive power estimated using permutations, the significant level (i.e., the p value) is 0.05. We have explained this in Statistical tests.
  
  10) p. 33: FDR-based multiple comparisons are noted several times, but wasn't clear to me what the comparison set is for any given test; more details would be helpful (e.g. X comparisons were conducted across passages/model-variants/whatever)
  
  Sorry for missing this important information. We have now mentioned which comparisons are corrected,
  
  “…Furthermore, the predictive power was higher for global than local questions (P = 4 × 10-5, bootstrap, FDR corrected for comparisons across 3 features, i.e., layout features, word features, and question relevance)…”
  
  Reviewer #2:
  
  In this study, researchers aim to understand the computational principles behind attention allocation in goal-directed reading tasks. They explore how deep neural networks (DNNs) optimized for reading tasks can predict reading time and attention distribution. The findings show that attention weights in transformer-based DNNs predict reading time for each word. Eye tracking reveals that readers focus on basic text features and question-relevant information during initial reading and rereading, respectively. Attention weights in shallow and deep DNN layers are separately influenced by text features and question relevance. Additionally, when readers read without a specific question in mind, DNNs optimized for word prediction tasks can predict their reading time. Based on these findings, the authors suggest that attention in real-world reading can be understood as a result of task optimization.
  
  The research question pursued by the study is interesting and important. The manuscript was well written and enjoyable to read. However, I do have some concerns.
  
  We thank the reviewer for the accurate summary and positive evaluation.
  
  1) In the first paragraph of the manuscript, it appears that the purpose of the study was to test the optimization hypothesis in natural tasks. However, the cited papers mainly focus on covert visual attention, while the present study primarily focuses on overt attention (eye movements). It is crucial to clearly distinguish between these two types of attention and state that the study mainly focuses on overt attention at the beginning of the manuscript.
  
  Thank you for pointing out this issue. We have explicitly mentioned that we focus on overt attention in the current study. Furthermore, we have also discussed that native readers may rely more on covert attention so that they do not need to spend more time overtly fixating at the task relevant words.
  
  In Introduction:
  
  “Reading is one of the most common and most sophisticated human behaviors [16, 17], and it is strongly regulated by attention: Since readers can only recognize a couple of words within one fixation, they have to overtly shift their fixation to read a line of text [3]. Thus, eye movements serve as an overt expression of attention allocation during reading [3, 18].”
  
  In Discussion:
  
  “Therefore, it is possible that when readers are more skilled and when the passage is relatively easy to read, their processing is so efficient so that they do not need extra time to encode task-relevant information and may rely on covert attention to prioritize the processing of task-relevant information.”
  
  2) The manuscript correctly describes attention in DNN as a mechanism to selectively extract useful information. However, eye-movement measures such as gaze duration and total reading time are primarily influenced by the time needed to process words. Therefore, there is a doubt whether the argument stating that attention in DNN is conceptually similar to the human attention mechanism at the computational level is correct. It is strongly suggested that the authors thoroughly discuss whether these concepts describe the same or different things.
  
  Thank you for bringing up this very important issue and we have added discussions about why human and DNN may generate similar attention distributions. For example, we found that both DNN and human attention distributions are modulated by task relevance and word properties, which include word length, word frequency, and word surprisal. The influence of task relevance is relatively straightforward since both human readers and DNN should rely more on task relevant words to answer questions. The influence of word properties is less apparent for models than for human readers and we have added discussions:
  
  For DNN’s sensitivity to word surprisal:
  
  “The transformer-based DNN models analyzed here are optimized in two steps, i.e., pre-training and fine-tuning. The results show that pre-training leads to text-based attention that can well explain general-purpose reading in Experiment 4, while the fine-tuning process leads to goal-directed attention in Experiments 1-3 (Fig. 4B & Fig. 5A). Pre-training is also achieved through task optimization, and the pre-training task used in all the three models analyzed here is to predict a word based on the context. The purpose of the word prediction task is to let models learn the general statistical regularity in a language based on large corpora, which is crucial for model performance on downstream tasks [21, 22, 33], and this process can naturally introduce the sensitivity to word surprisal, i.e., how unpredictable a word is given the context.”
  
  For DNN’s sensitivity to word length:
  
  “Additionally, the tokenization process in DNN can also contribute to the similarity between human and DNN attention distributions: DNN first separates words into tokens (e.g., “tokenization” is separated into “token” and “ization”). Tokens are units that are learned based on co-occurrence of letters, and is not strictly linked to any linguistically defined units. Since longer words tend to be separated into more tokens, i.e., fragments of frequently co-occurred letters, longer words receive more attention even if the model pay uniform attention to each of its input, i.e., a token.”
  
  3) When reporting how reading time was predicted by attention weights, the authors used "prediction accuracy." While this measure is useful for comparing different models, it is less informative for readers to understand the quality of the prediction. It would be more helpful if the results of regression models were also reported.
  
  Sorry for the confusion. The prediction accuracy was defined as the correlation coefficient between the predicted and actual eye-tracking measures. We have now realized that the term “prediction accuracy” might have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:
  
  “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”
  
  4) The motivations of Experiments 2 and 3 could be better described. In their current form, it is challenging to understand how these experiments contribute to understanding the major research question of the study.
  
  Thank you for pointing out this issue. In Experiments 1, different types of questions were presented in separate blocks, and all the participants were L2 reader. Therefore, we conducted Experiments 2 and 3 to examine how reading behaviors were modulated when different types of questions were presented in a mixed manner, or when participants were L1 readers. We have now clarified the motivations:
  
  “In Experiment 1, different types of questions were presented in blocks which encouraged the participants to develop question-type-specific reading strategies. Next, we ran Experiment 2, in which questions from different types were mixed and presented in a randomized order, to test whether the participants developed question-type-specific strategies in Experiment 1.”
  
  “Experiments 1 and 2 recruited L2 readers. To investigate how language proficiency influenced task modulation of attention and the optimality of attention distribution, we ran Experiment 3, which was the same as Experiment 2 except that the participants were native English readers.”
  
  Reviewer #3:
  
  This paper presents several eyetracking experiments measuring task-directed reading behavior where subjects read texts and answered questions.
  
  It then models the measured reading times using attention patterns derived from deep-neural network models from the natural language processing literature.
  
  Results are taken to support the theoretical claim that human reading reflects task-optimized attention allocation.
  
  STRENGTHS:
  
  1) The paper leverages modern machine learning to model a high-level behavioral task (reading comprehension). While the claim that human attention reflects optimal behavior is not new, the paper considers a substantially more high-level task in comparison to prior work. The paper leverages recent models from the NLP literature which are known to provide strong performance on such question-answering tasks, and is methodologically well grounded in the NLP literature.
  
  2) The modeling uses text- and question-based features in addition to DNNs, specifically evaluates relevant effects, and compares vanilla pretrained and task-finetuned models. This makes the results more transparent and helps assess the contributions of task optimization. In particular, besides finetuned DNNs, the role of the task is further established by directly modeling the question relevance of each word. Specifically, the claim that human reading is predicted better by task-optimized attention distributions rests on (i) a role of question relevance in influencing reading in Expts 1-2 but not 4, and (ii) the fact that fine-tuned DNNs improve prediction of gaze in Expts 1-2 but not 4.
  
  3) The paper conducts experiments on both L2 and L1 speakers.
  
  We thank the reviewer for the accurate summary and positive evaluation.
  
  WEAKNESSES:
  
  1) The paper aims to show that human gaze is predicted the the DNN-derived task-optimal attention distribution, but the paper does not actually derive a task-optimal attention distribution. Rather, the DNNs are used to extract 144 different attention distributions, which are then put into a regression with coefficients fitted to predict human attention. As a consequence, the model has 144 free parameters without apparent a-priori constraint or theoretical interpretation. In this sense, there is a slight mismatch between what the modeling aims to establish and what it actually does.
  
  Regarding Weakness (1): This weakness should be made explicit, at least by rephrasing line 90. The authors could also evaluate whether there is either a specific attention head, or one specific linear combination (e.g. a simple average of all heads) that predicts the human data well.
  
  Thank you for pointing out this issue. One the one hand, we have now also predicted human attention using the average of all heads, i.e., the simple average suggested by the reviewer. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript.
  
  “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”
  
  On the other hand, since different attention weights may contribute differently to the prediction of human reading time, we have now also reported the weights assigned to individual attention head during the original regression analysis (Fig. S4). It was observed that the weight was highly distributed across attention head and was not dominated by a single head.
  
  Even more importantly, we have now rephrased the statement in line 90 of the previous manuscript:
  
  “We employed DNNs to derive a set of attention weights that are optimized for the goal-directed reading task, and tested whether such optimal weights could explain human attention measured by eye tracking.”
  
  Furthermore, in Discussion, we mentioned that:
  
  “Furthermore, we demonstrate that both humans and transformer-based DNN models achieve taskoptimal attention distribution in multiple steps… Similarly, the DNN models do not yield a single attention distribution, and instead it generates multiple attention distributions, i.e., heads, for each layer. Here, we demonstrate that basic text features mainly modulate the attention weights in shallow layers, while the question relevance of a word modulates the attention weights in deep layers, reflecting hierarchical control of attention to optimize task performance. The attention weights in both the shallow and deep layers of DNN contribute to the explanation of human word reading time (Fig. S4).”
  
  2) While Experiment 1 tests questions from different types in blocks, and the paper mentions that this might encourage the development of question-type-specific reading strategies -- indeed, this specifically motivates Experiment 2, and is confirmed indirectly in the comparison of the effects found in the two experiments ("all these results indicated that the readers developed question-typespecific strategies in Experiment 1") -- the paper seems to miss the opportunity to also test whether DNNs fine-tuned for each of the question-types predict specifically the reading times on the respective question types in Experiment 1. Testing not only whether DNN-derived features can differentially predict normal reading vs targeted reading, but also different targeted reading tasks, would be a strong test of the approach.
  
  Regarding Weakness (2): results after finetuning for each question type could be reported.
  
  Thank you for the valuable suggestion. We have now fine-tuned the models separately based on global and local questions. The detailed fine-tuning parameters employed in the fine-tuning process were presented in Author response table 1.
  
  Author response table 1.
  
  The hyperparameter for fine-tuning DNN models with specific question type.
  
  The fine-tuning process yielded a slight reduction in loss (i.e., the negative logarithmic score of the correct option) on the validation set. Specifically, for BERT, the loss decreased from 1.08 to 0.96; for ALBERT, it decreased from 1.16 to 0.76; for RoBERTa, it went down from 0.68 to 0.54. Nevertheless, the fine-tuning process did not improve the prediction of reading time (Author response image 1). A likely reason is that the number of global and local questions for training is limited (local questions: 520; global questions: 280), and similar questions also exist in RACE dataset that is used for the original fine tuning (sample size: 87,866). Therefore, a small number of questions can significantly change the reading strategy of human readers but using these questions to effectively fine-tune a model seems to be a more challenging task.
  
  Author response image 1.
  
  Fine-tuning based on local and global questions does not significantly modulate the prediction of human reading time. Lighter-color symbols show the results for the 3 BERT-family models (i.e., BERT, ALBERT, and RoBERTa) and the darker-color symbols show the average over the 3 BERT-family models. trans_fine: model fine-tuned based on the RACE dataset; trans_local: models additionally fine-tuned using local questions; trans_global: models additionally fine-tuned using global questions.
  
  3) The paper compares the DNN-derived features to word-related features such as frequency and surprisal and reports that the DNN features are predictive even when the others are regressed out (Figure S3). However, these features are operationalized in a way that puts them at an unfair disadvantage when compared to the DNNs: word frequency is estimated from the BNC corpus; surprisal is derived from the same corpus and derived using a trigram model. The BNC corpus contains 100 Million words, whereas BERT was trained on several Billions of words. Relatedly, trigram models are now far surpassed by DNN-based language models. Specifically, it is known that such models do not fit human eyetracking reading times as well as modern DNN-based models (e.g., Figure 2 Dundee in: Wilcox et al, On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior, CogSci 2020). This means that the predictive power of the word-related features is likely to be underestimated and that some residual predictive power is contained in the DNNs, which may implicitly compute quantities related to frequency and surprisal, but were trained on more data. In order to establish that the DNN models are predictive over and above word-related features, and to reliably quantify the predictive power gained by this, the authors could draw on (1) frequency estimated from the corpora used for BERT (BookCorpus + Wikipedia), (2) either train a strong DNN language model, or simply estimate surprisal from a strong off-the-shelf model such as GPT-2.
  
  This concern does not fundamentally cast doubt on the conclusions, since the authors found a clear effect of the task relevance of individual words, which by definition is not contained in those baseline models. However, Figure S3 -- specifically Figure S3C -- is likely to inflate the contribution of the DNN model over and above the text-based features.
  
  Thank you for pointing out these issues. Following the valuable suggestion of the reviewer, we have now 1) computed word frequencies based on BookCorpus and Wikipedia and 2) calculated word surprisal using GPT-2.
  
  “The word features included word length, logarithmic word frequency estimated based on the BookCorpus [62] and English Wikipedia using SRILM [68], and word surprisal estimated from GPT-2 Medium [69].”
  
  These recalculated word frequency and surprisal are correlated with the original measures (word frequency: 0.98; surprisal: 0.59), and the updated results are also closely aligned with those reported in the previous manuscript.
  
  Others:
  
  1) How does the statistical modeling take into account that measures are repeated both within the items (same texts read by different subjects) and within the subjects (some subject read multiple texts)? I only see the items-level repetition be addressed in line 715-721 in comparing between local and global questions, but not elsewhere. The standard approach in the literature on human reading times (e.g. the Wilcox et al paper mentioned above, or ref. 44) is to use mixed-effects regression with appropriate random effects for items and subjects. The same question applies to the calculation of chance accuracy (line 702-709), which is done by shuffling words within a passage. Relatedly, how exactly was cross-validation (line 681) calculated? On the level of subjects, individual words, trials, texts, ...?
  
  Thank you for raising up this issue. In the previous manuscript, the word reading time was averaged across participants. The cross-validation was conducted on the level of texts (i.e., passages). Following the valuable suggestion, we have now separately analyzed each participant and applied the linear mixed effects models.
  
  “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”
  
  “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”
  
  Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.
  
  2) I could not find any statement about code availability (only about data availability). Will the source code and statistical analysis code also be made available?
  
  We have added the code availability statement.
  
  “The code is now available at https://github.com/jiajiezou/TOA.”
  
  3) The theoretical claim, and some basic features of the research, are quite similar to other recent work (Hahn and Keller, Modeling task effects in human reading with neural network-based attention, Cognition, 2023; cited with very little discussion as ref 44), which also considered task-directed reading in a question-answering task and derived task-optimized attention distributions. There are various differences, and the paper under consideration has both weaknesses and strengths when compared to that existing work -- e.g., that paper derived a single attention distribution from task optimization, but the paper under consideration provides more detailed qualitative analysis of the task effects, uses questions requiring more high-level reasoning, and uses more state-of-the-art DNNs.
  
  The paper would benefit from being more explicit about how the work under review provides a novel angle over Ref 44 (Hahn and Keller, Cognition, 2023).
  
  Thanks for bringing up this issue. We have now incorporated a more comprehensive discussion that compare the current study with the recent work conducted by Hahn and Keller:
  
  “When readers read a passage to answer a question that can be answered using a word-matching strategy [45], a recent study has demonstrated that the specific reading goal modulates the word reading time and the effect can be modeled using a RNN model [46]. Here, we focus on questions that cannot be answered using a word-matching strategy (Fig. 1B) and demonstrate that, for these challenging questions, attention is still modulated by the reading goal but the attention modulation cannot be explained by a word-matching model (Fig. S3). Instead, the attention effect is better captured by transformer models than an advanced RNN model, i.e., the SAR (Fig. 3A). Combining the current study and the study by Hahn et al. [46], it is possible that the word reading time during a general-purpose reading task can be explained by a word prediction task, the word reading time during a simple goal-directed reading task that can be solved by word matching can be modeled by a RNN model, while the word reading time during a more complex goal-directed reading task involving inference is better modeled using a transformer model. The current study also further demonstrates that elongated reading time on task-relevant words is caused by counts of rereading and further studies are required to establish whether earlier eye movement measures can be modulated by, e.g., a word matching task.”
  
  4) In Materials&Methods, line 599-636, specifically when "pretraining" is mentioned (line 632), it should be mentioned what datasets these DNNs were pretrained on.
  
  We have now mentioned this in the revised manuscript:
  
  “The pre-training process aimed to learn general statistical regularities in a language based on large corpora, i.e., BooksCorpus [62] and English Wikipedia…”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.04.25.538252v2
www.biorxiv.org www.biorxiv.org

Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Insulin is crucial for maintaining metabolic homeostasis, and its release is regulated by various pathways, including blood glucose levels and neuromodulatory systems. The authors investigated the role of neuromodulators in regulating the dynamics of the adult Drosophila IPC population. They showed that IPCs express various receptors for monoaminergic and peptidergic neuromodulators, as well as synaptic neurotransmitters with highly heterogeneous profiles across the IPC population. Activating specific modulatory inputs, e.g. dopaminergic, octopaminergic or peptidergic (Leucokinin) using an optogenetic approach coupled with in vivo electrophysiology unveiled heterogeneous responses of individual IPCs resulting in excitatory, inhibitory or no responses. Interestingly, calcium imaging of the entire IPC population with or without simultaneous electrophysiological recording of individual cells showed highly specific and stable responses of individual IPCs suggesting their intrinsic properties are determined by the expressed receptor repertoire. Using the adult fly connectome they further corroborate the synaptic input of excitatory and inhibitory neuronal subsets of IPCs. The authors conclude that the heterogeneous modulation of individual IPC activity is more likely to allow for flexible control of insulin release to adapt to changes in metabolic demand and environmental cues.
  
  Strengths:
  
  This study provides a comprehensive, multi-level analysis of IPC properties utilizing single-nucleus RNA sequencing, anatomical receptor expression mapping, connectomics, electrophysiological recordings, calcium-imaging and an optogeneticsbased 'intrinsic pharmacology' approach. It highlights the heterogeneous receptor profiles of IPCs, demonstrating complex and differential modulation within the IPC population. The authors convincingly showed that different neuromodulatory inputs exhibit varied effects on IPC activity and simultaneous occurrence of heterogeneous responses in IPCs with some populations exciting a subset of IPCs while inhibiting others, showcasing the intricate nature of IPC modulation and diverse roles of IPC subgroups. The temporal dynamic of IPC modulation showed that polysynaptic and neuromodulatory connections play a major role in IPC response. The authors demonstrated that certain neuromodulatory inputs, e.g. dopamine, can shift the overall IPC population activity towards either an excited or inhibited state. The study thus provides a fundamental entry point to understanding the complex influence of neuromodulatory inputs on the insulinergic system of Drosophila.
  
  We thank the reviewer for endorsing our study as a fundamental entry point to understanding the complex neuromodulation of the insulin system.
  
  Weakness:
  
  GPCRs are typically expressed at low levels and while the transcriptomic and reporter expression analysis was comprehensive, both approaches have the caveat that they do not allow validating protein level expression. Thus, some receptors might have been missed while others might be false positives. The authors acknowledged the challenges in accurately accessing receptor expression in complex modulatory systems indicating there are limitations in full understanding of the receptor profiles of IPCs.
  
  We agree with the reviewer and acknowledge that both the transcript and protein expression need to be examined in order to obtain higher confidence in receptor expression profiles. The T2A-GAL4 lines used in our anatomical analyses do in fact provide insights into which of the receptor transcripts are translated. We added the following statement to the discussion section to clarify this approach “The singlenucleus transcriptome analysis reveals which receptor transcripts are expressed whereas the T2A-GAL4 lines used in our anatomical analyses provide insights on which of the receptor transcripts are translated. This is based on the fact that T2A peptides induce ribosome skipping during translation. Therefore, GAL4 protein is only produced when the receptor protein is produced(42,88).”
  
  While this study provides valuable insights into the heterogeneity of IPC responses and receptor expression, it will require future studies to elucidate how these modulatory inputs affect insulin release and transcriptional long-term changes. The authors further analyzed male and female snRNAseq data and claimed that the differences in receptor expression were minimal. The experimental analyses used mated females only and while the study is very complete in this respect, it would have been extremely interesting to compare male flies in terms of their response profiles.
  
  We thank the reviewer for acknowledging that long-term effects on release and transcript levels go beyond the scope of this study and agree that these questions should be addressed in future investigations. Concerning the differences between females and males: we did not find significant differences in the snRNAseq data between the two sexes. Moreover, a parallel study from our lab found no differences between males and females in IPC baseline activity (Bisen et al. 2024, eLife https://doi.org/10.7554/eLife.98514.1). We therefore did not follow this path for the present study. We explained our reasoning in the results section of our paper, by adding: “Since there were little differences in receptor expression between males and females (Fig. S1C), we used the transcriptomes from both sexes for all subsequent analyses.” in the transcriptome section, and “Since baseline recordings from IPCs, in addition to our transcriptomic analysis, revealed no significant difference between male and female flies(26), we only used mated females for our physiological experiments.” in the transition to the physiology section of our manuscript.
  
  Lastly as also pointed out by the authors, their approach of using optogenetically driven excitation of modulatory neuronal subsets limits the interpretation of the results due to the possibly confounding direct or indirect effect of fast synaptic transmission on IPC excitation/inhibition, and the broad expression of some neuromodulatory lines used in this analysis.
  
  We agree that our results are limited to general effects of neuronal populations rather than individual neurons or specific inputs, and that it is generally hard to untangle effects of fast transmitters from those of modulatory inputs. However, we believe that we are careful in presenting and interpreting our results in this regard.
  
  Overall, however, the conclusions of this study are well supported by the data provided by the authors. Moreover, their detailed and thorough analysis of IPC modulation will have a significant impact on the field of metabolic regulation to understand the complex regulatory mechanism of insulin release, which can now be studied further to provide insight about metabolic homeostasis and neural control of metabolic processes.
  
  We thank the referee kindly for these comments!
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Held et al. investigated the distinct activities of Insulin-Producing Cells (IPCs) by electrophysiological recordings and calcium imaging. In the brain of the fruit fly Drosophila melanogaster, there are approximately 14 IPCs that are analogous to mammalian pancreatic beta cells and provide a good model system for monitoring their activities in vivo. The authors performed single-nucleus RNA sequencing analysis to examine what types of neuromodulatory inputs are received by IPCs. A variety of neuromodulatory receptors are expressed heterogeneously in IPCs, which would explain the distinct activities of IPCs in response to the activations of neuromodulatory neurons. The authors also conducted the connectome analysis and G-protein prediction analysis to strengthen their hypothesis that the heterogeneity of IPCs may underlie the flexible insulin release in response to various environmental conditions.
  
  Strengths:
  
  The authors succeeded patch-clamp recordings and calcium imaging of individual IPCs in living animals at a single-cell resolution, which allows them to show the heterogeneity of IPCs precisely. They measured IPC activities in response to 9 types of neurons in patch-clamp recordings and 5 types of neurons in calcium imaging, comparing the similarities and differences in activities between two methods. These results support the idea that the neuromodulatory system affects individual IPC activities differently in a receptor-dependent manner.
  
  We thank the reviewer for emphasizing how our in vivo experiments allow for a precise characterization of the IPC responses to modulatory inputs.
  
  Weaknesses:
  
  One concern is how much extent the heterogeneity of IPC activities in a short time scale is relevant to the net output, a release of insulin-like peptides in response to metabolic demands in a relatively longer time scale. The authors can test their hypothesis by manipulating the heterogeneous expressions of receptor genes in IPCs and examining IPC activities on a longer time scale. Moreover, while the authors focus on IPC activities, they did not show the activation of the neuromodulatory inputs and the net output of insulin levels in the data. The readers might want to know which neurons are indeed activated to send signals to IPCs and how IPC activities result in the secretion of insulin peptides.
  
  We agree with the reviewer that the two experiments described, manipulating receptor expression before long-term recordings and measuring insulin levels after activating modulatory inputs, would deliver exciting insights into the interplay of modulatory inputs, IPC population activity, and insulin release. However, currently available methods for monitoring insulin release do not allow us to perform these experiments with a temporal resolution that would match the sensitivity and time resolution of our physiological experiments and are therefore not suited for a direct comparison. We also acknowledge that it would be extremely exciting to characterize the modulatory populations providing input to IPCs in terms of their sensitivity to internal state changes and external inputs. However, this clearly goes beyond the scope of our study. Essentially, one would have to perform experiments on a similar scale and breadth as we have done for IPCs here for the other populations. We aim to perform some of these experiments in follow up projects to this work.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The authors used a 5% expression cutoff initially, which seems arbitrary. Can you explain the rationale for using this cutoff? If I interpret the authors' logic correctly and given there are 14 IPCs per animal, at 5% there is a 70% chance that 1 cell expresses that receptor.
  
  We used a 5% cutoff to reduce false positives in our transcriptomic analysis. This threshold translates to expression in 0.8 out of 16 IPCs found in an individual fly on average. Hence, this cutoff ensures that receptors are expressed in at least 1 cell. Based on 392 IPC transcriptomes used in our analysis, our 5% threshold means that any receptor expressed in less than 20 transcriptomes will be deemed to be absent. At the population level, this ensures that our expression analysis is based on cells from at least two flies. However, we expect the actual number of flies from which the IPC transcriptomes were derived from to be much higher. We added the following statement to the methods section to clarify this point: “To determine if a transcript is present in the IPC transcriptomes, we used a 5% cutoff to reduce false positives. This cutoff is equivalent to expression in 0.8 IPCs out of 16 on average in an individual fly, and hence less than one IPC in the entire population. Since we used 392 IPC transcriptomes in our analysis, this cutoff means that expression in less than 20 IPCs will be deemed false positive”
  
  (2) Were male and female brains examined separately and tested for divergent expression of T2A-reporter signals? While there were not many strong differences in the snRNAseq dataset, based on some discrepancies with the reporters it might be worthwhile to assess sex-specific differences that might account for the observed expression/non-expression of some receptors.
  
  We did not investigate sex-specific differences using anatomical mapping, since our scRNA analysis pointed against that being a major factor. We clarified our reasoning in the results section by adding “Since there were little differences in receptor expression between males and females (Fig. S1C), we used the transcriptomes from both sexes for all subsequent analyses.” in the transcriptome section, and “Since baseline recordings from IPCs, in addition to our transcriptomic analysis, revealed no significant difference between male and female flies(26), we only used mated females for our physiological experiments.” in the transition to the physiology section of our manuscript.
  
  (3) The anatomical reporter and transcriptome data for neuromodulatory receptor expression do not fully complement each other, e.g. in Fig1D Lkr is expressed only in one cluster but anatomical expression is observed in most IPCs. Ultimately, visualizing receptor expression at the protein level and functional analysis with genetic perturbation of the respective receptors is needed to draw strong conclusions.
  
  We agree with the reviewer that visualizing receptor expression at protein level could help clarify some of these differences since neuropeptide GPCR transcripts tend to be less abundant whereas we expect protein expression to be more stable. However, out of the 14 receptors examined in our study, antibodies are only available for two: DH31R and LKR. Since our DH31R-T2A-GAL4 line does not drive expression in IPCs, we did not pursue this further. We did perform preliminary experiments to validate LKR protein expression in IPCs. Unfortunately, we found that the LKR antibody labels cells in the pars intercerebralis in both the wild type and LKR mutants (see Author response image 1 below). Therefore, we do not think it suitable to monitor LKR protein expression. Thus, additional investigations must await future generations of neuropeptide receptor antibodies. One biological reason for the discrepancies could be that anatomical quantification is based on cumulative expression while transcriptomic analysis captures a brief snapshot. We included “One explanation for the discrepancies could be that transcriptomic analysis provides a single snapshot, whereas anatomical data is based on cumulative expression. Fluorescent markers persist long after transcription and translation has terminated. Therefore, a higher likelihood for receptor expression can be expected when it is quantified via anatomical techniques.” in our results part to give the readers more context.
  
  Author response image 1.
  
  (4) In Fig1E, As Dop2R reporter signal is not colocalizing with IPC whereas dop2R is expressed in all four clusters.
  
  We tested if additional transcript variants with different C-termini are the cause for the discrepancy between transcriptome data and anatomical mapping. However, using a Trojan-GAL4 line for Octa2R that should account for other transcript variants did also not show any expression. At this point, with the tools we have, we cannot conclusively determine what the cause of this discrepancy is. Since we only see them with Dop2R and Octa2R, a mismatch caused by more general differences,
  
  e.g. sex-specific differences, seems unlikely. A more plausible reason could be that for those lines, inadequate transgenes lead to failed expressions. We added “Hence, inadequate transgenes for Dop2R and Octα2R or the lack of protein translation are the likely cause for the discrepancy between transcriptome analysis and anatomical mapping.“ to our results part as a possible explanation for the discrepancy.
  
  (5) Moving the AstANs expression images to the main figure (Fig 1E) would make sense as the authors focus on AstAN rather than MsRT or Dop2R in the later parts of their work.
  
  We thank the reviewer for this suggestion and replaced the LKR image with an AstAR2 image, as suggested. We kept the other two receptors in the main figure as additional examples.
  
  (6) Have the authors considered gap junction coupling of IPCs, which might explain the simultaneous responses in some cases?
  
  We have indeed considered this exciting idea, as gap junctions between IPCs could potentially synchronize activity in connected IPC subpopulations. To test if gap junctions are a major factor in the IPC population, we performed experiments with patch-clamp recordings from a single IPC while performing calcium imaging of the IPC population (as demonstrated in Fig. 4J). In some of these experiments, we injected current into individual IPCs and tested for activity changes in the other IPCs. However, the preliminary data we acquired did not indicate that the current-induced train of action potentials was transmitted to others IPCs. Hence, it is unlikely that the IPCs are directly coupled by gap junctions. Given the challenging nature of these experiments, and the discouraging preliminary results, we have not followed up on the idea any further.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Figure 3D was not described in the text.
  
  We thank the reviewer for pointing out this mistake, we included the panel in Figure 3C and added the reference in the text describing the results from multiple animals shown in the panel.
  
  (2) In Figure 4B, a scale of heat map is required. There is a blue spot with no ROI setting on the left side. On the right side of the photos, the ROI No.6 seemed to turn blue after activation. However, Figure 4D shows the ROI No.6 was inhibited.
  
  We are now using a simplified heatmap in Figure 4B and added a scalebar. We also changed the example images to avoid any confusion. Previously, we used a random snapshot from before LED onset, now we used a snapshot from the actual time window to which we normalized the traces. Regarding the spot where no ROI is depicted but a response is visible: in this area, a trachea made it difficult to clearly delimit the cell body underneath, and we therefore excluded this ROI. Occlusions by trachea are one reason why we can typically not image the entire IPC population in a single animal.
  
  (3) In Figure 4F, the regions of gray bars (baseline) contain blue and red colors to some extent, which makes me confused. Moreover, the description "within one cluster, the response seemed homogeneous, e.g., in fly #4 during the activation of DANs (Fig. 4F)." was not clear to me. How about fly #1, #2, and #3? It seems that the responses changed excitedly and inhibitory within a cluster. Although the authors tend to raise some consistent results with examples, it would not be so effective if I can see there are other counter-examples and exceptions in the results.
  
  We apologize for the confusion we caused. The gray bars indicate the time window we used for baseline subtraction: The median activity of each IPC in this window was subtracted from the activity of that IPC. Hence, the median activity in this window is zero, but individual frames can have positive or negative values.
  
  We thank the reviewer for pointing out the confusion about the homogeneous responses in one cluster. We clarified this part in the results, by adding “Recording from multiple IPCs at the same time uncovered that the activity of IPCs within a cluster was synchronized in some cases. For example, in fly #1 in the DAN activation experiment, the baseline activity pattern of the excited IPC cluster was already synchronized before the first activation (fly #1, cells 3-8). Furthermore, the excitation onset and duration during the activation of DANs was highly uniform in this cluster. However, in other flies, e.g. #2 and #3 in the DAN activation experiments, we did not observe this synchronicity. While all IPCs in the excited cluster displayed an excitatory response to the DAN activation in these flies, the onset and duration differed between individual IPCs. In addition, the IPCs also showed more variability in their baseline activity (Fig. 4F). These findings point towards a shared input that can lead to the synchronization of IPC activity in some clusters and time windows. One known such input is the behavioral state – flight strongly inhibits the activity of all IPCs with very short delays(22). The flies in our experiments were not flying, but this example illustrates the presence of strong, state-dependent inputs that can synchronize the IPC population activity.”
  
  (4) In Figure 4J, no explanations of arrowheads, gray boxes, or asterisks are available in the legend.
  
  We thank the reviewer for pointing out this omission. We added the missing information to the figure legend.
  
  (5) "IPCs form distinct clusters." Is this cluster located closely each other or distant from one another?
  
  We did not encounter a location-dependent relationship between the IPCs of one cluster in calcium imaging experiments, nor did the anatomical receptor mapping data or connectomics analysis give any indication for anatomical clusters. The location of individual IPC cell bodies is not stereotypical across flies. We clarified this point in the results by adding “IPCs form distinct functional clusters” and “However, we found no evidence in our anatomical data, calcium imaging experiments, or in the fly brain EM volume that these clusters are distinguishable based on IPC soma location in the pars intercerebralis.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.14.557555v6
www.biorxiv.org www.biorxiv.org

Mediator kinase inhibition suppresses hyperactive interferon signaling in Down syndrome

1
1. Public_Reviews 24 Oct 2025
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  The main conclusion of this manuscript is that the mediator kinases supporting the IFN response in Downs syndrome cell lines represent an important addition to understanding the pathology of this affliction.
  
  Strengths:
  
  Mediator kinase stimulates cytokine production. Both RNAseq and metabolomics clearly demonstrate a stimulatory role for CDK8/CDK19 in the IFN response. The nature of this role, direct vs. indirect, is inferred by previous studies demonstrating that inflammatory transcription factors are Cdk8/19 substrates. The cytokine and metabolic changes are clear-cut and provide a potential avenue to mitigate these associated pathologies.
  
  Weaknesses:
  
  This study revealed a previously undescribed role for the CKM in splicing. The previous identification of splicing factors as substrates of CDK8/CDK19 is also intriguing. However, additional studies seem to be necessary in order to attach this new function to the CKM. As the authors point out, the changes in splicing patterns are relatively modest compared to other regulators. In addition, some indication that the proteins encoded by these genes exhibit reduced levels or activities would support their RNAseq findings.
  
  We have added new splicing data for the version of record. Specifically, we have added splicing data analysis for the "non-sibling" T21 cell line (±cortistatin A, t=4.5h) and for the sibling T21 line (±cortistatin A) at t=24h. The results are summarized in new Figure 5 – figure supplement 2. The data are in agreement with our prior data from the sibling T21 line ±CA at t=4.5h. In particular, i) similar numbers of genes were impacted by splicing changes (alternative exon inclusion or alternative exon skipping) in CA-treated cells in the "non-sibling" T21 line compared with the sibling T21 line; ii) upon completion of a pathway analysis of these alternatively spliced genes, similar pathways were affected by CA in each case (non-sibling T21 vs. sibling T21), in particular those related to IFN signaling; iii) regarding the new t=24h timepoint for the sibling T21 line, similar numbers of genes were alternatively spliced (alternative exon inclusion or alternative exon skipping) in CA-treated cells compared with the 4.5h timepoint, and iv) the IPA results with the alternately spliced genes identified inflammatory signaling, mRNA processing, and lipid metabolism among other pathways, which broadly reflect the cytokine screen and metabolomics data in CA-treated cells (t=24h).
  
  Additional evidence for CDK8/CDK19 regulation of splicing comes from our t=24h RNA-seq data in T21 cells ±CA. GSEA results revealed down-regulation of many pathways related to RNA processing and splicing, suggesting that the splicing changes caused by Mediator kinase inhibition result from reduced expression of splicing regulators, at least at this longer timeframe. These results are summarized in new Figure 2 – figure supplement 2E. Collectively, the data shown in this article reveal a previously unidentified role for Mediator kinases as splicing regulators. We emphasize in the article, however, that the splicing effects of Mediator kinase inhibition appear modest, at least within the cell lines and timeframes of our experiments, especially when compared with CDK7 inhibition [Rimel et al. Genes Dev 2020 1452].
  
  Seahorse analysis is normally calculated with specific units for oxygen consumption, ATP production, etc. It would be of interest to see the actual values of OCR between the D21 and T21 cell lines rather than standardizing the results. This will address the specific question about relative mitochondrial function between these cells. Reduced mitochondrial function has been associated with DS patients. Therefore, it would be important to know whether mitochondrial function is reduced in the T21 cells vs. the D21 control. Importantly for the authors' goal of investigating the use of CDK8/19 inhibitors in DS patients, does CA treatment reduce mitochondrial function to pathological levels?
  
  These are good points. We have addressed as follows.
  
  (1) We have added a comparative analysis of Seahorse data for the sibling-matched T21 and D21 lines. As shown in new Figure 2 – figure supplement 4A-C, the T21 line shows higher basal levels of OCR and ECAR compared with D21. Although reviewer 1 states that "reduced mitochondrial function has been associated with DS patients" we are unaware of the study from which this conclusion was made. Our results are consistent with a Down syndrome mouse model study published last year [Sarver et al. eLife 2023 e86023]. We acknowledge that in this study, T21/D21 OCR levels varied in different tissues, but the majority of tissue types showed elevated OCR in T21, similar to our results in the human B-cells used here.
  
  (2) Interestingly, CA treatment reduced OCR and ECAR in T21 cells (and D21), suggesting that Mediator kinase inhibition might normalize mitochondrial function (and ECAR) toward D21 levels. We show this comparison in new Figure 2 – figure supplement 4D-F. Indeed, CA treatment appears to normalize T21 mitochondrial function and ECAR toward D21 levels. Although this may suggest a therapeutic benefit, we emphasize that more experiments would be needed to make such claims with confidence.
  
  (3) We include a breakdown of mitochondrial parameters from Seahorse data in the bar plots shown in Figure 2–figure supplement 3. This includes ATP production, which shows reduced ATP levels in CA-treated T21 cells specifically.
  
  (4) We have added Seahorse data for ECAR (extracellular acidification rate) in the siblingmatched D21 and T21 cells, ±CA. These results are shown in new Figure 2 – figure supplement 3D, and indicate that CA treatment reduces ECAR in both D21 and T21 cells. This result is consistent with a prior report that analyzed ECAR in CDK8 analog-sensitive HCT116 cells [Galbraith et al. Cell Rep 2017 1495].
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  In this manuscript, Cozzolino et al. demonstrate that inhibition of the Mediator kinase CDK8 and its paralog CDK19 suppresses hyperactive interferon (IFN) signaling in Down syndrome (DS), which results from trisomy of chromosome 21 (T21). Numerous pathologies associated with DS are considered direct consequences of chronic IFN pathway activation, and thus hyperactive IFN signaling lies at the heart of pathophysiology. The collective interrogation of transcriptomics, metabolomics, and cytokine screens in sibling-matched cell lines (T21 vs D21) allows the authors to conclude that Mediator kinase inhibition could mitigate chronic, hyperactive IFN signaling in T21. To probe the functional outcomes of Mediator kinase inhibition, the authors performed cytokine screens, transcriptomic, and untargeted metabolomics. This collective approach revealed that Mediator kinases establish IFN-dependent cytokine responses at least in part through transcriptional regulation of cytokine genes and receptors. Mediator kinase inhibition suppresses cell responses during hyperactive IFN signaling through inhibition of pro- inflammatory transcription factor activity (anti-inflammatory effect) and alteration of core metabolic pathways, including upregulation of anti-inflammatory lipid mediators, which served as ligands for specific nuclear receptors and downstream phenotypic outcomes (e.g., oxygen consumption). These data provided a mechanistic link between Mediator kinase activity and nuclear receptor function. Finally, the authors also disclosed that Mediator kinase inhibition alters splicing outcomes.
  
  Overall, this study reveals a mechanism by which Mediator kinases regulate gene expression and establish that its inhibition antagonizes chronic IFN signaling through collective transcriptional, metabolic, and cytokine responses. The data have implications for DS and other chronic inflammatory conditions, as Mediator kinase inhibition could potentially mitigate pathological immune system hyperactivation.
  
  Strengths:
  
  (1) One major strength of this study is the mechanistic evidence linking Mediator kinases to hyperactive IFN signaling through transcriptional changes impacting cell signaling and metabolism. (2) Another major strength of this study is the use of sibling-matched cell lines (T21 vs D21) from various donors (not just one sibling pair), and further cross-referencing with data from large cohorts, suggesting that part of the data and conclusions are generalizable.
  
  (3) Another major strength of this study is the combined experimental approach including transcriptomics, untargeted metabolomics, and cytokine screens to define the mechanisms underlying suppression of hyperactive interferon signaling in DS upon Mediator kinase inhibition. (4) Another major strength of this study is the significance of the work to DS and its potential impact on other chronic inflammatory conditions.
  
  Weakness:
  
  (1) Genetic evidence linking the mentioned nuclear receptors to activation of an anti-inflammatory program upon Mediator kinase inhibition could improve the definition of the mechanism and overall impact of the work.
  
  Existing data from other studies, some of which are cited in the article, have linked PPAR and LXR to lipid biosynthesis and anti-inflammatory signaling cascades. We assume that reviewer 2 is suggesting knockdown and/or degron depletion of specific nuclear receptors, to compare/contrast the effect of CA on IFN responses in T21 and D21 cells. Such experiments would help de-couple the NR-specific contributions from other CA-dependent effects. We consider these experiments important next steps for this project, but beyond the scope of this study. That said, we anticipate that data from such experiments might be challenging to interpret, given the complex and inter-connected cascade of transcriptional and metabolic changes that would result from PPAR or LXR depletion.
  
  (2) Page 5 states that "Mediator kinases broadly regulate cholesterol and fatty acid biosynthesis and this was further confirmed by the metabolomics data", but a clear mechanistic explanation was lacking. Likewise, the data suggest but do not prove, that altered lipid metabolites influence the function of nuclear receptors to regulate an anti-inflammatory program in response to Mediator kinase inhibition (p. 6), despite the fact the gene expression changes elicited by Mediator kinase inhibition tracked with downstream metabolic changes.
  
  We have clarified the text on page 5 to address this comment. Specifically, we note that CA treatment increases expression of FA metabolism and cholesterol metabolism genes in T21 cells under basal conditions, and the genes affected are shown in Figure 2–figure supplement 1E. Thus, the mechanistic explanation is that Mediator kinases cause elevated levels of FA and cholesterol metabolites via changes in expression of FA and cholesterol biosynthesis genes (at least in part). We further address the mechanism with the PRO-seq data and TFEA results in Figure 6; in particular, p53 activity is rapidly suppressed in CA-treated T21 cells (t=75min), and this alone is sufficient to activate SREBP [Moon et al. Cell 2019 564]. CA-dependent activation of SREBP target genes is a dominant feature in the T21 RNA-seq data (t=4.5h).
  
  We agree with the second point raised by reviewer 2, that our data suggest but do not prove nuclear receptor function is altered by CA treatment. We do cite papers that have provided good evidence that the metabolites elevated in CA-treated cells are NR ligands and activate their target genes. Additional experiments to address this question might involve targeted depletion of select metabolites via inhibition of key biosynthetic enzymes. We consider these experiments beyond the scope of this already expansive article. That said, it will be challenging to conclusively demonstrate clear cause-effect relationships (e.g. to demonstrate whether select metabolites altered by CA treatment directly alter PPARA function), given i) the myriad transcriptional and metabolic changes caused by CA treatment, coupled with the fact that ii) the CA-dependent lipid metabolite changes are spread out across chemically distinct NR agonists (e.g. endocannabinoids, oleamide, or cholesterol metabolites such as desmosterol), and iii) NR activation can occur via multiple different metabolites.
  
  (3) The figures are outstanding but dense.
  
  Thank you. We have done our best to represent the results clearly and within the publication guidelines. There was an enormous amount of data to summarize for this article.
  
  (4) Figure 6 (PRO-Seq). The authors refer to pro-inflammatory TFs (e.g. NF-kB/RelA). It is not clear whether the authors have specifically examined TF binding at enhancers or more broadly at every region occupied by the interrogated TFs?
  
  This is a good point. Our analysis (TFEA) only identified the TFs whose activity was changing in CA-treated cells. It did not distinguish where these TFs were bound (enhancers vs. promoters). We completed a modified TFEA by separating enhancer TFs vs. promoter TFs. The results showed a preference for CA-dependent suppression of enhancer-bound TFs. This result is consistent with the general observation that stimulus-response transcription is controlled by enhancer-bound TFs (e.g. Kim et al. Nature 2010 182; Azofeifa et al. Genome Res 2018 334; Jones et al. bioRxiv 2024 585303). However, our TFEA enhancer/promoter analysis is preliminary and more work would be needed to address this comment in a rigorous way. Therefore, we did not include this analysis in the revision.
  
  Reviewing Editor Comments:
  
  Main suggestions for improvement:
  
  (1) Provide additional information about the mechanistic basis for the changes in lipid levels observed on kinase inhibition.
  
  We have changed the text to better emphasize that the mechanistic basis involves i) gene expression changes resulting from Mediator kinase inhibition (e.g. Fig 2 – figure supplement 1D, E, Fig 2 – figure supplement 2B, Fig 2 – figure supplement 4B-D); ii) activation of SREBP and PPAR and LXR, based upon IPA results with RNA-seq data (e.g. Fig 2B, Fig 2 – figure supplement 1F, Fig 2 – figure supplement 2D, Fig 2 – figure supplement 4E; Fig 3E), and iii) rapid CAdependent suppression of p53 function (Fig 6A), which will activate SREBP (Moon et al. Cell 2019 564).
  
  (2) Provide direct genetic evidence that the nuclear receptors are activated by the lipid changes to mediate an anti-inflammatory program in response to Mediator kinase inhibition.
  
  This is an excellent question but we consider it beyond the scope of this already expansive study. That said, we cite several papers in the article that demonstrate that the lipids we observe elevated in CA-treated cells i) directly bind PPAR or LXR and ii) activate their TF function. We also note that the anti-inflammatory impacts of Mediator kinase inhibition are broad, affecting distinct gene sets through transcriptional changes, metabolites, and cytokines. Any NR-specific contributions could be challenging to de-couple from CA-dependent effects using knockdown or depletion methods, given the compensatory responses that would result.
  
  (3) Improve/expand the evidence that Mediator kinase inhibition confers reduced mitochondrial function.
  
  We have added new Seahorse data for sibling-matched D21 and T21 cells (±CA) for the version of record. Our prior results showed reduced mitochondrial function and OCR in CA-treated T21 cells. We have added data that compares D21 and T21 mitochondrial function. As shown in new Figure 2 – figure supplement 4A-C, the T21 line shows higher basal levels of OCR and ECAR compared with D21. These results are consistent with a Down syndrome mouse model study published last year [Sarver et al. eLife 2023 e86023]. When we compare CA-treated T21 with D21 cells, mitochondrial respiration and OCR are similar, suggesting that Mediator kinase inhibition might normalize mitochondrial function (and ECAR) toward D21 levels. We show this comparison in new Figure 2 – figure supplement 4D-F. Although this may suggest a therapeutic benefit, we emphasize that more experiments would be needed to make such claims with confidence.
  
  (4) Determine whether mitochondrial function is reduced in the T21 cells vs. the D21 controls and whether kinase inhibition with the inhibitor reduces mitochondrial function to pathological levels.
  
  For the version of record, we have added a direct comparison of mitochondrial parameters and OCR in the sibling-matched D21/T21 lines. The data show that T21 cells have higher OCR compared with D21. These results are consistent with a Down syndrome mouse model study published last year [Sarver et al. eLife 2023 e86023]. Our results also indicate that CA treatment brings OCR and other "mitochondrial parameters" in T21 cells toward D21 levels, as noted above.
  
  (5) Consider whether the CDK8/19 inhibitor has off-target effects that would lessen its therapeutic value.
  
  We chose cortistatin A (CA) for this project because it is the most potent and selective inhibitor available for targeting CDK8/CDK19. Initial published reports suggested off-target effects (Cee et al. Angew Chem IEE 2009), but these experiments used binding assays against the kinase protein alone, and did not measure binding or inhibition with biologically relevant, active kinase complexes. Kinome-wide screens involving native, active kinase complexes showed no evidence of off-target effects for cortistatin A, even at concentrations 5000-times the measured KD (Pelish et al. Nature 2015). See Author response image 1.
  
  Related to CA therapeutic value, that is an important issue but beyond the scope of this study. We consider CA a valuable chemical probe, to use as a means to define CDK8/CDK19-dependent functions in cell line models. As a chemical probe, we consider CA the "best-in-class" Mediator kinase inhibitor, based upon all available data (Clopper & Taatjes Curr Opin Chem Biol 2022 102186).
  
  That said, we understand the concern about off-target effects, which can never be ruled out with a chemical inhibitor. We include quantitative western data (Fig 1 – figure supplement 1A) that compares CA with a structurally distinct CDK8/CDK19 inhibitor, CCT251545. The data show that, as expected, CA (100nM) and CCT251545 (250nM) similarly inhibit STAT1 S727 phosphorylation in IFN-stimulated cells. The samples were pre-treated with inhibitor for 30 minutes prior to IFNg and collected 45 minutes after IFNg treatment.
  
  We did not complete any experiments with knockouts or kinasedead alleles primarily because knockouts or kinase-dead alleles are not reliable comparisons for chemical inhibition because of the different time frames involved. For example, there will be genetic compensation in edited cell lines (Rossi/Stanier Nature 2015 230) and we and others have shown that there are major differences between kinase protein loss through knockdown or knockout methods vs. rapid inhibition with small molecules (e.g. Poss et al. Cell Rep 2016 436; Sooraj et al. Mol Cell 2022 123).
  
  Author response image 1.
  
  Information about cortistatin A. A) KiNativ kinome screen from HEK293 lysates. CA blocked capture of only CDK8/CDK19 in this MSbased assay, among over 200 kinases detected. B) Equilibrium binding constants and kinetics for CA. C) CA structure; note the dimethylamine is protonated at physiological pH, and forms a pi-cation interaction with W105 (crystal structure, panel D). Only CDK8 and CDK19 have an aromatic residue (W) at this position, providing a structural basis for high selectivity.
  
  (6) Improve the presentation of the splicing data and better discuss how the splicing alterations may be contributing to the disease phenotype.
  
  We have added new splicing data for the version of record. Specifically, we have added splicing data analysis for the "non-sibling" T21 cell line (±cortistatin A, t=4.5h) and for the sibling T21 line (±cortistatin A) at t=24h. The results are summarized in new Figure 5 – figure supplement 2. The data are in agreement with our prior results from the sibling T21 line ±CA at t=4.5h. In particular, i) similar numbers of genes were impacted by splicing changes (alternative exon inclusion or alternative exon skipping) in CA-treated cells in the "non-sibling" T21 line compared with the sibling T21 line; ii) upon completion of a pathway analysis of these alternatively spliced genes, similar pathways, including IFN signaling pathways, were affected by CA in each case (non-sibling T21 vs. sibling T21); iii) regarding the new t=24h timepoint for the sibling T21 line, similar numbers of genes were alternatively spliced (alternative exon inclusion or alternative exon skipping) in CA-treated cells compared with the 4.5h timepoint, and iv) the IPA results with the alternately spliced genes identified inflammatory signaling, mRNA processing, nucleotide and lipid metabolism among other pathways, which broadly reflect the cytokine screen and metabolomics data in CA-treated cells (t=24h).
  
  Additional evidence for CDK8/CDK19 regulation of splicing comes from our t=24h RNA-seq data in T21 cells ±CA. GSEA results from sibling T21 cells ±CA revealed down-regulation of many pathways related to RNA processing and splicing (RNA-seq data, t=24h), suggesting that the splicing changes caused by Mediator kinase inhibition result from reduced expression of splicing regulators, at least at longer timeframes. These results are summarized in new Figure 2 – figure supplement 2E.
  
  Related to how splicing alterations may be contributing to the CA-dependent effects and their potential therapeutic implications, this is an interesting question but open-ended. It will not be straightforward to link specific splicing changes to possible therapeutic outcomes, especially given that there are hundreds of genes affected and because the effects are modest (i.e. not all-ornothing).
  
  Reviewer #1 (Recommendations For The Authors):
  
  The findings that CA treatment leads to upregulation of as many genes are downregulated is consistent with previous studies of a 50:50 role for the CKM. However, most previous studies utilized knockout alleles or knockdown approaches. As the authors demonstrated in a previous study, CA inhibits kinase activity without changing CDK8 levels. Does this indicate that the kinase activity of Cdk8/19 is required for transcriptional repression? Previous in vitro studies suggested that Cdk8/19-dependent repression was independent of their kinase activity. The authors should comment on this.
  
  This is a challenging question to address, because the answer will depend on the timing of the experiment and the experimental context. The short answer is that the kinase activity of CDK8/19 will activate some genes and reduce expression of others, at least in part because CDK8/19 phosphorylate TFs, which drive global gene expression programs. TF phosphorylation by CDK8/19 appears to activate some genes and repress others (e.g. STAT1 S727A example from Steinparzer et al. Mol Cell 2019 485), at least based upon RNA-seq data, but this doesn't measure the immediate effects on the transcriptome. It is true that kinase activity isn't required to block pol II incorporation into the PIC (Knuesel et al. Genes Dev 2009 439). This is a kinase-independent function of the module; MKM-Mediator binding will block Mediator-pol II interaction and therefore block PIC assembly and pol II initiation (Knuesel 2009; Ebmeier & Taatjes PNAS 2010 11283). The kinase-independent functions of CDK8/19 were not a focus of the work described here. We only focus on Mediator kinase activity. We also do not focus on potential effects on RNAPII initiation or PIC assembly, although these are important peripheral topics.
  
  Descriptors are less useful as the reader must go back to reconstruct the experiment: "Although metabolites were measured 24h after CA treatment, these data suggest that altered lipid metabolites influence LXR and PPAR function". Does "altered" mean the lipid concentrations were up or down? Similarly, lipids that "influenced" LXR function - were they stimulatory or inhibitory?
  
  Good point. Where possible, we used more accurate language when describing CAdependent changes.
  
  I found many sections in the text confusing. For example: Figure 3. Mediator kinase inhibition antagonizes IFNγ transcriptional responses in T21 and D21. It takes a while to unpack this figure title. Instead of the double negative, the authors could simply state that "Mediator kinase is required for IFN-dependent transcriptional activation". Describing the protein activity, versus the drug-induced phenotype, can often clarify complicated scenarios.
  
  Good idea. We have edited the text to eliminate some but not all of these double negatives. In some cases we prefer to describe the consequence of kinase inhibition.
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The splicing data analysis is compelling, but not well integrated into the overall story and it cuts the storytelling logic in the Abstract. The authors could consider better integrating the large amount of data generated and better explaining how it relates to the various aspects of the proposed model (transcriptional, metabolism) to help improve potential cause-and-effect outcomes.    -
  
  We agree. The large amount of data, combined with the different experimental approaches, makes it a challenge to summarize the data in a concise way. We have done our best to organize the results in a logical and clear manner. To address this comment, we have gone through the text and re-organized where possible, and we have edited the abstract. We have added new splicing data and the splicing results are now better integrated (in our opinion) in part because of the pathway results from the t=24h ±CA RNA-seq data, which show major reductions in gene sets related to splicing and RNA processing.
  
  (2) The manuscript could improve its readability by providing specific details throughout. Examples include i) explaining why and what 29 cytokines were chosen for the screen (p. 3, p. 4) ii) providing major data analysis conclusions to the cytokine screen part (p. 3) iii) expanding the conclusions to the metabolic pathway analysis (p. 4) iv) being more precise when referring to T21-specific changes (up or down?) (p 4), and "significantly altered" by CA treatment in T21 cells (up or down?) (p. 5).
  
  Good points. We have edited the text to address these comments. Please note that the 29 cytokines refers to a different study (Malle et al. Nature 2023) and we had no role in selecting the cytokines. Our screen involved 105 cytokines that were arrayed as part of a commercially available panel.
  
  (3) The figures are outstanding but dense (e.g., Figure 1b, can any simplification and/or highlighting be done to underscore important features?). Some panels are illegible (e.g. Figure 1- supplement Figure 2a and b). The authors could improve data presentation. For example, the Venn diagrams (e.g., Figure 2f) are hard to quickly digest. Can the authors find a better way to highlight important data (e.g., hard to distinguish the meaning of font bolding from italics)?
  
  Thank you for these suggestions. Regarding Figure 1B, we simplified the metabolic pathways to emphasize the biochemicals that specifically relate to this study. We decided against highlighting specific metabolites beyond this simplification, because in our opinion it causes as many problems as it solves. Where possible, we have enlarged the panels with hard-to-read text; thank you for the suggestion. For the Venn diagrams, they convey a large amount of information in a single panel: increased or decreased gene expression in T21 or D21, cytokine genes or cytokine receptors, and gene expression convergence or divergence compared with protein levels from cytokine screens. There is a different way to display the results, but it would involve generating more data panels to parse out the results. This could be considered better, but we opted for something that is more information-rich that requires only a single data panel. Given the large amount of data already shown, we hope the reviewer can understand this choice.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.07.05.547813v4